-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Adding basic containment support to the x86 HWIntrinsics #15804
Adding basic containment support to the x86 HWIntrinsics #15804
Conversation
Also CC. @mikedn (you normally end up looking/commenting on stuff like this) |
Thanks for the work. This looks like a good example to help me to consider involving containment into the table-driven framework. Could you provide a bit JITDump to show the effect of this change? |
Here is the full disasm/dumps |
Basically, the delta is the transformation of:
into
|
Cc. @dotnet/jit-contrib |
@CarolEidt, are you the right person to review this? Also, as a more general question: Who are the people we should be tagging on the HWIntrinsic PRs for review? I don't believe there is an "area owners" document, like CoreFX has either, so it isn't always clear who should be tagged for any particular change to ensure that it does get reviewed when ready. |
Yes
Me and @dotnet/jit-contrib, I think.
I believe that we have explicitly avoided this, partly because the fluidity of assignments and focus on the JIT team. Feel free to ping me, especially for HW intrinsic or register allocator related things that are languishing. |
I have no permission to ping @dotnet/jit-contrib, will always ping @CarolEidt 😄 Thank you so much for your help! |
@fiigii - I'm taking a look at this now, but don't plan to scrutinize the encodings part - if you could review that, it would be helpful. |
src/jit/emitxarch.cpp
Outdated
@@ -3798,6 +3800,82 @@ void emitter::emitIns_R_R_I(instruction ins, emitAttr attr, regNumber reg1, regN | |||
emitCurIGsize += sz; | |||
} | |||
|
|||
void emitter::emitIns_R_A( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt, Just as a note. We already have several similar emitIns_R_A
methods, but they are emitIns_R_AI
, emitIns_R_AX
, etc. They only differ based on which of baseReg, indxReg, scale, and offs
are passed in.
I found it significantly easier to have a single method which takes and assigns all the args. This works because when we extract these from an Indir node, the unused ones are set to the correct value (reg = REG_NA, scale = 1, offs = 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to have the others call this, and/or phase them out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I believe so. I'll log a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall (and thanks!), but I'm wondering about handling the IND(LEA(ADDR(lcl),...))
case
{ | ||
case GT_LCL_VAR_ADDR: | ||
{ | ||
varNum = memBase->AsLclVarCommon()->GetLclNum(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memBase
is a LCL_VAR_ADDR
, it still may be the case that there are other components to the addressing mode (i.e. if the child of the IND
is an LEA
, then it could have all of the various addressing mode components, unless you've excluded those cases from containment, which I don't think you have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had followed emitInsBinary as an example for the various emit paths that needed to be supported.
It looked like, the GT_LCL_VAR_ADDR
just set the varNum
and offset
(https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3059) and then it just calls emitIns_R_S
(https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3082).
I didn't think it was possible for GetLclNum
to return BAD_VAR_NUM
in this case, so the subsequent code path (which calls into emitIns_R_M
) wasn't needed. Was that an incorrect assumption?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think you're right, and it is probably the case that, if we have anything more than a base, we would use GT_LCL_FLD_ADDR
instead. I think it would be prudent to add assert(mem->Index() == nullptr && mem->Scale() == 1 && mem->Offset() == 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, new commit should be validating that varNum
and offset
are correct for the general emitIns_R_S
case and that baseReg
, indxReg
, scale
, and offs
are their default values for the GT_LCL_VAR_ADDR
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tests that will validate various scenarios that should end up with contained nodes here: #15771
All current feedback has been addressed. All tests are passing locally (#15771 has more tests that cover other containment scenarios, which are also passing locally, but is also part of a larger test refactoring for the hardware intrinsics and is getting reviewed independently) |
The encoding part (emitter) looks good to me. Let's run some tests to cover all the encoding cases (x86, x64, SSE, and VEX). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks!
Thanks! Will merge once tests complete (provided they all come back green). |
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
@@ -110,7 +110,7 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins) | |||
assert(op2->isContained()); | |||
|
|||
GenTreeIndir* mem = op2->AsIndir(); | |||
GenTree* memBase = mem->Base(); | |||
GenTree* memBase = mem->gtOp1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt, discovered this when looking at fixing https://github.com/dotnet/coreclr/issues/15829.
emitInsBinary
does this and they (->Base()
vs ->gtOp1
) actually do slightly different things.
Latest commit should resolve the issues I discovered (see #15804 (comment)). |
Seems the CI gets stuck... |
Ok. This should be good now. After working on fixing https://github.com/dotnet/coreclr/issues/15829 (see PR #15836), I discovered that:
|
test Windows_NT x64 Checked jitincompletehwintrinsic |
test Windows_NT x86 Checked jitincompletehwintrinsic |
@tannergooding Are you going to merge this PR once CI gets green? Or waiting for #15836? I want change #15749 to match the containment. |
@fiigii, I need @CarolEidt to review the new changes and re-approve first. This isn't blocked by #15836 as that is a separate refactoring. I just found some cases I missed originally from it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
id->idIns(ins); | ||
id->idReg1(reg1); | ||
|
||
emitHandleMemOp(indir, id, fmt, ins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch - I'd forgotten about that.
Ok. The They are also failing in the outerloop:
Going to log a bug. |
This PR introduced desktop build failures for legacy jit, because new function
@tannergooding @CarolEidt could you please fix that? What is the contract for |
Yes. |
I'll have a PR up shortly. @CarolEidt can confirm, but I don't belive HW_INTRINSICS should be defined for legacy JIT |
Yes, that is the case. I don't believe we currently have anything to enable this, but I believe that in a process that uses a legacy altjit, none of the HW intrinsics should be enabled at all. |
@sandreenko, is there a good way to test the legacy JIT build locally (using CoreCLR) or do I need to pull down the TFS sources? |
Ok, so we can enable Or we can disable it for |
@tannergooding We deleted |
At this point, x86 LEGACY_BACKEND only builds on the TFS side. We can probably stop doing that now (I think all the testing that depends on it has now been removed), but it's better to fix the break and I'll do the real LEGACY_BACKEND removal soon enough. |
@BruceForstall, is that change just removing all the Is it worth logging a bug to track that work? |
Essentially. There's also removing all testing infrastructure that depends on the builds. We're not ready to do this yet, though we're getting close -- we need to decide that we have no more need for this flavor, either as product or for internal use, such as performance/CQ/diffs comparison.
IMO, no, or, it doesn't matter to me. We're very well aware of it. |
FYI. @fiigii, @CarolEidt