Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Adding basic containment support to the x86 HWIntrinsics #15804

Merged
merged 3 commits into from
Jan 12, 2018
Merged

Adding basic containment support to the x86 HWIntrinsics #15804

merged 3 commits into from
Jan 12, 2018

Conversation

tannergooding
Copy link
Member

@tannergooding tannergooding commented Jan 10, 2018

@tannergooding
Copy link
Member Author

Also CC. @mikedn (you normally end up looking/commenting on stuff like this)

@fiigii
Copy link

fiigii commented Jan 10, 2018

Thanks for the work. This looks like a good example to help me to consider involving containment into the table-driven framework. Could you provide a bit JITDump to show the effect of this change?

@tannergooding
Copy link
Member Author

@tannergooding
Copy link
Member Author

Basically, the delta is the transformation of:

48B9382804240A020000 mov      rcx, 0x20A24042838
488B09               mov      rcx, gword ptr [rcx]
C4E17D104108         vmovupd  ymm0, ymmword ptr[rcx+8]
48B9402804240A020000 mov      rcx, 0x20A24042840
488B09               mov      rcx, gword ptr [rcx]
C4E17D104908         vmovupd  ymm1, ymmword ptr[rcx+8]
C4E17C58F1           vaddps   ymm6, ymm0, ymm1

into

48B93828594593010000 mov      rcx, 0x19345592838
488B09               mov      rcx, gword ptr [rcx]
C4E17D104108         vmovupd  ymm0, ymmword ptr[rcx+8]
48B94028594593010000 mov      rcx, 0x19345592840
488B09               mov      rcx, gword ptr [rcx]
C4E17C587108         vaddps   ymm6, ymm0, ymmword ptr[rcx+8]

@tannergooding
Copy link
Member Author

Cc. @dotnet/jit-contrib

@tannergooding
Copy link
Member Author

@CarolEidt, are you the right person to review this?

Also, as a more general question: Who are the people we should be tagging on the HWIntrinsic PRs for review?

I don't believe there is an "area owners" document, like CoreFX has either, so it isn't always clear who should be tagged for any particular change to ensure that it does get reviewed when ready.

@CarolEidt
Copy link

@CarolEidt, are you the right person to review this?

Yes

Also, as a more general question: Who are the people we should be tagging on the HWIntrinsic PRs for review?

Me and @dotnet/jit-contrib, I think.

I don't believe there is an "area owners" document, like CoreFX has either, so it isn't always clear who should be tagged for any particular change to ensure that it does get reviewed when ready.

I believe that we have explicitly avoided this, partly because the fluidity of assignments and focus on the JIT team. Feel free to ping me, especially for HW intrinsic or register allocator related things that are languishing.

@fiigii
Copy link

fiigii commented Jan 11, 2018

I have no permission to ping @dotnet/jit-contrib, will always ping @CarolEidt 😄 Thank you so much for your help!

@CarolEidt
Copy link

@fiigii - I'm taking a look at this now, but don't plan to scrutinize the encodings part - if you could review that, it would be helpful.

@@ -3798,6 +3800,82 @@ void emitter::emitIns_R_R_I(instruction ins, emitAttr attr, regNumber reg1, regN
emitCurIGsize += sz;
}

void emitter::emitIns_R_A(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt, Just as a note. We already have several similar emitIns_R_A methods, but they are emitIns_R_AI, emitIns_R_AX, etc. They only differ based on which of baseReg, indxReg, scale, and offs are passed in.

I found it significantly easier to have a single method which takes and assigns all the args. This works because when we extract these from an Indir node, the unused ones are set to the correct value (reg = REG_NA, scale = 1, offs = 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the others call this, and/or phase them out?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I believe so. I'll log a bug.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fiigii
Copy link

fiigii commented Jan 11, 2018

if you could review that, it would be helpful.

Sure, will do.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall (and thanks!), but I'm wondering about handling the IND(LEA(ADDR(lcl),...)) case

{
case GT_LCL_VAR_ADDR:
{
varNum = memBase->AsLclVarCommon()->GetLclNum();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memBase is a LCL_VAR_ADDR, it still may be the case that there are other components to the addressing mode (i.e. if the child of the IND is an LEA, then it could have all of the various addressing mode components, unless you've excluded those cases from containment, which I don't think you have.

Copy link
Member Author

@tannergooding tannergooding Jan 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had followed emitInsBinary as an example for the various emit paths that needed to be supported.

It looked like, the GT_LCL_VAR_ADDR just set the varNum and offset (https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3059) and then it just calls emitIns_R_S (https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3082).

I didn't think it was possible for GetLclNum to return BAD_VAR_NUM in this case, so the subsequent code path (which calls into emitIns_R_M) wasn't needed. Was that an incorrect assumption?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I think you're right, and it is probably the case that, if we have anything more than a base, we would use GT_LCL_FLD_ADDR instead. I think it would be prudent to add assert(mem->Index() == nullptr && mem->Scale() == 1 && mem->Offset() == 0)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, new commit should be validating that varNum and offset are correct for the general emitIns_R_S case and that baseReg, indxReg, scale, and offs are their default values for the GT_LCL_VAR_ADDR case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tests that will validate various scenarios that should end up with contained nodes here: #15771

@tannergooding
Copy link
Member Author

All current feedback has been addressed.

All tests are passing locally (#15771 has more tests that cover other containment scenarios, which are also passing locally, but is also part of a larger test refactoring for the hardware intrinsics and is getting reviewed independently)

@fiigii
Copy link

fiigii commented Jan 11, 2018

The encoding part (emitter) looks good to me. Let's run some tests to cover all the encoding cases (x86, x64, SSE, and VEX).

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks!

@tannergooding
Copy link
Member Author

Thanks! Will merge once tests complete (provided they all come back green).

@@ -110,7 +110,7 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins)
assert(op2->isContained());

GenTreeIndir* mem = op2->AsIndir();
GenTree* memBase = mem->Base();
GenTree* memBase = mem->gtOp1;
Copy link
Member Author

@tannergooding tannergooding Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt, discovered this when looking at fixing https://github.com/dotnet/coreclr/issues/15829.

emitInsBinary does this and they (->Base() vs ->gtOp1) actually do slightly different things.

@tannergooding
Copy link
Member Author

Latest commit should resolve the issues I discovered (see #15804 (comment)).

@fiigii
Copy link

fiigii commented Jan 12, 2018

Seems the CI gets stuck...

@tannergooding
Copy link
Member Author

Ok. This should be good now. After working on fixing https://github.com/dotnet/coreclr/issues/15829 (see PR #15836), I discovered that:

  1. memIndir->gtOp1 and memIndir->Base() do slightly different things
  2. isUsedFromSpillTemp is expected to take precedence over everything else

@tannergooding
Copy link
Member Author

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

@tannergooding
Copy link
Member Author

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

@fiigii
Copy link

fiigii commented Jan 12, 2018

@tannergooding Are you going to merge this PR once CI gets green? Or waiting for #15836? I want change #15749 to match the containment.

@tannergooding
Copy link
Member Author

@fiigii, I need @CarolEidt to review the new changes and re-approve first.

This isn't blocked by #15836 as that is a separate refactoring. I just found some cases I missed originally from it.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

id->idIns(ins);
id->idReg1(reg1);

emitHandleMemOp(indir, id, fmt, ins);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch - I'd forgotten about that.

@tannergooding
Copy link
Member Author

Ok. The x86_checked_windows_nt_jitx86hwintrinsicnoavx2 failures are not related to this change.

They are also failing in the outerloop:

Going to log a bug.

@tannergooding
Copy link
Member Author

@tannergooding tannergooding merged commit 97efaab into dotnet:master Jan 12, 2018
@sandreenko
Copy link

This PR introduced desktop build failures for legacy jit, because new function emitIns_R_A calls emitHandleMemOp. emitHandleMemOp is defined only for non-legacy backend:

#ifndef LEGACY_BACKEND
    void emitHandleMemOp(GenTreeIndir* indir, instrDesc* id, insFormat fmt, instruction ins);
#endif // !LEGACY_BACKEND

@tannergooding @CarolEidt could you please fix that? What is the contract for FEATURE_HW_INTRINSICS? Should it be undefined for legacy jit?

@fiigii
Copy link

fiigii commented Jan 16, 2018

What is the contract for FEATURE_HW_INTRINSICS? Should it be undefined for legacy jit?

Yes.

@tannergooding
Copy link
Member Author

I'll have a PR up shortly.

@CarolEidt can confirm, but I don't belive HW_INTRINSICS should be defined for legacy JIT

@CarolEidt
Copy link

I don't belive HW_INTRINSICS should be defined for legacy JIT

Yes, that is the case. I don't believe we currently have anything to enable this, but I believe that in a process that uses a legacy altjit, none of the HW intrinsics should be enabled at all.

@tannergooding
Copy link
Member Author

@sandreenko, is there a good way to test the legacy JIT build locally (using CoreCLR) or do I need to pull down the TFS sources?

@sandreenko
Copy link

sandreenko commented Jan 16, 2018

Ok, so we can enable HW_INTRINSICS only for non-legacy jit. Right now, as I see from CMAKE files, we enable it forr default jit in src\jit\CMakeLists.txt and then remove this def in legacy sources, for example jit\armelnonjit\CMakeLists.txt remove_definitions(-DFEATURE_HW_INTRINSICS).

Or we can disable it for frankenjit in desktop sources.

@sandreenko
Copy link

@tannergooding We deleted frankenjit\frankenjit.nativeproj from CoreCLR, so I think you have to pull the TFS sources. But @BruceForstall may know a better way.

@BruceForstall
Copy link
Member

At this point, x86 LEGACY_BACKEND only builds on the TFS side.

We can probably stop doing that now (I think all the testing that depends on it has now been removed), but it's better to fix the break and I'll do the real LEGACY_BACKEND removal soon enough.

@tannergooding
Copy link
Member Author

I'll do the real LEGACY_BACKEND removal soon enough.

@BruceForstall, is that change just removing all the #ifdef LEGACY_BACKEND and #ifndef LEGACY_BACKEND code (on the CoreCLR side, I imagine there is a bit more work on the TFS side)?

Is it worth logging a bug to track that work?

@BruceForstall
Copy link
Member

@BruceForstall, is that change just removing all the #ifdef LEGACY_BACKEND and #ifndef LEGACY_BACKEND code (on the CoreCLR side, I imagine there is a bit more work on the TFS side)?

Essentially. There's also removing all testing infrastructure that depends on the builds. We're not ready to do this yet, though we're getting close -- we need to decide that we have no more need for this flavor, either as product or for internal use, such as performance/CQ/diffs comparison.

Is it worth logging a bug to track that work?

IMO, no, or, it doesn't matter to me. We're very well aware of it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants