Adding basic containment support to the x86 HWIntrinsics #15804

tannergooding · 2018-01-10T05:31:09Z

tannergooding · 2018-01-10T05:31:22Z

Also CC. @mikedn (you normally end up looking/commenting on stuff like this)

fiigii · 2018-01-10T05:47:35Z

Thanks for the work. This looks like a good example to help me to consider involving containment into the table-driven framework. Could you provide a bit JITDump to show the effect of this change?

tannergooding · 2018-01-10T08:35:09Z

Here is the full disasm/dumps

tannergooding · 2018-01-10T08:37:13Z

Basically, the delta is the transformation of:

48B9382804240A020000 mov      rcx, 0x20A24042838
488B09               mov      rcx, gword ptr [rcx]
C4E17D104108         vmovupd  ymm0, ymmword ptr[rcx+8]
48B9402804240A020000 mov      rcx, 0x20A24042840
488B09               mov      rcx, gword ptr [rcx]
C4E17D104908         vmovupd  ymm1, ymmword ptr[rcx+8]
C4E17C58F1           vaddps   ymm6, ymm0, ymm1

into

48B93828594593010000 mov      rcx, 0x19345592838
488B09               mov      rcx, gword ptr [rcx]
C4E17D104108         vmovupd  ymm0, ymmword ptr[rcx+8]
48B94028594593010000 mov      rcx, 0x19345592840
488B09               mov      rcx, gword ptr [rcx]
C4E17C587108         vaddps   ymm6, ymm0, ymmword ptr[rcx+8]

tannergooding · 2018-01-10T15:12:42Z

Cc. @dotnet/jit-contrib

tannergooding · 2018-01-11T19:00:45Z

@CarolEidt, are you the right person to review this?

Also, as a more general question: Who are the people we should be tagging on the HWIntrinsic PRs for review?

I don't believe there is an "area owners" document, like CoreFX has either, so it isn't always clear who should be tagged for any particular change to ensure that it does get reviewed when ready.

CarolEidt · 2018-01-11T19:31:33Z

@CarolEidt, are you the right person to review this?

Yes

Also, as a more general question: Who are the people we should be tagging on the HWIntrinsic PRs for review?

Me and @dotnet/jit-contrib, I think.

I don't believe there is an "area owners" document, like CoreFX has either, so it isn't always clear who should be tagged for any particular change to ensure that it does get reviewed when ready.

I believe that we have explicitly avoided this, partly because the fluidity of assignments and focus on the JIT team. Feel free to ping me, especially for HW intrinsic or register allocator related things that are languishing.

fiigii · 2018-01-11T19:44:37Z

I have no permission to ping @dotnet/jit-contrib, will always ping @CarolEidt 😄 Thank you so much for your help!

CarolEidt · 2018-01-11T20:58:41Z

@fiigii - I'm taking a look at this now, but don't plan to scrutinize the encodings part - if you could review that, it would be helpful.

tannergooding · 2018-01-11T21:01:52Z

src/jit/emitxarch.cpp

@@ -3798,6 +3800,82 @@ void emitter::emitIns_R_R_I(instruction ins, emitAttr attr, regNumber reg1, regN
    emitCurIGsize += sz;
 }

+void emitter::emitIns_R_A(


@CarolEidt, Just as a note. We already have several similar emitIns_R_A methods, but they are emitIns_R_AI, emitIns_R_AX, etc. They only differ based on which of baseReg, indxReg, scale, and offs are passed in.

I found it significantly easier to have a single method which takes and assigns all the args. This works because when we extract these from an Indir node, the unused ones are set to the correct value (reg = REG_NA, scale = 1, offs = 0)

Would it make sense to have the others call this, and/or phase them out?

Yes, I believe so. I'll log a bug.

https://github.com/dotnet/coreclr/issues/15828

fiigii · 2018-01-11T21:10:14Z

if you could review that, it would be helpful.

Sure, will do.

CarolEidt

Looks good overall (and thanks!), but I'm wondering about handling the IND(LEA(ADDR(lcl),...)) case

CarolEidt · 2018-01-11T21:19:45Z

src/jit/hwintrinsiccodegenxarch.cpp

+            {
+                case GT_LCL_VAR_ADDR:
+                {
+                    varNum = memBase->AsLclVarCommon()->GetLclNum();


If memBase is a LCL_VAR_ADDR, it still may be the case that there are other components to the addressing mode (i.e. if the child of the IND is an LEA, then it could have all of the various addressing mode components, unless you've excluded those cases from containment, which I don't think you have.

I had followed emitInsBinary as an example for the various emit paths that needed to be supported.

It looked like, the GT_LCL_VAR_ADDR just set the varNum and offset (https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3059) and then it just calls emitIns_R_S (https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3082).

I didn't think it was possible for GetLclNum to return BAD_VAR_NUM in this case, so the subsequent code path (which calls into emitIns_R_M) wasn't needed. Was that an incorrect assumption?

Thanks. I think you're right, and it is probably the case that, if we have anything more than a base, we would use GT_LCL_FLD_ADDR instead. I think it would be prudent to add assert(mem->Index() == nullptr && mem->Scale() == 1 && mem->Offset() == 0)

Ok, new commit should be validating that varNum and offset are correct for the general emitIns_R_S case and that baseReg, indxReg, scale, and offs are their default values for the GT_LCL_VAR_ADDR case.

I have tests that will validate various scenarios that should end up with contained nodes here: #15771

tannergooding · 2018-01-11T23:04:39Z

All current feedback has been addressed.

All tests are passing locally (#15771 has more tests that cover other containment scenarios, which are also passing locally, but is also part of a larger test refactoring for the hardware intrinsics and is getting reviewed independently)

fiigii · 2018-01-11T23:18:31Z

The encoding part (emitter) looks good to me. Let's run some tests to cover all the encoding cases (x86, x64, SSE, and VEX).

CarolEidt

LGTM - thanks!

tannergooding · 2018-01-11T23:45:37Z

Thanks! Will merge once tests complete (provided they all come back green).

tannergooding · 2018-01-12T01:58:06Z

src/jit/hwintrinsiccodegenxarch.cpp

@@ -110,7 +110,7 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins)
            assert(op2->isContained());

            GenTreeIndir* mem     = op2->AsIndir();
-            GenTree*      memBase = mem->Base();
+            GenTree*      memBase = mem->gtOp1;


@CarolEidt, discovered this when looking at fixing https://github.com/dotnet/coreclr/issues/15829.

emitInsBinary does this and they (->Base() vs ->gtOp1) actually do slightly different things.

tannergooding · 2018-01-12T02:33:29Z

Latest commit should resolve the issues I discovered (see #15804 (comment)).

…ecks

fiigii · 2018-01-12T16:56:57Z

Seems the CI gets stuck...

tannergooding · 2018-01-12T17:48:47Z

Ok. This should be good now. After working on fixing https://github.com/dotnet/coreclr/issues/15829 (see PR #15836), I discovered that:

memIndir->gtOp1 and memIndir->Base() do slightly different things
isUsedFromSpillTemp is expected to take precedence over everything else

tannergooding · 2018-01-12T18:16:27Z

test Windows_NT x64 Checked jitincompletehwintrinsic
test Windows_NT x64 Checked jitx86hwintrinsicnoavx
test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
test Windows_NT x64 Checked jitx86hwintrinsicnosimd
test Windows_NT x64 Checked jitnox86hwintrinsic

tannergooding · 2018-01-12T18:16:36Z

test Windows_NT x86 Checked jitincompletehwintrinsic
test Windows_NT x86 Checked jitx86hwintrinsicnoavx
test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
test Windows_NT x86 Checked jitx86hwintrinsicnosimd
test Windows_NT x86 Checked jitnox86hwintrinsic

fiigii · 2018-01-12T18:39:11Z

@tannergooding Are you going to merge this PR once CI gets green? Or waiting for #15836? I want change #15749 to match the containment.

tannergooding · 2018-01-12T18:43:58Z

@fiigii, I need @CarolEidt to review the new changes and re-approve first.

This isn't blocked by #15836 as that is a separate refactoring. I just found some cases I missed originally from it.

CarolEidt

LGTM

CarolEidt · 2018-01-12T19:23:32Z

src/jit/emitxarch.cpp

+    id->idIns(ins);
+    id->idReg1(reg1);
+
+    emitHandleMemOp(indir, id, fmt, ins);


Ah, good catch - I'd forgotten about that.

tannergooding · 2018-01-12T22:51:27Z

Ok. The x86_checked_windows_nt_jitx86hwintrinsicnoavx2 failures are not related to this change.

They are also failing in the outerloop:

Going to log a bug.

tannergooding · 2018-01-12T22:56:13Z

Logged: https://github.com/dotnet/coreclr/issues/15848

sandreenko · 2018-01-16T19:30:14Z

This PR introduced desktop build failures for legacy jit, because new function emitIns_R_A calls emitHandleMemOp. emitHandleMemOp is defined only for non-legacy backend:

#ifndef LEGACY_BACKEND
    void emitHandleMemOp(GenTreeIndir* indir, instrDesc* id, insFormat fmt, instruction ins);
#endif // !LEGACY_BACKEND

@tannergooding @CarolEidt could you please fix that? What is the contract for FEATURE_HW_INTRINSICS? Should it be undefined for legacy jit?

fiigii · 2018-01-16T19:33:10Z

What is the contract for FEATURE_HW_INTRINSICS? Should it be undefined for legacy jit?

Yes.

tannergooding · 2018-01-16T19:38:09Z

I'll have a PR up shortly.

@CarolEidt can confirm, but I don't belive HW_INTRINSICS should be defined for legacy JIT

CarolEidt · 2018-01-16T20:07:06Z

I don't belive HW_INTRINSICS should be defined for legacy JIT

Yes, that is the case. I don't believe we currently have anything to enable this, but I believe that in a process that uses a legacy altjit, none of the HW intrinsics should be enabled at all.

tannergooding · 2018-01-16T21:01:24Z

@sandreenko, is there a good way to test the legacy JIT build locally (using CoreCLR) or do I need to pull down the TFS sources?

sandreenko · 2018-01-16T21:03:08Z

Ok, so we can enable HW_INTRINSICS only for non-legacy jit. Right now, as I see from CMAKE files, we enable it forr default jit in src\jit\CMakeLists.txt and then remove this def in legacy sources, for example jit\armelnonjit\CMakeLists.txt remove_definitions(-DFEATURE_HW_INTRINSICS).

Or we can disable it for frankenjit in desktop sources.

sandreenko · 2018-01-16T21:05:25Z

@tannergooding We deleted frankenjit\frankenjit.nativeproj from CoreCLR, so I think you have to pull the TFS sources. But @BruceForstall may know a better way.

BruceForstall · 2018-01-16T21:20:06Z

At this point, x86 LEGACY_BACKEND only builds on the TFS side.

We can probably stop doing that now (I think all the testing that depends on it has now been removed), but it's better to fix the break and I'll do the real LEGACY_BACKEND removal soon enough.

tannergooding · 2018-01-17T16:12:08Z

I'll do the real LEGACY_BACKEND removal soon enough.

@BruceForstall, is that change just removing all the #ifdef LEGACY_BACKEND and #ifndef LEGACY_BACKEND code (on the CoreCLR side, I imagine there is a bit more work on the TFS side)?

Is it worth logging a bug to track that work?

BruceForstall · 2018-01-17T18:09:53Z

@BruceForstall, is that change just removing all the #ifdef LEGACY_BACKEND and #ifndef LEGACY_BACKEND code (on the CoreCLR side, I imagine there is a bit more work on the TFS side)?

Essentially. There's also removing all testing infrastructure that depends on the builds. We're not ready to do this yet, though we're getting close -- we need to decide that we have no more need for this flavor, either as product or for internal use, such as performance/CQ/diffs comparison.

Is it worth logging a bug to track that work?

IMO, no, or, it doesn't matter to me. We're very well aware of it.

jkotas added the area-CodeGen label Jan 10, 2018

tannergooding mentioned this pull request Jan 10, 2018

Table-driven Intel hardware intrinsic #15749

Merged

tannergooding commented Jan 11, 2018

View reviewed changes

CarolEidt reviewed Jan 11, 2018

View reviewed changes

CarolEidt approved these changes Jan 11, 2018

View reviewed changes

tannergooding commented Jan 12, 2018

View reviewed changes

tannergooding added 2 commits January 12, 2018 08:44

Adding basic containment support to the x86 HWIntrinsics

8225ec3

Adding asserts that values are as expected for certain containment ch…

6804e99

…ecks

Fixing the hwintrin codgen containment checks

592aa82

tannergooding mentioned this pull request Jan 12, 2018

Enable CORINFO_INTRINSIC Round, Ceiling, and Floor to generate ROUNDSS and ROUNDSD #14736

Merged

CarolEidt approved these changes Jan 12, 2018

View reviewed changes

src/jit/emitxarch.cpp

id->idIns(ins);

id->idReg1(reg1);

emitHandleMemOp(indir, id, fmt, ins);

Copy link

CarolEidt Jan 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch - I'd forgotten about that.

tannergooding merged commit 97efaab into dotnet:master Jan 12, 2018

tannergooding mentioned this pull request Jan 16, 2018

Mark emitIns_R_A and emitIns_R_R_A to be not defined for legacy backend #15880

Merged

Adding basic containment support to the x86 HWIntrinsics #15804

Adding basic containment support to the x86 HWIntrinsics #15804

Conversation

tannergooding commented Jan 10, 2018 • edited Loading

tannergooding commented Jan 10, 2018

fiigii commented Jan 10, 2018 • edited Loading

tannergooding commented Jan 10, 2018

tannergooding commented Jan 10, 2018

tannergooding commented Jan 10, 2018

tannergooding commented Jan 11, 2018

CarolEidt commented Jan 11, 2018

fiigii commented Jan 11, 2018

CarolEidt commented Jan 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fiigii commented Jan 11, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding Jan 11, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jan 11, 2018

fiigii commented Jan 11, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

tannergooding commented Jan 11, 2018

tannergooding Jan 12, 2018 • edited Loading

Choose a reason for hiding this comment

tannergooding commented Jan 12, 2018

fiigii commented Jan 12, 2018

tannergooding commented Jan 12, 2018

tannergooding commented Jan 12, 2018

tannergooding commented Jan 12, 2018

fiigii commented Jan 12, 2018

tannergooding commented Jan 12, 2018

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tannergooding commented Jan 12, 2018

tannergooding commented Jan 12, 2018

sandreenko commented Jan 16, 2018

fiigii commented Jan 16, 2018

tannergooding commented Jan 16, 2018

CarolEidt commented Jan 16, 2018

tannergooding commented Jan 16, 2018

sandreenko commented Jan 16, 2018 • edited Loading

sandreenko commented Jan 16, 2018

BruceForstall commented Jan 16, 2018

tannergooding commented Jan 17, 2018

BruceForstall commented Jan 17, 2018

tannergooding commented Jan 10, 2018 •

edited

Loading

fiigii commented Jan 10, 2018 •

edited

Loading

tannergooding Jan 11, 2018 •

edited

Loading

tannergooding Jan 12, 2018 •

edited

Loading

sandreenko commented Jan 16, 2018 •

edited

Loading