JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

amanasifkhalid · 2024-03-04T19:33:09Z

Based on discussion in #99049, remove INS_SCALABLE_OPTS_SHIFT, and instead encode immediates to their shifted format when outputting/displaying instructions. This should marginally improve TP for the relevant encodings.

cstool output for affected encodings:

mov   z0.b, #-0x80
mov   z1.h, #0
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.d, #0x100
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
mov   z0.b, #0
mov   z1.h, #0
mov   z2.s, #0
mov   z3.d, #0
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #5
sqsub z2.s, z2.s, #0x80
sub   z3.d, z3.d, #0xFF
subr  z4.d, z4.d, #0x100
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00

JitDisasm:

mov     z0.b, #-128
mov     z1.h, #0
mov     z2.s, #5
mov     z3.d, #127
mov     z4.d, #1, LSL #8
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #5
sqsub   z2.s, z2.s, #128
sub     z3.d, z3.d, #255
subr    z4.d, z4.d, #1, LSL #8
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8

This will conflict with #99049, so let's wait for that to be merged first.

ghost · 2024-03-04T19:33:18Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Based on discussion in #99049, remove INS_SCALABLE_OPTS_SHIFT, and instead encode immediates to their shifted format when outputting/displaying instructions. This should marginally improve TP for the relevant encodings.

cstool output for affected encodings:

mov   z0.b, #-0x80
mov   z1.h, #0
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.d, #0x100
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
mov   z0.b, #0
mov   z1.h, #0
mov   z2.s, #0
mov   z3.d, #0
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #5
sqsub z2.s, z2.s, #0x80
sub   z3.d, z3.d, #0xFF
subr  z4.d, z4.d, #0x100
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00

JitDisasm:

mov     z0.b, #-128
mov     z1.h, #0
mov     z2.s, #5
mov     z3.d, #127
mov     z4.d, #1, LSL #8
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #5
sqsub   z2.s, z2.s, #128
sub     z3.d, z3.d, #255
subr    z4.d, z4.d, #1, LSL #8
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8

This will conflict with #99049, so let's wait for that to be merged first.

Author:	amanasifkhalid
Assignees:	amanasifkhalid
Labels:	`area-CodeGen-coreclr`
Milestone:	-

amanasifkhalid · 2024-03-05T02:35:56Z

cc @dotnet/arm64-contrib

kunalspathak · 2024-03-05T05:25:57Z

src/coreclr/jit/emitarm64.cpp

+                // Size specifier must be able to fit a left-shifted immediate
+                assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii
+                assert(insOptsScalableAtLeastHalf(id->idInsOpt()));
+            }


can you add an assert in the else clause about id->idInsOpt() == NONE / insScalableOptsNone or whatever relevant?

If we encode imm in emitIns, this logic ends up going away.

kunalspathak · 2024-03-05T05:28:18Z

src/coreclr/jit/emitarm64.cpp

+
+            if (!isValidSimm8(imm))
+            {
+                assert(isValidSimm8_MultipleOf256(imm));


we have done enough validation until we come here, so don't think the assert here adds any value. We do not usually assert in encoding portion of the code.

a74nh · 2024-03-05T09:37:58Z

src/coreclr/jit/emitarm64.cpp

+        // MOV <Zd>.<T>, #<imm>{, <shift>}
+        case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
+        {
+            imm                 = emitGetInsSC(id);


How about moving into a helper?
imm = ExtractSimm8_MultipleOf256()

If we encode imm in emitIns, this logic goes away.

good we don't have to check here if it was small or large constant. The emitGetInsSC() does it for us.

a74nh · 2024-03-05T09:48:43Z

src/coreclr/jit/emitarm64.cpp

-        id = emitNewInstrCns(attr, imm);
-        id->idOptionalShift(hasShift);
-    }
+    instrDesc* id = emitNewInstrSC(attr, imm);


Looks like the imm isn't being converted before calling emitNewInstrSC(). The larger MultipleOf256 immediates are not going to fit into a normal instr?

yes, i think I noticed that too.

Yeah, that's probably not ideal... Here's my new approach:

For encodings without any shifting, just call emitNewInstrSC.

For unshifted immediates, call emitNewInstrSC. If it allocates a small descriptor, idHasShift will always return false. Else, idHasShift still returns false, as we set the dedicated bit in large descriptors to false after creating the descriptor.

For immediates that need a shift, call emitNewInstrCns so a normal or large descriptor is used. idHasShift will always check the dedicated bit, as it is always available. We set this bit to true after creating the descriptor.

This means we can use small descriptors more often than we could with the prior approach of always calling emitNewInstrCns for encodings that might have shifts. The downside to this approach is idHasShift has to also check if the descriptor is small... If we want to prioritize TP over memory usage, then the previous approach of not converting imm before saving it to the descriptor might be better.

prioritize TP over memory usage

Yes, we need to be clear what we're optimising for, and that'll dictate when we should be encoding/decoding. I'll leave it to others to decide which approach we should be going for.

I am ok with this approach.

If we want to prioritize TP over memory usage

For SVE encodings, which I expect to be very rare, I don't think we need to worry about either TP or memory. I would prioritize understandable code first, TP next.

I would be very highly be concerned with TP/memory impact on non-SVE encodings.

The previous approach of allocating large descriptors for shifted values sacrificed memory usage for SVE encodings in favor of improving TP for non-SVE encodings by removing a branch or two in the affected emitIns_* methods. The diffs showed that only improved TP by <0.1% though, so I'm not too attached to either approach.

I would prioritize understandable code first, TP next.

I think the previous approach is easier to understand since there's less state to track, though that approach requires us to encode the immediate (i.e. right-shift it by 8) each time we print it, encode it in an instruction, etc.

amanasifkhalid · 2024-03-05T18:34:13Z

FYI, widespread CI failures are #99320.

kunalspathak

Changes LGTM. The TP regressed and we should gain it back when we move the sve code from emitIns* methods to emitIns_Sve* method.

kunalspathak · 2024-03-06T14:43:50Z

src/coreclr/jit/emitarm64.cpp

+                // Size specifier must be able to fit a left-shifted immediate
+                assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii
+                assert(insOptsScalableAtLeastHalf(opt));
+                hasShift = true;


So for B element, essentially the sh bit is ignored, which makes sense, but it is not clearly stated in the docs.

Yeah, I don't see it clearly stated either, though I do see this line in the pseudocode description:
if size:sh == '001' then UNDEFINED;
Which means the behavior is undefined if the size specifier is B, but the sh bit is 1. I think these asserts reflect that.

kunalspathak · 2024-03-06T14:47:59Z

src/coreclr/jit/emitarm64.cpp

-        id = emitNewInstrCns(attr, imm);
-        id->idOptionalShift(hasShift);
-    }
+    instrDesc* id = emitNewInstrSC(attr, imm);


I am ok with this approach.

kunalspathak · 2024-03-06T14:49:38Z

src/coreclr/jit/emitarm64.cpp

            imm  = emitGetInsSC(id);
            code = emitInsCodeSve(ins, fmt);
            code |= insEncodeReg_V_4_to_0(id->idReg1());                  // ddddd
+            code |= insEncodeElemsize(optGetSveElemsize(id->idInsOpt())); // xx


Isn't IF_SVE_EB_1A and IF_SVE_EC_1A cases same?

Yes, I must've moved this when refactoring and forgot to put it back; I'll fix this

kunalspathak · 2024-03-06T14:53:08Z

src/coreclr/jit/emitarm64.cpp

+        // MOV <Zd>.<T>, #<imm>{, <shift>}
+        case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
+        {
+            imm                 = emitGetInsSC(id);


good we don't have to check here if it was small or large constant. The emitGetInsSC() does it for us.

This reverts commit d248988.

amanasifkhalid · 2024-03-07T16:37:41Z

@BruceForstall are you ok with me merging this as-is? This approach has slight TP regressions; the previous approach slightly improves TP for the general case.

BruceForstall · 2024-03-07T17:07:49Z

@BruceForstall are you ok with me merging this as-is? This approach has slight TP regressions; the previous approach slightly improves TP for the general case.

Yes. I leave it to you to make the right trade-off here.

amanasifkhalid · 2024-03-07T17:14:52Z

Yes. I leave it to you to make the right trade-off here.

Got it. I think we should plan to move this SVE-specific logic to a dedicated emitIns_Sve method once the encodings are finished, so this new logic doesn't pollute existing encoding logic. That cleanup should get rid of the above TP regressions in the general case, and we'll still use the smallest instruction descriptors possible in the SVE case.

Remove INS_SCALABLE_OPTS_SHIFT

f15d19e

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 4, 2024

ghost assigned amanasifkhalid Mar 4, 2024

build-analysis bot mentioned this pull request Mar 4, 2024

Tracking issue for CI build timeouts #76454

Closed

Merge from main

a9c9815

TIHan approved these changes Mar 5, 2024

View reviewed changes

kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Mar 5, 2024

kunalspathak reviewed Mar 5, 2024

View reviewed changes

build-analysis bot mentioned this pull request Mar 5, 2024

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

3 tasks

a74nh reviewed Mar 5, 2024

View reviewed changes

amanasifkhalid added 3 commits March 5, 2024 12:43

Convert imm in emitIns

ce4dc70

Remove sopt arg from emitIns_R_I

4bb3e35

Style

02ddf43

Merge branch 'main' into scalable-shift

131ec68

kunalspathak approved these changes Mar 6, 2024

View reviewed changes

amanasifkhalid added 3 commits March 6, 2024 10:44

Fix code dup

d248988

Revert

fd039c0

This reverts commit d248988.

Fix

462447c

amanasifkhalid merged commit ddd465b into dotnet:main Mar 7, 2024
129 checks passed

amanasifkhalid deleted the scalable-shift branch March 7, 2024 17:15

github-actions bot locked and limited conversation to collaborators Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

amanasifkhalid commented Mar 4, 2024

ghost commented Mar 4, 2024

amanasifkhalid commented Mar 5, 2024

kunalspathak Mar 5, 2024 •

edited

Loading

amanasifkhalid Mar 5, 2024

kunalspathak Mar 5, 2024

a74nh Mar 5, 2024

amanasifkhalid Mar 5, 2024

kunalspathak Mar 6, 2024

a74nh Mar 5, 2024

kunalspathak Mar 5, 2024

amanasifkhalid Mar 5, 2024

a74nh Mar 6, 2024

kunalspathak Mar 6, 2024

BruceForstall Mar 6, 2024

amanasifkhalid Mar 6, 2024

amanasifkhalid commented Mar 5, 2024

kunalspathak left a comment

kunalspathak Mar 6, 2024

amanasifkhalid Mar 6, 2024

kunalspathak Mar 6, 2024

kunalspathak Mar 6, 2024

amanasifkhalid Mar 6, 2024

kunalspathak Mar 6, 2024

amanasifkhalid commented Mar 7, 2024

BruceForstall commented Mar 7, 2024

amanasifkhalid commented Mar 7, 2024

JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

Conversation

amanasifkhalid commented Mar 4, 2024

ghost commented Mar 4, 2024

amanasifkhalid commented Mar 5, 2024

kunalspathak Mar 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented Mar 5, 2024

kunalspathak left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amanasifkhalid commented Mar 7, 2024

BruceForstall commented Mar 7, 2024

amanasifkhalid commented Mar 7, 2024

kunalspathak Mar 5, 2024 •

edited

Loading