Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258

Merged
merged 9 commits into from
Mar 7, 2024

Conversation

amanasifkhalid
Copy link
Member

Based on discussion in #99049, remove INS_SCALABLE_OPTS_SHIFT, and instead encode immediates to their shifted format when outputting/displaying instructions. This should marginally improve TP for the relevant encodings.

cstool output for affected encodings:

mov   z0.b, #-0x80
mov   z1.h, #0
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.d, #0x100
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
mov   z0.b, #0
mov   z1.h, #0
mov   z2.s, #0
mov   z3.d, #0
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #5
sqsub z2.s, z2.s, #0x80
sub   z3.d, z3.d, #0xFF
subr  z4.d, z4.d, #0x100
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00

JitDisasm:

mov     z0.b, #-128
mov     z1.h, #0
mov     z2.s, #5
mov     z3.d, #127
mov     z4.d, #1, LSL #8
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #5
sqsub   z2.s, z2.s, #128
sub     z3.d, z3.d, #255
subr    z4.d, z4.d, #1, LSL #8
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8

This will conflict with #99049, so let's wait for that to be merged first.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 4, 2024
@ghost ghost assigned amanasifkhalid Mar 4, 2024
@ghost
Copy link

ghost commented Mar 4, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Based on discussion in #99049, remove INS_SCALABLE_OPTS_SHIFT, and instead encode immediates to their shifted format when outputting/displaying instructions. This should marginally improve TP for the relevant encodings.

cstool output for affected encodings:

mov   z0.b, #-0x80
mov   z1.h, #0
mov   z2.s, #5
mov   z3.d, #0x7F
mov   z4.d, #0x100
mov   z5.h, #-0x8000
mov   z6.s, #0x500
mov   z7.d, #0x7F00
mov   z0.b, #0
mov   z1.h, #0
mov   z2.s, #0
mov   z3.d, #0
add   z0.b, z0.b, #0
sqadd z1.h, z1.h, #5
sqsub z2.s, z2.s, #0x80
sub   z3.d, z3.d, #0xFF
subr  z4.d, z4.d, #0x100
uqadd z5.h, z5.h, #0x500
uqsub z6.s, z6.s, #0xFF00

JitDisasm:

mov     z0.b, #-128
mov     z1.h, #0
mov     z2.s, #5
mov     z3.d, #127
mov     z4.d, #1, LSL #8
mov     z5.h, #-128, LSL #8
mov     z6.s, #5, LSL #8
mov     z7.d, #127, LSL #8
mov     z0.b, #0
mov     z1.h, #0
mov     z2.s, #0
mov     z3.d, #0
add     z0.b, z0.b, #0
sqadd   z1.h, z1.h, #5
sqsub   z2.s, z2.s, #128
sub     z3.d, z3.d, #255
subr    z4.d, z4.d, #1, LSL #8
uqadd   z5.h, z5.h, #5, LSL #8
uqsub   z6.s, z6.s, #255, LSL #8

This will conflict with #99049, so let's wait for that to be merged first.

Author: amanasifkhalid
Assignees: amanasifkhalid
Labels:

area-CodeGen-coreclr

Milestone: -

@amanasifkhalid
Copy link
Member Author

cc @dotnet/arm64-contrib

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Mar 5, 2024
// Size specifier must be able to fit a left-shifted immediate
assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii
assert(insOptsScalableAtLeastHalf(id->idInsOpt()));
}
Copy link
Member

@kunalspathak kunalspathak Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add an assert in the else clause about id->idInsOpt() == NONE / insScalableOptsNone or whatever relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we encode imm in emitIns, this logic ends up going away.


if (!isValidSimm8(imm))
{
assert(isValidSimm8_MultipleOf256(imm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have done enough validation until we come here, so don't think the assert here adds any value. We do not usually assert in encoding portion of the code.

// MOV <Zd>.<T>, #<imm>{, <shift>}
case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
{
imm = emitGetInsSC(id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving into a helper?
imm = ExtractSimm8_MultipleOf256()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we encode imm in emitIns, this logic goes away.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good we don't have to check here if it was small or large constant. The emitGetInsSC() does it for us.

id = emitNewInstrCns(attr, imm);
id->idOptionalShift(hasShift);
}
instrDesc* id = emitNewInstrSC(attr, imm);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the imm isn't being converted before calling emitNewInstrSC(). The larger MultipleOf256 immediates are not going to fit into a normal instr?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think I noticed that too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's probably not ideal... Here's my new approach:

  • For encodings without any shifting, just call emitNewInstrSC.
  • For unshifted immediates, call emitNewInstrSC. If it allocates a small descriptor, idHasShift will always return false. Else, idHasShift still returns false, as we set the dedicated bit in large descriptors to false after creating the descriptor.
  • For immediates that need a shift, call emitNewInstrCns so a normal or large descriptor is used. idHasShift will always check the dedicated bit, as it is always available. We set this bit to true after creating the descriptor.

This means we can use small descriptors more often than we could with the prior approach of always calling emitNewInstrCns for encodings that might have shifts. The downside to this approach is idHasShift has to also check if the descriptor is small... If we want to prioritize TP over memory usage, then the previous approach of not converting imm before saving it to the descriptor might be better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prioritize TP over memory usage

Yes, we need to be clear what we're optimising for, and that'll dictate when we should be encoding/decoding. I'll leave it to others to decide which approach we should be going for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to prioritize TP over memory usage

For SVE encodings, which I expect to be very rare, I don't think we need to worry about either TP or memory. I would prioritize understandable code first, TP next.

I would be very highly be concerned with TP/memory impact on non-SVE encodings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous approach of allocating large descriptors for shifted values sacrificed memory usage for SVE encodings in favor of improving TP for non-SVE encodings by removing a branch or two in the affected emitIns_* methods. The diffs showed that only improved TP by <0.1% though, so I'm not too attached to either approach.

I would prioritize understandable code first, TP next.

I think the previous approach is easier to understand since there's less state to track, though that approach requires us to encode the immediate (i.e. right-shift it by 8) each time we print it, encode it in an instruction, etc.

@amanasifkhalid
Copy link
Member Author

FYI, widespread CI failures are #99320.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. The TP regressed and we should gain it back when we move the sve code from emitIns* methods to emitIns_Sve* method.

image

// Size specifier must be able to fit a left-shifted immediate
assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii
assert(insOptsScalableAtLeastHalf(opt));
hasShift = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for B element, essentially the sh bit is ignored, which makes sense, but it is not clearly stated in the docs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I don't see it clearly stated either, though I do see this line in the pseudocode description:
if size:sh == '001' then UNDEFINED;
Which means the behavior is undefined if the size specifier is B, but the sh bit is 1. I think these asserts reflect that.

id = emitNewInstrCns(attr, imm);
id->idOptionalShift(hasShift);
}
instrDesc* id = emitNewInstrSC(attr, imm);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am ok with this approach.

imm = emitGetInsSC(id);
code = emitInsCodeSve(ins, fmt);
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd
code |= insEncodeElemsize(optGetSveElemsize(id->idInsOpt())); // xx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't IF_SVE_EB_1A and IF_SVE_EC_1A cases same?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I must've moved this when refactoring and forgot to put it back; I'll fix this

// MOV <Zd>.<T>, #<imm>{, <shift>}
case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated)
{
imm = emitGetInsSC(id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good we don't have to check here if it was small or large constant. The emitGetInsSC() does it for us.

@amanasifkhalid
Copy link
Member Author

@BruceForstall are you ok with me merging this as-is? This approach has slight TP regressions; the previous approach slightly improves TP for the general case.

@BruceForstall
Copy link
Member

@BruceForstall are you ok with me merging this as-is? This approach has slight TP regressions; the previous approach slightly improves TP for the general case.

Yes. I leave it to you to make the right trade-off here.

@amanasifkhalid
Copy link
Member Author

Yes. I leave it to you to make the right trade-off here.

Got it. I think we should plan to move this SVE-specific logic to a dedicated emitIns_Sve method once the encodings are finished, so this new logic doesn't pollute existing encoding logic. That cleanup should get rid of the above TP regressions in the general case, and we'll still use the smallest instruction descriptors possible in the SVE case.

@amanasifkhalid amanasifkhalid merged commit ddd465b into dotnet:main Mar 7, 2024
129 checks passed
@amanasifkhalid amanasifkhalid deleted the scalable-shift branch March 7, 2024 17:15
@github-actions github-actions bot locked and limited conversation to collaborators Apr 7, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants