-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64-SVE: Remove INS_SCALABLE_OPTS_SHIFT #99258
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsBased on discussion in #99049, remove cstool output for affected encodings:
JitDisasm:
This will conflict with #99049, so let's wait for that to be merged first.
|
cc @dotnet/arm64-contrib |
src/coreclr/jit/emitarm64.cpp
Outdated
// Size specifier must be able to fit a left-shifted immediate | ||
assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii | ||
assert(insOptsScalableAtLeastHalf(id->idInsOpt())); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an assert in the else
clause about id->idInsOpt() == NONE
/ insScalableOptsNone
or whatever relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we encode imm
in emitIns
, this logic ends up going away.
src/coreclr/jit/emitarm64.cpp
Outdated
|
||
if (!isValidSimm8(imm)) | ||
{ | ||
assert(isValidSimm8_MultipleOf256(imm)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have done enough validation until we come here, so don't think the assert here adds any value. We do not usually assert in encoding portion of the code.
// MOV <Zd>.<T>, #<imm>{, <shift>} | ||
case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated) | ||
{ | ||
imm = emitGetInsSC(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving into a helper?
imm = ExtractSimm8_MultipleOf256()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we encode imm
in emitIns
, this logic goes away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good we don't have to check here if it was small or large constant. The emitGetInsSC()
does it for us.
id = emitNewInstrCns(attr, imm); | ||
id->idOptionalShift(hasShift); | ||
} | ||
instrDesc* id = emitNewInstrSC(attr, imm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the imm
isn't being converted before calling emitNewInstrSC()
. The larger MultipleOf256 immediates are not going to fit into a normal instr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, i think I noticed that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's probably not ideal... Here's my new approach:
- For encodings without any shifting, just call
emitNewInstrSC
. - For unshifted immediates, call
emitNewInstrSC
. If it allocates a small descriptor,idHasShift
will always return false. Else,idHasShift
still returns false, as we set the dedicated bit in large descriptors to false after creating the descriptor. - For immediates that need a shift, call
emitNewInstrCns
so a normal or large descriptor is used.idHasShift
will always check the dedicated bit, as it is always available. We set this bit to true after creating the descriptor.
This means we can use small descriptors more often than we could with the prior approach of always calling emitNewInstrCns
for encodings that might have shifts. The downside to this approach is idHasShift
has to also check if the descriptor is small... If we want to prioritize TP over memory usage, then the previous approach of not converting imm
before saving it to the descriptor might be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prioritize TP over memory usage
Yes, we need to be clear what we're optimising for, and that'll dictate when we should be encoding/decoding. I'll leave it to others to decide which approach we should be going for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to prioritize TP over memory usage
For SVE encodings, which I expect to be very rare, I don't think we need to worry about either TP or memory. I would prioritize understandable code first, TP next.
I would be very highly be concerned with TP/memory impact on non-SVE encodings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous approach of allocating large descriptors for shifted values sacrificed memory usage for SVE encodings in favor of improving TP for non-SVE encodings by removing a branch or two in the affected emitIns_*
methods. The diffs showed that only improved TP by <0.1% though, so I'm not too attached to either approach.
I would prioritize understandable code first, TP next.
I think the previous approach is easier to understand since there's less state to track, though that approach requires us to encode the immediate (i.e. right-shift it by 8) each time we print it, encode it in an instruction, etc.
FYI, widespread CI failures are #99320. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Size specifier must be able to fit a left-shifted immediate | ||
assert(isValidSimm8_MultipleOf256(imm)); // iiiiiiii | ||
assert(insOptsScalableAtLeastHalf(opt)); | ||
hasShift = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for B
element, essentially the sh
bit is ignored, which makes sense, but it is not clearly stated in the docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't see it clearly stated either, though I do see this line in the pseudocode description:
if size:sh == '001' then UNDEFINED;
Which means the behavior is undefined if the size specifier is B
, but the sh
bit is 1. I think these asserts reflect that.
id = emitNewInstrCns(attr, imm); | ||
id->idOptionalShift(hasShift); | ||
} | ||
instrDesc* id = emitNewInstrSC(attr, imm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this approach.
imm = emitGetInsSC(id); | ||
code = emitInsCodeSve(ins, fmt); | ||
code |= insEncodeReg_V_4_to_0(id->idReg1()); // ddddd | ||
code |= insEncodeElemsize(optGetSveElemsize(id->idInsOpt())); // xx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't IF_SVE_EB_1A
and IF_SVE_EC_1A
cases same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I must've moved this when refactoring and forgot to put it back; I'll fix this
// MOV <Zd>.<T>, #<imm>{, <shift>} | ||
case IF_SVE_EB_1A: // ........xx...... ..hiiiiiiiiddddd -- SVE broadcast integer immediate (unpredicated) | ||
{ | ||
imm = emitGetInsSC(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good we don't have to check here if it was small or large constant. The emitGetInsSC()
does it for us.
@BruceForstall are you ok with me merging this as-is? This approach has slight TP regressions; the previous approach slightly improves TP for the general case. |
Yes. I leave it to you to make the right trade-off here. |
Got it. I think we should plan to move this SVE-specific logic to a dedicated |
Based on discussion in #99049, remove
INS_SCALABLE_OPTS_SHIFT
, and instead encode immediates to their shifted format when outputting/displaying instructions. This should marginally improve TP for the relevant encodings.cstool output for affected encodings:
JitDisasm:
This will conflict with #99049, so let's wait for that to be merged first.