-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT ARM64/SVE2: Add increment/decrement from predicate register instructions #94811
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsThese changes are modeled on Kunal's work in #94529; once that work is merged in, I'll rebase these changes on top of it. I've implemented the following instructions:
|
src/coreclr/jit/emitarm64.cpp
Outdated
case IF_SVE_DN_2A: // ........xx...... .......MMMMddddd -- SVE inc/dec vector by predicate count | ||
case IF_SVE_DP_2A: // ........xx...... .......MMMMddddd -- SVE saturating inc/dec vector by predicate count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these forms be merged?
(Or should we wait to decide on merging until after many/all instructions have been implemented?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with either doing it now or waiting until after we've implemented all of them. @kunalspathak @TIHan do you have any preference for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
waiting until after we've implemented all of them
Will be good to wait so we can track the coverage of them as listed in #94549. We can then have one pass to combine all the duplicate ones.
src/coreclr/jit/emitarm64.cpp
Outdated
{ | ||
// TODO-SVE: Fix once we add predicate registers | ||
assert(isPredicateRegister(reg)); | ||
printf(emitVectorRegName(reg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like we need a emitPredicateRegName
than handles the "fake" predicate registers but still outputs p0
through p15
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kunal has a temporary implementation for this in #95016. As an aside, there seems to be potential for circular dependencies between these two PRs. Should I wait for Kunal's to be merged in first? I can then update my implementation with the new methods #95016 adds in (like emitPredicateRegName
) and also update the predicate register TODOs to use the aliases you suggested above.
src/coreclr/jit/emitarm64.h
Outdated
inline static bool isPredicateRegister(regNumber reg) | ||
{ | ||
// TODO-SVE: Fix once we add predicate registers | ||
return (reg >= REG_FP_FIRST && reg <= REG_FP_LAST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use the new REG_P0 to REG_P15. And/or define REG_P_FIRST
and REG_P_LAST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done in #95016
@amanasifkhalid - I will update #95016 to add the following:
Once #95016 along with these changes are merged, it will become easier to base the changes on top of it. We still won't be able to merge it because of encoding validation. |
I merged in Kunal's changes from #95016 and fixed the unit test additions. I'm waiting for #95105 to be merged in before using the updated machine-generated files, since the new
Would it make sense for me to open more (draft) PRs like this one if we will have to wait to merge them? I guess this is a good way to find issues with the tool-generated code. |
@amanasifkhalid - I have shared few steps offline on validating the encoding, see if you can check them. Please hold off on doing more draft PRs as I am updating the tool to fix few things that came up in Alan's #95127 . You might want to consume the new files. |
@kunalspathak thank you for sharing the steps! The jitdump of the new instructions prints the predicate registers as vector registers for now:
I think I'll have to wait to merge changes in from #95127 to fix this (along with the
Here's the hex I'm passing to cstool.exe (comments mine):
|
@dotnet/arm64-contrib, sorry for the delay; please take a look. Here's the JIT disasm output:
and the cstool output:
|
Can you update the comment at the top? (If nothing else, the "once that work is merged in, I'll rebase these changes on top of it" comment is obsolete) |
cc @a74nh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few small things
src/coreclr/jit/codegenarm64test.cpp
Outdated
@@ -5098,6 +5098,106 @@ void CodeGen::genArm64EmitterUnitTestsSve() | |||
theEmitter->emitIns_R_R_I(INS_sve_uqrshrn, EA_SCALABLE, REG_V15, REG_V12, 1, | |||
INS_OPTS_SCALABLE_H); // UQRSHRN <Zd>.H, {<Zn1>.S-<Zn2>.S }, #<const> | |||
|
|||
// IF_SVE_DM_2A | |||
theEmitter->emitIns_R_R(INS_sve_decp, EA_8BYTE, REG_R0, REG_P0, INS_OPTS_SCALABLE_B); // DECP <Xdn>, <Pm>.<T> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with instructions elsewhere, INS_OPTS_SCALABLE_B should be INS_OPTS_SCALABLE_B_WITH_SCALAR. Same for anything else where EA_ is a value instead of EA_SCALABLE.
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -1259,6 +1276,8 @@ bool emitter::emitInsMayWriteToGCReg(instrDesc* id) | |||
case IF_DV_2B: // DV_2B .Q.........iiiii ......nnnnnddddd Rd Vn[] (umov - to general) | |||
case IF_DV_2H: // DV_2H X........X...... ......nnnnnddddd Rd Vn (fmov - to general) | |||
|
|||
case IF_SVE_DO_2A: // ........xx...... .....X.MMMMddddd -- SVE saturating inc/dec register by predicate count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why might these two groups write to GC Reg? (It's possible I've been missing some from my PRs)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, didn't mean to push that in the first place. I don't think we should worry about touching emitInsMayWriteToGCReg
for now. Once we start emitting these SVE instructions for actual codegen, we might need to come back to this method.
src/coreclr/jit/emitarm64.cpp
Outdated
// <Xdn>, <Pm>.<T> | ||
case IF_SVE_DM_2A: // ........xx...... .......MMMMddddd -- SVE inc/dec register by predicate count | ||
emitDispReg(id->idReg1(), id->idOpSize(), true); // ddddd | ||
emitDispPredicateReg(id->idReg2(), PREDICATE_NONE, false); // MMMM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once #95918 goes in then this should be
emitDispPredicateReg(id->idReg2(), PREDICATE_SIZED, false);
and then don't call emitDispArrangement()
(This code can be fixed up later if it goes in before 95918)
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -15364,7 +15452,7 @@ void emitter::emitDispLowPredicateReg(regNumber reg, PredicateType ptype, bool a | |||
//------------------------------------------------------------------------ | |||
// emitDispArrangement: Display a SIMD vector arrangement suffix | |||
// | |||
void emitter::emitDispArrangement(insOpts opt) | |||
void emitter::emitDispArrangement(insOpts opt, bool addComma /* = false */) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any use of calling emitDispArrangement()
with addComma
set to true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. (But still can be removed if end up using PREDICATE_SIZED)
Thank you for the review @a74nh! I've applied your feedback, and the unit test outputs remain the same. JitDisasm:
cstool:
|
code = emitInsCodeSve(ins, fmt); | ||
code |= insEncodeReg_R_4_to_0(id->idReg1()); // ddddd | ||
code |= insEncodeReg_P_8_to_5(id->idReg2()); // MMMM | ||
code |= insEncodeVLSElemsize(id->idOpSize()); // X |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
technically, you could combine this with IF_SVE_DM_2A
, but lets do it in cleanup phase.
case IF_SVE_DO_2A:
code |= insEncodeVLSElemsize(id->idOpSize()); // X
FALLTHROUGH;
case IF_SVE_DM_2A:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Merged changes from #95996. |
This implements the following instructions:
incp/decp <Xdn>, <Pm>.<T>
: increment/decrement general register by predicate countincp/decp <Zdn>.<T>, <Pm>.<T>
: increment/decrement vector register by predicate countsqdecp/sqincp <Xdn>, <Pm>.<T>, <Wdn>
: signed saturating increment/decrement general register by predicate countsqdecp/sqincp <Zdn>.<T>, <Pm>.<T>
: signed saturating increment/decrement vector register by predicate countuqdecp/uqincp <Wdn>, <Pm>.<T>
: unsigned saturating increment/decrement general register by predicate countuqdecp/uqincp <Zdn>.<T>, <Pm>.<T>
: unsigned saturating increment/decrement vector register by predicate count