-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ARM64 encodings for groups IF_SVE_CE,CF #98409
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsAdds ARM64 encodings for
Contributing towards #94549.
|
@a74nh @kunalspathak @dotnet/arm64-contrib |
Diff results for #98409Throughput diffsThroughput diffs for osx/arm64 ran on windows/x64MinOpts (-0.01% to +0.00%)
Throughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.01% to +0.00%)
Details here |
Diff results for #98409Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Throughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.01% to +0.01%)
Details here |
Diff results for #98409Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Throughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.01% to +0.01%)
Details here |
src/coreclr/jit/emitarm64.cpp
Outdated
assert(isVectorRegister(reg)); | ||
printf("%s", emitSveRegName(reg)); | ||
|
||
printf("[%d]", static_cast<int>(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we have emitDispElementIndex
which will do this.
So something like this:
//------------------------------------------------------------------------
// emitDispSveRegIndex: Display a scalable vector register with indexed element
//
void emitter::emitDispSveRegIndex(regNumber reg, ssize_t index, bool addComma)
{
assert(isVectorRegister(reg));
printf("%s", emitSveRegName(reg));
emitDispElementIndex(index, addComma);
}
// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant | ||
// position 'lo'. | ||
template <const size_t hi, const size_t lo> | ||
static code_t insEncodeUimm(size_t imm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tend to write explicit insEncodeUimm*
, but I'm not opposed to have something more generalized.
@BruceForstall @kunalspathak what are the guidelines for writing generalized functions like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. Thanks @snickolls-arm . Will it be possible to replace existing methods with this one and make sure you don't hit any asserts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'll take a look at replacing these. Would you prefer this in a separate patch or on this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate path is fine.
// value between them. The bit ranges are from hi1-lo1, and hi2-lo2 where the second range is at a less | ||
// significant position relative to the first. | ||
template <const size_t hi1, const size_t lo1, const size_t hi2, const size_t lo2> | ||
static code_t insEncodeSplitUimm(size_t imm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment for insEncodeUimm
@@ -701,6 +737,13 @@ static bool isValidSimm4_MultipleOf32(ssize_t value) | |||
return (-256 <= value) && (value <= 224) && (value % 32 == 0); | |||
}; | |||
|
|||
template <const size_t bits> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as the comment for insEncodeUimm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small nit.
I like the generalized functions being introduced, but we should wait for @kunalspathak or @BruceForstall to weigh-in on it.
src/coreclr/jit/emitarm64.cpp
Outdated
switch (opt) | ||
{ | ||
case INS_OPTS_SCALABLE_D: | ||
assert(isValidUimm3(imm)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use the new methods that you have, here and elsewhere.
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -9203,6 +9264,48 @@ void emitter::emitIns_R_R_I(instruction ins, | |||
fmt = IF_SVE_AM_2A; | |||
break; | |||
|
|||
case INS_sve_pmov: | |||
switch (opt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better to rewrite this switch/case something like this:
if (isPredicateRegister(reg1) && isVectorRegister(reg2))
{
switch (opt)
{
case INS_OPTS_SCALABLE_D:
assert(isValidUimm3(imm));
fmt = IF_SVE_CE_2B;
break;
case INS_OPTS_SCALABLE_S:
assert(isValidUimm2(imm));
fmt = IF_SVE_CE_2D;
break;
case INS_OPTS_SCALABLE_H:
assert(isValidImm1(imm));
fmt = IF_SVE_CE_2C;
break;
default:
unreached();
}
}
else if (isVectorRegister(reg1) && isPredicateRegister(reg2))
{
switch (opt)
{
case INS_OPTS_SCALABLE_D:
assert(isValidUimm3(imm));
fmt = IF_SVE_CF_2B;
break;
case INS_OPTS_SCALABLE_S:
assert(isValidUimm2(imm));
fmt = IF_SVE_CF_2D;
break;
case INS_OPTS_SCALABLE_H:
assert(isValidImm1(imm));
fmt = IF_SVE_CF_2C;
break;
default:
unreached();
}
}
else
{
unreached();
}
src/coreclr/jit/emitarm64.cpp
Outdated
void emitter::emitDispSveReg(regNumber reg, bool addComma) | ||
{ | ||
assert(isVectorRegister(reg)); | ||
printf("%s", emitSveRegName(reg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printf("%s", emitSveRegName(reg)); | |
printf(emitSveRegName(reg)); |
src/coreclr/jit/emitarm64.cpp
Outdated
void emitter::emitDispSveRegIndex(regNumber reg, ssize_t index, bool addComma) | ||
{ | ||
assert(isVectorRegister(reg)); | ||
printf("%s", emitSveRegName(reg)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printf("%s", emitSveRegName(reg)); | |
printf(emitSveRegName(reg)); |
src/coreclr/jit/emitarm64.cpp
Outdated
assert(isVectorRegister(reg)); | ||
printf("%s", emitSveRegName(reg)); | ||
|
||
printf("[%d]", static_cast<int>(index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
printf("[%d]", static_cast<int>(index)); | |
printf("[%d]", (int)index); |
src/coreclr/jit/emitarm64.h
Outdated
@@ -576,6 +578,40 @@ static code_t insEncodeSveElemsize_dtype(instruction ins, emitAttr size, code_t | |||
// for the 'dtype' field. | |||
static code_t insEncodeSveElemsize_dtype_ld1w(instruction ins, insFormat fmt, emitAttr size, code_t code); | |||
|
|||
// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant | |
// Encodes an immediate value in consecutive bits from most significant position 'hi' to least significant |
src/coreclr/jit/emitarm64.h
Outdated
template <const size_t hi, const size_t lo> | ||
static code_t insEncodeUimm(size_t imm) | ||
{ | ||
static_assert(hi >= lo && hi < sizeof(code_t) * 8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
static_assert(hi >= lo && hi < sizeof(code_t) * 8); | |
static_assert((hi >= lo) && (hi < sizeof(code_t) * BITS_PER_BYTE)); |
src/coreclr/jit/emitarm64.h
Outdated
{ | ||
static_assert(hi >= lo && hi < sizeof(code_t) * 8); | ||
const size_t imm_bits = hi - lo + 1; | ||
const size_t imm_max = 1 << imm_bits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll never use this function as insEncodeUimm<63,0>(imm)
(for a 64-bit number), but this expression will overflow in that case. Maybe add a assert(imm_bits < sizeof(code_t) * BITS_PER_BYTE)
?
src/coreclr/jit/emitarm64.h
Outdated
template <const size_t hi1, const size_t lo1, const size_t hi2, const size_t lo2> | ||
static code_t insEncodeSplitUimm(size_t imm) | ||
{ | ||
static_assert(hi1 >= lo1 && lo1 > hi2 && hi2 >= lo2); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
static_assert(hi1 >= lo1 && lo1 > hi2 && hi2 >= lo2); | |
static_assert((hi1 >= lo1) && (lo1 > hi2) && (hi2 >= lo2)); |
src/coreclr/jit/emitarm64.h
Outdated
const size_t imm_bits = hi - lo + 1; | ||
const size_t imm_max = 1 << imm_bits; | ||
assert(imm < imm_max); | ||
return static_cast<code_t>(imm << lo); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we have an assert that we're not losing any high-order bits? E.g.,
code_t result = static_cast<code_t>(imm << lo);
assert((result >> lo) == imm);
return result;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth adding this as it can tighten any portability assumptions that come with code_t
and size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, please fix the formatting errors.
Diff results for #98409Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64MinOpts (-0.00% to +0.01%)
Throughput diffs for osx/arm64 ran on windows/x64MinOpts (-0.01% to +0.00%)
Throughput diffs for windows/arm64 ran on windows/x64MinOpts (-0.01% to +0.00%)
Details here |
Looking at the output:
On the last bunch of instructions, they should have a destination Z and a source P. Looks like the encodings got mixed up somewhere. Needs fixing before merge. |
Looks like the display code for the vector and predicate registers just need to flip. The encodings look right though. |
We've identified this to come from my implementation relying on @a74nh has suggested adding a new scalable option for this which would fix it in the meantime until the predicates are distinctly defined. |
from predicate registers
src/coreclr/jit/emitarm64.cpp
Outdated
@@ -8629,6 +8675,25 @@ void emitter::emitIns_R_R(instruction ins, | |||
} | |||
break; | |||
|
|||
case INS_sve_pmov: | |||
if (sopt == INS_SCALABLE_OPTS_TO_PREDICATE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a check so that INS_OPTS_SCALABLE_B
is required for opt
(and all the other places IF_SVE_CE_2A
and IF_SVE_CF_2A
are checked)
This ensures the code calling it doesn't call with a different sized scalable. Eg: We want theEmitter->emitIns_R_R(INS_sve_pmov, EA_SCALABLE, REG_P2, REG_V12, INS_OPTS_SCALABLE_H, INS_SCALABLE_OPTS_TO_PREDICATE);
to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what would be even more useful is if opt
is INS_OPTS_SCALABLE_H/S/D
then it should just do:
return emitIns_R_R_I(INS_sve_pmov, attr, reg1, reg2, 0, opt, sopt, INS_SCALABLE_OPTS_TO_PREDICATE);
Why? Because it's a useful shortcut. Most of the time the code calling all of this will be wanting to do a pmov with immediate 0, and it's a pain to have to write:
if (size == B) then emitIns_R_R(pmov,....) else emitIns_R_R_I(pmov,....,0);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
isPredicateRegister is returning true for vector registers as well.
Yes, unfortunately we will have to wait for this to get added. There are just 1 or 2 places where we rely on this method and hopefully it doesn't cause any problem until the support to handle them comes.
@a74nh - do you have any additional comments? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Adds ARM64 encodings for
pmov
variants. Matching capstone output:Contributing towards #94549.