Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ARM64 encodings for groups IF_SVE_CE,CF #98409

Merged
merged 11 commits into from
Feb 29, 2024

Conversation

snickolls-arm
Copy link
Contributor

Adds ARM64 encodings for pmov variants. Matching capstone output:

pmov p2.b, z12
pmov p15.d, z7[7]
pmov p7.d, z16[0]
pmov p0.h, z31[1]
pmov p1.h, z1[0]
pmov p3.s, z9[3]
pmov p10.s, z4[0]
pmov p11.b, z12
pmov p6.d, z8[7]
pmov p9.d, z7[0]
pmov p8.h, z4[1]
pmov p5.h, z9[0]
pmov p14.s, z2[3]
pmov p3.s, z15[0]

Contributing towards #94549.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 14, 2024
@ghost
Copy link

ghost commented Feb 14, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Adds ARM64 encodings for pmov variants. Matching capstone output:

pmov p2.b, z12
pmov p15.d, z7[7]
pmov p7.d, z16[0]
pmov p0.h, z31[1]
pmov p1.h, z1[0]
pmov p3.s, z9[3]
pmov p10.s, z4[0]
pmov p11.b, z12
pmov p6.d, z8[7]
pmov p9.d, z7[0]
pmov p8.h, z4[1]
pmov p5.h, z9[0]
pmov p14.s, z2[3]
pmov p3.s, z15[0]

Contributing towards #94549.

Author: snickolls-arm
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@snickolls-arm
Copy link
Contributor Author

@a74nh @kunalspathak @dotnet/arm64-contrib

@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
realworld.run.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%
realworld.run.windows.arm64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%
realworld.run.windows.arm64.checked.mch -0.01%

Details here


@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Feb 14, 2024
@kunalspathak kunalspathak requested a review from TIHan February 14, 2024 18:57
assert(isVectorRegister(reg));
printf("%s", emitSveRegName(reg));

printf("[%d]", static_cast<int>(index));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we have emitDispElementIndex which will do this.

So something like this:

//------------------------------------------------------------------------
// emitDispSveRegIndex: Display a scalable vector register with indexed element
//
void emitter::emitDispSveRegIndex(regNumber reg, ssize_t index, bool addComma)
{
    assert(isVectorRegister(reg));
    printf("%s", emitSveRegName(reg));
    emitDispElementIndex(index, addComma);
}

// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant
// position 'lo'.
template <const size_t hi, const size_t lo>
static code_t insEncodeUimm(size_t imm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to write explicit insEncodeUimm*, but I'm not opposed to have something more generalized.
@BruceForstall @kunalspathak what are the guidelines for writing generalized functions like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. Thanks @snickolls-arm . Will it be possible to replace existing methods with this one and make sure you don't hit any asserts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'll take a look at replacing these. Would you prefer this in a separate patch or on this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate path is fine.

// value between them. The bit ranges are from hi1-lo1, and hi2-lo2 where the second range is at a less
// significant position relative to the first.
template <const size_t hi1, const size_t lo1, const size_t hi2, const size_t lo2>
static code_t insEncodeSplitUimm(size_t imm)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment for insEncodeUimm

@@ -701,6 +737,13 @@ static bool isValidSimm4_MultipleOf32(ssize_t value)
return (-256 <= value) && (value <= 224) && (value % 32 == 0);
};

template <const size_t bits>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the comment for insEncodeUimm

Copy link
Contributor

@TIHan TIHan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a small nit.

I like the generalized functions being introduced, but we should wait for @kunalspathak or @BruceForstall to weigh-in on it.

switch (opt)
{
case INS_OPTS_SCALABLE_D:
assert(isValidUimm3(imm));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the new methods that you have, here and elsewhere.

@@ -9203,6 +9264,48 @@ void emitter::emitIns_R_R_I(instruction ins,
fmt = IF_SVE_AM_2A;
break;

case INS_sve_pmov:
switch (opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be better to rewrite this switch/case something like this:

if (isPredicateRegister(reg1) && isVectorRegister(reg2))
{
    switch (opt)
    {
        case INS_OPTS_SCALABLE_D:
            assert(isValidUimm3(imm));
            fmt = IF_SVE_CE_2B;
            break;
        case INS_OPTS_SCALABLE_S:
            assert(isValidUimm2(imm));
            fmt = IF_SVE_CE_2D;
            break;
        case INS_OPTS_SCALABLE_H:
            assert(isValidImm1(imm));
            fmt = IF_SVE_CE_2C;
            break;
        default:
            unreached();
    }
}
else if (isVectorRegister(reg1) && isPredicateRegister(reg2))
{
    switch (opt)
    {
        case INS_OPTS_SCALABLE_D:
            assert(isValidUimm3(imm));
            fmt = IF_SVE_CF_2B;
            break;
        case INS_OPTS_SCALABLE_S:
            assert(isValidUimm2(imm));
            fmt = IF_SVE_CF_2D;
            break;
        case INS_OPTS_SCALABLE_H:
            assert(isValidImm1(imm));
            fmt = IF_SVE_CF_2C;
            break;
        default:
            unreached();
    }
}
else
{
  unreached();
}

void emitter::emitDispSveReg(regNumber reg, bool addComma)
{
assert(isVectorRegister(reg));
printf("%s", emitSveRegName(reg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printf("%s", emitSveRegName(reg));
printf(emitSveRegName(reg));

void emitter::emitDispSveRegIndex(regNumber reg, ssize_t index, bool addComma)
{
assert(isVectorRegister(reg));
printf("%s", emitSveRegName(reg));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printf("%s", emitSveRegName(reg));
printf(emitSveRegName(reg));

assert(isVectorRegister(reg));
printf("%s", emitSveRegName(reg));

printf("[%d]", static_cast<int>(index));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
printf("[%d]", static_cast<int>(index));
printf("[%d]", (int)index);

@@ -576,6 +578,40 @@ static code_t insEncodeSveElemsize_dtype(instruction ins, emitAttr size, code_t
// for the 'dtype' field.
static code_t insEncodeSveElemsize_dtype_ld1w(instruction ins, insFormat fmt, emitAttr size, code_t code);

// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
// Encodes an immediate value in consecutive bits from most signficant position 'hi' to least significant
// Encodes an immediate value in consecutive bits from most significant position 'hi' to least significant

template <const size_t hi, const size_t lo>
static code_t insEncodeUimm(size_t imm)
{
static_assert(hi >= lo && hi < sizeof(code_t) * 8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
static_assert(hi >= lo && hi < sizeof(code_t) * 8);
static_assert((hi >= lo) && (hi < sizeof(code_t) * BITS_PER_BYTE));

{
static_assert(hi >= lo && hi < sizeof(code_t) * 8);
const size_t imm_bits = hi - lo + 1;
const size_t imm_max = 1 << imm_bits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll never use this function as insEncodeUimm<63,0>(imm) (for a 64-bit number), but this expression will overflow in that case. Maybe add a assert(imm_bits < sizeof(code_t) * BITS_PER_BYTE)?

template <const size_t hi1, const size_t lo1, const size_t hi2, const size_t lo2>
static code_t insEncodeSplitUimm(size_t imm)
{
static_assert(hi1 >= lo1 && lo1 > hi2 && hi2 >= lo2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
static_assert(hi1 >= lo1 && lo1 > hi2 && hi2 >= lo2);
static_assert((hi1 >= lo1) && (lo1 > hi2) && (hi2 >= lo2));

const size_t imm_bits = hi - lo + 1;
const size_t imm_max = 1 << imm_bits;
assert(imm < imm_max);
return static_cast<code_t>(imm << lo);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have an assert that we're not losing any high-order bits? E.g.,

code_t result = static_cast<code_t>(imm << lo);
assert((result >> lo) == imm);
return result;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth adding this as it can tighten any portability assumptions that come with code_t and size_t.

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, please fix the formatting errors.

@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98409

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.linux.arm64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch -0.01%

Throughput diffs for windows/arm64 ran on windows/x64

MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%

Details here


@a74nh
Copy link
Contributor

a74nh commented Feb 22, 2024

Looking at the output:

pmov p2.b, z12
pmov p15.d, z7[7]
pmov p7.d, z16[0]
pmov p0.h, z31[1]
pmov p1.h, z1[0]
pmov p3.s, z9[3]
pmov p10.s, z4[0]
pmov p11.b, z12
pmov p6.d, z8[7]
pmov p9.d, z7[0]
pmov p8.h, z4[1]
pmov p5.h, z9[0]
pmov p14.s, z2[3]
pmov p3.s, z15[0]

On the last bunch of instructions, they should have a destination Z and a source P. Looks like the encodings got mixed up somewhere.

Needs fixing before merge.

@TIHan
Copy link
Contributor

TIHan commented Feb 22, 2024

Looks like the display code for the vector and predicate registers just need to flip. The encodings look right though.

@snickolls-arm
Copy link
Contributor Author

Looking at the output:

pmov p2.b, z12
pmov p15.d, z7[7]
pmov p7.d, z16[0]
pmov p0.h, z31[1]
pmov p1.h, z1[0]
pmov p3.s, z9[3]
pmov p10.s, z4[0]
pmov p11.b, z12
pmov p6.d, z8[7]
pmov p9.d, z7[0]
pmov p8.h, z4[1]
pmov p5.h, z9[0]
pmov p14.s, z2[3]
pmov p3.s, z15[0]

On the last bunch of instructions, they should have a destination Z and a source P. Looks like the encodings got mixed up somewhere.

Needs fixing before merge.

We've identified this to come from my implementation relying on isPredicateRegister, currently as we're aliasing predicate registers to vector registers this is causing it to only emit the pmov (to predicate) variant. I think it's correctly written in the code it's just that it's always taking paths where isPredicateRegister(reg1) && isVectorRegister(reg2) == true because isPredicateRegister is returning true for vector registers as well.

@a74nh has suggested adding a new scalable option for this which would fix it in the meantime until the predicates are distinctly defined.

@@ -8629,6 +8675,25 @@ void emitter::emitIns_R_R(instruction ins,
}
break;

case INS_sve_pmov:
if (sopt == INS_SCALABLE_OPTS_TO_PREDICATE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a check so that INS_OPTS_SCALABLE_B is required for opt (and all the other places IF_SVE_CE_2A and IF_SVE_CF_2A are checked)

This ensures the code calling it doesn't call with a different sized scalable. Eg: We want theEmitter->emitIns_R_R(INS_sve_pmov, EA_SCALABLE, REG_P2, REG_V12, INS_OPTS_SCALABLE_H, INS_SCALABLE_OPTS_TO_PREDICATE); to fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, what would be even more useful is if opt is INS_OPTS_SCALABLE_H/S/D then it should just do:
return emitIns_R_R_I(INS_sve_pmov, attr, reg1, reg2, 0, opt, sopt, INS_SCALABLE_OPTS_TO_PREDICATE);

Why? Because it's a useful shortcut. Most of the time the code calling all of this will be wanting to do a pmov with immediate 0, and it's a pain to have to write:
if (size == B) then emitIns_R_R(pmov,....) else emitIns_R_R_I(pmov,....,0);

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

isPredicateRegister is returning true for vector registers as well.

Yes, unfortunately we will have to wait for this to get added. There are just 1 or 2 places where we rely on this method and hopefully it doesn't cause any problem until the support to handle them comes.

@kunalspathak
Copy link
Member

@a74nh - do you have any additional comments?

Copy link
Contributor

@a74nh a74nh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kunalspathak kunalspathak merged commit 8dcca1c into dotnet:main Feb 29, 2024
127 of 129 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 31, 2024
@snickolls-arm snickolls-arm deleted the github-IF_SVE_CE,CF branch January 17, 2025 16:07
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants