-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Slightly relax "const vector propagation" heuristics #76788
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsFixes #76781 By changing too conservative heuristic I introduced in #70378, also changed SPMI to print PerfScore (will revert before merge) Snippet from #76781 static void Foo(ref byte src, nuint length, ref byte dest)
{
nuint idx = 0;
Vector128<byte> mask = Vector128.Create((byte)0x7F);
if (length >= (uint)Vector128<byte>.Count)
{
nuint vecEnd = length - (uint)Vector128<byte>.Count;
do
{
Vector128<byte> vec = Vector128.LoadUnsafe(ref src, idx);
vec &= mask; // use of hoisted constant
vec.StoreUnsafe(ref dest, idx);
idx += (uint)Vector128<byte>.Count;
} while (idx <= vecEnd);
}
} ; Method Program:Foo(byref,ulong,byref)
G_M5292_IG01:
C5F877 vzeroupper
G_M5292_IG02:
33C0 xor eax, eax
+ C4E179100532000000 vmovupd xmm0, xmmword ptr [reloc @RWD00]
4883FA10 cmp rdx, 16
7223 jb SHORT G_M5292_IG05
G_M5292_IG03:
4883C2F0 add rdx, -16
- 90 align [1 bytes for IG04]
+ 0F1F840000000000 align [8 bytes for IG04]
G_M5292_IG04:
- C5FA6F0401 vmovdqu xmm0, xmmword ptr [rcx+rax]
- C5F9DB0513000000 vpand xmm0, xmm0, xmmword ptr [reloc @RWD00]
- C4C17A7F0400 vmovdqu xmmword ptr [r8+rax], xmm0
+ C5F9DB0C01 vpand xmm1, xmm0, xmmword ptr [rcx+rax]
+ C4C17A7F0C00 vmovdqu xmmword ptr [r8+rax], xmm1
4883C010 add rax, 16
483BC2 cmp rax, rdx
76E4 jbe SHORT G_M5292_IG04
G_M5292_IG05:
C3 ret
RWD00 dq 7F7F7F7F7F7F7F7Fh, 7F7F7F7F7F7F7F7Fh
-; Total bytes of code: 45
+; Total bytes of code: 53
|
@jakobbotsch is this a correct way to switch to PerfScore for superpmi-diff job? |
I think it should work depending on the size of the diffs. If there are massive number of diffs then you will hit the problem we always had before #76238, but if it manages to complete I think the jit-analyze output should show perfscore analysis. |
This need more thinking judging by regressions 🙁 |
9b3ae74
to
c1af69a
Compare
Fixes #76781
By changing too conservative heuristic I introduced in #70378, also changed SPMI to print PerfScore (will revert before merge)
Snippet from #76781