-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Handle half accesses for SIMDs in local morph #89520
Conversation
Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16. Fix dotnet#85359 Fix dotnet#89456
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsWhile it's not generally expected that quarters/halves can be accessed directly (ending up with Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16. Some size wise regressions are expected, which come down to
Many of the regressions are in SW implementations of Vector256/Vector512 methods that we usually do not expect to see called with HW acceleration supported.
|
@@ -1125,11 +1192,17 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor> | |||
} | |||
else if (indir->TypeIs(TYP_SIMD12)) | |||
{ | |||
if ((offset == 0) && m_compiler->IsBaselineSimdIsaSupported()) | |||
if ((offset == 0) && (varDsc->TypeGet() == TYP_SIMD16) && m_compiler->IsBaselineSimdIsaSupported()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could adjust this to handle TYP_SIMD32
/TYP_SIMD64
if desired, correct? Just not worth it as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's leave that for a separate PR if desired. This was just a drive-by fix for #89456.
cc @dotnet/jit-contrib PTAL @tannergooding Diffs. Size wise regressions as discussed above. |
break; | ||
} | ||
default: | ||
unreached(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we confident that unreached()
is "correct" here or can/should we simply bail out of the transformation?
The main consideration would probably be that Vector<T>
and Vector64<T>
are backed by ulong
fields. That shouldn't normally be an issue, but with all the various Unsafe.As/BitCast
and other tricks, I'm not positive its not possible to see something here still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine -- SelectLocalIndirTransform
only returns WithElement
/GetElement
for the cases that we can handle here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left a couple questions around certainty the unreached/assertions are "safe" given the various bitcasts and such we could encounter.
Size regression is primarily in tests and won't really impact the real world code most users see. The overall codegen is better (removing two stores) and while there is some more we could do, its a step in the right direction.
-- We do probably want to check the perf scores at some point since its showing the 2x insert
(3 cycles, 1tp, 1uop, 1port
each) is 1 cycle slower than the 2x stores (<11 cycles, 0.5-1tp, 1-2 uops, 2 ports
each)
Failures are known according to build analysis. |
While it's not generally expected that halves can be accessed directly (ending up with
LCL_FLD
), it sometimes happens in some of the SW implementations of Vector256/Vector512 methods. In rare cases the JIT still falls back to these even with there is HW acceleration.In those cases we want to avoid DNER'ing the involved locals, so expand the existing recognition with these patterns.
Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16.
Fix #85359
Fix #89456
Some size wise regressions are expected, which come down to
Many of the regressions are in SW implementations of Vector256/Vector512 methods that we usually do not expect to see called with HW acceleration supported.