JIT: Handle half accesses for SIMDs in local morph #89520

jakobbotsch · 2023-07-26T19:07:43Z

While it's not generally expected that halves can be accessed directly (ending up with LCL_FLD), it sometimes happens in some of the SW implementations of Vector256/Vector512 methods. In rare cases the JIT still falls back to these even with there is HW acceleration.
In those cases we want to avoid DNER'ing the involved locals, so expand the existing recognition with these patterns.

Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16.

Fix #85359
Fix #89456

Some size wise regressions are expected, which come down to

A large number of similar looking tests end up now enregistering some locals that cause new upper half saves/restores to be required. This accounts for most of the size-wise regressions.
The expansions often do not result in smaller code because loading/storing the halves directly from/to stack is smaller code than the vector equivalent with extraction/insertion.

Many of the regressions are in SW implementations of Vector256/Vector512 methods that we usually do not expect to see called with HW acceleration supported.

Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16. Fix dotnet#85359 Fix dotnet#89456

ghost · 2023-07-26T19:07:56Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

While it's not generally expected that quarters/halves can be accessed directly (ending up with LCL_FLD), it sometimes happens in some of the SW implementations of Vector256/Vector512 methods. In rare cases the JIT still falls back to these even with there is HW acceleration.
In those cases we want to avoid DNER'ing the involved locals, so expand the existing recognition with these patterns.

Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16.

Fix #85359
Fix #89456

Some size wise regressions are expected, which come down to

A large number of similar looking tests end up now enregistering some locals that cause new upper half saves/restores to be required. This accounts for most of the size-wise regressions.
The expansions often do not result in smaller code because loading/storing the halves directly from/to stack is smaller code than the vector equivalent with extraction/insertion.

Many of the regressions are in SW implementations of Vector256/Vector512 methods that we usually do not expect to see called with HW acceleration supported.

Author:	jakobbotsch
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

src/coreclr/jit/lclmorph.cpp

tannergooding · 2023-07-26T19:39:17Z

src/coreclr/jit/lclmorph.cpp

@@ -1125,11 +1192,17 @@ class LocalAddressVisitor final : public GenTreeVisitor<LocalAddressVisitor>
                }
                else if (indir->TypeIs(TYP_SIMD12))
                {
-                    if ((offset == 0) && m_compiler->IsBaselineSimdIsaSupported())
+                    if ((offset == 0) && (varDsc->TypeGet() == TYP_SIMD16) && m_compiler->IsBaselineSimdIsaSupported())


We could adjust this to handle TYP_SIMD32/TYP_SIMD64 if desired, correct? Just not worth it as part of this PR?

Yeah, let's leave that for a separate PR if desired. This was just a drive-by fix for #89456.

jakobbotsch · 2023-07-27T13:07:21Z

cc @dotnet/jit-contrib PTAL @tannergooding

Diffs. Size wise regressions as discussed above.

tannergooding · 2023-07-27T14:27:46Z

src/coreclr/jit/lclmorph.cpp

+                        break;
+                    }
+                    default:
+                        unreached();


Are we confident that unreached() is "correct" here or can/should we simply bail out of the transformation?

The main consideration would probably be that Vector<T> and Vector64<T> are backed by ulong fields. That shouldn't normally be an issue, but with all the various Unsafe.As/BitCast and other tricks, I'm not positive its not possible to see something here still.

I think it's fine -- SelectLocalIndirTransform only returns WithElement/GetElement for the cases that we can handle here.

tannergooding

LGTM.

Left a couple questions around certainty the unreached/assertions are "safe" given the various bitcasts and such we could encounter.

Size regression is primarily in tests and won't really impact the real world code most users see. The overall codegen is better (removing two stores) and while there is some more we could do, its a step in the right direction.

-- We do probably want to check the perf scores at some point since its showing the 2x insert (3 cycles, 1tp, 1uop, 1port each) is 1 cycle slower than the 2x stores (<11 cycles, 0.5-1tp, 1-2 uops, 2 ports each)

jakobbotsch · 2023-07-27T17:06:24Z

Failures are known according to build analysis.

JIT: Handle half/quarter accesses for SIMDs in local morph

06b5d0a

Also add a check to the existing SIMD16 -> SIMD12 to verify the source is a SIMD16. Fix dotnet#85359 Fix dotnet#89456

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 26, 2023

ghost assigned jakobbotsch Jul 26, 2023

tannergooding reviewed Jul 26, 2023

View reviewed changes

src/coreclr/jit/lclmorph.cpp Show resolved Hide resolved

tannergooding reviewed Jul 26, 2023

View reviewed changes

jakobbotsch added 2 commits July 27, 2023 11:07

Add a test for dotnet#89456

99a452e

Handle SIMD8 halves for ARM64, remove quarter handling

10d9afb

jakobbotsch changed the title ~~JIT: Handle half/quarter accesses for SIMDs in local morph~~ JIT: Handle halfaccesses for SIMDs in local morph Jul 27, 2023

jakobbotsch changed the title ~~JIT: Handle halfaccesses for SIMDs in local morph~~ JIT: Handle half accesses for SIMDs in local morph Jul 27, 2023

build-analysis bot mentioned this pull request Jul 27, 2023

Test failure profiler/gc/nongcheap/nongcheap.sh #88507

Closed

jakobbotsch marked this pull request as ready for review July 27, 2023 13:06

jakobbotsch requested a review from tannergooding July 27, 2023 13:07

tannergooding reviewed Jul 27, 2023

View reviewed changes

tannergooding approved these changes Jul 27, 2023

View reviewed changes

jakobbotsch merged commit 87526fb into dotnet:main Jul 27, 2023

jakobbotsch deleted the fix-85359 branch July 27, 2023 17:07

AndyAyersMS mentioned this pull request Aug 1, 2023

[Perf] Windows/x64: 2 Improvements on 7/27/2023 9:58:25 PM dotnet/perf-autofiling-issues#20311

Closed

This was referenced Aug 3, 2023

[Perf] Linux/arm64: 1 Regression on 7/28/2023 9:42:28 PM dotnet/perf-autofiling-issues#20343

Closed

Check IsBaselineSimdIsaSupported in LocalAddressVisitor #90099

Merged

ghost locked as resolved and limited conversation to collaborators Aug 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Handle half accesses for SIMDs in local morph #89520

JIT: Handle half accesses for SIMDs in local morph #89520

jakobbotsch commented Jul 26, 2023 •

edited

Loading

ghost commented Jul 26, 2023

tannergooding Jul 26, 2023

jakobbotsch Jul 27, 2023

jakobbotsch commented Jul 27, 2023

tannergooding Jul 27, 2023

jakobbotsch Jul 27, 2023

tannergooding left a comment

jakobbotsch commented Jul 27, 2023

JIT: Handle half accesses for SIMDs in local morph #89520

JIT: Handle half accesses for SIMDs in local morph #89520

Conversation

jakobbotsch commented Jul 26, 2023 • edited Loading

ghost commented Jul 26, 2023

tannergooding Jul 26, 2023

Choose a reason for hiding this comment

jakobbotsch Jul 27, 2023

Choose a reason for hiding this comment

jakobbotsch commented Jul 27, 2023

tannergooding Jul 27, 2023

Choose a reason for hiding this comment

jakobbotsch Jul 27, 2023

Choose a reason for hiding this comment

tannergooding left a comment

Choose a reason for hiding this comment

jakobbotsch commented Jul 27, 2023

jakobbotsch commented Jul 26, 2023 •

edited

Loading