Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use intrinsic for genLog2() #69333

Merged
merged 6 commits into from
May 17, 2022
Merged

Conversation

kunalspathak
Copy link
Member

After recent VC++ upgrade, we saw a bad codegen C++ bug that repros only on release build of windows/arrm64. Explanation is in #69222 (comment). To get around the bug, use intrinsic BitForward64. I also simplified the code in BitPosition to always use BitScanForward().

Here is the code that is generated by various compilers/archs for BitScanForward:
vc++: https://godbolt.org/z/h4e7MqMsE
clang/gcc: https://godbolt.org/z/jWc6YPGd1

Also using _BitScanReverse() methods in genFindHighestBit() (although they are not used by anyone).

Fixes: #69222

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 13, 2022
@ghost ghost assigned kunalspathak May 13, 2022
@ghost
Copy link

ghost commented May 13, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

After recent VC++ upgrade, we saw a bad codegen C++ bug that repros only on release build of windows/arrm64. Explanation is in #69222 (comment). To get around the bug, use intrinsic BitForward64. I also simplified the code in BitPosition to always use BitScanForward().

Here is the code that is generated by various compilers/archs for BitScanForward:
vc++: https://godbolt.org/z/h4e7MqMsE
clang/gcc: https://godbolt.org/z/jWc6YPGd1

Also using _BitScanReverse() methods in genFindHighestBit() (although they are not used by anyone).

Fixes: #69222

Author: kunalspathak
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak kunalspathak marked this pull request as ready for review May 14, 2022 21:17
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib

// - Taking the modulo of 'value' with this will produce a unique hash for all
// powers of 2 (which is what "value" is).
// - Entries in hashTable[] which are -1 should never be used. There
// should be PRIME-8*sizeof(value) entries which are -1 .
//
inline
unsigned BitPosition(unsigned value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, but its worth calling out that this is "effectively" doing TrailingZeroCount, but without handling 0.

It might be worth centralizing these "primitive bit operation" methods somewhere and ensuring they are generally reusable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened tracking issue #69401

*/

inline unsigned int genFindHighestBit(unsigned int mask)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. LGTM but this is effectively using LeadingZeroCount (and then masking out everything but the highest bit) but without strictly handling some of the potential edge cases.

It might be worthwhile to centralize these somewhere (maybe as an "up for grabs" work item)

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes LGTM, but some of these methods are "odd" and seemingly restrictive for no reason compared to a more standardized implementation.

We have existing accelerated implementations on the C# side that could be reused in native: https://source.dot.net/#System.Private.CoreLib/BitOperations.cs,af96b203ee2f79f7

Such implementations also handle various edge cases (still optimally) and make the semantics of code clearer IMO. It might be worth an up-for-grabs work item to consider improving some of these more generally outside the necessary bug workaround

@jakobbotsch
Copy link
Member

jakobbotsch commented May 16, 2022

Interesting to note that this saves 0.5% to 1% TP when cross compiling to arm64, while there is no TP diff when compiling for x64. Presumably that's because we only use 64-bit genLog2 when targeting ARM64.
Shame we do not have native ARM64 TP diffs, would be interesting to see what this saved there.

[edit] it does save significantly on TP on win-x86 too though, ~0.3%.

@kunalspathak kunalspathak merged commit 6506705 into dotnet:main May 17, 2022
@kunalspathak kunalspathak deleted the bitposition branch May 17, 2022 03:30
@ghost ghost locked as resolved and limited conversation to collaborators Jun 16, 2022
@kunalspathak
Copy link
Member Author

/backport to release/6.0

@kunalspathak
Copy link
Member Author

/backport to release/6.0

Not sure what is the right text here. @jakobbotsch - do you know?

@jakobbotsch
Copy link
Member

AFAIK you got it right... Maybe it is because the thread is locked? I guess you have to open it manually.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CLRJit Access Violation on when attempting to run Windows ARM64 app
4 participants