-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64: Use csel and ccmp for conditional moves #67894
Closed
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
34989bb
Use csel and ccmp for conditional moves
a74nh 84f2ef4
Fold constant compares and better loop detection
a74nh 116f91d
Fix HW intrinsic compares
a74nh a6218da
Add conditional nodes to gtCloneExpr
a74nh e4226c7
Fix formatting
a74nh c65c424
Allow select nodes to constant propagate
a74nh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious why you chose this rather early stage for this.
On the face of it, doing this early has the significant disadvantage of decanonicalizing the IR w.r.t. relops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs doing after loop information is available. The next phase is PHASE_CLEAR_LOOP_INFO, so I wanted to go before that.
The phase before is loop unrolling. If conversion is only modifying code outside of loops (for now), so it made sense then to do this after loop unrolling to maximise the number of changes.
As far as I'm aware, other compilers also do if conversion fairly early too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that
PHASE_CLEAR_LOOP_INFO
doesn't actually clear much, only information used by cloning.I see you are using the "inside a loop" information as a performance heuristic currently, that should be in a "good enough" shape for that even after the optimizer. We're using the loop table for a similar purpose much later for loop alignment.
Ultimately though the question is if there are actual problems because of this, i. e. if there are any regressions in the diffs. I suppose there aren't any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok, maybe it can move down a bit then.
A quick scan through the passes, I'm thinking this should be done before loop hoisting too.
I'm not certain either way yet. Certainly, the tests I've tried look good.
I've got three asserts left from "superpmi.py replay" to fix up (there's a comment elsewhere on here about that). Next step after that would be to do a full test build and run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note my thinking with this is to avoid adding all the front-end support required (you've hit this problem with constant folding).
Well, it is currently done "before hoisting". It does not make sense to interject it between SSA and the optimizer phases I think (since it alters the flow graph). I was hoping we could slot it just after the optimizer (whether it'd be before lowering or after range check does not matter much I think, though making it LIR-only will make some things (morph support) unnecessary).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a full test run, of 2000+ tests, I get 11 failures, of which there are 3 unique failures: 1) The constant folding issue I already know about 2) a "pThread" assert, which I get without my patch (will ignore for now) and 3) a segfault - I'll debug this.
Question still remains if this is right place to have the phase. Looking at other compilers:
*LLVM if converts very early, and optimises every occurrence to select nodes. This is because removing the if then branches makes later phases, especially vectorisation, much easier. Then at the end of the compilation, it restores back to if/then branches for certain cases.
*GCC if converts loops early to allow vectorisation. Scalar code is if converted at the end of compilation
*OpenJDK if converts early. It does everything outside of loops and for inside loops uses historical branch taken / not taken counts.
Back to dotnet...
My original patch added if conversion after lowering but the ability to reshape the graph was limited and didn't work for multiple conditions. Agreed with by @EgorBo too.
As a test, I tried pushing if conversion down to just before lowering, and this seems to work in theory - I then get later some later errors, just requires some debugging.