Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use IntegralRange in fgOptimizeCast #59897

Merged
merged 5 commits into from
Oct 8, 2021

Conversation

SingleAccretion
Copy link
Contributor

@SingleAccretion SingleAccretion commented Oct 2, 2021

To get rid of the duplicated handling, reduce the amount of code and have the CQ benefits that come along.

The aforementioned CQ benefits are quite significant: win-x64, win-arm64, win-x86, linux-arm.

The improvements come mainly from the handling of bool-returning managed methods, the regressions are due to more aggressive loop inversion, a few register allocation changes and a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 2, 2021
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2021
@ghost
Copy link

ghost commented Oct 2, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

To get rid of the duplicated handling, reduce the amount of code and have the CQ benefits that come along.

The aforementioned CQ benefits are quite significant: diffs pending.

The improvements come mainly from the handling of bool-returning managed methods, the regressions are due to more aggressive loop inversion, a few register allocation changes and a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

Author: SingleAccretion
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@SingleAccretion SingleAccretion force-pushed the IntegralRange-Further-Work branch from 7c430a2 to e8e4c90 Compare October 2, 2021 16:16
Couple good diffs from the handling of GT_CALL.
The comments are right, the code is wrong...

Fortunately, the bug was a pessimizing one, not a correctness issue.
And reap the benefits.
@SingleAccretion SingleAccretion force-pushed the IntegralRange-Further-Work branch from e8e4c90 to d87cc73 Compare October 2, 2021 16:41
@SingleAccretion SingleAccretion marked this pull request as ready for review October 2, 2021 19:23
@SingleAccretion
Copy link
Contributor Author

@dotnet/jit-contrib

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 59897 in repo dotnet/runtime

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

var_types srcType = oper->TypeGet();
var_types dstType = tree->CastToType();
unsigned dstSize = genTypeSize(dstType);
GenTree* src = cast->CastOp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing these optimizations even in minopts? Should we be?

Copy link
Contributor Author

@SingleAccretion SingleAccretion Oct 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we are, that is how it has been for a very long time.

The "discard a cast" part seems low value for minopts, optNarrowTree, however, does a lot of work for 32 bit targets CQ-wise (notably, it is somewhat misleadingly under a CLFLG_TREETRANS guard, the only optimization enabled in minopts).

I personally think we should not be doing any optimization in minopts (which then will allow us to selectively enable high-impact optimizations in T0), since it is cheaper to have one big opts.OptimizationEnabled guard somewhere instead of them being scattered throughout, but disabling this folding will have significant CQ impact, so, it is not something for this change I think. Not to mention there is this complicated tradeoff between folding and TP where high-impact folding may lead to an increase in throughput. I am not sure where fgOptimizeCast is on that spectrum. I suppose one thing I know from profiles is that it is pretty cheap.

Copy link
Member

@jakobbotsch jakobbotsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, much simpler to understand and very nice diffs!

@pentp
Copy link
Contributor

pentp commented Oct 4, 2021

a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

Could lowering or emit check the asserted range for the node and still emit the optimized variant based on that? I'm asking because I hope to understand how to use the asserted ranges in a somewhat similar situation in division by constant lowering.

@SingleAccretion
Copy link
Contributor Author

Could lowering or emit check the asserted range for the node and still emit the optimized variant based on that?

Yes, the code here would need to be updated to check for nodes with the range of TYP_UBYTE, not just casts.

I hope to understand how to use the asserted ranges in a somewhat similar situation in division by constant lowering.

Heh... When I was designing IntegralRange, I really wanted it to work for your situation too, but it turned out to not be possible to do without making it slow on 32 bit, so I ended up with this constrained symbolic version that is not able to represent all values. That said, you could certainly utilize it, IntegralRange::ForNode will work in LIR too. You'll just need to add a new helper that extracts the number of bits from the range (I would imagine only positive ranges would be useful there, since the range is signed).

Perhaps we should find some similar abstraction for bits, but I am not sure at this point how it would look.

@SingleAccretion
Copy link
Contributor Author

SingleAccretion commented Oct 4, 2021

Summarizing the failures:

  1. A couple crashes on CoreCLR Pri1 Runtime Tests Run OSX x64 checked - classifying as Test failure baseservices/exceptions/simple/ParallelCrashWorkerThreads/ParallelCrashWorkerThreads.sh #57621.
  2. Suspicious Test Infrastructure Failure: System.IO.IOException: No space left on device in CoreCLR Pri1 Runtime Tests Run R2R_CG2 OSX x64 checked, jitstress - CoreCLR Pri1 Runtime Tests Run OSX arm64 checked - classifying as infrastructure failure.
  3. A few Assert failure(PID 1654 [0x00000676], Thread: 1654 [0x0676]): Verify_TypeLayout 'System.Numerics.Vector'1' failed to verify type layout failures in CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux arm64 checked, CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux_musl arm64, CoreCLR Pri1 Runtime Tests Run R2R_CG2 windows arm64 checked. Seen in an earlier run, so classifying as not related.

@pentp
Copy link
Contributor

pentp commented Oct 5, 2021

Perhaps we should find some similar abstraction for bits, but I am not sure at this point how it would look.

I wonder if instead of SymbolicIntegerValue we could just use two int8_t bounds in IntegralRange encoding the maximum number of bits set (with negative values representing significant bits of negative values). So LongMin = -63, ByteMin = -7, Zero = 0, One = 1, ByteMax = 7, UByteMax = 8, IntMax = 31, LongMax = 63 etc.
This would be more flexible and could account for some bitwise and arithmetic operations also (e.g., division by 7 always reducing the number of significant bits by 2).

@SingleAccretion
Copy link
Contributor Author

I wonder if instead of SymbolicIntegerValue we could just use two int8_t bounds in IntegralRange encoding the maximum number of bits set (with negative values representing significant bits of negative values). So LongMin = -63, ByteMin = -7, Zero = 0, One = 1, ByteMax = 7, UByteMax = 8, IntMax = 31, LongMax = 63 etc.
This would be more flexible and could account for some bitwise and arithmetic operations also (e.g., division by 7 always reducing the number of significant bits by 2).

I think this is a great suggestion, I will look into implementing it in the next round of IntegralRange-related changes.

@jakobbotsch jakobbotsch merged commit de00deb into dotnet:main Oct 8, 2021
@jakobbotsch
Copy link
Member

Thanks again!

@SingleAccretion SingleAccretion deleted the IntegralRange-Further-Work branch October 8, 2021 18:33
@ghost ghost locked as resolved and limited conversation to collaborators Nov 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants