Use `IntegralRange` in `fgOptimizeCast` #59897

SingleAccretion · 2021-10-02T16:07:11Z

To get rid of the duplicated handling, reduce the amount of code and have the CQ benefits that come along.

The aforementioned CQ benefits are quite significant: win-x64, win-arm64, win-x86, linux-arm.

The improvements come mainly from the handling of bool-returning managed methods, the regressions are due to more aggressive loop inversion, a few register allocation changes and a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

ghost · 2021-10-02T16:07:17Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

To get rid of the duplicated handling, reduce the amount of code and have the CQ benefits that come along.

The aforementioned CQ benefits are quite significant: diffs pending.

The improvements come mainly from the handling of bool-returning managed methods, the regressions are due to more aggressive loop inversion, a few register allocation changes and a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

Author:	SingleAccretion
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

It was just forgotten.

Couple good diffs from the handling of GT_CALL.

The comments are right, the code is wrong... Fortunately, the bug was a pessimizing one, not a correctness issue.

And reap the benefits.

SingleAccretion · 2021-10-02T19:23:31Z

@dotnet/jit-contrib

azure-pipelines · 2021-10-03T13:13:08Z

Commenter does not have sufficient privileges for PR 59897 in repo dotnet/runtime

jakobbotsch · 2021-10-04T11:54:01Z

/azp run runtime-coreclr outerloop, runtime-coreclr jitstress

azure-pipelines · 2021-10-04T11:54:33Z

Azure Pipelines successfully started running 2 pipeline(s).

jakobbotsch · 2021-10-04T12:12:51Z

src/coreclr/jit/morph.cpp

-    var_types srcType = oper->TypeGet();
-    var_types dstType = tree->CastToType();
-    unsigned  dstSize = genTypeSize(dstType);
+    GenTree* src = cast->CastOp();


Are we doing these optimizations even in minopts? Should we be?

Yes we are, that is how it has been for a very long time.

The "discard a cast" part seems low value for minopts, optNarrowTree, however, does a lot of work for 32 bit targets CQ-wise (notably, it is somewhat misleadingly under a CLFLG_TREETRANS guard, the only optimization enabled in minopts).

I personally think we should not be doing any optimization in minopts (which then will allow us to selectively enable high-impact optimizations in T0), since it is cheaper to have one big opts.OptimizationEnabled guard somewhere instead of them being scattered throughout, but disabling this folding will have significant CQ impact, so, it is not something for this change I think. Not to mention there is this complicated tradeoff between folding and TP where high-impact folding may lead to an increase in throughput. I am not sure where fgOptimizeCast is on that spectrum. I suppose one thing I know from profiles is that it is pretty cheap.

jakobbotsch

LGTM, much simpler to understand and very nice diffs!

pentp · 2021-10-04T17:41:49Z

a few lost shortening of instructions (cmp al, 7 -> cmp eax, 7 because we no longer have a cast above the call).

Could lowering or emit check the asserted range for the node and still emit the optimized variant based on that? I'm asking because I hope to understand how to use the asserted ranges in a somewhat similar situation in division by constant lowering.

SingleAccretion · 2021-10-04T17:57:31Z

Could lowering or emit check the asserted range for the node and still emit the optimized variant based on that?

Yes, the code here would need to be updated to check for nodes with the range of TYP_UBYTE, not just casts.

I hope to understand how to use the asserted ranges in a somewhat similar situation in division by constant lowering.

Heh... When I was designing IntegralRange, I really wanted it to work for your situation too, but it turned out to not be possible to do without making it slow on 32 bit, so I ended up with this constrained symbolic version that is not able to represent all values. That said, you could certainly utilize it, IntegralRange::ForNode will work in LIR too. You'll just need to add a new helper that extracts the number of bits from the range (I would imagine only positive ranges would be useful there, since the range is signed).

Perhaps we should find some similar abstraction for bits, but I am not sure at this point how it would look.

SingleAccretion · 2021-10-04T17:58:18Z

Summarizing the failures:

A couple crashes on CoreCLR Pri1 Runtime Tests Run OSX x64 checked - classifying as Test failure baseservices/exceptions/simple/ParallelCrashWorkerThreads/ParallelCrashWorkerThreads.sh #57621.
Suspicious Test Infrastructure Failure: System.IO.IOException: No space left on device in CoreCLR Pri1 Runtime Tests Run R2R_CG2 OSX x64 checked, jitstress - CoreCLR Pri1 Runtime Tests Run OSX arm64 checked - classifying as infrastructure failure.
A few Assert failure(PID 1654 [0x00000676], Thread: 1654 [0x0676]): Verify_TypeLayout 'System.Numerics.Vector'1' failed to verify type layout failures in CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux arm64 checked, CoreCLR Pri1 Runtime Tests Run R2R_CG2 Linux_musl arm64, CoreCLR Pri1 Runtime Tests Run R2R_CG2 windows arm64 checked. Seen in an earlier run, so classifying as not related.

pentp · 2021-10-05T13:12:03Z

Perhaps we should find some similar abstraction for bits, but I am not sure at this point how it would look.

I wonder if instead of SymbolicIntegerValue we could just use two int8_t bounds in IntegralRange encoding the maximum number of bits set (with negative values representing significant bits of negative values). So LongMin = -63, ByteMin = -7, Zero = 0, One = 1, ByteMax = 7, UByteMax = 8, IntMax = 31, LongMax = 63 etc.
This would be more flexible and could account for some bitwise and arithmetic operations also (e.g., division by 7 always reducing the number of significant bits by 2).

SingleAccretion · 2021-10-05T15:05:38Z

I wonder if instead of SymbolicIntegerValue we could just use two int8_t bounds in IntegralRange encoding the maximum number of bits set (with negative values representing significant bits of negative values). So LongMin = -63, ByteMin = -7, Zero = 0, One = 1, ByteMax = 7, UByteMax = 8, IntMax = 31, LongMax = 63 etc.
This would be more flexible and could account for some bitwise and arithmetic operations also (e.g., division by 7 always reducing the number of significant bits by 2).

I think this is a great suggestion, I will look into implementing it in the next round of IntegralRange-related changes.

jakobbotsch · 2021-10-08T13:21:46Z

Thanks again!

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 2, 2021

ghost added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2021

SingleAccretion force-pushed the IntegralRange-Further-Work branch from 7c430a2 to e8e4c90 Compare October 2, 2021 16:16

SingleAccretion added 5 commits October 2, 2021 19:38

Set call return type in fgMorphIntoHelperCall

05050f7

It was just forgotten.

Introduce IntegralRange::ForNode

f79516c

Use Range::ForNode in assertion propagation

653a45f

Couple good diffs from the handling of GT_CALL.

Fix IntegralRange::ForCastInput

54472c5

The comments are right, the code is wrong... Fortunately, the bug was a pessimizing one, not a correctness issue.

Use IntegralRange in fgOptimizeCast

d87cc73

And reap the benefits.

SingleAccretion force-pushed the IntegralRange-Further-Work branch from e8e4c90 to d87cc73 Compare October 2, 2021 16:41

SingleAccretion marked this pull request as ready for review October 2, 2021 19:23

jakobbotsch reviewed Oct 4, 2021

View reviewed changes

jakobbotsch approved these changes Oct 4, 2021

View reviewed changes

jakobbotsch merged commit de00deb into dotnet:main Oct 8, 2021

SingleAccretion deleted the IntegralRange-Further-Work branch October 8, 2021 18:33

SingleAccretion mentioned this pull request Oct 9, 2021

Unable to build CLR: error: assigning to 'var_types' from incompatible type 'unsigned char' #60221

Closed

ghost locked as resolved and limited conversation to collaborators Nov 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `IntegralRange` in `fgOptimizeCast` #59897

Use `IntegralRange` in `fgOptimizeCast` #59897

SingleAccretion commented Oct 2, 2021 •

edited

Loading

ghost commented Oct 2, 2021

SingleAccretion commented Oct 2, 2021

azure-pipelines bot commented Oct 3, 2021

jakobbotsch commented Oct 4, 2021

azure-pipelines bot commented Oct 4, 2021

jakobbotsch Oct 4, 2021

SingleAccretion Oct 4, 2021 •

edited

Loading

jakobbotsch left a comment

pentp commented Oct 4, 2021

SingleAccretion commented Oct 4, 2021

SingleAccretion commented Oct 4, 2021 •

edited

Loading

pentp commented Oct 5, 2021 •

edited

Loading

SingleAccretion commented Oct 5, 2021

jakobbotsch commented Oct 8, 2021

Use IntegralRange in fgOptimizeCast #59897

Use IntegralRange in fgOptimizeCast #59897

Conversation

SingleAccretion commented Oct 2, 2021 • edited Loading

ghost commented Oct 2, 2021

SingleAccretion commented Oct 2, 2021

azure-pipelines bot commented Oct 3, 2021

jakobbotsch commented Oct 4, 2021

azure-pipelines bot commented Oct 4, 2021

jakobbotsch Oct 4, 2021

Choose a reason for hiding this comment

SingleAccretion Oct 4, 2021 • edited Loading

Choose a reason for hiding this comment

jakobbotsch left a comment

Choose a reason for hiding this comment

pentp commented Oct 4, 2021

SingleAccretion commented Oct 4, 2021

SingleAccretion commented Oct 4, 2021 • edited Loading

pentp commented Oct 5, 2021 • edited Loading

SingleAccretion commented Oct 5, 2021

jakobbotsch commented Oct 8, 2021

Use `IntegralRange` in `fgOptimizeCast` #59897

Use `IntegralRange` in `fgOptimizeCast` #59897

SingleAccretion commented Oct 2, 2021 •

edited

Loading

SingleAccretion Oct 4, 2021 •

edited

Loading

SingleAccretion commented Oct 4, 2021 •

edited

Loading

pentp commented Oct 5, 2021 •

edited

Loading