-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[InstCombine] Fold (icmp pred (trunc nuw/nsw X), C)
-> (icmp pred X, (zext/sext C))
#87935
[InstCombine] Fold (icmp pred (trunc nuw/nsw X), C)
-> (icmp pred X, (zext/sext C))
#87935
Conversation
@llvm/pr-subscribers-llvm-transforms Author: None (goldsteinn) Changes
Full diff: https://github.com/llvm/llvm-project/pull/87935.diff 2 Files Affected:
diff --git a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
index db302d7e526844..6a3f5ba7b0c1e7 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp
@@ -5705,6 +5705,22 @@ Instruction *InstCombinerImpl::foldICmpWithTrunc(ICmpInst &ICmp) {
ICmpInst::Predicate Pred = ICmp.getPredicate();
Value *Op0 = ICmp.getOperand(0), *Op1 = ICmp.getOperand(1);
+ // Match (icmp pred (trunc nuw/nsw X), C)
+ // Which we can convert to (icmp pred X, (sext/zext C))
+ // TODO: Maybe this makes sense as a general canonicalization?
+ if (match(Op1, m_ImmConstant())) {
+ if (auto *TI = dyn_cast<TruncInst>(Op0)) {
+ Value *ExtOp0 = TI->getOperand(0);
+ Value *ExtOp1 = nullptr;
+ if (!ICmp.isSigned() && TI->hasNoUnsignedWrap())
+ ExtOp1 = Builder.CreateZExt(Op1, ExtOp0->getType());
+ else if (!ICmp.isUnsigned() && TI->hasNoSignedWrap())
+ ExtOp1 = Builder.CreateSExt(Op1, ExtOp0->getType());
+ if (ExtOp1)
+ return new ICmpInst(Pred, ExtOp0, ExtOp1);
+ }
+ }
+
// Try to canonicalize trunc + compare-to-constant into a mask + cmp.
// The trunc masks high bits while the compare may effectively mask low bits.
Value *X;
diff --git a/llvm/test/Transforms/InstCombine/icmp-trunc.ll b/llvm/test/Transforms/InstCombine/icmp-trunc.ll
index b2de9dddb21947..13f83ac5db7798 100644
--- a/llvm/test/Transforms/InstCombine/icmp-trunc.ll
+++ b/llvm/test/Transforms/InstCombine/icmp-trunc.ll
@@ -555,3 +555,147 @@ define i1 @shl1_trunc_sgt4(i32 %a) {
%r = icmp sgt i16 %t, 4
ret i1 %r
}
+
+define i1 @eq_nuw(i32 %x) {
+; DL64-LABEL: @eq_nuw(
+; DL64-NEXT: [[TMP1:%.*]] = and i32 [[X:%.*]], 255
+; DL64-NEXT: [[R:%.*]] = icmp eq i32 [[TMP1]], 123
+; DL64-NEXT: ret i1 [[R]]
+;
+; DL8-LABEL: @eq_nuw(
+; DL8-NEXT: [[R:%.*]] = icmp eq i32 [[X:%.*]], 123
+; DL8-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i32 %x to i8
+ %r = icmp eq i8 %t, 123
+ ret i1 %r
+}
+
+define i1 @ult_nuw(i32 %x) {
+; CHECK-LABEL: @ult_nuw(
+; CHECK-NEXT: [[R:%.*]] = icmp ult i32 [[X:%.*]], 45
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i32 %x to i8
+ %r = icmp ult i8 %t, 45
+ ret i1 %r
+}
+
+define i1 @ule_nuw(i32 %x) {
+; CHECK-LABEL: @ule_nuw(
+; CHECK-NEXT: [[T:%.*]] = trunc nuw i32 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp ult i32 [[X]], 46
+; CHECK-NEXT: call void @use(i8 [[T]])
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i32 %x to i8
+ %r = icmp ule i8 %t, 45
+ call void @use(i8 %t)
+ ret i1 %r
+}
+
+define i1 @ugt_nuw(i32 %x) {
+; CHECK-LABEL: @ugt_nuw(
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i32 [[X:%.*]], 12
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i32 %x to i8
+ %r = icmp ugt i8 %t, 12
+ ret i1 %r
+}
+
+define i1 @uge_nuw(i48 %x) {
+; CHECK-LABEL: @uge_nuw(
+; CHECK-NEXT: [[T:%.*]] = trunc nuw i48 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp ugt i48 [[X]], 98
+; CHECK-NEXT: call void @use(i8 [[T]])
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i48 %x to i8
+ %r = icmp uge i8 %t, 99
+ call void @use(i8 %t)
+ ret i1 %r
+}
+
+define i1 @sgt_nuw_fail(i32 %x) {
+; CHECK-LABEL: @sgt_nuw_fail(
+; CHECK-NEXT: [[T:%.*]] = trunc nuw i32 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp sgt i8 [[T]], 12
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nuw i32 %x to i8
+ %r = icmp sgt i8 %t, 12
+ ret i1 %r
+}
+
+define i1 @ne_nsw(i32 %x) {
+; DL64-LABEL: @ne_nsw(
+; DL64-NEXT: [[TMP1:%.*]] = and i32 [[X:%.*]], 255
+; DL64-NEXT: [[R:%.*]] = icmp ne i32 [[TMP1]], 123
+; DL64-NEXT: ret i1 [[R]]
+;
+; DL8-LABEL: @ne_nsw(
+; DL8-NEXT: [[R:%.*]] = icmp ne i32 [[X:%.*]], 123
+; DL8-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i32 %x to i8
+ %r = icmp ne i8 %t, 123
+ ret i1 %r
+}
+
+define i1 @slt_nsw(i32 %x) {
+; CHECK-LABEL: @slt_nsw(
+; CHECK-NEXT: [[R:%.*]] = icmp slt i32 [[X:%.*]], 45
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i32 %x to i8
+ %r = icmp slt i8 %t, 45
+ ret i1 %r
+}
+
+define i1 @sle_nsw(i32 %x) {
+; CHECK-LABEL: @sle_nsw(
+; CHECK-NEXT: [[T:%.*]] = trunc nsw i32 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp slt i32 [[X]], 46
+; CHECK-NEXT: call void @use(i8 [[T]])
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i32 %x to i8
+ %r = icmp sle i8 %t, 45
+ call void @use(i8 %t)
+ ret i1 %r
+}
+
+define i1 @sgt_nsw(i32 %x) {
+; CHECK-LABEL: @sgt_nsw(
+; CHECK-NEXT: [[R:%.*]] = icmp sgt i32 [[X:%.*]], 12
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i32 %x to i8
+ %r = icmp sgt i8 %t, 12
+ ret i1 %r
+}
+
+define i1 @sge_nsw(i48 %x) {
+; CHECK-LABEL: @sge_nsw(
+; CHECK-NEXT: [[T:%.*]] = trunc nsw i48 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp sgt i48 [[X]], 98
+; CHECK-NEXT: call void @use(i8 [[T]])
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i48 %x to i8
+ %r = icmp sge i8 %t, 99
+ call void @use(i8 %t)
+ ret i1 %r
+}
+
+define i1 @ule_nsw_fail(i32 %x) {
+; CHECK-LABEL: @ule_nsw_fail(
+; CHECK-NEXT: [[T:%.*]] = trunc nsw i32 [[X:%.*]] to i8
+; CHECK-NEXT: [[R:%.*]] = icmp ult i8 [[T]], 46
+; CHECK-NEXT: ret i1 [[R]]
+;
+ %t = trunc nsw i32 %x to i8
+ %r = icmp ule i8 %t, 45
+ ret i1 %r
+}
|
(icmp pred (trunc nuw/nsw X), C)
-> (icmp pred X, (zext/sext C))
Think long term, we can probably generalize just about any fold we where previously doing with |
Generally fine, but before we start using trunc nuw/nsw in optimizations we need to perform some due diligence. Based on past experience, we very likely have folds that incorrectly preserve the nuw/nsw flag now. Approach is basically 1. replace "trunc" with "trunc nuw" in existing tests, 2. check whether there are failures with alive2, 3. repeat with trunc nsw. Let me know if you would be interested in doing that... |
Ill look into that. I would think the only way this really becomes an issue is if we have |
It tends to be either in-place modification, or returning of existing values. Here's a couple examples of issues I ran into with disjoint or: d8bc546 07c18a0 cd31cf5 We had similar with zext nneg. Most likely there will be some for trunc nuw/nsw as well. |
I went ahead and did this myself -- and didn't find any issues this time around. Maybe we don't have that many interesting trunc combines... |
Value *ExtOp1 = nullptr; | ||
if (!ICmp.isSigned() && TI->hasNoUnsignedWrap()) | ||
ExtOp1 = Builder.CreateZExt(Op1, ExtOp0->getType()); | ||
else if (!ICmp.isUnsigned() && TI->hasNoSignedWrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is actually also valid for isUnsigned()
predicates: https://alive2.llvm.org/ce/z/MiDiYZ
42fc67c
to
3f3b660
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks basically good to me, the only question I have is whether this should be behind a shouldChangeType check, similar to the other trunc+icmp combine?
If we have a trunc i128 to i8 or so, it's not clear that converting from an icmp i8 to icmp i128 is a good transform. (Similar for the i48 example in the tests.)
Yeah, think |
3f3b660
to
1f86ade
Compare
|
||
// Match (icmp pred (trunc nuw/nsw X), C) | ||
// Which we can convert to (icmp pred X, (sext/zext C)) | ||
if (isDesirableIntType(SrcBits) || shouldChangeType(DstBits, SrcBits)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need both isDesirableIntType and shouldChangeType here?
1f86ade
to
e6e5f52
Compare
ping |
IIRC @regehr is developing a fuzzer that mutates existing instcombine tests :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. There are some regressions. But I don't think we should block this patch.
if (!Cmp.isSigned() && Trunc->hasNoUnsignedWrap()) | ||
return new ICmpInst(Pred, X, ConstantInt::get(SrcTy, C.zext(SrcBits))); | ||
if (Trunc->hasNoSignedWrap()) | ||
return new ICmpInst(Pred, X, ConstantInt::get(SrcTy, C.sext(SrcBits))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the sign-extended constant as it may be cheaper to materialize than the zero-extended one (e.g., -1 vs 4294967296).
Don't do this until we have some performance data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, ill leave as a potential todo for now.
What would likely avoid regressions would be to limit this to one-use truncs. I assume that the main problem is that it's hard to infer facts about the truncated value after conditions have been changed to use the non-truncated value. |
By the way, I don't seem to be getting notifications when you force-push a PR, so I usually don't notice when you address feedback in PRs. Which is kind of weird, because I receive stupendous amounts of force-push spam from AtariDream's pull requests. I don't really get what the difference is. |
|
||
// Match (icmp pred (trunc nuw/nsw X), C) | ||
// Which we can convert to (icmp pred X, (sext/zext C)) | ||
if (shouldChangeType(DstBits, SrcBits)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be passing the types, not the bit width? That way we will not transform vector cases, for which we don't have sensible profitability heuristics here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, in this case the shouldChnageType
that accepts types just forwards the primitiveBitWidth
to the bitwidth helper, but if you think this version is a more future proof happy to change.
Edit: Misread the Type *
impl.
At least IMO, handling Vecs is beneficial. If we have no profitability heuristic think simpler IR is a fair tiebreaker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't really true for vectors. If the previous trunc + icmp sequence worked on legal types and you convert it into one that works on illegal types, that may be catastrophically worse (down to scalarization). There's a reason the special handling for vectors is there.
(This is especially true in the multi-use case, where you're not even removing the trunc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a trivial example for aarch64: https://llvm.godbolt.org/z/K3hh3bx65 Even in the one-use case, getting rid of the trunc is worse for codegen. I haven't checked, but I expect your patch would currently do this transform because i32 is legal but i8 is not (for scalars).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
it is my student Yuyou who has written this, but I have had a hand in it, and I use it often. if anyone else wants to run it, build this branch of Alive2: https://github.com/Hatsunespica/alive2/tree/mutate it can be used as a standalone mutator, but also it can generate a mutant and analyze it using Alive2 without forking, which makes it very fast. I'm mostly fuzzing the AArch64 backend these days, so if anyone wants to do middle-end fuzzing, that would be good! |
NB, part of this is canonicalization imo. Original motivation was if we should extend the trunc handling in |
Hmm, would it help if I add a comment when I force-push? |
e6e5f52
to
0344081
Compare
…X, (zext/sext C))` This is valid as long as the sign of the wrap flag doesn't differ from the sign of the `pred`. Proofs: https://alive2.llvm.org/ce/z/35NsrR NB: The online Alive2 hasn't been updated with `trunc nuw/nsw` support, so the proofs must be reproduced locally.
0344081
to
5b4e99e
Compare
@nikic, without trying to sound passive aggressive or rushing. Do you get notifications when I comment in a PR, or are those lost like force-push notifications? edit: Wondering if I'm going to need to start tagging you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@goldsteinn I do get the comments. I also got a notification for your last force-push. But not for the preceding one, for example. |
This patch causing false MSAN reports. Would it be OK to revert the patch while I am investigating, to avoid false reports for us and other MSAN users? |
As in breaks MSAN, or MSAN now complains about LLVM?
I guess preferably not. Can you wait until there have some actual reports / an issue justifying. |
Msan reports false bugs in users code.
I have multiple reports on our internal codebase, I just need to minimize them. Inconveniently, I am traveling this week. Also if we wait, it can be branched into clang RC, which will need cherry-picks, potentially with conflicts. |
The thing is, we want this patch (from a codegen POV). It is necessary to cleanup regressions otherwise assosiated w/ #90436. |
I think it's okay to revert in the meantime. Though you might want to check whether #93163 has fixed the msan issue -- I feel like it may be related (incorrect inference of trunc nuw/nsw for values that may be undef).
This is still two months away, so that's unlikely :) |
#93163 didn't help.
I see it in a different way. With a reproducer it would be totally fair to revert this patch. However it's often better to revert early with just a promise to publish a reproducer slightly later. I guess I will have time in the next 24h to investigate. I am still open to 3 possibilities:
I will get back with the result soon. |
trivial reproducer https://godbolt.org/z/TaxWPPrdq before the patch bit field was completely isolated before comparison, now it just compares with different constants: Irrelevant bits of comparison arg are uninitialized. However Msan can't handle it for signed, but unsigned version is OK: https://godbolt.org/z/dMGeqaGEj |
I guess we can fix forward by enabling FYI @eugenis |
(icmp pred (trunc nuw/nsw X), C)
; NFC(icmp pred (trunc nuw/nsw X), C)
->(icmp pred X, (zext/sext C))
This is valid as long as the sign of the wrap flag doesn't differ from
the sign of the
pred
.Proofs: https://alive2.llvm.org/ce/z/mxWoFd
NB: The online Alive2 hasn't been updated with
trunc nuw/nsw
support, so the proofs must be reproduced locally.