-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Canonicalize API #8983
Improve Canonicalize API #8983
Conversation
/// | ||
/// assert_eq!(non_canonicalized, expr); | ||
/// ``` | ||
pub fn with_canonicalize(mut self, canonicalize: bool) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the new api that is part of the ExprSimplifer that mirrors the guarantees
@@ -98,6 +96,7 @@ impl SimplifyExpressions { | |||
// The left and right expressions in a Join on clause are not commutative, | |||
// since the order of the columns must match the order of the children. | |||
LogicalPlan::Join(_) => { | |||
let simplifier = ExprSimplifier::new(info).with_canonicalize(false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving the initialization of simplifier out like below? It looks cleaner. :)
let simplifier = match plan {
// Canonicalize step won't reorder expressions in a Join on clause.
// The left and right expressions in a Join on clause are not commutative,
// since the order of the columns must match the order of the children.
LogicalPlan::Join(_) => ExprSimplifier::new(info).with_canonicalize(false),
_ => ExprSimplifier::new(info),
};
let expr = plan
.expressions()
.into_iter()
.map(|e| {
// TODO: unify with `rewrite_preserving_name`
let original_name = e.name_for_alias()?;
let new_e = simplifier.simplify(e)?;
new_e.alias_if_changed(original_name)
})
.collect::<Result<Vec<_>>>()?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good design! 😃
I helped review it and here are some of my thoughts.
Co-authored-by: Junhao Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks clean 👍
/// // Expression: a = c AND 1 = b | ||
/// let expr = col("a").eq(col("c")).and(lit(1).eq(col("b"))); | ||
/// | ||
/// // With canonicalization, the expression is rewritten to canonical form | ||
/// // (though it is no simpler in this case): | ||
/// let canonical = simplifier.simplify(expr.clone()).unwrap(); | ||
/// // Expression has been rewritten to: (c = a AND b = 1) | ||
/// assert_eq!(canonical, col("c").eq(col("a")).and(col("b").eq(lit(1)))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, seeing this just made me realize the doc for Canonicalize struct actually states the opposite order:
Probably can fix as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, that's my previous fault.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 1bb7d09
|
||
// The left and right expressions in a Join on clause are not commutative, | ||
// since the order of the columns must match the order of the children. | ||
// Thus, not reorder expressions in a Join `on` clause while simplifying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I just took a look at the join node:
Was it actually a problem with the filter
clause and not the on
clause? 🤔
Because the on
here is a vec of tuples to represent left and right (I was assuming it was a single Expr before)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the problem with on
. 🤔
In my previous experience, it's a problem when the tuples of on
keeps as before when optimizing, but the left
and right
of Join
change, so they can't match.
Not pretty sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is not entirely clear what is wrong with canonicalizing the expressions for joins. I'll add a note to the comment explaining that we don't really understand the issue (maybe someone can look into it as a follow on)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Thanks for the review @yyy1000 -- I am now just waiting for another committer to approve the PR and I'll merge it |
Which issue does this PR close?
Part of #8724
Follow on to #8780
Rationale for this change
The new canonicalize API added in #8780:
ExprSimplifer::with_guarantees
What changes are included in this PR?
Simplifier::with_canonicalize
and update code to use itAre these changes tested?
Yes, with new doc example as well as existing code
Are there any user-facing changes?
Better api + documentation.
cc @Jefffrey