-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove some Expr clones in EliminateCrossJoin
(3%-5% faster planning)
#10430
Remove some Expr clones in EliminateCrossJoin
(3%-5% faster planning)
#10430
Conversation
@@ -250,90 +265,67 @@ fn find_inner_join( | |||
})) | |||
} | |||
|
|||
fn intersect( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was moved into JoinExprSet
accum.push((*left.clone(), *right.clone())); | ||
} | ||
// insert handles ensuring we don't add the same Join keys multiple times | ||
join_keys.insert(left, right); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JoinExprSet does not require cloning to check if the exprs should be inserted
match op { | ||
Operator::Eq => { | ||
if join_keys.contains(&(*left.clone(), *right.clone())) | ||
|| join_keys.contains(&(*right.clone(), *left.clone())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there were several clone
s (deep copies) here too to simply check if the the join keys contained the expressions which have been removed.
/// | ||
/// 1. Retains insert order | ||
/// 2. Can quickly look up if a pair of expressions are in the set. | ||
#[derive(Debug)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I extracted all the relevant behavior for managing join keys into this file both to document it as well as to allow checking if a join key is present without having to clone Exprs
/// # Returns | ||
/// * `Some()` when there are few remaining predicates in filter_expr | ||
/// * `None` otherwise | ||
fn remove_join_expressions(expr: Expr, join_keys: &JoinKeySet) -> Option<Expr> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API previously cloned on all paths (including to check join keys) -- the new API does not
Also it never returns Err
so I changed the signature to Option
from Result<Option<..>>
EliminateCrossJoin
EliminateCrossJoin
(2%-5% faster planning)
EliminateCrossJoin
(2%-5% faster planning)EliminateCrossJoin
(3%-5% faster planning)
75ef06a
to
cdda848
Compare
@@ -131,7 +131,7 @@ impl OptimizerRule for EliminateCrossJoin { | |||
.map(|f| Some(LogicalPlan::Filter(f))) | |||
} else { | |||
// Remove join expressions from filter: | |||
match remove_join_expressions(predicate, &all_join_keys)? { | |||
match remove_join_expressions(predicate.clone(), &all_join_keys) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This clone used to happen inside remove_join_expressions
so I moved it up so it is clearer (and sets the stage to avoid the clone entirely in the next PR)
@@ -150,7 +150,7 @@ impl OptimizerRule for EliminateCrossJoin { | |||
/// Returns a boolean indicating whether the flattening was successful. | |||
fn try_flatten_join_inputs( | |||
plan: &LogicalPlan, | |||
possible_join_keys: &mut Vec<(Expr, Expr)>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the core change in this PR -- instead of passing around &HashSet or &Vec()
, they are encapsulated into a struct now which is much more careful to clone only when needed
possible_join_keys: &[(Expr, Expr)], | ||
all_join_keys: &mut HashSet<(Expr, Expr)>, | ||
possible_join_keys: &JoinKeySet, | ||
all_join_keys: &mut JoinKeySet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is nice unification
…liminate_cross_join
|
||
/// Return an iterator over the join keys in this set | ||
pub fn iter(&self) -> impl Iterator<Item = (&Expr, &Expr)> { | ||
self.inner.iter().map(|(l, r)| (l, r)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need this map here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: It is (annoying) subtle -- this transforms a iterator of &(Expr, Expr)
to an iterator of (&Expr, &Expr)
It is somewhat annoying that Rust does this conversion implicitly in the map closure but can't do it when in the return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is 💪
Co-authored-by: comphead <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Thanks for the review @comphead 🚀 |
apache#10430) * Remove some Expr clones in `EliminateCrossJoin` * Apply suggestions from code review Co-authored-by: comphead <[email protected]> * fix --------- Co-authored-by: comphead <[email protected]>
Draft as it builds on #10427Which issue does this PR close?
Part of #10287
Rationale for this change
I am trying to avoid copying as much in
EliminateCrossJoin
to speed up planning.One thing this rule does is quite a few
Expr
copies while checking exprs for join keysWhat changes are included in this PR?
JoinExprSet
structJoinExprSet
JoinExprSet
Are these changes tested?
Existing CI
Are there any user-facing changes?
No functional changes
Performance tests show 3-5% faster for TPCH / TPDS queries
Details