-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add eliminate group by constant optimizer rule #10591
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really nice contribution @korowa (as always). I think the code is well tested, documented and explained.
I had some suggestions, but I don't think they are required for merging.
Thanks again
05)------Projection: COUNT(*), t2.t2_int, __always_true | ||
06)--------Aggregate: groupBy=[[t2.t2_int, Boolean(true) AS __always_true]], aggr=[[COUNT(Int64(1)) AS COUNT(*)]] | ||
05)------Projection: COUNT(*), t2.t2_int, Boolean(true) AS __always_true | ||
06)--------Aggregate: groupBy=[[t2.t2_int]], aggr=[[COUNT(Int64(1)) AS COUNT(*)]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this certainly looks like a plan improvement (fewer grouping keys) -- among other thing that would also allow this query to use the faster specialized group accumulator
/// | ||
/// Intended to be used only within this rule, helper function, which heavily | ||
/// reiles on `SimplifyExpressions` result. | ||
fn is_constant_expression(expr: &Expr) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could use TreeNode::Exists
here:
datafusion/datafusion/common/src/tree_node.rs
Line 408 in 045e8fc
fn exists<F: FnMut(&Self) -> Result<bool>>(&self, mut f: F) -> Result<bool> { |
Which handles all types of expressions and would be general
Something like this, perhaps (untested)
fn is_constant_expression(expr: &Expr) -> bool {
!expr.exsts(|expr| {
match expr {
Expr::Column(_) => true,
Expr::ScalarFunction(e) if matches!(
e.func.signature().volatility,
Volatility::Volatile
) => true,
_ => false
})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the hint. I've checked id out, and seems like there is no TreeNode API method for this case, since it requires all arguments of scalar function to be literals, and exists
will return on first match.
Since this PR looks good and makes improvements I am going to merge it in. @korowa perhaps you can consider the feedback on #10591 (comment) as a follow on PR 🚀 |
* feat: eliminate group by constant optimizer rule * fix tests
Which issue does this PR close?
Closes #.
Rationale for this change
Initial intention was to improve clickbench q34 -- it contains aggregation by constant and URL which makes aggregation switch to row-based mode which is slower that single field group by.
In general, elimination of constants from
GROUP BY
seems to be reasonable optimization.What changes are included in this PR?
New logical optimizer rule which removes constant or originated from constant expressions from
Aggregate::group_expr
, and places additional projection on top of aggregation, to satisfy original schema for downstream operators.Optimizer rule is placed between last
SimplifyExpressions
andOptimizeProjections
, since it requires constant evaluation, and projections, produced by this rule, may be merged.Are these changes tested?
New unit tests and sqllogictests
Are there any user-facing changes?
No