Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add eliminate group by constant optimizer rule #10591

Merged
merged 2 commits into from
May 22, 2024

Conversation

korowa
Copy link
Contributor

@korowa korowa commented May 20, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

Initial intention was to improve clickbench q34 -- it contains aggregation by constant and URL which makes aggregation switch to row-based mode which is slower that single field group by.

In general, elimination of constants from GROUP BY seems to be reasonable optimization.

What changes are included in this PR?

New logical optimizer rule which removes constant or originated from constant expressions from Aggregate::group_expr, and places additional projection on top of aggregation, to satisfy original schema for downstream operators.

Optimizer rule is placed between last SimplifyExpressions and OptimizeProjections, since it requires constant evaluation, and projections, produced by this rule, may be merged.

Are these changes tested?

New unit tests and sqllogictests

Are there any user-facing changes?

No

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels May 20, 2024
@alamb alamb changed the title feat: eliminate group by constant optimizer rule feat: Add eliminate group by constant optimizer rule May 21, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really nice contribution @korowa (as always). I think the code is well tested, documented and explained.

I had some suggestions, but I don't think they are required for merging.

Thanks again

05)------Projection: COUNT(*), t2.t2_int, __always_true
06)--------Aggregate: groupBy=[[t2.t2_int, Boolean(true) AS __always_true]], aggr=[[COUNT(Int64(1)) AS COUNT(*)]]
05)------Projection: COUNT(*), t2.t2_int, Boolean(true) AS __always_true
06)--------Aggregate: groupBy=[[t2.t2_int]], aggr=[[COUNT(Int64(1)) AS COUNT(*)]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this certainly looks like a plan improvement (fewer grouping keys) -- among other thing that would also allow this query to use the faster specialized group accumulator

///
/// Intended to be used only within this rule, helper function, which heavily
/// reiles on `SimplifyExpressions` result.
fn is_constant_expression(expr: &Expr) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could use TreeNode::Exists here:

fn exists<F: FnMut(&Self) -> Result<bool>>(&self, mut f: F) -> Result<bool> {

Which handles all types of expressions and would be general

Something like this, perhaps (untested)

fn is_constant_expression(expr: &Expr) -> bool {
  !expr.exsts(|expr| {
    match expr { 
      Expr::Column(_) => true,
      Expr::ScalarFunction(e) if matches!(
                e.func.signature().volatility,
                Volatility::Volatile
      ) => true,
      _ => false
 })

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the hint. I've checked id out, and seems like there is no TreeNode API method for this case, since it requires all arguments of scalar function to be literals, and exists will return on first match.

@alamb
Copy link
Contributor

alamb commented May 22, 2024

Since this PR looks good and makes improvements I am going to merge it in. @korowa perhaps you can consider the feedback on #10591 (comment) as a follow on PR

🚀

@alamb alamb merged commit cde8f59 into apache:main May 22, 2024
24 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* feat: eliminate group by constant optimizer rule

* fix tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants