-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant Aggregate when DISTINCT
& GROUP BY
are in the same query
#11781
Remove redundant Aggregate when DISTINCT
& GROUP BY
are in the same query
#11781
Conversation
Fix deploying DataFusion site error
…eature/eliminate-aggregate # Conflicts: # .github/workflows/docs.yaml # datafusion/optimizer/src/lib.rs # datafusion/optimizer/src/optimizer.rs # datafusion/optimizer/src/single_distinct_to_groupby.rs
implement as rewrite function
…eature/eliminate-aggregate
add additional tests for not removing cases
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
Co-authored-by: Mehmet Ozan Kabak <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this and it LGTM. Easy win!
DISTINCT
& GROUP BY
DISTINCT
& GROUP BY
DISTINCT
& GROUP BY
are in the same query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @mertak-synnada and @ozankabak
@@ -4536,19 +4536,14 @@ EXPLAIN SELECT DISTINCT c3 FROM aggregate_test_100 group by c3 limit 5; | |||
logical_plan | |||
01)Limit: skip=0, fetch=5 | |||
02)--Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[]] | |||
03)----Aggregate: groupBy=[[aggregate_test_100.c3]], aggr=[[]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These certainly look like bettr plans to me.
However, I wonder if these changes now mean the query doesn't fulfill the original intent of the tests
#
# Push limit into distinct group-by aggregation tests
#
Therefore I wonder if we should add another test like
EXPLAIN SELECT DISTINCT c3, min(c1) FROM aggregate_test_100 group by c3 limit 5;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW in my opinion it would be fine to add the other tests as a new PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, let's do it that way
Which issue does this PR close?
Closes #.
Rationale for this change
Inefficient planning is produced when a redundant DISTINCT & GROUP BY clause is used together.
Eg.
Consecutive empty aggregates with same group by are unnecessary.
What changes are included in this PR?
It's fixed that if a distinct is the same as a following group by statement, it should not be turned into an Aggregate plan.
Are these changes tested?
Yes
Are there any user-facing changes?
No