-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support unnest
in GROUP BY clause
#11469
Conversation
I wonder why we need to specialize group by expression for unnest, does that mean we need to specialize unnest with filter by too? 🤔 |
So, I tried to implement it in the community version. As far as I know, pg does not currently support unnest in filter, so I'm not sure it needs to be implemented. |
Thank you for your reply. 1 If "keep the recursive unnest logic within the aggregate function body" is implemented, at least the following work needs to be done: 2 If 'Recursive Unnest' is handled like DuckDB, how do we implement the unnest(unnest(list_column)) syntax when the user needs it? By optimization rules? |
@JasonLi-cn I mistakenly thought |
@JasonLi-cn Is it better to move the logic into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @JasonLi-cn -- I think this PR looks good to me. Thank you for the contribution
I think adding some comments about what try_process_group_by_unnest
is doing and why would help improve this code but I don't think it is required prior to merge
Thank you @alamb for your review. I have completed the code modification according to your modification suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments, but I think it can be merged as is. It works well. Perhaps we can try to improve it through subsequent PRs.
datafusion/sql/src/select.rs
Outdated
// Projection: tab.array_col AS unnest(tab.array_col) | ||
// TableScan: tab | ||
// ``` | ||
let mut intermediate_plan = input.clone(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid this clone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed 810ca8b which avoids the cloning in this function
|
||
/// Try converting Unnest(Expr) of group by to Unnest/Projection | ||
/// Return the new input and group_by_exprs of Aggregate. | ||
fn try_process_group_by_unnest( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function seems to have some code repetition with function try_process_unnest.
I wonder if there's a better way to handle this, such as planning unnest before aggregation, and then reusing the current group-by planning logic. This seems more intuitive to me. But I'm not sure about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #11498
@@ -297,6 +298,9 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { | |||
input: LogicalPlan, | |||
select_exprs: Vec<Expr>, | |||
) -> Result<LogicalPlan> { | |||
// Try process group by unnest | |||
let input = self.try_process_aggregate_unnest(input)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unnest has already been processed by try_process_aggregate_unnest
, does the following logic for handling unnest
become redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. We need to verify this logic in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracked in #11498
unnest
in GROUP BY clause
I merged up to resolve a conflict as well |
Thank you @alamb for your help to modify the code, I learned a lot from it. |
Thanks @JasonLi-cn @alamb |
* feat: support group by unnest * pass slt * refactor: mv process_group_by_unnest into try_process_unnest * chore: add some documentation comments and tests * Avoid cloning input * use consistent field names --------- Co-authored-by: Andrew Lamb <[email protected]>
* feat: support group by unnest * pass slt * refactor: mv process_group_by_unnest into try_process_unnest * chore: add some documentation comments and tests * Avoid cloning input * use consistent field names --------- Co-authored-by: Andrew Lamb <[email protected]>
* feat: support group by unnest * pass slt * refactor: mv process_group_by_unnest into try_process_unnest * chore: add some documentation comments and tests * Avoid cloning input * use consistent field names --------- Co-authored-by: Andrew Lamb <[email protected]>
* feat: support group by unnest * pass slt * refactor: mv process_group_by_unnest into try_process_unnest * chore: add some documentation comments and tests * Avoid cloning input * use consistent field names --------- Co-authored-by: Andrew Lamb <[email protected]>
Which issue does this PR close?
Closes #11470
Rationale for this change
support SQLs like:
What changes are included in this PR?
Are these changes tested?
Yes. Has added some test cases into
unnest.slt
.Are there any user-facing changes?
Simplified SQL syntax, eg:
can replace by