Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor: inconsistent group by position planning #10679

Merged
merged 3 commits into from
May 28, 2024

Conversation

korowa
Copy link
Contributor

@korowa korowa commented May 26, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

While #10591 found that planning for INT literals in GROUP BY clause is inconsistent. For example:

EXPLAIN SELECT "URL", COUNT(1) from hits GROUP BY 1, 3, 4, 5

produces following logical plan (this is logical plan before any optimizations):

Explain
  Projection: hits.URL, COUNT(Int64(1))
    Aggregate: groupBy=[[hits.URL, Int64(3), Int64(4), Int64(5)]], aggr=[[COUNT(Int64(1))]]
      TableScan: hits

1 resolved as column from select clause, and 3, 4, 5 remain as constants. Don't think this behavior is intended + there are couple of tests in sql_integration.rs (modified below), for similar cases, expect queries to fail.

The suggestion is to fail while planning phase for such cases, and return error, specifying first index which can not be resolved as select column.

What changes are included in this PR?

Resolution for positions (integer literals in GROUP BY clause) now returns error on first unresolved position.

SIgnature for resolve_positions_to_exprs changed to avoid unnecessary cloning. resolve_aliases_to_exprs signature has similar changes, for the same reasons (it's been changed just for consistency with resolve_positions_... function).

Are these changes tested?

Added sqllogictests + some test coverage for resolve_positions_to_exprs

Are there any user-facing changes?

Users might experience query planning failures for GROUP BY %idx% where idx is out of bounds of SELECT expressions list (which seems to be OK from SQL semantics point of view -- PG/MySQL handle this case in the same way).

@github-actions github-actions bot added sql SQL Planner sqllogictest SQL Logic Tests (.slt) labels May 26, 2024
@korowa korowa changed the title fix: inconsistent group by position planning minor: inconsistent group by position planning May 26, 2024
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @korowa -- this looks like a nice usability improvement to me.

I had a suggestion for improved message but I don't think it is needed

cc @jonahgao who I think is an expert in this area

Copy link
Member

@jonahgao jonahgao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

I have verified that the behavior of this ordinal reference aligns with PostgreSQL.
The work to eliminate cloning is also great.

@alamb alamb merged commit 3dc1773 into apache:main May 28, 2024
23 checks passed
@alamb
Copy link
Contributor

alamb commented May 28, 2024

Thanks again @korowa and @jonahgao

findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* fix: inconsistent group by position planning

* Update datafusion/sql/src/utils.rs

Co-authored-by: Andrew Lamb <[email protected]>

* improved error message

---------

Co-authored-by: Andrew Lamb <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants