-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix update_expr for projection pushdown #9096
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -916,7 +916,9 @@ fn update_expr( | |||
.find_map(|(index, (projected_expr, alias))| { | |||
projected_expr.as_any().downcast_ref::<Column>().and_then( | |||
|projected_column| { | |||
column.name().eq(projected_column.name()).then(|| { | |||
(column.name().eq(projected_column.name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is another similar check a bit further down in the file in new_columns_for_join_on
Likewise in new_indices_for_join_filter
I wonder if those checks need to extended to check the index as well? Maybe we could extract the "find matching projection column" logic as a function 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am working on a new, more comprehensive and less error-prone projection optimizer. I plan to open a PR after our internal review. Just so you know, all these utils will be removed in the new version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, then I don't touch other places like new_columns_for_join_on
, but only address the reported issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, for new_columns_for_join_on
, because it applies on particular joining side (left or right) with left/right joining keys and left/right projection expressions. It should not have the same issue as update_expr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto for new_indices_for_join_filter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If another rewrite is imminent, it is all the more important that we thoroughly test for this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for new_columns_for_join_on
and new_indices_for_join_filter
, you won't hit a similar issue like this. The reason is #9096 (comment).
Alice 100 | ||
Alice 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that after this PR the results matches expected ones, in comment above.
Should we update the comment?
the current query results are incorrect, becuase the query was incorrectly rewritten as:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should have both test-cases (just in case):
SELECT t1.a, t1.b FROM t1 JOIN t2 ON t1.a = t2.a ORDER BY t1.a, t1.b;
Alice 50
Alice 50
Alice 100
Alice 100
SELECT t1.a, t1.b FROM t1 JOIN t2 ON t1.a = t2.a ORDER BY t1.a, t2.b;
Alice 50
Alice 100
Alice 50
Alice 100
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, why do we have test-cases with incorrect results, knowing this?
It seems it would be useful to have a set of tests that can fail, but do not block CI, etc (something like TDD, we know it fails and we can make it work)
It will be much easier to find test-cases that don't work right now. The tests won't even have to be changed, except perhaps excluded from the "expectedly fail" category.
сс @alamb
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. Yea, this existing test now matches correct behaviors after this PR. Actually it is same issue (order by with same column name in joined sides).
Updated the test case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it would be useful to have a set of tests that can fail, but do not block CI, etc (something like TDD, we know it fails and we can make it work)
I agree this would be useful -- we can do it in the rust tests with #[ignore]
. Maybe we need to add something similar to sqllogictests
It will be much easier to find test-cases that don't work right now. The tests won't even have to be changed, except perhaps excluded from the "expectedly fail" category.
One way to do this is to search for issue links https://github.com...
@viirya huge thanks for such a reactive response to the issue! 🚀 |
Operator::Modulo, | ||
Arc::new(Column::new("e", 5)), | ||
Arc::new(Column::new("e", 4)), | ||
))), | ||
)?), | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than changing this test like that,
]; | |
let projected_exprs: Vec<(Arc<dyn PhysicalExpr>, String)> = vec![ | |
(Arc::new(Column::new("a", 3)), "a".to_owned()), | |
(Arc::new(Column::new("b", 1)), "b_new".to_owned()), | |
(Arc::new(Column::new("c", 0)), "c".to_owned()), | |
(Arc::new(Column::new("d", 2)), "d_new".to_owned()), | |
(Arc::new(Column::new("e", 5)), "e".to_owned()), | |
(Arc::new(Column::new("f", 4)), "f_new".to_owned()), | |
]; | |
these changes will be simpler and greater coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, updated.
Thank you @alamb @DDtKey @berkaysynnada |
BTW #9111 tracks the work @berkaysynnada is doing for ProjectionPushdown in case anyone has thoughts they would like to share |
Which issue does this PR close?
Closes #9091.
Rationale for this change
projection_pushdown
rule uses functionupdate_expr
to check if a plan's expressions could be updated with projection expressions. The function only checks column name so if there are columns with same name, the check will return incorrectRewrittenValid
state. Column index should be checked too in the functionupdate_expr
.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?