-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: panic and incorrect results in LogFunc::output_ordering()
#11571
Conversation
@@ -547,32 +554,32 @@ physical_plan | |||
04)------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c11], output_ordering=[c11@0 ASC NULLS LAST], has_header=true | |||
|
|||
query TT | |||
EXPLAIN SELECT LOG(c11, c12) as log_c11_base_c12 | |||
EXPLAIN SELECT LOG(c12, c11) as log_c11_base_c12 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base
should be placed in the first parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this rapid fix!
It's not the problem for this PR, but I found this output_ordering calculation is extremely hard to get all cases correct, I think the correct way to implement it is to do some hard coded exact match only for a small number of obvious cases. For the rest cases we could blindly return unordered
.
If the example I gave is not mistaken, we can open an issue to address it and fix later.
match (input[0].sort_properties, input[1].sort_properties) { | ||
let (base_sort_properties, num_sort_properties) = if input.len() == 1 { | ||
// log(x) defaults to log(10, x) | ||
(SortProperties::Singleton, input[0].sort_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense to me 👍🏼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
} else { | ||
(input[0].sort_properties, input[1].sort_properties) | ||
}; | ||
match (num_sort_properties, base_sort_properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the following logic is not completely correct 🤔
e.g.
base is DESC NULLS FIRST
num is ASC NULLS LAST
This implementation will return log(base, num)
is ASC NULLS LAST
But the actual output might look like
NULL
NULL
1
2
NULL
Which is unordered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. I noticed that other logic also does not consider nulls_first
, for example
datafusion/datafusion/expr/src/sort_properties.rs
Lines 48 to 62 in 36660fe
pub fn add(&self, rhs: &Self) -> Self { | |
match (self, rhs) { | |
(Self::Singleton, _) => *rhs, | |
(_, Self::Singleton) => *self, | |
(Self::Ordered(lhs), Self::Ordered(rhs)) | |
if lhs.descending == rhs.descending => | |
{ | |
Self::Ordered(SortOptions { | |
descending: lhs.descending, | |
nulls_first: lhs.nulls_first || rhs.nulls_first, | |
}) | |
} | |
_ => Self::Unordered, | |
} | |
} |
Maybe @berkaysynnada can help confirm it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. We also need to check the placement of nulls in output_ordering()
(and possibly in other implementations of it for multi-input functions). As @jonahgao noticed, impl SortProperties {}
needs similar handling as well, but that can be addressed in a separate PR. I can create a ticket for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the nulls_first
problem for log
by 3feeaa8. We can create separate PRs for other fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix looks good to me. Thanks, @jonahgao. Could you also address the missing cases of nulls order checks, or do we report a ticket?
match (input[0].sort_properties, input[1].sort_properties) { | ||
let (base_sort_properties, num_sort_properties) = if input.len() == 1 { | ||
// log(x) defaults to log(10, x) | ||
(SortProperties::Singleton, input[0].sort_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
} else { | ||
(input[0].sort_properties, input[1].sort_properties) | ||
}; | ||
match (num_sort_properties, base_sort_properties) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. We also need to check the placement of nulls in output_ordering()
(and possibly in other implementations of it for multi-input functions). As @jonahgao noticed, impl SortProperties {}
needs similar handling as well, but that can be addressed in a separate PR. I can create a ticket for it.
SortProperties::Singleton, | ||
]; | ||
assert_eq!(results, expected); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding these tests. output_ordering()
API of scalar functions are experimental and needs unit tests (#10595). This is a good example for the rest of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_ordering() API of scalar functions are experimental
What do you mean by "experimental"? If that is the case perhaps we can update the documentation to explain what that means -- I don't see any mention of it
@@ -512,7 +519,7 @@ CREATE EXTERNAL TABLE aggregate_test_100 ( | |||
) | |||
STORED AS CSV | |||
WITH ORDER(c11) | |||
WITH ORDER(c12 DESC) | |||
WITH ORDER(c12 DESC NULLS LAST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make c11
and c12
have the same nulls_first
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jonahgao @berkaysynnada and @2010YOUY01 -- I think this PR is an improvement over what is on main,
As you have pointed out, filing some tickets to track the remaining work would be a good idea.
SortProperties::Singleton, | ||
]; | ||
assert_eq!(results, expected); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
output_ordering() API of scalar functions are experimental
What do you mean by "experimental"? If that is the case perhaps we can update the documentation to explain what that means -- I don't see any mention of it
Thanks all. The remaining issues are being tracked in ticket #11596 |
@alamb I guess I used a wrong terminology. I meant that |
Which issue does this PR close?
Closes #11549.
Rationale for this change
log
has two parameters, but this does not hold forlog(100)
, which is equivalent tolog(10, 100)
. This will cause panic due to index out of bounds.base
, causing incorrect ordering results.Ref: https://datafusion.apache.org/user-guide/sql/scalar_functions.html#log
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?
No