Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix expr partial ord test #8908

Merged
merged 3 commits into from
Jan 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions datafusion/expr/src/expr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1869,10 +1869,14 @@ mod test {
let exp2 = col("a") + lit(2);
let exp3 = !(col("a") + lit(2));

assert!(exp1 < exp2);
assert!(exp2 > exp1);
assert!(exp2 > exp3);
assert!(exp3 < exp2);
// Since comparisons are done using hash value of the expression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, but does that actually mean the comparison itself works not correctly sometimes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the calculation is consistent (will always be either < or >)

However, I wonder if the hash values are always consistent (like perhaps do they vary on x86_64 and M1 platforms?) Does that matter 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahash is not a stable hash algorithm and may not only change between platforms, but also minor releases. I lack context into why we are ordering based on hashes, but my initial response is this is probably incorrect, especially if it is inconsistent with the equality relation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but my initial response is this is probably incorrect, especially if it is inconsistent with the equality relation.

Why is it "inconsistent with equality" ? If two exprs are equal, they would have the same hash, so the order between hash values would be consistent (on a certain platform and release)

That is not to say we shouldn't change how ordering is done, but I just don't understand this comment

Copy link
Contributor

@tustvold tustvold Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't checked this, but if the PartialEq implementation isn't also using this hash approach, the two would be inconsistent

Edit: as suspected it looks like PartialEq is being derived using a proc macro, and is therefore inconsistent with this implementation of PartialOrd - https://github.com/apache/arrow-datafusion/blob/main/datafusion/expr/src/expr.rs#L87. In particular PartialOrd could claim things to be equal due to a hash collision, when PartialEq would indicate they are not

// expr < expr2 may return false, or true. There is no guaranteed result.
// The only guarantee is "<" operator should have the opposite result of ">=" operator
let greater_or_equal = exp1 >= exp2;
assert_eq!(exp1 < exp2, !greater_or_equal);

let greater_or_equal = exp3 >= exp2;
assert_eq!(exp3 < exp2, !greater_or_equal);
}

#[test]
Expand Down