-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string aggregate grouping fuzz test, add MemTable::with_sort_exprs
#9190
Conversation
@@ -58,6 +60,9 @@ pub struct MemTable { | |||
pub(crate) batches: Vec<PartitionData>, | |||
constraints: Constraints, | |||
column_defaults: HashMap<String, Expr>, | |||
/// Optional pre-known sort order(s). Must be `SortExpr`s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I needed this feature to write a test that ran with sorted aggregates
|
||
/// Return a string of random characters of length 1..=max_len | ||
fn random_string(rng: &mut StdRng, max_len: usize) -> String { | ||
// pick characters at random (not just ascii) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats nice....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
this makes me think that fuzz tests require a lot of code to maintain the tests, so we probably may want to think about having a separate crate for fuzz testing like you did for array funtions in #9113
Thanks @comphead -- I agree keeping the fuzz testing code in a central place is a good ideea. In this case, some non trivial chunk of the code is in the test-utils
crate (not published to crates.io, etc):
https://github.com/apache/arrow-datafusion/tree/main/test-utils
And then is used as a dev dependency
I think longer term consolidating fuzz testing into its own crate (like we did for sqllogictests) would be very helpful
async fn aggregate_test() { | ||
/// Tests that streaming aggregate and batch (non streaming) aggregate produce | ||
/// same results | ||
#[tokio::test(flavor = "multi_thread")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to limit this to 8 threads that I know of. Using multi-thread uses more cores if available
@@ -50,18 +61,18 @@ async fn aggregate_test() { | |||
let n = 300; | |||
let distincts = vec![10, 20]; | |||
for distinct in distincts { | |||
let mut handles = Vec::new(); | |||
let mut join_set = JoinSet::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using JoinSet for consistency -- which automatically cancels outstanding tasks on panic
/// to test the streaming group by case | ||
/// | ||
/// if large is true, the input batches will be LargeStringArray | ||
async fn group_by_string_test( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the new test
}; | ||
join_set.spawn(async move { run_distinct_count_test(generator).await }); | ||
} | ||
for generator in StringBatchGenerator::interesting_cases() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored StringBatchGenerator into a common location to share
MemTable::with_sort_exprs
Thanks again for the review @comphead |
Which issue does this PR close?
Part of #7064
Rationale for this change
I want fuzz test coverage for the grouping by single string columns so no regressions are introduced
What changes are included in this PR?
MemTable::with_sort_exprs
which is needed to write this testAre these changes tested?
Yes
Are there any user-facing changes?
MemTable::with_sort_exprs