[test] add fuzz test for topk #7772

Tangruilin · 2023-10-08T12:23:51Z

Which issue does this PR close?

Closes #7749 .

Rationale for this change

Add fuzz test for topk

What changes are included in this PR?

This PR add i32 type fuzz test for topk;
This PR add fn to get f64 and string batched but now not use(for the fn batches_to_vec can only work for i32)

TODO: This PR is not completed. I need some suggestions that if i need to modify fn batches_to_vec

Are these changes tested?

this PR is a test

Are there any user-facing changes?

no

Tangruilin · 2023-10-09T16:10:36Z

@alamb Please review it.

Tangruilin · 2023-10-09T16:10:50Z

@alamb Please review it.

Thanks

alamb · 2023-10-09T21:42:18Z

Thank you @Tangruilin -- I plan to review this tomorrow. I look forward to it!

alamb

Thank you @Tangruilin

TODO: This PR is not completed. I need some suggestions that if i need to modify fn batches_to_vec

I am not quite sure what you mean here. I think if you can make a TopKScenario type structure, it maybe easier to figure out if you need to modify batches_to_vec

alamb · 2023-10-10T14:52:28Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

@@ -138,6 +162,44 @@ impl SortTest {
        self
    }

+    async fn run_with_params(


Perhaps naming this run_with_limit would make it easier to understand what is happening

alamb · 2023-10-10T15:30:43Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

+}
+
+#[tokio::test]
+async fn test_sort_topk_i32() {


What would you think about encapsulating the limit data and expected value calculation in a structure?

So this test might look like

let size: usize = ...; // pick a random size let scenario = TopKScenario::new() // tell the scenario to sort by one column .with_sort_column(["i32_column"]) // specify a limit of 10 rows .with_limit(10); // stagger the batches in the scenario scenario.stagger() let collected = SortTest::new() // call Scenario::batches to get the input batches .with_input(scenario.batches()); // run the test .run_with_limit("t", scenario.sort_cols(), scenario.limit()).await; // The scenario handles calculting expected output (as it knows the sort column and limit) let expected = scenario.expected() let actual = batches_to_vec(&collected); assert_eq!(actual, &expected);

A good suggestion! I'll do it tomorrow and push it

alamb · 2023-10-10T16:03:11Z

Mark as draft to signify this PR is not waiting on feedback anymore

Weijun-H

LGTM. Thanks @Tangruilin

Weijun-H · 2023-10-10T16:04:29Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

 use std::sync::Arc;
-use test_utils::{batches_to_vec, partitions_to_sorted_vec};
+use test_utils::{batches_to_vec, partitions_to_sorted_vec, stagger_batch};



Suggested change

const KB: u64 = 1 << 10;

Weijun-H · 2023-10-10T16:04:49Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

+    for (batch_size, should_spill) in [(5, false), (20000, true), (1000000, true)] {
+        SortTest::new()
+            .with_int32_batches(batch_size)
+            .with_pool_size(10240)


Suggested change

.with_pool_size(10240)

.with_pool_size(10 * KB)

Weijun-H · 2023-10-10T16:05:19Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

+    for (batch_size, should_spill) in [(5, false), (20000, false), (1000000, true)] {
+        SortTest::new()
+            .with_int32_batches(batch_size)
+            .with_pool_size(102400)


Suggested change

.with_pool_size(102400)

.with_pool_size(100 * KB)

Tangruilin · 2023-10-10T17:13:06Z

I run ci/scripts/rust_clippy.sh on my mac but get

but the ci is

the result is not some, I'm confused @alamb

alamb · 2023-10-10T17:44:07Z

I run ci/scripts/rust_clippy.sh on my mac but get

You may have to do:

rustup update to get the latest rust version
merge up from origin/main to get the latest DataFusion code

Tangruilin · 2023-10-12T15:48:13Z

I run ci/scripts/rust_clippy.sh on my mac but get

You may have to do:

rustup update to get the latest rust version

merge up from origin/main to get the latest DataFusion code

except that. I found that the rust should be stable —— run rustup update && rust default stable then it is solved. Are there docs for this. If not, maybe I can add it.

alamb · 2023-10-13T15:04:36Z

except that. I found that the rust should be stable —— run rustup update && rust default stable then it is solved. Are there docs for this. If not, maybe I can add it.

I think it is implicit in the Testing setup: part:

rustup update stable DataFusion uses the latest stable release of rust

I agree this could be clearer in the documentation. Perhaps you can help here

Tangruilin · 2023-10-16T07:23:01Z

I have tried some ways, but there is not a prefect solution. I will update the PR tonight

Signed-off-by: reilly <[email protected]>

alamb · 2023-10-16T23:52:13Z

Thanks @Tangruilin -- I plan to review this more carefully tomorrow

Tangruilin · 2023-10-19T04:57:21Z

Thanks @Tangruilin -- I plan to review this more carefully tomorrow

Don't forget this~~~

alamb · 2023-10-19T10:14:39Z

I am sorry 😢 -- I have not forgotten, but I am backed up on reviews:

alamb

Thank you very much @Tangruilin and @Weijun-H -- this very nicely done. I am sorry for my delay in reviewing. I think this PR adds additional coverage and is structured to make future improvements straghtforward.

🙏 great job

alamb · 2023-10-21T11:41:40Z

datafusion/core/tests/fuzz_cases/sort_fuzz.rs

+                datafusion_expr::col(topk_scenario.col_name).sort(true, true)
+            ])
+            .unwrap()
+            .limit(0, Some(topk_scenario.limit))


alamb · 2023-10-21T18:53:31Z

BTW I am working on an extesion of this test to support multiple columns

github-actions bot added the core Core DataFusion crate label Oct 8, 2023

Tangruilin force-pushed the test#7749#add_fuzz_test branch 4 times, most recently from f63c3fc to 4144b44 Compare October 9, 2023 16:10

alamb reviewed Oct 10, 2023

View reviewed changes

alamb marked this pull request as draft October 10, 2023 16:02

Weijun-H approved these changes Oct 10, 2023

View reviewed changes

[test] add fuzz test for topk

1920bfe

Signed-off-by: reilly <[email protected]>

Tangruilin force-pushed the test#7749#add_fuzz_test branch from 4144b44 to 1920bfe Compare October 16, 2023 15:34

Tangruilin marked this pull request as ready for review October 16, 2023 15:35

Tangruilin requested a review from alamb October 16, 2023 15:38

Merge remote-tracking branch 'apache/main' into test#7749#add_fuzz_test

0213052

alamb approved these changes Oct 21, 2023

View reviewed changes

alamb merged commit 5bdc9af into apache:main Oct 21, 2023

alamb mentioned this pull request Oct 21, 2023

Add multi-column topk fuzz tests #7898

Merged

matthewgapp mentioned this pull request Jan 11, 2024

matt/feat/recursive ctes/config flag matthewgapp/arrow-datafusion#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[test] add fuzz test for topk #7772

[test] add fuzz test for topk #7772

Tangruilin commented Oct 8, 2023

Tangruilin commented Oct 9, 2023

Tangruilin commented Oct 9, 2023

alamb commented Oct 9, 2023

alamb left a comment

alamb Oct 10, 2023

alamb Oct 10, 2023

Tangruilin Oct 10, 2023

alamb commented Oct 10, 2023

Weijun-H left a comment

Weijun-H Oct 10, 2023

Weijun-H Oct 10, 2023

Weijun-H Oct 10, 2023

Tangruilin commented Oct 10, 2023

alamb commented Oct 10, 2023

Tangruilin commented Oct 12, 2023

alamb commented Oct 13, 2023

Tangruilin commented Oct 16, 2023

alamb commented Oct 16, 2023

Tangruilin commented Oct 19, 2023

alamb commented Oct 19, 2023

alamb left a comment

alamb Oct 21, 2023

alamb commented Oct 21, 2023

[test] add fuzz test for topk #7772

[test] add fuzz test for topk #7772

Conversation

Tangruilin commented Oct 8, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Tangruilin commented Oct 9, 2023

Tangruilin commented Oct 9, 2023

alamb commented Oct 9, 2023

alamb left a comment

Choose a reason for hiding this comment

alamb Oct 10, 2023

Choose a reason for hiding this comment

alamb Oct 10, 2023

Choose a reason for hiding this comment

Tangruilin Oct 10, 2023

Choose a reason for hiding this comment

alamb commented Oct 10, 2023

Weijun-H left a comment

Choose a reason for hiding this comment

Weijun-H Oct 10, 2023

Choose a reason for hiding this comment

Weijun-H Oct 10, 2023

Choose a reason for hiding this comment

Weijun-H Oct 10, 2023

Choose a reason for hiding this comment

Tangruilin commented Oct 10, 2023

alamb commented Oct 10, 2023

Tangruilin commented Oct 12, 2023

alamb commented Oct 13, 2023

Tangruilin commented Oct 16, 2023

alamb commented Oct 16, 2023

Tangruilin commented Oct 19, 2023

alamb commented Oct 19, 2023

alamb left a comment

Choose a reason for hiding this comment

alamb Oct 21, 2023

Choose a reason for hiding this comment

alamb commented Oct 21, 2023