Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor: Add SMJ to TPCH benchmark usage #10747

Merged
merged 4 commits into from
Jun 1, 2024
Merged

Minor: Add SMJ to TPCH benchmark usage #10747

merged 4 commits into from
Jun 1, 2024

Conversation

comphead
Copy link
Contributor

Which issue does this PR close?

Closes #10100 .

Rationale for this change

Basically fix for #10380 fixed the issue, I just fixing also the usage info for SMJ

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@comphead
Copy link
Contributor Author

I checked the TPCH benchmarks passes with SMJ on and row counts are the same

RUST_BACKTRACE=1 RESULTS_NAME=smj ./benchmarks/bench.sh run tpch_smj
RUST_BACKTRACE=1 RESULTS_NAME=hj ./benchmarks/bench.sh run tpch
RUST_BACKTRACE=1 RESULTS_NAME=smj10 ./benchmarks/bench.sh run tpch_smj10
RUST_BACKTRACE=1 RESULTS_NAME=hj10 ./benchmarks/bench.sh run tpch10

tpch_mem: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), query from memory
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, hash join
tpch_smj10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, sort merge join
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to get rid of tpch_smj* soon and get the hash join type from the user input any bench can run with a choice of join type

Copy link

Benchmark results

Benchmarks comparing d6ddd23 (main) and 8353d20 (PR)
Comparing d6ddd23 and 8353d20
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 311.23ms │ 314.48ms │    no change │
│ QQuery 2     │  39.75ms │  44.90ms │ 1.13x slower │
│ QQuery 3     │  58.71ms │  59.99ms │    no change │
│ QQuery 4     │  83.26ms │  85.53ms │    no change │
│ QQuery 5     │  97.94ms │ 100.15ms │    no change │
│ QQuery 6     │  15.20ms │  15.67ms │    no change │
│ QQuery 7     │ 215.63ms │ 217.48ms │    no change │
│ QQuery 8     │  40.10ms │  40.95ms │    no change │
│ QQuery 9     │ 117.77ms │ 118.47ms │    no change │
│ QQuery 10    │ 104.43ms │ 101.81ms │    no change │
│ QQuery 11    │  75.79ms │  77.27ms │    no change │
│ QQuery 12    │  60.18ms │  59.87ms │    no change │
│ QQuery 13    │ 112.28ms │ 109.35ms │    no change │
│ QQuery 14    │  18.76ms │  18.58ms │    no change │
│ QQuery 15    │  30.72ms │  30.86ms │    no change │
│ QQuery 16    │  46.01ms │  45.91ms │    no change │
│ QQuery 17    │ 167.60ms │ 164.57ms │    no change │
│ QQuery 18    │ 465.70ms │ 545.63ms │ 1.17x slower │
│ QQuery 19    │  61.25ms │  60.38ms │    no change │
│ QQuery 20    │ 116.80ms │ 120.37ms │    no change │
│ QQuery 21    │ 335.60ms │ 342.52ms │    no change │
│ QQuery 22    │  30.19ms │  30.47ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 2604.90ms │
│ Total Time (8353d20)   │ 2705.21ms │
│ Average Time (d6ddd23) │  118.40ms │
│ Average Time (8353d20) │  122.96ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         2 │
│ Queries with No Change │        20 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 453.39ms │ 459.54ms │    no change │
│ QQuery 2     │  55.09ms │  57.11ms │    no change │
│ QQuery 3     │ 142.59ms │ 145.54ms │    no change │
│ QQuery 4     │  88.22ms │  89.39ms │    no change │
│ QQuery 5     │ 200.19ms │ 204.77ms │    no change │
│ QQuery 6     │ 105.65ms │ 105.17ms │    no change │
│ QQuery 7     │ 273.96ms │ 287.56ms │    no change │
│ QQuery 8     │ 182.75ms │ 179.28ms │    no change │
│ QQuery 9     │ 283.66ms │ 295.45ms │    no change │
│ QQuery 10    │ 228.35ms │ 233.43ms │    no change │
│ QQuery 11    │  41.04ms │  41.67ms │    no change │
│ QQuery 12    │ 127.32ms │ 129.31ms │    no change │
│ QQuery 13    │ 177.37ms │ 183.10ms │    no change │
│ QQuery 14    │ 124.31ms │ 124.03ms │    no change │
│ QQuery 15    │ 183.90ms │ 186.47ms │    no change │
│ QQuery 16    │  49.60ms │  49.47ms │    no change │
│ QQuery 17    │ 313.09ms │ 321.38ms │    no change │
│ QQuery 18    │ 447.60ms │ 493.86ms │ 1.10x slower │
│ QQuery 19    │ 226.96ms │ 228.16ms │    no change │
│ QQuery 20    │ 189.06ms │ 195.03ms │    no change │
│ QQuery 21    │ 317.85ms │ 315.83ms │    no change │
│ QQuery 22    │  40.09ms │  40.57ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 4252.06ms │
│ Total Time (8353d20)   │ 4366.11ms │
│ Average Time (d6ddd23) │  193.28ms │
│ Average Time (8353d20) │  198.46ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         1 │
│ Queries with No Change │        21 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃   d6ddd23 ┃   8353d20 ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 4470.71ms │ 4454.97ms │ no change │
│ QQuery 2     │  512.15ms │  491.60ms │ no change │
│ QQuery 3     │ 1709.79ms │ 1718.17ms │ no change │
│ QQuery 4     │  835.04ms │  831.02ms │ no change │
│ QQuery 5     │ 2157.94ms │ 2179.48ms │ no change │
│ QQuery 6     │ 1005.84ms │ 1005.22ms │ no change │
│ QQuery 7     │ 3452.80ms │ 3556.18ms │ no change │
│ QQuery 8     │ 2463.24ms │ 2497.21ms │ no change │
│ QQuery 9     │ 3975.42ms │ 3996.31ms │ no change │
│ QQuery 10    │ 2480.86ms │ 2486.30ms │ no change │
│ QQuery 11    │  343.56ms │  346.09ms │ no change │
│ QQuery 12    │ 1222.34ms │ 1224.75ms │ no change │
│ QQuery 13    │ 2313.42ms │ 2286.39ms │ no change │
│ QQuery 14    │ 1249.23ms │ 1263.20ms │ no change │
│ QQuery 15    │ 1908.59ms │ 1903.24ms │ no change │
│ QQuery 16    │  516.33ms │  509.42ms │ no change │
│ QQuery 17    │ 5413.51ms │ 5443.66ms │ no change │
│ QQuery 18    │ 6777.95ms │ 6896.24ms │ no change │
│ QQuery 19    │ 2243.45ms │ 2267.72ms │ no change │
│ QQuery 20    │ 2615.21ms │ 2579.24ms │ no change │
│ QQuery 21    │ 4479.53ms │ 4403.21ms │ no change │
│ QQuery 22    │  468.13ms │  451.80ms │ no change │
└──────────────┴───────────┴───────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 52615.04ms │
│ Total Time (8353d20)   │ 52791.42ms │
│ Average Time (d6ddd23) │  2391.59ms │
│ Average Time (8353d20) │  2399.61ms │
│ Queries Faster         │          0 │
│ Queries Slower         │          0 │
│ Queries with No Change │         22 │
└────────────────────────┴────────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @comphead

@alamb alamb merged commit 3777114 into apache:main Jun 1, 2024
23 checks passed
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Fix: Sort Merge Join crashes on TPCH Q21

* Fix LeftAnti SMJ join when the join filter is set

* rm dbg

* Add SMJ to TPCH benchmark usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fix Sort Merge Join to pass TPCH tests
2 participants