-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Acero] Refine hash join benchmark #45591
Comments
zanmato1984
added a commit
that referenced
this issue
Feb 21, 2025
…rom the project (#45593) ### Rationale for this change See #45591 . ### What changes are included in this PR? 1. Replace the usage of openmp with arrow-native multi-threading primitives; 2. Remove all the occurrences of openmp from the project; 3. Support stats for build side rows in hash join benchmark, and update certain benchmark. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? Removed a public CMake option but I think it shouldn't affect the user. * GitHub Issue: #45591 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
Issue resolved by pull request 45593 |
kou
pushed a commit
to kou/arrow
that referenced
this issue
Feb 25, 2025
…enmp from the project (apache#45593) ### Rationale for this change See apache#45591 . ### What changes are included in this PR? 1. Replace the usage of openmp with arrow-native multi-threading primitives; 2. Remove all the occurrences of openmp from the project; 3. Support stats for build side rows in hash join benchmark, and update certain benchmark. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? Removed a public CMake option but I think it shouldn't affect the user. * GitHub Issue: apache#45591 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
arashandishgar
pushed a commit
to arashandishgar/arrow
that referenced
this issue
Feb 25, 2025
…enmp from the project (apache#45593) ### Rationale for this change See apache#45591 . ### What changes are included in this PR? 1. Replace the usage of openmp with arrow-native multi-threading primitives; 2. Remove all the occurrences of openmp from the project; 3. Support stats for build side rows in hash join benchmark, and update certain benchmark. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? Removed a public CMake option but I think it shouldn't affect the user. * GitHub Issue: apache#45591 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the enhancement requested
The current hash join benchmark can be refined in the following aspects:
It is using openmp for multi-threading, and this is the solely place that depends on openmp library. Based on my experience of recent benchmarking, openmp can cause tricky inaccurate benchmarking numbers (some obvious improvement shows worse number in one or two out of tens of cases, I still don't know why), and make diagnosis hard (I was using flame graph and openmp makes it hard to interpret, see the following two pictures).


Flame graph with openmp:
Flame graph without openmp:
I'm not entirely sure why openmp was introduced at the first place (maybe @save-buffer can elaborate a bit?), but I think at this point we can trivially replace it with arrow-native multi-threading primitives. (Besides a bunch of simplification of cmake could be done).
In certain benchmarks, we are more interested in the number of rows from the build side. Currently all benchmarks are based on the number of rows from the probe side.
Component(s)
C++
The text was updated successfully, but these errors were encountered: