-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project #45593
GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project #45593
Conversation
|
@@ -141,7 +141,6 @@ else | |||
-DARROW_BUILD_BENCHMARKS=${ARROW_BUILD_BENCHMARKS:-OFF} \ | |||
-DARROW_BUILD_EXAMPLES=${ARROW_BUILD_EXAMPLES:-OFF} \ | |||
-DARROW_BUILD_INTEGRATION=${ARROW_BUILD_INTEGRATION:-OFF} \ | |||
-DARROW_BUILD_OPENMP_BENCHMARKS=${ARROW_BUILD_OPENMP_BENCHMARKS:-OFF} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the hash join benchmark is never ran in our CI. And this is probably why.
@@ -221,36 +221,21 @@ if(ARROW_BUILD_BENCHMARKS) | |||
|
|||
add_arrow_acero_benchmark(aggregate_benchmark SOURCES aggregate_benchmark.cc) | |||
|
|||
if(ARROW_BUILD_OPENMP_BENCHMARKS) | |||
find_package(OpenMP REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feeling comfortable of removing one dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too!
97b4b22
to
c27cf8a
Compare
@ursabot please benchmark |
Benchmark runs are scheduled for commit c27cf8a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
c27cf8a
to
492b429
Compare
06da9aa
to
c6460ac
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
[&](std::function<Status(size_t)> task) -> Status { | ||
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); }); | ||
}, | ||
thread_pool_->GetCapacity(), settings.num_threads == 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change to thread_pool_->GetCapacity()
(settings.num_threads
) from 2 * settings.num_threads
? Is the original 2 * settings.num_threads
wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument controls the max number of concurrent tasks in the scheduler, so any value >= settings.num_threads
is fine. (The original doubling isn't wrong though.)
settings.num_threads == 1)); | ||
/*thread_id=*/0, | ||
[&](std::function<Status(size_t)> task) -> Status { | ||
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you choose Spawn()
+ conditional variable not Submit()
+ Future::status()
(or Spawn()
+ ThreadPool::WaitForIdle()
)? Is it easy to write/maintain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right.
Spawn()
+ WaitForIdle()
is easier and suffice (I was wrongly suspecting WaitForIdle()
not working so changed to using cond var. But later it turned out the problem was something else and forgot to change it back.)
I'm updating it. Thank you for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -221,36 +221,21 @@ if(ARROW_BUILD_BENCHMARKS) | |||
|
|||
add_arrow_acero_benchmark(aggregate_benchmark SOURCES aggregate_benchmark.cc) | |||
|
|||
if(ARROW_BUILD_OPENMP_BENCHMARKS) | |||
find_package(OpenMP REQUIRED) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so too!
Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit c27cf8a. None of the specified runs were found on the Conbench server. The full Conbench report has more details. |
@ursabot please benchmark |
Benchmark runs are scheduled for commit 4cba288. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete. |
@@ -28,11 +28,11 @@ | |||
#include "arrow/testing/random.h" | |||
#include "arrow/util/thread_pool.h" | |||
|
|||
#include <condition_variable> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, forgot this.
Removing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
#include <cstdint> | ||
#include <cstdio> | ||
#include <memory> | ||
|
||
#include <omp.h> | ||
#include <mutex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thanks for reminding!
Though the benchmark result is not ready yet, I've manually checked some of the finished ones and saw hash join benchmarks are executed as expected, w/o baseline as expected as well (they never run before). So everything seems good now. I'm merging. Thanks @kou for reviewing. |
Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 4cba288. There was 1 benchmark result indicating a performance regression:
The full Conbench report has more details. |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit b36659a. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 21 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…enmp from the project (apache#45593) ### Rationale for this change See apache#45591 . ### What changes are included in this PR? 1. Replace the usage of openmp with arrow-native multi-threading primitives; 2. Remove all the occurrences of openmp from the project; 3. Support stats for build side rows in hash join benchmark, and update certain benchmark. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? Removed a public CMake option but I think it shouldn't affect the user. * GitHub Issue: apache#45591 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
…enmp from the project (apache#45593) ### Rationale for this change See apache#45591 . ### What changes are included in this PR? 1. Replace the usage of openmp with arrow-native multi-threading primitives; 2. Remove all the occurrences of openmp from the project; 3. Support stats for build side rows in hash join benchmark, and update certain benchmark. ### Are these changes tested? Manually tested. ### Are there any user-facing changes? Removed a public CMake option but I think it shouldn't affect the user. * GitHub Issue: apache#45591 Authored-by: Rossi Sun <[email protected]> Signed-off-by: Rossi Sun <[email protected]>
Rationale for this change
See #45591 .
What changes are included in this PR?
Are these changes tested?
Manually tested.
Are there any user-facing changes?
Removed a public CMake option but I think it shouldn't affect the user.