Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-45591: [C++][Acero] Refine hash join benchmark and remove openmp from the project #45593

Merged
merged 6 commits into from
Feb 21, 2025

Conversation

zanmato1984
Copy link
Contributor

@zanmato1984 zanmato1984 commented Feb 20, 2025

Rationale for this change

See #45591 .

What changes are included in this PR?

  1. Replace the usage of openmp with arrow-native multi-threading primitives;
  2. Remove all the occurrences of openmp from the project;
  3. Support stats for build side rows in hash join benchmark, and update certain benchmark.

Are these changes tested?

Manually tested.

Are there any user-facing changes?

Removed a public CMake option but I think it shouldn't affect the user.

Copy link

⚠️ GitHub issue #45591 has been automatically assigned in GitHub to PR creator.

@@ -141,7 +141,6 @@ else
-DARROW_BUILD_BENCHMARKS=${ARROW_BUILD_BENCHMARKS:-OFF} \
-DARROW_BUILD_EXAMPLES=${ARROW_BUILD_EXAMPLES:-OFF} \
-DARROW_BUILD_INTEGRATION=${ARROW_BUILD_INTEGRATION:-OFF} \
-DARROW_BUILD_OPENMP_BENCHMARKS=${ARROW_BUILD_OPENMP_BENCHMARKS:-OFF} \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, the hash join benchmark is never ran in our CI. And this is probably why.

@@ -221,36 +221,21 @@ if(ARROW_BUILD_BENCHMARKS)

add_arrow_acero_benchmark(aggregate_benchmark SOURCES aggregate_benchmark.cc)

if(ARROW_BUILD_OPENMP_BENCHMARKS)
find_package(OpenMP REQUIRED)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feeling comfortable of removing one dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too!

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 20, 2025
@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from 97b4b22 to c27cf8a Compare February 20, 2025 15:27
@zanmato1984
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Feb 20, 2025

Benchmark runs are scheduled for commit c27cf8a. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from c27cf8a to 492b429 Compare February 20, 2025 15:59
@zanmato1984 zanmato1984 force-pushed the refine-hash-join-benchmark branch from 06da9aa to c6460ac Compare February 20, 2025 16:55
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

[&](std::function<Status(size_t)> task) -> Status {
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); });
},
thread_pool_->GetCapacity(), settings.num_threads == 1));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change to thread_pool_->GetCapacity() (settings.num_threads) from 2 * settings.num_threads? Is the original 2 * settings.num_threads wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument controls the max number of concurrent tasks in the scheduler, so any value >= settings.num_threads is fine. (The original doubling isn't wrong though.)

settings.num_threads == 1));
/*thread_id=*/0,
[&](std::function<Status(size_t)> task) -> Status {
return thread_pool_->Spawn([&, task]() { DCHECK_OK(task(thread_indexer_())); });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you choose Spawn() + conditional variable not Submit() + Future::status() (or Spawn() + ThreadPool::WaitForIdle())? Is it easy to write/maintain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right.

Spawn() + WaitForIdle() is easier and suffice (I was wrongly suspecting WaitForIdle() not working so changed to using cond var. But later it turned out the problem was something else and forgot to change it back.)

I'm updating it. Thank you for pointing this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -221,36 +221,21 @@ if(ARROW_BUILD_BENCHMARKS)

add_arrow_acero_benchmark(aggregate_benchmark SOURCES aggregate_benchmark.cc)

if(ARROW_BUILD_OPENMP_BENCHMARKS)
find_package(OpenMP REQUIRED)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Feb 20, 2025
Copy link

Thanks for your patience. Conbench analyzed the 0 benchmarking runs that have been run so far on PR commit c27cf8a.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@zanmato1984
Copy link
Contributor Author

@ursabot please benchmark

@ursabot
Copy link

ursabot commented Feb 21, 2025

Benchmark runs are scheduled for commit 4cba288. Watch https://buildkite.com/apache-arrow and https://conbench.ursa.dev for updates. A comment will be posted here when the runs are complete.

@@ -28,11 +28,11 @@
#include "arrow/testing/random.h"
#include "arrow/util/thread_pool.h"

#include <condition_variable>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot this.

Removing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#include <cstdint>
#include <cstdio>
#include <memory>

#include <omp.h>
#include <mutex>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks for reminding!

@github-actions github-actions bot removed the awaiting merge Awaiting merge label Feb 21, 2025
@github-actions github-actions bot added the awaiting changes Awaiting changes label Feb 21, 2025
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 21, 2025
@zanmato1984
Copy link
Contributor Author

Though the benchmark result is not ready yet, I've manually checked some of the finished ones and saw hash join benchmarks are executed as expected, w/o baseline as expected as well (they never run before).

So everything seems good now. I'm merging.

Thanks @kou for reviewing.

@zanmato1984 zanmato1984 merged commit b36659a into apache:main Feb 21, 2025
39 checks passed
@zanmato1984 zanmato1984 removed the awaiting change review Awaiting change review label Feb 21, 2025
@zanmato1984 zanmato1984 deleted the refine-hash-join-benchmark branch February 21, 2025 06:10
Copy link

Thanks for your patience. Conbench analyzed the 4 benchmarking runs that have been run so far on PR commit 4cba288.

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details.

Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit b36659a.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 21 possible false positives for unstable benchmarks that are known to sometimes produce them.

kou pushed a commit to kou/arrow that referenced this pull request Feb 25, 2025
…enmp from the project (apache#45593)

### Rationale for this change

See apache#45591 .

### What changes are included in this PR?

1. Replace the usage of openmp with arrow-native multi-threading primitives;
2. Remove all the occurrences of openmp from the project;
3. Support stats for build side rows in hash join benchmark, and update certain benchmark.

### Are these changes tested?

Manually tested.

### Are there any user-facing changes?

Removed a public CMake option but I think it shouldn't affect the user.

* GitHub Issue: apache#45591

Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
arashandishgar pushed a commit to arashandishgar/arrow that referenced this pull request Feb 25, 2025
…enmp from the project (apache#45593)

### Rationale for this change

See apache#45591 .

### What changes are included in this PR?

1. Replace the usage of openmp with arrow-native multi-threading primitives;
2. Remove all the occurrences of openmp from the project;
3. Support stats for build side rows in hash join benchmark, and update certain benchmark.

### Are these changes tested?

Manually tested.

### Are there any user-facing changes?

Removed a public CMake option but I think it shouldn't affect the user.

* GitHub Issue: apache#45591

Authored-by: Rossi Sun <[email protected]>
Signed-off-by: Rossi Sun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants