Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker pool for efficient handling of blocking_operation_wait. #359

Merged
merged 1 commit into from
Nov 24, 2024

Conversation

ioquatix
Copy link
Member

@ioquatix ioquatix commented Nov 22, 2024

See #352 for context.

The cost of creating threads and the usage of nogvl creates inefficiencies.

rb_nogvl should really only be used when the amount of work to be done is greater than some scheduling quantum. In practice, that's hard to achieve, so we also want to minimise the overhead of blocking_operation_wait (later abbreviated BOW).

I've been using async-cable as a benchmark as it has a good mixture of IO and nogvl (inflate/deflate) operations. Those operations are typically extremely small, so the overhead is revealed greatly. Another benchmark is the recently introduced IO::Buffer#copy using nogvl on large buffers. It is the opposite - highly CPU bound (memory bound actually) work with little IO.

Semantics remain unchanged.

Async::Cable Benchmarks

This benchmark is mostly network bound and there are a lot of small calls to inflate/deflate which uses rb_nogvl:

Configuration Connection Time Message Time
No BOW 0.67ms 0.024ms
Thread BOW 2.2ms 0.96ms
Work Pool BOW 0.83ms 0.045ms

Overall, we can see a net loss in performance by offloading rb_nogvl with a pure Ruby implementation. I believe we can attribute this to the offloading thread having to re-acquire the GVL which creates unnecessary contention. This is fixable but requires a native code path (probably in the IO::Event scheduler implementation.

IO::Buffer Benchmarks

This benchmark is more memory bound and there is essentially zero blocking IO:

Configuration Task Count Buffer Size Duration Throughput
No BOW 1 100MiB 6.46ms 15GB/s
No BOW 8 100MiB 28.93ms 27GB/s
No BOW 16 100MiB 56.81ms 28GB/s
Thread BOW 1 100MiB 6.99ms 14GB/s
Thread BOW 8 100MiB 24.52ms 32GB/s
Thread BOW 16 100MiB 44.33ms 36GB/s
Work Pool BOW 1 100MiB 7.12ms 14GB/s
Work Pool BOW 8 100MiB 20.41ms 39GB/s
Work Pool BOW 16 100MiB 43.53ms 36GB/s

Overall, the thread and work pool are similar. I believe we see GVL contention even on the background threads as I'd expect the numbers to be a little more linear, although it's also true the memory bandwidth isn't unlimited.

Types of Changes

  • Performance improvement.

Contribution

@ioquatix ioquatix force-pushed the work-pool branch 3 times, most recently from 39419c2 to 785ee50 Compare November 22, 2024 04:39
@ioquatix ioquatix changed the title Work pool for efficient handling of blocking_operation_wait. Worker pool for efficient handling of blocking_operation_wait. Nov 22, 2024
@ioquatix ioquatix force-pushed the work-pool branch 3 times, most recently from f66aa44 to d1fcdd4 Compare November 23, 2024 07:24
@ioquatix ioquatix merged commit 9449e6f into main Nov 24, 2024
50 of 58 checks passed
@ioquatix ioquatix deleted the work-pool branch November 24, 2024 02:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant