Worker pool for efficient handling of `blocking_operation_wait`. #359

ioquatix · 2024-11-22T04:16:20Z

See #352 for context.

The cost of creating threads and the usage of nogvl creates inefficiencies.

rb_nogvl should really only be used when the amount of work to be done is greater than some scheduling quantum. In practice, that's hard to achieve, so we also want to minimise the overhead of blocking_operation_wait (later abbreviated BOW).

I've been using async-cable as a benchmark as it has a good mixture of IO and nogvl (inflate/deflate) operations. Those operations are typically extremely small, so the overhead is revealed greatly. Another benchmark is the recently introduced IO::Buffer#copy using nogvl on large buffers. It is the opposite - highly CPU bound (memory bound actually) work with little IO.

Semantics remain unchanged.

`Async::Cable` Benchmarks

This benchmark is mostly network bound and there are a lot of small calls to inflate/deflate which uses rb_nogvl:

Configuration	Connection Time	Message Time
No BOW	0.67ms	0.024ms
Thread BOW	2.2ms	0.96ms
Work Pool BOW	0.83ms	0.045ms

Overall, we can see a net loss in performance by offloading rb_nogvl with a pure Ruby implementation. I believe we can attribute this to the offloading thread having to re-acquire the GVL which creates unnecessary contention. This is fixable but requires a native code path (probably in the IO::Event scheduler implementation.

`IO::Buffer` Benchmarks

This benchmark is more memory bound and there is essentially zero blocking IO:

Configuration	Task Count	Buffer Size	Duration	Throughput
No BOW	1	100MiB	6.46ms	15GB/s
No BOW	8	100MiB	28.93ms	27GB/s
No BOW	16	100MiB	56.81ms	28GB/s
Thread BOW	1	100MiB	6.99ms	14GB/s
Thread BOW	8	100MiB	24.52ms	32GB/s
Thread BOW	16	100MiB	44.33ms	36GB/s
Work Pool BOW	1	100MiB	7.12ms	14GB/s
Work Pool BOW	8	100MiB	20.41ms	39GB/s
Work Pool BOW	16	100MiB	43.53ms	36GB/s

Overall, the thread and work pool are similar. I believe we see GVL contention even on the background threads as I'd expect the numbers to be a little more linear, although it's also true the memory bandwidth isn't unlimited.

Types of Changes

Performance improvement.

Contribution

I added tests for my changes.
I tested my changes locally.
I agree to the Developer's Certificate of Origin 1.1.

ioquatix force-pushed the work-pool branch 3 times, most recently from 39419c2 to 785ee50 Compare November 22, 2024 04:39

ioquatix changed the title ~~Work pool for efficient handling of blocking_operation_wait.~~ Worker pool for efficient handling of blocking_operation_wait. Nov 22, 2024

ioquatix force-pushed the work-pool branch 3 times, most recently from f66aa44 to d1fcdd4 Compare November 23, 2024 07:24

Work pool for (more) efficient handling of blocking_operation_wait.

343ba86

ioquatix force-pushed the work-pool branch from d1fcdd4 to 343ba86 Compare November 23, 2024 08:12

ioquatix merged commit 9449e6f into main Nov 24, 2024
50 of 58 checks passed

ioquatix deleted the work-pool branch November 24, 2024 02:06

octo-sts bot mentioned this pull request Dec 25, 2024

ruby3.3-async/2.21.1 package update wolfi-dev/os#38357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker pool for efficient handling of `blocking_operation_wait`. #359

Worker pool for efficient handling of `blocking_operation_wait`. #359

ioquatix commented Nov 22, 2024 •

edited

Loading

Worker pool for efficient handling of blocking_operation_wait. #359

Worker pool for efficient handling of blocking_operation_wait. #359

Conversation

ioquatix commented Nov 22, 2024 • edited Loading

Async::Cable Benchmarks

IO::Buffer Benchmarks

Types of Changes

Contribution

Worker pool for efficient handling of `blocking_operation_wait`. #359

Worker pool for efficient handling of `blocking_operation_wait`. #359

ioquatix commented Nov 22, 2024 •

edited

Loading

`Async::Cable` Benchmarks

`IO::Buffer` Benchmarks