Worker pool for efficient handling of blocking_operation_wait
.
#359
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
See #352 for context.
The cost of creating threads and the usage of
nogvl
creates inefficiencies.rb_nogvl
should really only be used when the amount of work to be done is greater than some scheduling quantum. In practice, that's hard to achieve, so we also want to minimise the overhead ofblocking_operation_wait
(later abbreviated BOW).I've been using
async-cable
as a benchmark as it has a good mixture of IO and nogvl (inflate/deflate) operations. Those operations are typically extremely small, so the overhead is revealed greatly. Another benchmark is the recently introducedIO::Buffer#copy
using nogvl on large buffers. It is the opposite - highly CPU bound (memory bound actually) work with little IO.Semantics remain unchanged.
Async::Cable
BenchmarksThis benchmark is mostly network bound and there are a lot of small calls to inflate/deflate which uses
rb_nogvl
:Overall, we can see a net loss in performance by offloading
rb_nogvl
with a pure Ruby implementation. I believe we can attribute this to the offloading thread having to re-acquire the GVL which creates unnecessary contention. This is fixable but requires a native code path (probably in theIO::Event
scheduler implementation.IO::Buffer
BenchmarksThis benchmark is more memory bound and there is essentially zero blocking IO:
Overall, the thread and work pool are similar. I believe we see GVL contention even on the background threads as I'd expect the numbers to be a little more linear, although it's also true the memory bandwidth isn't unlimited.
Types of Changes
Contribution