-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bound gossip channels #5104
base: master
Are you sure you want to change the base?
Bound gossip channels #5104
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+r network team approval
pr looks good but i am running some tests to get a better understanding of the behavior |
@@ -71,6 +72,7 @@ impl StreamerReceiveStats { | |||
packet_batches_count: AtomicUsize::default(), | |||
full_packet_batches_count: AtomicUsize::default(), | |||
max_channel_len: AtomicUsize::default(), | |||
num_packets_dropped: AtomicUsize::default(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: is it time to start reporting StreamerReceiveStats
? or is there a reason we don't actually report these stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can certainly include stat reporting in this PR.
As far as I can tell, when the stats component of streamer was added, reporting of those stats was not added to all usage sites, presumably to satisfy type checking of that change without making assumptions about reporting. I.e., I don't think we're deliberately not reporting stats
9c85fbd
to
cb9bae8
Compare
+1 from me |
Problem
The
unbounded
channel uses what is effectively a linked list of "blocks" of 31 slots. As volume surges, it has to dynamically allocate additional blocks to accommodate the extra slots required. It does this on demand, one-by-one, which adds extra synchronization overhead on the sending and receiving side in addition to the time waiting for allocation / deallocation -- this slows down the entire pipeline of packet processing. Contrast this to abounded
channel, which uses a single contiguous allocation without any of the block synchronization overhead.Summary of Changes
Gossip packet buffers now used
bounded
channels, with a fixed, preallocated, capacity.It was observed that under extreme load, the channel caps out at around 11k packet batches. The channel capacity introduced here is
16,384
(2^14
) packet batches, which is the 11k figure rounded up to the next power of 2. This ensures that load shedding is highly unlikely to occur on the sender and continues to be done on the receiver side.Senders are also updated to use
try_send
and introduce error handling in the event that channels are full. It is highly unlikely that these errors are actually hit in practice given the large capacity of the channels and the draining that is occurring in the receivers. This adjustment is made here for correctness and completeness, and also dovetail with forthcoming updates to load shed at the send site.To reiterate, there is no practical change to current load shedding behavior in this branch, save for edge cases that were never observed in load testing and are highly unlikely to be encountered.
Broken out of #5065