-
-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Congestion controller fixes #975
Conversation
6898182
to
e0bc877
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, but it probably also needs a review from @Ralith.
e0bc877
to
0c0980d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is much needed!
if self.bytes_acked >= self.window { | ||
self.bytes_acked -= self.window; | ||
self.window += self.config.max_datagram_size; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be helpful to cite the RFC we got this logic from.
quinn-proto/src/connection/mod.rs
Outdated
@@ -422,6 +428,8 @@ where | |||
|| self.path_response.is_some() | |||
}); | |||
|
|||
let mut was_congestion_blocked = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere (e.g. handle_packet
) we use "was_*" to refer to the state prior to running some logic which may change that state, whereas this is referring to the state after we're done. Could we say is_
here instead? Also, maybe transport_blocked
(as opposed to application) since pacing can be responsible as well as congestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine with removing was_
. But I think even with pacing congestion_blocked
is fine, because pacing is a form of congestion control
if self.window >= self.ssthresh { | ||
// Exiting slow start | ||
// Initialize `bytes_acked` for congestion avoidance | ||
self.bytes_acked = self.window - self.ssthresh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this logic based on? Also, style nit:
self.bytes_acked = self.window - self.ssthresh | |
self.bytes_acked = self.window - self.ssthresh; |
This change fixes 2 issues in the congestion controller: Extra slow slow start === The congestion controller had an issue where it didn't ramp up by doubling the window in the slow start phase as expected. The reason here was the usage of the "congestion_blocked" field, which aims to remove the avoid growing the window when there is no data to send. The issue with the field was that the first incoming ACK for every batch would increase the congestion window, which would let the next call to `congestion_blocked()` return false, and thereby let the congestion controller ignore all of the following ACKs in a received batch up the point where the next `poll_transmit()` call is made and the window is filled again. Since often all ACKs for one round-trip arrive in a single batch that means only the first ACK had an effect and the others where ignored -> which lead to increasing the window by 1 MTU instead of doubling it in the slow start phase. The new approach introduces a new `app_limited` field which caches whether the last `poll_transmit` attempt couldn't produce data because there was no application data available. Numeric precision issue in congestion avoidance === The formula ``` self.window += self.config.max_datagram_size * bytes / self.window; ``` which is specified in the RFC requires floating point arithmetic. If integer arithmetic is used, the multiplication of the first 2 values yields about 1.7MB for 1300 byte packets. As soon as the window is bigger than this value, the window won't change at all due the division leading to 0. The new implementation follow this guidance from the quic recovery specification: > In congestion avoidance, implementers that use an integer > representation for congestion_window should be careful with division, > and can use the alternative approach suggested in Section 2.1 of > [RFC3465].
0c0980d
to
f3d025e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
This change fixes 2 issues in the congestion controller:
Extra slow slow start
The congestion controller had an issue where it didn't ramp up by
doubling the window in the slow start phase as expected. The reason
here was the usage of the "congestion_blocked" field, which aims to
remove the avoid growing the window when there is no data to send.
The issue with the field was that the first incoming ACK for every batch
would increase the congestion window, which would let the next
call to
congestion_blocked()
return false, and thereby let thecongestion controller ignore all of the following ACKs in a received
batch up the point where the next
poll_transmit()
call is made andthe window is filled again. Since often all ACKs for one round-trip
arrive in a single batch that means only the first ACK had an effect
and the others where ignored -> which lead to increasing the
window by 1 MTU instead of doubling it in the slow start phase.
The new approach introduces a new
app_limited
field whichcaches whether the last
poll_transmit
attempt couldn't producedata because there was no application data available.
Numeric precision issue in congestion avoidance
The formula
which is specified in the RFC requires floating point arithmetic.
If integer arithmetic is used, the multiplication of the first 2 values
yields about 1.7MB for 1300 byte packets. As soon as the window is
bigger than this value, the window won't change at all due the
division leading to 0.
The new implementation follow this guidance from the quic recovery
specification: