-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
respect contexts in a more timely manner #900
Conversation
i believe the 7.5MB bursts are documented in #872. i'll try to test this soon, if i can figure out how to test branches... |
The main issue that this PR 'fixes' is that the network stack does not respect contexts. I think that if we switch over to using grpc, a lot of that will be fixed. |
select { | ||
case <-done: | ||
case <-ctx.Done(): | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if bs.send hangs, we exit leaking a hung goroutine.
the downside of this approach is that it becomes easy to hide hanging things, and thus harder to find problems-- nodes would just slowly deteriorate.
why is bs.send
hanging? is there a way to speed that up and respect the context there, instead of bypassing it here?
For others following this along-- the codebase is full of examples of both things: places where we return asap ad let the slow-to-finish goroutine finish on its own, and places where we explicitly wait for it to be done, so that we know we've fully finished. there's tradeoffs and cases where we must do one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send is hanging because of the second stack trace in this gist: https://gist.github.com/whyrusleeping/a04c7bcf791d13e6ce73
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically comes down to our network stack not respecting contexts that well (but gRPC does!!!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So-- when we move to gRPC we could revert this change and go back to waiting?
Also, any idea how long those goroutines wait in yamux before exiting? or what's making them wait in the first place? (worried about letting them build up indefinitely, could cause more problems)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea at all what yamux is doing. the method is waitForSendErr
which selects on the error channel and the shutdown channel and exits when either even happens...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm wonder if it's waiting for some sort of ack from the remote side...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- https://github.com/jbenet/go-peerstream/blob/master/transport/yamux/yamux.go
- https://github.com/hashicorp/yamux
- https://godoc.org/github.com/hashicorp/yamux
cant readily see anything im doing wrong. maybe it does wait for an ack to do flowcontrol / backpressure.
eead067
to
5eb08c4
Compare
respect contexts in a more timely manner
This PR fixes the '7MB bursts' that @anarcat reported. While this fix works, if the lower calls 'hang', those goroutines may build up, but the contexts are cancelled so they should exit sooner rather than later. Shouldnt be an issue.