Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

respect contexts in a more timely manner #900

Merged
merged 2 commits into from
Mar 9, 2015
Merged

Conversation

whyrusleeping
Copy link
Member

This PR fixes the '7MB bursts' that @anarcat reported. While this fix works, if the lower calls 'hang', those goroutines may build up, but the contexts are cancelled so they should exit sooner rather than later. Shouldnt be an issue.

@whyrusleeping whyrusleeping added the status/in-progress In progress label Mar 8, 2015
@anarcat
Copy link
Contributor

anarcat commented Mar 8, 2015

i believe the 7.5MB bursts are documented in #872. i'll try to test this soon, if i can figure out how to test branches...

@whyrusleeping
Copy link
Member Author

The main issue that this PR 'fixes' is that the network stack does not respect contexts. I think that if we switch over to using grpc, a lot of that will be fixed.

select {
case <-done:
case <-ctx.Done():
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if bs.send hangs, we exit leaking a hung goroutine.

the downside of this approach is that it becomes easy to hide hanging things, and thus harder to find problems-- nodes would just slowly deteriorate.

why is bs.send hanging? is there a way to speed that up and respect the context there, instead of bypassing it here?


For others following this along-- the codebase is full of examples of both things: places where we return asap ad let the slow-to-finish goroutine finish on its own, and places where we explicitly wait for it to be done, so that we know we've fully finished. there's tradeoffs and cases where we must do one or the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

send is hanging because of the second stack trace in this gist: https://gist.github.com/whyrusleeping/a04c7bcf791d13e6ce73

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically comes down to our network stack not respecting contexts that well (but gRPC does!!!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So-- when we move to gRPC we could revert this change and go back to waiting?

Also, any idea how long those goroutines wait in yamux before exiting? or what's making them wait in the first place? (worried about letting them build up indefinitely, could cause more problems)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea at all what yamux is doing. the method is waitForSendErr which selects on the error channel and the shutdown channel and exits when either even happens...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm wonder if it's waiting for some sort of ack from the remote side...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant readily see anything im doing wrong. maybe it does wait for an ack to do flowcontrol / backpressure.

@whyrusleeping whyrusleeping force-pushed the fix/bitswap-ctx-respect branch from eead067 to 5eb08c4 Compare March 9, 2015 07:22
jbenet added a commit that referenced this pull request Mar 9, 2015
respect contexts in a more timely manner
@jbenet jbenet merged commit bcbc268 into master Mar 9, 2015
@jbenet jbenet removed the status/in-progress In progress label Mar 9, 2015
@jbenet jbenet deleted the fix/bitswap-ctx-respect branch March 9, 2015 07:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants