-
Notifications
You must be signed in to change notification settings - Fork 112
Cancel/Want short-circuiting is racy #347
Comments
Really, if we fail to send, we should drop the peer and walk away. We can do this by:
That is, we won't forget about the peer. Otherwise, we'll mess up our reference counting. Instead, we just mark it as dead and move on. I'm pretty sure a large part of our issue has been peers that literally don't support the bitswap protocol. We'll currently keep retrying over and over. |
You're right, nice catch 👍 An alternative that occurs to me is that when we call
|
I think we should be disconnecting from these peers regardless. I'd also like to reduce the number of ad-hoc checks. |
Sorry, not disconnecting. We should start ignoring these peers. |
Ah so you're saying we should just eagerly move wants/cancels from pending to sent, and if something goes wrong when trying to send the message we just shut down the message queue for the peer? |
Yeah, basically. We could also try reviving the peer (from scratch) we receive a new "connected" event. But that's a lower priority. It's just that the current logic is really unfriendly to non-ipfs peers as we'll repeatedly hammer them with new streams even if they don't speak the protocol. |
Makes sense 👍 Once they've failed a few dial attempts, do you think we should try to reconnect with a backoff, or just put them in a blacklist forever? |
For now, I just want to drop them (and re-activate when we receive a new "connect" event). |
If the peer doesn't respond for a long time, I think we actually need to simulate it having "disconnected" so that it's removed from the session. This allows the session to stop sending want-block to the peer and to search for better peers. The session will re-add the peer if it receives a block or HAVE from the peer, so it may make sense to wire this change through at the networking layer. I propose:
|
That sounds like a great idea. |
@Stebalien is this issue resolved? If so I will close |
👍 |
Cancel Race:
However, the inverse isn't an issue:
So, I think the solution here is to either:
IMO, the second solution is probably the "more correct" solution.
The text was updated successfully, but these errors were encountered: