Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gossipsub + tokio: unexpected warning #2897

Closed
odesenfans opened this issue Sep 12, 2022 · 11 comments
Closed

gossipsub + tokio: unexpected warning #2897

odesenfans opened this issue Sep 12, 2022 · 11 comments

Comments

@odesenfans
Copy link

Summary

Hello! I'm interested in running a gossipsub-based application with Tokio as a runtime. As a first step, I revamped the gossipsub-chat example with Tokio as a runtime (mixing elements from the chat-tokio example). While the communication works, I get two warnings when closing one of the peers:

[2022-09-12T15:23:25Z WARN libp2p_gossipsub::handler] Inbound stream error: Io(Kind(UnexpectedEof))
[2022-09-12T15:23:25Z WARN libp2p_gossipsub::handler] Inbound substream error while closing: Io(Custom { kind: UnexpectedEof, error: "unexpected end of file" })

This warning does not occur with the floodsub example, so I'm wondering what's going on and whether it will have an impact down the line. From what I can determine it looks like the end of the connection results in an EOF instead of a graceful closing.

Debug Output

Local peer id: PeerId("12D3KooWNXEmmzeEWezBCFcPgB17bLU8XmZxQGBb3puBfPBJBW13")
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] Subscribing to topic: test-net
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] Running JOIN for topic: TopicHash { hash: "test-net" }
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] JOIN: Inserting 0 random peers into the mesh
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] Completed JOIN for topic: TopicHash { hash: "test-net" }
[2022-09-12T15:26:28Z DEBUG libp2p_gossipsub::behaviour] Subscribed to topic: test-net
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] listening on 0.0.0.0:0
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.23.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.23.0.1/tcp/38489"
Listening on "/ip4/172.23.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.19.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.19.0.1/tcp/38489"
Listening on "/ip4/172.19.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/192.168.122.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/192.168.122.1/tcp/38489"
Listening on "/ip4/192.168.122.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/192.168.0.147/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/192.168.0.147/tcp/38489"
Listening on "/ip4/192.168.0.147/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.17.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.17.0.1/tcp/38489"
Listening on "/ip4/172.17.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.25.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.25.0.1/tcp/38489"
Listening on "/ip4/172.25.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/127.0.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/127.0.0.1/tcp/38489"
Listening on "/ip4/127.0.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.22.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.22.0.1/tcp/38489"
Listening on "/ip4/172.22.0.1/tcp/38489"
[2022-09-12T15:26:28Z DEBUG libp2p_tcp] New listen address: /ip4/172.21.0.1/tcp/38489
[2022-09-12T15:26:28Z DEBUG libp2p_swarm] Listener ListenerId(11990340746411954256); New address: "/ip4/172.21.0.1/tcp/38489"
Listening on "/ip4/172.21.0.1/tcp/38489"
[2022-09-12T15:26:33Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat
[2022-09-12T15:26:33Z DEBUG libp2p_gossipsub::behaviour] HEARTBEAT: Mesh low. Topic: test-net Contains: 0 needs: 5
[2022-09-12T15:26:33Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:26:33Z DEBUG libp2p_gossipsub::behaviour] Updating mesh, new mesh: {}
[2022-09-12T15:26:33Z DEBUG libp2p_gossipsub::behaviour] Completed Heartbeat
[2022-09-12T15:26:43Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat
[2022-09-12T15:26:43Z DEBUG libp2p_gossipsub::behaviour] HEARTBEAT: Mesh low. Topic: test-net Contains: 0 needs: 5
[2022-09-12T15:26:43Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:26:43Z DEBUG libp2p_gossipsub::behaviour] Updating mesh, new mesh: {}
[2022-09-12T15:26:43Z DEBUG libp2p_gossipsub::behaviour] Completed Heartbeat
[2022-09-12T15:26:48Z DEBUG libp2p_tcp] Incoming connection from /ip4/192.168.0.147/tcp/42914 at /ip4/172.25.0.1/tcp/38489
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: confirming protocol: /noise
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: sent confirmed protocol: /noise
[2022-09-12T15:26:48Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: confirming protocol: /mplex/6.7.0
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: sent confirmed protocol: /mplex/6.7.0
[2022-09-12T15:26:48Z DEBUG libp2p_mplex::io] New multiplexed connection: 4646deb6793b6f32
[2022-09-12T15:26:48Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2022-09-12T15:26:48Z DEBUG libp2p_swarm] Connection established: PeerId("12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV") Listener { local_addr: "/ip4/172.25.0.1/tcp/38489", send_back_addr: "/ip4/192.168.0.147/tcp/42914" }; Total (peer): 1. Total non-banned (peer): 1
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] New peer connected: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:48Z DEBUG libp2p_mplex::io] 4646deb6793b6f32: New outbound substream: (0/initiator) (total 1)
[2022-09-12T15:26:48Z DEBUG multistream_select::dialer_select] Dialer: Proposed protocol: /meshsub/1.1.0
[2022-09-12T15:26:48Z DEBUG libp2p_mplex::io] 4646deb6793b6f32: New inbound substream: (0/receiver) (total 2)
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: confirming protocol: /meshsub/1.1.0
[2022-09-12T15:26:48Z DEBUG multistream_select::listener_select] Listener: sent confirmed protocol: /meshsub/1.1.0
[2022-09-12T15:26:48Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2022-09-12T15:26:48Z DEBUG multistream_select::dialer_select] Dialer: Received confirmation for protocol: /meshsub/1.1.0
[2022-09-12T15:26:48Z DEBUG libp2p_core::upgrade::apply] Successfully applied negotiated protocol
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] New peer type found: Gossipsub v1.1 for peer: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] Handling subscriptions: [GossipsubSubscription { action: Subscribe, topic_hash: TopicHash { hash: "test-net" } }], from source: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] SUBSCRIPTION: Adding gossip peer: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV to topic: TopicHash { hash: "test-net" }
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] SUBSCRIPTION: Adding peer 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV to the mesh for topic TopicHash { hash: "test-net" }
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] Sending GRAFT to peer 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV for topic TopicHash { hash: "test-net" }
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] Handling GRAFT message for peer: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] GRAFT: Received graft for peer PeerId("12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV") that is already in topic TopicHash { hash: "test-net" }
[2022-09-12T15:26:48Z DEBUG libp2p_gossipsub::behaviour] Completed GRAFT handling for peer: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:49Z DEBUG libp2p_mplex::io] 4646deb6793b6f32: Multiplexed connection failed: Kind(UnexpectedEof)
[2022-09-12T15:26:49Z WARN  libp2p_gossipsub::handler] Inbound stream error: Io(Kind(UnexpectedEof))
[2022-09-12T15:26:49Z WARN  libp2p_gossipsub::handler] Inbound substream error while closing: Io(Custom { kind: UnexpectedEof, error: "unexpected end of file" })
[2022-09-12T15:26:49Z DEBUG libp2p_swarm] Connection closed with error IO(Custom { kind: Other, error: Custom { kind: UnexpectedEof, error: "unexpected end of file" } }): Connected { endpoint: Listener { local_addr: "/ip4/172.25.0.1/tcp/38489", send_back_addr: "/ip4/192.168.0.147/tcp/42914" }, peer_id: PeerId("12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV") }; Total (peer): 0.
[2022-09-12T15:26:49Z DEBUG libp2p_gossipsub::behaviour] Peer disconnected: 12D3KooWH31mjn7gLrxBqLgiz7YfbVHraWhVsRsGT7RA75d1p4VV
[2022-09-12T15:26:53Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat
[2022-09-12T15:26:53Z DEBUG libp2p_gossipsub::behaviour] HEARTBEAT: Mesh low. Topic: test-net Contains: 0 needs: 5
[2022-09-12T15:26:53Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:26:53Z DEBUG libp2p_gossipsub::behaviour] Updating mesh, new mesh: {}
[2022-09-12T15:26:53Z DEBUG libp2p_gossipsub::behaviour] Completed Heartbeat
[2022-09-12T15:27:03Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat
[2022-09-12T15:27:03Z DEBUG libp2p_gossipsub::behaviour] HEARTBEAT: Mesh low. Topic: test-net Contains: 0 needs: 5
[2022-09-12T15:27:03Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:27:03Z DEBUG libp2p_gossipsub::behaviour] Updating mesh, new mesh: {}
[2022-09-12T15:27:03Z DEBUG libp2p_gossipsub::behaviour] Completed Heartbeat
[2022-09-12T15:27:13Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat
[2022-09-12T15:27:13Z DEBUG libp2p_gossipsub::behaviour] HEARTBEAT: Mesh low. Topic: test-net Contains: 0 needs: 5
[2022-09-12T15:27:13Z DEBUG libp2p_gossipsub::behaviour] RANDOM PEERS: Got 0 peers
[2022-09-12T15:27:13Z DEBUG libp2p_gossipsub::behaviour] Updating mesh, new mesh: {}
[2022-09-12T15:27:13Z DEBUG libp2p_gossipsub::behaviour] Completed Heartbeat
[2022-09-12T15:27:23Z DEBUG libp2p_gossipsub::behaviour] Starting heartbeat

I pushed the code for the example on this repository: https://github.com/odesenfans/rust-libp2p-gossipsub-tokio. Any ideas?

Version

  • libp2p version (version number, commit, or branch): 0.47.0

Cheers,
Olivier

@dariusc93
Copy link
Member

There are connection fail errors before and after those 2 warnings. Although I would have to check my logs and experiments with some of my code but the only time ive seen this is of a peer suddenly disconnects before everything was received (eg crash, process terminated), hence the unexpected eof, but dont believe I've seen it specifically in gossip. It might not show up in floodsub because the warnings arent there but could be wrong

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Sep 14, 2022

The reason why it might not show up in floodsub might be because it doesn't have as much logging (I didn't check the code).

From what I can determine it looks like the end of the connection results in an EOF instead of a graceful closing.

Are you closing the connection gracefully or just shutting the peer down?

@odesenfans
Copy link
Author

@thomaseizinger I'm just shutting the peer down, which would explain the warning indeed. I haven't seen it yet, what's the idiomatic way to shut down the connection in rust-libp2p?

@mxinden
Copy link
Member

mxinden commented Sep 16, 2022

Unfortunately there is no graceful shutdown for a libp2p node today. See #1682 for a previous attempt.

I don't think any of us are going to get to this any time soon. That said, contributions are welcome.

@thomaseizinger
Copy link
Contributor

@thomaseizinger I'm just shutting the peer down, which would explain the warning indeed. I haven't seen it yet, what's the idiomatic way to shut down the connection in rust-libp2p?

All of the above will - to my knowledge - try to gracefully close the connection before freeing any resources.

@odesenfans
Copy link
Author

Thanks a lot for the suggestions, I'll try them out and see if the warning persists.

@odesenfans
Copy link
Author

So, I looked into the docs and tried to implement a graceful disconnection before shutting down the application. To my understanding of the docs, the only public method available to me from the swarm object is Swarm::disconnect_peer_id. I implemented a loop to disconnect peers one by one on shutdown. This does not resolve the warnings mentioned in the issue, even though I poll the swarm until the connection is closed on the node being shut down. The other node will still print these warnings from the gossipsub handler.

So, additional questions/remarks:

  • I may have missed something in the docs. Is there a way to send a NetworkBehaviourAction::CloseConnection event, or close the underlying connection handlers by using methods of Swarm (or NetworkBehaviour)?
  • If not, my understanding of the issue is that the gossipsub handler polls for the next gossipsub event, sees an I/O error and prints the warning. If it's too hard to determine what's a graceful connection closing and what's not at that point, another way could be to unsubscribe from the gossipsub topics from the closing node. I tried this, but it does not work out of the box. i.e., if I want this to work I should do some additional work in the event handler for GossipsubEvent::Unsubscribe. The gossipsub behaviour does not seem to offer this possibility though.
  • What's bugging me here is that the gossipsub handler gets passed an I/O error. Shouldn't there be a handler that catches it beforehand and translates that to the correct event(s) for protocol handlers like gossipsub?

I'm slowly diving in the code but I'm afraid my current knowledge of Rust and libp2p is too narrow to fix this by myself.

@thomaseizinger
Copy link
Contributor

I think the issue we are facing here is that gossipsub actually keeps substreams alive, which is something rather unsual across most libp2p protocols. Most protocols open a stream, send some data and close it again. Thus, it is highly unlikely that a connection gets closed right inbetween when a substream is active.

For a graceful shutdown of the streams within gossipsub, I think it may depend on the underlying StreamMuxer and whether it can gracefully close streams. Are you using yamux or mplex in your application?

It may also be that we are not properly closing all streams when we shutdown a connection.

@odesenfans
Copy link
Author

For a graceful shutdown of the streams within gossipsub, I think it may depend on the underlying StreamMuxer and whether it can gracefully close streams. Are you using yamux or mplex in your application?

I'm using mplex. Should I try using yamux to see if it makes any difference?

@thomaseizinger
Copy link
Contributor

For a graceful shutdown of the streams within gossipsub, I think it may depend on the underlying StreamMuxer and whether it can gracefully close streams. Are you using yamux or mplex in your application?

I'm using mplex. Should I try using yamux to see if it makes any difference?

Yes please try with yamux. I think the issue you are facing might be explained here:

/// > **Note**: No `Close` or `Reset` frames are sent on open substreams
/// > before closing the underlying connection. However, the connection
/// > close implies a flush of any frames already sent.

We do close the muxer as part of graceful connection shutdown but mplex doesn't gracefully close open streams as part of that.

@thomaseizinger
Copy link
Contributor

Let me know if this is still an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants