Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design discussion #1

Closed
bigs opened this issue Sep 10, 2018 · 15 comments
Closed

Design discussion #1

bigs opened this issue Sep 10, 2018 · 15 comments
Assignees

Comments

@bigs
Copy link
Contributor

bigs commented Sep 10, 2018

Opening a thread for the discussion of the control API for our daemon.

Control API

Responsibilities

  • Peerstore management
  • Stream creation (connect if connection does not exist)
  • General information retrieval
    • Open connections
    • Open streams
    • Peerstore size
  • Register protocol handlers

Implementation details

Two solid options:

  • HTTP/JSON
    • Requires polling for incoming connections
  • JSON over TCP
    • Bidirectional, so we can push incoming stream notices to users

Stream Proxy

Responsibilities

  • Proxy streams to end users, taking care of all security (secio, TLS) and multiplexing (yamux, etc)
  • Clients should be able to close streams (i.e. close a file handle)
  • Incoming streams on registered protocols should create new streams

Implementation details

Editor's note: I think we can get away by polling the filesystem/shared memory where our streams are created as opposed to polling the control API, which would make things simpler.

  • Unix sockets
    • One socket per stream
    • Organized on filesystem by connection id
    • Seems Windows support has landed
  • shmem
    • Likely fastest implementation
    • Certainly most complex, least platform agnostic
      • There exists a Windows alternative, but it has a separate API. Perhaps something to be dealt with in a golang library
  • Proxy Filesystems (FUSE, WinFSP)
    • Heavier than unix sockets
@bigs
Copy link
Contributor Author

bigs commented Sep 10, 2018

i'm feeling pretty good about HTTP/JSON API with a unix socket based stream manager, where the client polls the configured directories in the filesystem for new incoming sockets

@vyzo
Copy link
Collaborator

vyzo commented Sep 11, 2018

Per discussion with bigs on zoom:

  • We discussed a symmetric unix domain socket stream protocol
    • the daemon listens to a unix socket, and the client initiates streams by opening connection and issuing a protocol header
    • the client listens to a unix socket to provide stream handlers; the daemon connects back on stream open and issues a protocol header
  • The daemon exports the control interface through http/json, as is the simplest and most flexible approach to begin with. We can later implement a binary control protocol over unix socket as well.

The protocol header must contain at minimum:

  • a disambiguator, to allow later implementing the control protocol multiplexed in the daemon
  • the peer ID and multiaddr
  • the protocol for the stream

We settled on using delimited protobuf for the protocol header, as this saves the need to write custom serializers.

@Stebalien
Copy link
Member

Before going into design considerations too much, let's flesh out our motivations so we get on the same page: #3

Notes on the current discussion:

  • Serialization Format: I'd like to make a somewhat bold proposal: no JSON. JSON is lacking a "bytes" type and this has caused no end of trouble. How do you feel about mandating CBOR?
  • Multi-Tenet from day 1: We'd like much of this daemon to eventually move to the kernel (or, at least, a system daemon) and it turns out that adding in multi-user support later is rather tricky.
  • 100% libp2p: I'd like to at least entertain the idea of going 100% libp2p; that is, no HTTP API. This would make everything we do network transparent (in theory). We may need to have some service expose an HTTP API for simplicity but we should at least consider a micro-kernel like architecture where that's a separate daemon.

@vyzo
Copy link
Collaborator

vyzo commented Sep 12, 2018

If we are going JSON-less, let's not add another format -- we can use protobufs for the control protocol and multiplex in the UNIX socket.

@vyzo
Copy link
Collaborator

vyzo commented Sep 12, 2018

For multi-tenant applications we might have the issue of who's handling the streams -- there can be only a single stream handler for each protocol.
I think it makes more sense for each application to run its own daemon; the application can also be composed of multiple processes.

@raulk
Copy link
Member

raulk commented Sep 12, 2018

Couple of points here:

  1. Another way to view what we're doing is stream virtualisation.

  2. Supposing that the daemon is exposing streams over unix sockets and SHM, we should publish lightweight client bindings/protocol SDKs in different languages. We really don't want users (re)implementing the plumbing across the board to attach their apps to our our local, virtualised libp2p transports.

  3. I echo @Stebalien's third point about keeping it 100% libp2p. I'd advocate for the daemon to be started with the listen multiaddr for the control plane, that only accepts local transports, e.g. --listen /unixsocket/[/var/unix/...], along with an option to enable the --http layer, outputting a warning that interface is only for development/testing/admin.

For the multi-tenant mode, the master control plane could accept only two commands: new, attach.

  • With new an app starts a new session, does an encryption handshake, and receives a private socket/shm assignment for its app control plane (which is encrypted for that app only), along with a token to re-attach in the future.
  • With attach, an existing app could reattach by providing the token.

Just some initial brainstorming, really.

@vyzo
Copy link
Collaborator

vyzo commented Sep 12, 2018

I think that requiring clients to implement yamux/secio is a huge burden for bindings implementors (speaking as one :)

@raulk
Copy link
Member

raulk commented Sep 12, 2018

Agree. Bindings should not perform multiplexing, that's precisely what the daemon does for them, tunnelling each stream onto a physical mapping atop a local transport. So there'd be a 1:1 mapping between a local resource (e.g. shm, socket) and a backing stream.

Regarding secio, if we want multitenancy and isolation in the future, I guess we'll need an encryption mechanism to avoid apps cross-reading streams. But yeah, that complicates binding implementations. Alternatively, we could leverage OS resources like cgroups to provide the isolation.

@Stebalien
Copy link
Member

I'd advocate for the daemon to be started with the listen multiaddr for the control plane, that only accepts local transports

Technically, we don't even need to do that as long as we can whitelist (although we may want to anyways).

I think that requiring clients to implement yamux/secio is a huge burden for bindings implementors (speaking as one :)

My thinking here is that we'd use a super special local-only transport. That is, we build a SHM/unix domain socket transport that does 90% of the work on the server side. We could even have multiple: a simple one that uses a new file descriptor per stream and a fancy one that uses memory mapping and a single socket for control information. We also don't need to do any secio/encryption as it's all local and privacy/authentication can be enforced by the kernel. The real tricky part here would be key management because we currently expect all "peers" to be identified by a public key. We could do an authentication round using signatures but that feels like overkill.

The only tricky part here would be peer IDs but, in theory,

@vyzo
Copy link
Collaborator

vyzo commented Sep 13, 2018

Let's not have so many words and no code!
Initial implementation: #4

@raulk raulk mentioned this issue Sep 25, 2018
@cheatfate
Copy link

I'm sorry, but is SHM transport is really needed?
In situation where

client -> shm -> libp2p-daemon -> network

libp2p-daemon become a bottleneck. SHM will allow clients to send data in much bigger speeds then libp2p-daemon will be able to handle and send to network, because network is much more slower then SHM.

In such case libp2p-daemon will needs to hold big buffers to keep all the incoming packets, or lock incoming clients until it will be able to send data over network.

@bigs
Copy link
Contributor Author

bigs commented Oct 5, 2018

@cheatfate the spec has actually been formalized (in its current state) in SPEC.md. shm was thrown around as a more direct, efficient, method of IPC. we're not approaching it at the moment.

@raulk
Copy link
Member

raulk commented Oct 5, 2018

Yeah, as @bigs says SHM is just on the radar, but not an immediate priority. We're aware of the complexity, and it'll warrant much experimentation.

I think your remark boils down to needing a mechanism for backpressure. Unix domain sockets inherently provide this (I believe). With SHM, it'll need to be part of the protocol agreed between both processes.

I have a lot of investigating to do before I can provide better answers, but for now SHM is just in the wishlist ;-)

@bigs
Copy link
Contributor Author

bigs commented Oct 5, 2018 via email

@bigs
Copy link
Contributor Author

bigs commented Oct 16, 2018

at this point, implementation details are starting to settle a bit. i'm going to close this conversation for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants