Proposal: GraphSync (A) #66

vmx · 2018-07-16T12:59:50Z

These are the current thoughts about GraphSync written down in a
single document.

This also contains the results from the Deep-Dive session at the
Developer Meeting 2018 in Berlin.

This document should be seen as a starting point, not as a complete, ready to merge thing.

/cc @b5 @diasdavid @jbenet @mib-kd743naq @Stebalien

These are the current thoughts about GraphSync written down in a single document. This also contains the results from the Deep-Dive session at the Developer Meeting 2018 in Berlin.

Stebalien · 2018-07-20T23:32:50Z

I'm having trouble getting a picture of the protocol from this document, even as a starting point.

I'm seeing:

User -> GraphSync  <-> GraphSync <-> Something? Server? Per selector?
          ^ |
          | v
    "Consumer" for selector type Y

Where the "Consumer" for selector type Y "executes" selectors of type Y, puppeting the GraphSync "client". Is that correct? If so, I'd like to be careful to avoid putting too much logic in the "Consumer" as we don't want implementing new selectors to be hard.

vmx · 2018-07-23T12:13:16Z

I think you are correct. The point of the Consumer is that the GraphSync part on the Server can be pretty minimal. Implementing new selectors would then mostly happen in the Consumer as the Server has already the basic building blocks implemented.

vmx · 2018-08-03T06:50:57Z

/cc @ajbouh

whyrusleeping · 2018-09-19T00:02:45Z

graphsync.md

+        block: message.block
+      }
+    }
+    // Server has only a subset of the requested DAG


Is this saying that we are adding an "I dont have this" response message?

That would be my plan.

whyrusleeping · 2018-09-19T18:49:22Z

@vmx do you have plans around wire format changes?

Also, any thoughts towards real world performance of such algorithms? A lot goes into making bitswap both fast, and not wasteful. The duplicate blocks issue is pretty significant, and worth designing solutions that take it into account. For example, in the happy case, we can ask one person for the data, they can tell us what they don't have, and we can then ask others for that data. But that relies on us trusting that the other peer will be honest, and fast.

vmx · 2018-09-20T09:42:34Z

@whyrusleeping Currently GraphSync is becoming more of a RPC cal thing, not a real Bitswap replacements. Perhaps GraphSync could then be used as building block. While implementing what I think GraphSync is, I get more and more doubts that it is useful.

whyrusleeping · 2018-09-20T18:26:20Z

@vmx don't get me wrong, I think GraphSync (in some form) will be incredibly useful. The hard part is just figuring out what that looks like.

I've been grappling with the latency vs bandwidth waste vs centralization tradeoffs lately, and its tough. Some tools that i'm thinking might be useful:

A 'would you send me this?' flag in wantlist entries that tells the other side only to send back an indication of whether they can provide the content. Alternatively, we could just send single 'findProvider' rpcs to each of the peers (not do a full DHT crawl).
(as you also suggest) A 'Nack' response. Where if you ask a peer for some content, and they don't have it, they return a 'Nack' or 'ErrNotFound'. This should be optional, and specified in the wantlist, but it saves us from having to send out cancels, and also allows us to more effectively distribute and schedule requests across our peers. The downside is that a mischievous node may just refuse to send the Nack, messing up our accounting.
Provider hints. Our bitswap peers could send us information about other peers that might be beneficial for us to send requests to.

b5 · 2018-09-20T19:52:48Z

Jumping in again to wave hands about graph manifests. I brought this up in cursory notion at this session, but have had some time to marinate, and think it's a concept worth revisiting.

For every discrete DAG g one can construct a manifest which is a second DAG of only block names and links (no content):

These manifests are relatively small. If expressed as a set of two lists (one of array-positional links and one of names/hashes) it should be possible to represent many gigs worth of IPFS DAG content in < 100kb of CBOR.

IMHO, the power of IPFS is derived from the dual expression of blocks as both graphs and flat lists. This is also a fault line that shows up in the seam between bitswap and graph sync. I think graph manifests are a missing "primitive" from IPFS.

These manifests have a few properties that are nice:

deterministic: a properly designed algo for generating manifests will generate the same manifest when given the same graph. Hash it, pass it to your friends. If the graph you're generating a manifest is immutable, manifest and hash of manifest are also immutable
one can generate a manifest of any subgraph
if implemented as a protocol, manifest generation can act as a challenge. You don't know if I've already generated this manifest ahead of time, and am simply asking you to compute it for trust purposes
When I don't have a manifest, I can ask multiple peers for the manifest of the same graph root. Differing responses raise suspicion
sooooooo cachable, manifests could themselves be designed/required to fit into a single IPFS block.

If I'm planning on efficiently planning my requests for blocks, I really want this manifest as soon as possible. Once I have a manifest I can trust I know a shit tonne of important things:

when I'm done (progress)
what blocks to ask for
how blocks are related

So this might be a graph-sync thing, but it could also be a structural outgrowth of a bitswap session: establish a trusted graph, then divy up block requests among the session. If block sizes are also in the manifest, one can match larger blocks to faster peers. The point being, a manifest gives me a primitive to plan my block requests, and makes optimizing request planning a matter of better matching

Downsides:

you need the entire graph to calculate a manifest, or at least a trusted list of names and links (you may be able to use manifests to generate other manifests... a story for another day)
graphs aren't super trivial to calculate, I could trick others into doing work they don't want to do if not rate limited or something. it's worth noting that calculating a manifest should be as cheap or cheaper as than through the block graph (cheaper if I can avoid loading associated data).

Both of those downsides can be mitigated by implementing manifests as a protocol, where peers can dynamically generate manifests of arbitrary graphs & subgraphs, which is the only reason I think it should exist at the IPFS layer.

Adding in Graph manifests is kinda like turning IPFS into dynamic bittorrent 🤷‍♂️.

vmx · 2018-09-21T09:56:56Z

I wrote this yesterday, before the two new comments from @whyrusleeping and @b5. I just keep it like that and post a follow-up comment on how this all relates to each other.

Definitions

Client: The peer that sends out a request.
Server: The peer that is receiving a request from a Client and responds to it.
Node: An item within the the DAG

Intro

I finally took the time to code what I had in mind (based on this PR). After tackling a "give the full sub-DAG", I wanted to tackle an obvious candidate for GraphSync: UnixFS v1.

I then got deep into a rabbit hole. I thought I'll just execute the UnixFS Engine code on the Server, so I don't have to re-implement that. It would then return all the Nodes it's visiting, which will then be the ones that are needed in order to perform the same query on the Client.

It turned out that such a RPC like call isn't really useful. It won't serve the purpose of being something that is a better Bitswap. If you'd have a subset of that Graph already, you'd still get a lot of Nodes you won't actually need. I came to that realisation after reading @whyrusleeping's comment (thanks!).

I then thought I need to go back to the drawing board and talk to lots of people with more knowledge as I really hit a wall and need to start from scratch.

A better way

Suddenly I had my own ideas and after a bit of thinking, I think I found a way to move forward which aligns with the stuff I already have. Make GraphSync less powerful than I intended and let the application layer deal with it. GraphSync will only support getting a full sub-DAG combined with a maximum depth. So if you want to get a single Node, you just have a maximum depth of 1.

Let me use UnixFS v1 as an example on how this is still powerful enough.

Getting a full file

The easiest case is if you request the full contents of a file. It's just the full sub-DAG of a specific path without any depth limitation.

Getting only the first few bytes of a file

You wouldn't want to transfer all Nodes of the file as only a small part is needed. For such a traversal you would need to keep track of the sizes of the Nodes that were already transmitted. That's a lot of logic and out of scope for GraphSync.

Instead UnixFs needs a bit more logic. It could fall back to how things currently work with Bitswap and request one block after another. Or it could be smarter and e.g. request all children of a certain Node. This would be a request with a maximum depth of 2. It could then inspect those nodes and do subsequent requests, e.g. for full sub-DAGs from some Nodes without an maximum depth limitation.

Getting a slice of a file

This case is about getting only a few bytes combined with a certain offset. It works similarly as the case above, which is without the offset.

Getting another slice of the same file

So far the cases would've work just well with the way described in the intro, doing a UnixFS traversal on the Server and transmitting all visited Nodes.

But this case is more interesting. If you want a slice of a file you previously got another slice from, it could be that you already have some of the Nodes stored locally. It would be a waste to request all those again from the Server.

The current system handles traversals where some Nodes are missing well, thanks to Bitswap it will get those missing Nodes from the network. GraphSync can't be used in such a transparent way as more context is needed (you could use GraphSync like Bitswap with requesting always with a maximum depth of 1, but that wouldn't improve anything). The traversal would signal that the requested Node is not available locally and then you can decide what to do. It could be that you request the full sub-DAG, or perhaps only the direct children. It's up to the current context and traversal that is going on, what is best suited.

If such a signal for a missing Node is provided by the traversal, it can be re-used for partial GraphSync replies. If you request a full sub-DAG it could well be that the Server has only a subset of the data. The logic already in place could then deal with such conditions.

Outro

There's still a lot of open questions around how to process those incoming Nodes from a GraphSync request, but at the moment I think those are just implementation details that can be solved.

vmx · 2018-09-21T10:17:48Z

@whyrusleeping I fully agree that the hard part is what GraphSync should look like. That's exactly what I struggle with.

My "better way" addresses the "NACK response" part. It could be extended to a "do you have the data?" request, although I guess if a peer has the data, we would want it anyway, so having a "NACK response" would be enough.

Or a "would you send me this?" could also be combined with @b5's Graphs Manifests and would not only reply with information about a single block, but with the whole sub-DAG this block links to.

Provider Hints could be the Graphs Manifests.

@b5 Thanks for the detailed information on the Graph Manifests. I can see how those could help to optimise the things I described in my "better way".

mikeal · 2018-09-24T18:00:33Z

Graph Manifest

Something related that I've been thinking about is creating an abstraction above a Block Store that stores metatdata about whether or not the store contains the entire graph linked to in the block.

This need came up in a proof-of-concept I wrote for "pushing" a graph called graph-push. Essentially, it exposed both a "shallow" and "deep" push based on whether or not the service has a block. Pushing this decision to the client was highly problematic, it means the client would have to choose between being either fast/efficient or reliable.

Block Store: Unsorted key/value store.
- Stores block data indexed by multihash.
Graph Store: Boolean CID index on top of Block Store.
- true/false value for CID's. true if underlying block store contains all the blocks referenced in the CID's graph.

The reason I bring this up is, I don't see how a singular manifest scales well for very large graphs. It means that you either keep a static representation of the graph index for every CID or you do a fairly expensive query over a simpler index every time you generate the manifest. The manifest could also be incredibly large which leads me to think about all kinds of performance concerns. You can image solving these issues with depth definitions and options but this starts to get very complicated very fast and is always going to have cases that make any solution more or less optimized (deep vs. shallow graphs for instance).

It may be more flexible to simple be able to say "I contain all the blocks in the graph for this CID" or "I don't know how much of this graph I have." The client should be able to figure out the best way to prioritize getting the graph based on this information. It can traverse down the graph with a peer that has some of the data until it hits a block that peer doesn't have. As it makes its way down the graph and has to find new peers in a very large graph it will see more peers that have the entire graph and can prioritize those peers.

b5 · 2018-09-25T18:33:50Z

The reason I bring this up is, I don't see how a singular manifest scales well for very large graphs.

That's a really good question IMHO: how much could a graph manifest practically hold? If it's not enough info, then it's a bad design choice. Given that @vmx's better way might be able to make use of these manifests, I've coded up a quick spike implementation to get a feel & see if this is worth discussing further:

Example Code

https://github.com/qri-io/go-ipld-manifest

There's a test in there that runs some extremely rough numbers of a 4-tiered Dag, where the first three tiers are small "link-only nodes" and the bottom ~3k nodes are all 256kb blocks. running that test with go test -v:

manifest representing 4043 nodes and 1.024210Gb of content is 253.921997kb as CBOR

So based on this very rough example, you could get around 1 Gig of content represented in a single manifest if stored as CBOR. I'm assuming a manifest should fit in a single block for caching purposes, but that may not necessarily be true. To keep the example "real" (lol) I've added in a list of block sizes to the manifest. Weather that's acceptable is, well, a question for y'all. It's worth noting this total-storable figure will drop with the switch to base32 cids.

Pushing this decision to the client was highly problematic, it means the client would have to choose between being either fast/efficient or reliable.

I'm assuming we're operating in a peer-2-peer environment, and having trouble seeing how me (as a peer) having a list of all the blocks I need before I go get them isn't worth the trouble. I'm guessing there's details & a good war story here that I'm having trouble getting to b/c of the client / server terminology. As far as I understand, we're trying to figure out a protocol and implementation to retrieve a subgraph of a DAG with providing a CID plus some meta information, which clearly has a connection to bitswap, the question is where to draw lines between those APIs, and what API GraphSync should expose (which I fully trust @vmx will handle ;) ). I don't think graph manifests solve this problem. I'm proposing manifests are a missing building block in that process, and that there are other use cases for a graph manifest outside of graph sync (the big one being a proper progress indicator).

It means that you either keep a static representation of the graph index for every CID or you do a fairly expensive query over a simpler index every time you generate the manifest.

There's a third option: only keeping manifests of important CIDs. In the common use cases that means root hashes. No need to keep a manifest of every CID, but being able to generate a manifest of any graph is a useful property. Manifests of immutable content are also immutable, so caching here is a win, but not vital. Being able to generate manifests as protocol level would alleviate the need for users to see this stuff, and open the door to future work with subgraph manifests.

The code example provided isn't usable as a measurement of performance b/c it's not doing any real node resolving. If network is involved, yes this will be a very expensive operation that should be avoided entirely IMHO. (@mikeal here I think we're in agreement that a peer either having full graph or not is a vital piece of info for decision making).

If the peer has the full graph locally, calculating a manifest should be cheap. How cheap depends on plumbing I'm not super familiar with. Performance could indeed be a reason for not using the concept of a manifest at all, but to me if we can't generate a fast manifest of a complete graph we have locally, something is wrong.

It may be more flexible to simple be able to say "I contain all the blocks in the graph for this CID" or "I don't know how much of this graph I have." The client should be able to figure out the best way to prioritize getting the graph based on this information. It can traverse down the graph with a peer that has some of the data until it hits a block that peer doesn't have. As it makes its way down the graph and has to find new peers in a very large graph it will see more peers that have the entire graph and can prioritize those peers.

I have two concerns here:

This conversation is happening over the network. Network is expensive.
The logic that drives this is IMHO, really hard when you put multiple peers speaking concurrently into the mix.

To me the goal of a graph manifest is to get the client/requesting peer out of an information deficit as early as possible in the graph-sync process, allowing the requester to perform coordination duties, and to be able to concoct different strategies for delegating requests to peers in parallel. To me those "coordination duties" are where the graph sync work starts. If others can benefit from having manifests (I know we would), then I think it's a candidate for pushing lower into the stack.

Graph Store: Boolean CID index on top of Block Store.

To me this is, like, super solid, which I interpret as part of the "just store your graph information in a graph database" school of thought. This has been suggested elsewhere (I think @lgierth is one of it's proponents). A graph database / index does sound smarter than one-off manifests, but I think even in that context they can work in tandem: generate a manifest from the graph DB so the requester can update it's knowledge of the merkle forest. Sounds like a lot of planning work that's above my pay grade ;).

whyrusleeping · 2018-09-25T18:46:37Z

@b5 for standard 'wide' graphs, what is the advantage of the graph manifest over simply doing a breadth first search over the dag?

ajbouh · 2018-09-25T18:48:52Z

Looks like awesome work! Datasets like ImageNet have 10^6 entries (image files) in a single directory. IPFS really falls down when trying to handle scenarios like this. In the abstract, a manifest sounds like a good solution. Though it certainly won't fit in a single block!

…

On Tue, Sep 25, 2018, 11:33 b5 ***@***.***> wrote: The reason I bring this up is, I don't see how a singular manifest scales well for very large graphs. That's a really good question IMHO: how much could a graph manifest practically hold? If it's not enough info, then it's a bad design choice. Given that @vmx <https://github.com/vmx>'s *better way* might be able to make use of these manifests, I've coded up a quick spike implementation to get a feel & see if this is worth discussing further: Example Code https://github.com/qri-io/go-ipld-manifest There's a test in there that runs some *extremely* rough numbers of a 4-tiered Dag, where the first three tiers are small "link-only nodes" and the bottom ~3k nodes are all 256kb blocks. running that test with go test -v: manifest representing 4043 nodes and 1.024210Gb of content is 253.921997kb as CBOR So based on this *very rough* example, you could get around 1 Gig of content represented in a single manifest if stored as CBOR. I'm assuming a manifest should fit in a single block for caching purposes, but that may not necessarily be true. To keep the example "real" (lol) I've added in a list of block sizes to the manifest. Weather that's acceptable is, well, a question for y'all. It's worth noting this total-storable figure will drop with the switch to base32 cids. Pushing this decision to the client was highly problematic, it means the client would have to choose between being either fast/efficient or reliable. I'm assuming we're operating in a peer-2-peer environment, and having trouble seeing how me (as a peer) having a list of all the blocks I need before I go get them *isn't* worth the trouble. I'm guessing there's details & a good war story here that I'm having trouble getting to b/c of the client / server terminology. As far as I understand, we're trying to figure out a *protocol and implementation to retrieve a subgraph of a DAG with providing a CID plus some meta information*, which clearly has a connection to bitswap, the question is where to draw lines between those APIs, and what API GraphSync should expose (which I fully trust @vmx <https://github.com/vmx> will handle ;) ). I don't think graph manifests solve this problem. I'm proposing manifests are a missing building block in that process, and that there are other use cases for a graph manifest outside of graph sync (the big one being a proper progress indicator). It means that you either keep a static representation of the graph index for every CID or you do a fairly expensive query over a simpler index every time you generate the manifest. There's a third option: only keeping manifests of important CIDs. In the common use cases that means root hashes. No need to keep a manifest of every CID, but being able to generate a manifest of any graph is a useful property. Manifests of immutable content are also immutable, so caching here is a win, but not vital. Being able to generate manifests as protocol level would alleviate the need for users to see this stuff, and open the door to future work with subgraph manifests. The code example provided isn't usable as a measurement of performance b/c it's not doing any real node resolving. If network is involved, yes this will be a *very* expensive operation that should be avoided entirely IMHO. ***@***.*** <https://github.com/mikeal> here I think we're in agreement that a peer either having full graph or not is a vital piece of info for decision making). If the peer has the full graph locally, calculating a manifest should be cheap. How cheap depends on plumbing I'm not super familiar with. Performance could indeed be a reason for not using the concept of a manifest at all, but to me if we can't generate a fast manifest of a complete graph we have locally, something is wrong. It may be more flexible to simple be able to say "I contain all the blocks in the graph for this CID" or "I don't know how much of this graph I have." The client should be able to figure out the best way to prioritize getting the graph based on this information. It can traverse down the graph with a peer that has some of the data until it hits a block that peer doesn't have. As it makes its way down the graph and has to find new peers in a very large graph it will see more peers that have the entire graph and can prioritize those peers. I have two concerns here: - This conversation is happening over the network. Network is expensive. - The logic that drives this is IMHO, really hard when you put multiple peers speaking concurrently into the mix. To me the goal of a graph manifest is to get the client/requesting peer out of an information deficit as early as possible in the graph-sync process, allowing the requester to perform coordination duties, and to be able to concoct different strategies for delegating requests to peers in parallel. To me those "coordination duties" are where the graph sync work starts. If others can benefit from having manifests (I know we would), then I think it's a candidate for pushing lower into the stack. Graph Store: Boolean CID index on top of Block Store. To me this is, like, super solid, which I interpret as part of the "just store your graph information in a graph database" school of thought. This has been suggested elsewhere (I think @lgierth <https://github.com/lgierth> is one of it's proponents). A graph database / index does sound smarter than one-off manifests, but I think even in that context they can work in tandem: generate a manifest from the graph DB so the requester can update it's knowledge of the merkle forest. Sounds like a lot of planning work that's above my pay grade ;). — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#66 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAcnUjqrQg4dTaqsV0kxiI09j828Yt3ks5uencUgaJpZM4VRFQD> .

b5 · 2018-09-25T18:53:44Z

what is the advantage of the graph manifest over simply doing a breadth first search over the dag?

locally or over the network? Locally the advantage is very little if any. To me the advantage shows up over the network, giving a requesting peer a small payload of trustable knowledge of what they're after. I think they'd make a great extension when kicking off a bitswap session. For any DAG with less than some threshold of blocks, a manifest would be overkill, and should be skipped.

whyrusleeping · 2018-09-25T18:57:46Z

@b5 i'm talking about over the network. Say i'm fetching a really large file, If i use a selector to fetch the first three layers of the graph, it should give quite a few hashes to request further, in a trustable way, without being too much data.

whyrusleeping · 2018-09-25T18:59:06Z

Also potentially relevant for some, an issue I wrote up on selectors a while back: ipfs/notes#272 (comment)

b5 · 2018-09-25T19:22:16Z

@whyrusleeping using the first example from your selector thoughts:

<H>{/a/b/c/d}

Returns the object referenced by d (single object) at the path /a/b/c below H, as well as the merkle proof to H.

One approach would be to optionally return a manifest of H, or at least the hash-of-manifest-of H if the peer has a manifest on offer. Peer could elect to not compute a manifest for a number of reasons, so it should be optional. In this context, the manifest is the "quite a few hashes to request further" without being too much data. It's "trustable" in the DHT sense, where manifests should probs be vetted against multiple responding peers or something.

If you do end up with a trustable manifest, you can now construct selector-like queries locally & just ask for blocks, because you have the entire graph, just not the content. You don't know which peer has which blocks, but that's less relevant than knowing what blocks you need. Recursive fetching strategies that hone in on outstanding blocks become a thing, which should cut down on complex selector construction & fulfillment, and parallelize across peers better.

mikeal · 2018-09-25T20:38:31Z

Datasets like ImageNet have 10^6 entries (image files) in a single directory.

manifest representing 4043 nodes and 1.024210Gb of content is 253.921997kb as CBOR
where the first three tiers are small "link-only nodes" and the bottom ~3k nodes are all 256kb blocks

Any solution we go with here is going to be more optimized for one case vs another. That said, I don't think that we should be using block sizes as optimal as 250K as our go-to use case. Optimal file chunking for large binary files like media would be based on keyframe windows and with text files we probably want to use a rabin chunker for better updates, which will result in many blocks of a much smaller size.

I think that we need a better idea of what use cases we're trying to optimize for. I can't think of a use case for large structured data where a manifest is not prohibitively expensive. As a general rule, the more structured data is the larger the indexes are, and a manifest is effectively an index.

This conversation is happening over the network. Network is expensive.

Couple notes here. Whether or not a peer has the full graph is a single bit, we could just stick it in the DHT and let the client use it when prioritizing peer selection.

Being that network is expensive, I don't see why we'd want clients to pull down the entire manifest when they may only want a portion of the graph.

ajbouh · 2018-09-25T20:45:53Z

You make good points about needing to outline the use cases we're targeting. Let me ask some silly questions:

Without a manifest of some kind, how will someone know what entries they want?

Are we assuming that IPFS should always rely on out of band coordination for distribution of CIDs?

This out of band bit seems like the implicit assumption in most of IPFS's design. I believe it is a source of many surprising (and disappointing) performance characteristics.

mikeal · 2018-09-25T21:32:12Z

Reading through this again and I'm starting to see some big holes in this approach.

How does the client parallelize grabbing the graph from multiple peers? If it happens to start requesting a graph from a slow peer it has no way to start grabbing other parts from other peers.
When using selectors, what guarantee do we have that the peer sent us the correct blocks?
- If I ask a peer for CID-1 + /one/two/three and it starts returning blocks starting at three it could literally return me anything it wanted to and I'd have no way to know it was wrong. UPDATED: I just saw Returns the object referenced by d (single object) at the path /a/b/c below H, as well as the merkle proof to H. in the path selector spec which should resolve this particular point.
- When requesting sub paths, the peer MUST return the intermediary blocks first so that we can parse them to verify the next blocks are correct, otherwise this is an open flooding attack on the block store.
With this approach, the "server" has no way of knowing which blocks the client already has. So in actual "sync" cases we're going to be requesting a ton of blocks we already have.

I don't quite see how we're going to securely and efficiently put this much logic on the "server" side of the transaction. It's a nice idea in theory to just have one end of the connection start sending blocks without the need for another request but this opts us out of any opportunity to not send blocks one side already has and the client can't really be responsible for parallelizing across multiple peers if it isn't responsible for the traversal of the graph.

Similarly, I don't see how a client could make use of a manifest. There's no guarantee that the peer isn't lying about the manifest, although you could detect inconsistencies as you parse the blocks and go from there. Other peers could make use of a client's manifest when sending blocks back, but this still isn't sufficient because the client's block store can contain several trees and it could have a sub-tree but be missing the link between that sub-tree and the root of this particular tree, so it wouldn't have appeared in the manifest for that root but was probably in another. This is going to happen a lot in static site deployments, people have lots of similar shared assets across sites and there are changes to those assets in subtrees all the time.

mikeal · 2018-09-25T21:36:55Z

One more thing, can we assume fully duplexed connections are available?

If so, there are ways that we can optimize performance by concurrently asking for blocks rather than trying to come up with ways for one end to send many requested blocks serially.

b5 · 2018-09-26T00:54:01Z

Ok, might be worth backing up to make this a little clearer with a story. First, the selector conversation is separate from graph manifests. For the sake of argument, let's put selectors aside for a second and walk through an example of how this might work.

First, I add some content to IPFS, which generates the classic DAG and CID A. The content I've added totals 15MB, which is larger than the threshold for creating a manifest (for example, 10MB), so I create one right here & now, before anyone asks for it. The manifest for a 15MB file clocks in at ~4.4kb. For the sake of using the file system I already have, I add this 4.4Kb manifest back to my local IPFS repo and get the CID Amanifest.

Later on peer Sandra comes along and asks me for the content at CID A. I respond with the first block A, and because I'm nice , I also populate a field in that message manifest with Amanifest.

Sandra's been asking a few others for hash of A, and isn't getting back and messages with the manifest field populated so she can't really trust me. She downloads all of the blocks at CID A the old fashioned way, but because Sandra is nice, once she has the full, verified DAG, she calculates the manifest of A, gets CID Amanifest as the result, and knows I'm a bro. If she'd gotten any answer other than Amanifest she would have put me in the burn book.

Em then connects and asks both me and Sandra for CID A. This time we both populate the manifest field with Amanifest, and because this trivial network is set to trust content seen by two peers, Em asks Sandra for the content at Amanifest, and Sandra sends over a 4.5Kb manifest, which Em decodes & runs against the checksum. The manifest passes the integrity check, and Em's satisfied with 2 peers saying the same thing, so Em uses the manifest.

At this point Em has a complete, list of every block in CID A, the graph, and the size of each block. Em uses this info to do smart things.

Before Em does anything else em does a set intersection between their local blocks and the blocks listed in the manifest. Turns out Em already has 15 of the 70 blocks listed in the manifest, so they can skip asking for those.

Em wants the whole DAG, so they do the easy thing & just cut the remaining list of 55 blocks in half, asking me for one half and Sandra for the other. Sandra's quicker than me and finishes her list first, so Em cuts my remaining list in half and gives the other list of blocks to Sandra again to fulfill, letting my weak-sauce tethered 3G connection close out the 4 blocks I can contribute.

While this is happening Em is seeing a progress bar, because they know exactly how many blocks are left, which they have, and which they need. One day in future versions of IPFS Em might use that information to construct fancy selectors that carve up the manifest, asking for a subgraph of available content. If the manifest came back with, say a larger size than Em's allowed repo, Em may elect to abort the process entirely before asking for more blocks.

While blocks are transmitting Em is doing the usual checking of the blocks coming over the wire. If at any point the blocks Em's requesting aren't adding up to correct hashes, the whole process can be aborted. In this example em's local 15 blocks happen to be a subgraph that adds up to a file index.html, which they already have from another DAG. Em could run quick integrity check on this subgraph, and if it works out, this manifest is even more trustworthy.

Peers are incentivized to not lie about manifests because If a peer ever transmits a malicious manifest and you acquire the real manifest, you know they're misbehaving, because there's a deterministic algorithm connecting the content to the manifest. Because you can generate the manifest locally once you have the full DAG, you can check for malicious responses after the fact.

Ideally, all of this is pretty low level, and structured as an opt-in speed-up-happy-path, falling back to the way things work today (because it works!).

Finally, it's worth pointing out this approach is chunking-strategy agnostic. Graph manifests will work on any DAG.

To me, selectors enter the conversation after manifests. Manifests by no means answer all the questions you would want to ask of a DAG, but a manifest makes constructing those selector queries simpler and faster. As @vmx mentioned something akin to manifests would be something graph sync builds upon.

I think @ajbouh hit the nail on the head with this:

Without a manifest of some kind, how will someone know what entries they want?

I'd be happy to outline how I plan to use graph manifests out in IPFS userland, but would rather avoid clogging all y'all's inboxes if we don't have clarity on the concept 😄.

whyrusleeping · 2018-09-26T04:04:21Z

@b5 Hrm... I'm still not seeing how much the manifest improves on the situation. For the 15MB file example, you end up with a 1 deep graph, where the root node has links to all the leaf nodes. So the root 'A' of that file contains all the information that the manifest would.

Then, at some point the graph gets too big for the manifest file to be represented as a single object, so you would have to shard it. This runs into the same issue as before...

If I could have a selector that said "Give me all non-leaf nodes in graph A" it would not be too much more data than the proposed manifest, and actually contain data that we need for the graph.

mib-kd743naq · 2018-09-27T09:44:53Z

@b5 looks interesting, though I can't dig into it in-depth. Could you try to build a manifest over one of my datasets, see how that behaves? ( yes, I still need to clean up the go-ipfs patch to render the metadata locked in this set, $real-world is really messing with my available time )

daviddias · 2018-10-22T06:39:55Z

Good to share here a video that just got uploaded, Volker's talk on GraphSync from LabDay.

daviddias · 2018-10-29T11:23:03Z

@jbenet and @whyrusleeping produced a specification for GraphSync and IPLD Selectors during the Go IPFS Hack Week. It contains all the thinking for these two systems from the last 3 years + thinking about this (first record was Jeromy's Bitswap Talk, circa Dec 2015).

You can watch Juan's presentation on the GraphSync and IPLD Selectors Spec here

daviddias · 2018-10-29T11:29:38Z

@jbenet can you provide the docs produced ASAP? I believe that @vmx and @mikeal are still working on the direction that came out of their recent discussions vs leveraging the spec you produced.

@vmx @mikeal one of the valuable outputs of the discussions in Glasgow, is that independently of who is right when it comes to GraphSync design, any GraphSync design and implementation will have to go through a series of tests/benchmarks with the multiple graph topological. Can you list those here? AFAIK we at least have:

Long linked lists (e.g. Blockchains, CRDT logs, etc)
Gigantic files
Very large sharded directories
Hibrids between sharded directories and gigantic files (e.g. npm, wikipedia, etc)
Fast Video Streaming

@hannahhoward I believe you are working on benchmarks for a potential GraphSync for go-ipfs, do you have a list of topologies you are about to test for?

warpfork · 2018-10-29T13:10:03Z

That latest set of docs for IPLD Selectors should also be linked to on ipfs/notes#272 :)

mikeal · 2018-10-30T23:06:41Z

That latest set of docs for IPLD Selectors should also be linked to on ipfs/notes#272 :)

Looking at Juan's screen in his talk and nothing in or linked to on this page matches what is up on his screen :(

mikeal · 2018-10-30T23:11:08Z

That latest set of docs for IPLD Selectors should also be linked to on ipfs/notes#272 :)

This is a good starter list. Once we have the benchmarks somewhere we can always add data sets, I'd rather just get a few of these going and iterate than try to front-load a ton of work when we're currently operating with zero benchmarks.

The much harder part of this will be multiplying the data sets with peer/network conditions. For each of these data sets we need to benchmark situations in which:

Only one peer has all the data.
Two or more peers have all of the data but variable network conditions.
Two or more peers have parts of the data but none have the whole set.

The issue with the old design wasn't so much that it didn't work well under a specific data-set but that it completely broke down once you were getting the set from multiple peers.

whyrusleeping · 2018-10-30T23:41:45Z

@mikeal you might be interested in the tests i wrote in go-bitswap recently: https://github.com/ipfs/go-bitswap/pull/8/files

jbenet · 2018-10-31T01:35:59Z

Hey folks, sorry for delay. I’ll put the docs we made in Glasgow up in the next day

jbenet · 2018-11-01T20:48:43Z

Here's the selectors part:

jbenet · 2018-11-01T21:10:47Z

I renamed this PR as "GraphSync (A)".
I PRed up my doc into Proposal: GraphSync (B) #75 ("Graphsync (B)")
See it here: https://github.com/ipld/specs/blob/bd841ab2b974f01eee07ed44e31cacdc56e13540/graphsync/graphsync.md
See the video presentation here: https://drive.google.com/file/d/1NbbVxZQFKXwW6mdodxgTaftsI8eID-c1/view

jbenet · 2018-11-01T21:39:41Z

Other notes about the manifests approach discussed here:

this is related to another important problem when working with large graphs & selectors: being able to check membership of an object in a graph or selection of a graph quickly (locally and in a trusted setting). Especially relevant to membership in a union of hundreds/thousands of selectors (eg the pinning or GC use case)
A good, fast, efficient implementation of a local selector based pinner would need some way of traversing the paths and structure of the graph, without having to pull out the data.
I think some graph dbs (probably non-linked data ones) make this kind of "traversal of the links" fast, and also avoid pulling out the data in a node, but not sure.
I think @whyrusleeping is right that for most graphs, the structure (links and paths) will be about as big as "the whole graph minus leaf/terminal nodes"). So the entire approach may not be winning much. It would be good to test/benchmark this assumption with many kinds of real workloads.

And thoughts on provable versions of these. (not relevant for the short term -- <1yr)

Transparent proofs that the structure/manifest of a graph corresponds to a graph would be useful. (basically, a way to prove structure_of_a = structure(graph_a) where structure(.) pulls out the paths/links and nothing else).
today we use simple merkle proofs because they are the cheapest way in combined "computation + bandwidth" costs. but these can be expensive in bandwidth, so in extremely bandwidth constrained settings (eg filecoin/other blockchains), we use SNARKs which are massively intensive on computation, but drastically reduce bandwidth usage.
I think proofs of this sort would be too exorbitantly expensive for anything outside of a blockchain (merkle membership proofs + all the path checking insanity. not reasonable in either SNARKs, STARKs, or bulletproofs).
BUT. since authenticated data structure checking is so tremendously useful, this may hit hardware in not a long time. It is not insane to think we might see chips dedicated to doing proofs like these in hardware in 10 years, especially if enough value is on the line (eg bitcoin asics, SGX, TEEs, light blockchain clients, etc). crazier stuff is happening in hardware for less value on the line.
following that line of thinking -- in the medium term (1-2yrs) an RFP for "figure out hardware friendly ways to prove membership in a markle dag w/ string paths in the links" might yield results that could be used for the long term (5-10yrs).

b5 · 2018-11-01T22:21:48Z

Thanks for the info. This is super helpful! The whole reason for making a stink comes from pain points we've uncovered building user experiences on top of IPFS:

I want to show our users meaningful progress bars when fetching a DAG.

That's it. It's a small point, but an extremely crucial one. Unless I'm missing something, IPFS peers lack the info needed to show how many blocks remain, and that they're arriving in parallel. Not being able to show "bittorrent style" progress bars means we can't build UI that shows users one of the greatest upsides of block-based content addressing: when performing a fetch, there's a chance your node already has some/many of the blocks you need. If you happen to be building, say, a version control system, there's a very high chance you have lots of the necessary blocks already. Nothing else I've seen has this property. It's the detail that made me pick IPFS over dat, and I really want to show it off to the world in a way I think they'll immediately understand.

It's absolutely true that most (all?) manifests would be pretty close to the size of "the whole graph minus leaf nodes". The entire manifest is a tax. The advantage of a manifest is not in the size, but in getting a fetching peer out of an information-poor context as soon as possible. The tax should be covered by being able to make smarter choices with that knowledge.

Anyway, I'm just after progress bars. Building this sort of thing in userland is, well, tough.

As for Provable versions of manifests, that's well above my pay grade I'll happily leave that to y'all 😉.

whyrusleeping · 2018-11-01T22:32:30Z

@b5 I think we can solve the progress bars problem (especially in your ipld usecase) by adding a small amount of extra metadata in each node that lets us know roughly how many nodes are behind each link. You should actually be able to do this today by simply adding that to your existing datastructures.

Does that seem reasonable? (also, we should open a new issue for 'progress bars on ipld' or something)

b5 · 2018-11-01T22:35:55Z

Totally. Apologies all (particularly @vmx), I've hijacked this thread for long enough. I'd be happy to kill the manifest discussion and move the progress-bar chat ~~someone~~ somewhere else.

Thanks all!

daviddias · 2018-11-03T10:16:33Z

Here is a playground for you -- ipfs/interop#44 (comment). Customizable exchange files tests between JS and Go (go<->go, go<->js, js<->js) that test for large files (as large as you want) and directories (as nested as you want). It is pretty easy to try it out with different bundles of go-ipfs and js-ipfs, check the Readme https://github.com/ipfs/interop#run-the-tests

vmx · 2019-08-12T20:49:54Z

I'm closing this PR. The contents lives on in the design history, see #159 for more information.

Add document describing GraphSync

69bd021

These are the current thoughts about GraphSync written down in a single document. This also contains the results from the Deep-Dive session at the Developer Meeting 2018 in Berlin.

vmx mentioned this pull request Sep 18, 2018

Create 2018-09-17--js-core-dev-team-weekly.md ipfs/team-mgmt#702

Merged

daviddias requested review from whyrusleeping and Stebalien September 18, 2018 14:54

whyrusleeping reviewed Sep 19, 2018

View reviewed changes

daviddias changed the title ~~Add document describing GraphSync~~ RFC: GraphSync Sep 21, 2018

jbenet changed the title ~~RFC: GraphSync~~ RFC: GraphSync (A) Nov 1, 2018

vmx mentioned this pull request Nov 9, 2018

Proposal: Graphsync (C) #78

Closed

daviddias changed the title ~~RFC: GraphSync (A)~~ Proposal: GraphSync (A) Nov 9, 2018

mikeal mentioned this pull request Nov 10, 2018

Proposal: Replication RPC ipld/replication#1

Closed

vmx closed this Aug 12, 2019

vmx deleted the graphsync branch September 30, 2019 12:23

4meta5 mentioned this pull request Oct 30, 2019

NodeRank.init() 4meta5/s-kademlia#2

Merged

4 tasks

Proposal: GraphSync (A) #66

Proposal: GraphSync (A) #66

Conversation

vmx commented Jul 16, 2018 • edited Loading

Stebalien commented Jul 20, 2018 • edited Loading

vmx commented Jul 23, 2018

vmx commented Aug 3, 2018

whyrusleeping Sep 19, 2018

Choose a reason for hiding this comment

vmx Sep 19, 2018

Choose a reason for hiding this comment

whyrusleeping commented Sep 19, 2018

vmx commented Sep 20, 2018

whyrusleeping commented Sep 20, 2018

b5 commented Sep 20, 2018 • edited Loading

vmx commented Sep 21, 2018

Definitions

Intro

A better way

Getting a full file

Getting only the first few bytes of a file

Getting a slice of a file

Getting another slice of the same file

Outro

vmx commented Sep 21, 2018

mikeal commented Sep 24, 2018 • edited Loading

b5 commented Sep 25, 2018

Example Code

whyrusleeping commented Sep 25, 2018

ajbouh commented Sep 25, 2018 via email

b5 commented Sep 25, 2018

whyrusleeping commented Sep 25, 2018

whyrusleeping commented Sep 25, 2018

b5 commented Sep 25, 2018

mikeal commented Sep 25, 2018

ajbouh commented Sep 25, 2018

mikeal commented Sep 25, 2018 • edited Loading

mikeal commented Sep 25, 2018

b5 commented Sep 26, 2018 • edited Loading

whyrusleeping commented Sep 26, 2018

mib-kd743naq commented Sep 27, 2018

daviddias commented Oct 22, 2018

daviddias commented Oct 29, 2018 • edited Loading

daviddias commented Oct 29, 2018

warpfork commented Oct 29, 2018

mikeal commented Oct 30, 2018

mikeal commented Oct 30, 2018

whyrusleeping commented Oct 30, 2018

jbenet commented Oct 31, 2018

jbenet commented Nov 1, 2018

jbenet commented Nov 1, 2018

jbenet commented Nov 1, 2018 • edited Loading

b5 commented Nov 1, 2018

whyrusleeping commented Nov 1, 2018

b5 commented Nov 1, 2018 • edited Loading

daviddias commented Nov 3, 2018

vmx commented Aug 12, 2019

vmx commented Jul 16, 2018 •

edited

Loading

Stebalien commented Jul 20, 2018 •

edited

Loading

b5 commented Sep 20, 2018 •

edited

Loading

mikeal commented Sep 24, 2018 •

edited

Loading

mikeal commented Sep 25, 2018 •

edited

Loading

b5 commented Sep 26, 2018 •

edited

Loading

daviddias commented Oct 29, 2018 •

edited

Loading

jbenet commented Nov 1, 2018 •

edited

Loading

b5 commented Nov 1, 2018 •

edited

Loading