Skip to content

Latest commit

 

History

History
39 lines (28 loc) · 2.09 KB

sync.md

File metadata and controls

39 lines (28 loc) · 2.09 KB

Syncing the list of files

We have been offline for a day and now want to see what cats have been added since last time we were online. We pick a random online user in our list of peers and want to sync the list of cats. If the list is small, we could just send the lists of file CIDs, but this doesn't scale. Here is a more sophisticated, but still fairly simple, sync protocol.

We need to run sync(x, y) where x and y represent two sets of file hashes:

x = [ 06b645, 00f4a0, 00e0ad, 141599, 1d8b4e, 1a2287, 101114,         c8d1b0 ]
y = [ 06b645, 00f4a0,         141599, 1d8b4e, 1a2287, 101114, c78f11, c8d1b0 ]

We can split the entire set into subsets based on the 1st hash digit:

Then we compare the subset hashes and continue this process recursively:

In this diagram:

  1. X and Y exchange with hashes of the entire sets. They see that the two hashes are different, so some elements need to be synced.
  2. X splits x into subsets based on the 1st hex digit of subset hash and sends the 3 hashes to Y:
    • 902787 = H(x:0) = H(06b645,00f4a0,00e0ad)
    • 882729 = H(x:1) = H(141599,1d8b4e,1a2287,101114)
    • 877000 = H(x:c) = H(c8d1b0)
  3. Y does the same with y.
  4. X and Y see that x:1 and y:1 have the same hash 882729, so there is no need to sync them further.
  5. X needs to continue this process recursively with x:0 and split it into subsets with hash prefixes 00, 01, 02 and so on. However it sees that x:0 has only 3 items, so it just sends the 3 hashes to Y. Same with the x:c subset.
  6. Y does the same with y:0 and y:c.
  7. Now X knows that it was missing file c78f11 and Y knows that it was missing file 00e0ad.

Note, that sync(x, y) doesn't have to sync the entire sets of files the two nodes have. Instead, X may choose to sync only a part of it, e.g. all file added the last week and as long as Y can apply the same filter to y, they can sync the two subsets:

sync(
  x.filter(f => f.time > Date.now() - 10 * day),
  y.filter(f => f.time > Date.now() - 10 * day));