We have been offline for a day and now want to see what cats have been added since last time we were online. We pick a random online user in our list of peers and want to sync the list of cats. If the list is small, we could just send the lists of file CID
s, but this doesn't scale. Here is a more sophisticated, but still fairly simple, sync protocol.
We need to run sync(x, y)
where x
and y
represent two sets of file hashes:
x = [ 06b645, 00f4a0, 00e0ad, 141599, 1d8b4e, 1a2287, 101114, c8d1b0 ]
y = [ 06b645, 00f4a0, 141599, 1d8b4e, 1a2287, 101114, c78f11, c8d1b0 ]
We can split the entire set into subsets based on the 1st hash digit:
Then we compare the subset hashes and continue this process recursively:
In this diagram:
X
andY
exchange with hashes of the entire sets. They see that the two hashes are different, so some elements need to be synced.X
splitsx
into subsets based on the 1st hex digit of subset hash and sends the 3 hashes toY
:902787 = H(x:0) = H(06b645,00f4a0,00e0ad)
882729 = H(x:1) = H(141599,1d8b4e,1a2287,101114)
877000 = H(x:c) = H(c8d1b0)
Y
does the same withy
.X
andY
see thatx:1
andy:1
have the same hash882729
, so there is no need to sync them further.X
needs to continue this process recursively withx:0
and split it into subsets with hash prefixes00
,01
,02
and so on. However it sees thatx:0
has only 3 items, so it just sends the 3 hashes toY
. Same with thex:c
subset.Y
does the same withy:0
andy:c
.- Now
X
knows that it was missing filec78f11
andY
knows that it was missing file00e0ad
.
Note, that sync(x, y)
doesn't have to sync the entire sets of files the two nodes have. Instead, X
may choose to sync only a part of it, e.g. all file added the last week and as long as Y
can apply the same filter to y
, they can sync the two subsets:
sync(
x.filter(f => f.time > Date.now() - 10 * day),
y.filter(f => f.time > Date.now() - 10 * day));