Defaulting to Rabin in unixfsv2 #31

mikeal · 2019-10-05T17:46:54Z

In my initial implementation of unixfsv2 I’ve defaulted to using the Rabin chunker.

This gives us much better default behavior in terms of de-duplication, especially with text data, but it does noticeably slow down imports.

I want to make sure we have a real discussion about this tradeoff before it becomes solidified.

Kubuxu · 2019-10-05T19:59:05Z

If we even want to think about using Rabin we should have a good implementation of it to evaluate.

Calling current implementation crappy is an understatement:

BenchmarkRabin-4     	      12	  95914912 ns/op	 174.92 MB/s	19145327 B/op	     186 allocs/op
BenchmarkDefault-4   	     672	   1721040 ns/op	9748.30 MB/s	     487 B/op	       2 allocs/op

There are multiple ways of making it better, it allocs per byte, doesn't use libp2p/go-buffer-pool and the implementation of the GF(2) polynomial evaluation could be optimized a lot by specializing it for just one polynomial.

PR with benchmark: ipfs/go-ipfs-chunker#15

rvagg · 2022-12-06T01:29:58Z

closing for archival

rvagg added the closed for archival label Dec 6, 2022

rvagg closed this as completed Dec 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defaulting to Rabin in unixfsv2 #31

Defaulting to Rabin in unixfsv2 #31

mikeal commented Oct 5, 2019

Kubuxu commented Oct 5, 2019 •

edited

Loading

rvagg commented Dec 6, 2022

Defaulting to Rabin in unixfsv2 #31

Defaulting to Rabin in unixfsv2 #31

Comments

mikeal commented Oct 5, 2019

Kubuxu commented Oct 5, 2019 • edited Loading

rvagg commented Dec 6, 2022

Kubuxu commented Oct 5, 2019 •

edited

Loading