Skip to content
This repository has been archived by the owner on Dec 6, 2022. It is now read-only.

Defaulting to Rabin in unixfsv2 #31

Closed
mikeal opened this issue Oct 5, 2019 · 2 comments
Closed

Defaulting to Rabin in unixfsv2 #31

mikeal opened this issue Oct 5, 2019 · 2 comments

Comments

@mikeal
Copy link
Contributor

mikeal commented Oct 5, 2019

In my initial implementation of unixfsv2 I’ve defaulted to using the Rabin chunker.

This gives us much better default behavior in terms of de-duplication, especially with text data, but it does noticeably slow down imports.

I want to make sure we have a real discussion about this tradeoff before it becomes solidified.

@Kubuxu
Copy link

Kubuxu commented Oct 5, 2019

If we even want to think about using Rabin we should have a good implementation of it to evaluate.

Calling current implementation crappy is an understatement:

BenchmarkRabin-4     	      12	  95914912 ns/op	 174.92 MB/s	19145327 B/op	     186 allocs/op
BenchmarkDefault-4   	     672	   1721040 ns/op	9748.30 MB/s	     487 B/op	       2 allocs/op

There are multiple ways of making it better, it allocs per byte, doesn't use libp2p/go-buffer-pool and the implementation of the GF(2) polynomial evaluation could be optimized a lot by specializing it for just one polynomial.


PR with benchmark: ipfs/go-ipfs-chunker#15

@rvagg
Copy link
Member

rvagg commented Dec 6, 2022

closing for archival

@rvagg rvagg closed this as completed Dec 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants