Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

croc immediately exits with "Seek: invalid offset" when using imohash with certain files #786

Closed
Hakkin opened this issue Aug 19, 2024 · 14 comments

Comments

@Hakkin
Copy link

Hakkin commented Aug 19, 2024

Using croc version v10.0.11, I was able to reproduce this on both a Linux (Go 1.23.0) and Windows (Go 1.22.4) host.

It seems to have to do with the filesize, I encountered it while trying to send a folder with many files, but I was able to reproduce it using just this:

$ truncate -s 998258 test.txt
$ croc send --hash imohash test.txt
2024/08/19 12:03:54 Seek: invalid offset
@schollz
Copy link
Owner

schollz commented Aug 20, 2024

What happens without imohash

@Hakkin
Copy link
Author

Hakkin commented Aug 20, 2024

What happens without imohash

The transfer works correctly, but takes a very long time to hash large files.

@Hakkin
Copy link
Author

Hakkin commented Aug 20, 2024

Here's all the hash algorithms, only imohash has issues:

$ croc send --hash xxhash test.txt
Sending 'test.txt' (974.9 kB)
Code is: xxxx-xxxxxx-xxxxx-xxxxxxx
On the other computer run

croc xxxx-xxxxxx-xxxxx-xxxxxxx

$ croc send --hash md5 test.txt
Sending 'test.txt' (974.9 kB)
Code is: xxxx-xxxxx-xxxxxxx-xxxxx
On the other computer run

croc xxxx-xxxxx-xxxxxxx-xxxxx

$ croc send --hash highway test.txt
Sending 'test.txt' (974.9 kB)
Code is: xxxx-xxxxx-xxxx-xxxx
On the other computer run

croc xxxx-xxxxx-xxxx-xxxx

$ croc send --hash imohash test.txt
2024/08/20 07:28:50 Seek: invalid offset

@schollz
Copy link
Owner

schollz commented Aug 20, 2024

It might be an imohash issue then

@Hakkin
Copy link
Author

Hakkin commented Aug 20, 2024

This seems to be the commit that broke it: 5cf4d7c
1 commit before that works correctly.

$ go install github.com/schollz/croc/v10@5cf4d7c103b6d6506990c1b80f5fd5eb3251aa2d
go: downloading github.com/schollz/croc/v10 v10.0.8-0.20240606171923-5cf4d7c103b6

$ croc send --hash imohash test.txt
2024/08/20 12:53:57 Seek: invalid offset

$ go install github.com/schollz/croc/v10@c28c4786a1296f937e400a55a22d1a90b152629c
go: downloading github.com/schollz/croc/v10 v10.0.8-0.20240606025349-c28c4786a129

$ croc send --hash imohash test.txt
Sending 'test.txt' (974.9 kB)
Code is: xxx-xxx-xxx-xxx
On the other computer run

croc xxx-xxx-xxx-xxx

@Hakkin
Copy link
Author

Hakkin commented Aug 20, 2024

croc calls SumFile with a sampleSize of 16*16*8*1024 (2097152) and sampleThreshold of 128*1024 (131072)

croc/src/utils/utils.go

Lines 189 to 197 in 5cf4d7c

var imofull = imohash.NewCustom(0, 0)
var imopartial = imohash.NewCustom(16*16*8*1024, 128*1024)
// IMOHashFile returns imohash
func IMOHashFile(fname string) (hash []byte, err error) {
b, err := imopartial.SumFile(fname)
hash = b[:]
return
}

if f.Size() > sampleThreshold, imohash tries to Seek() -sampleSize from end of file:
https://github.com/kalafut/imohash/blob/3f885db6c6c295ad8076096f9f02238490b91584/imohash.go#L124-L126

SectionReader.Seek() returns Seek: invalid offset if offset < s.base:
https://cs.opensource.google/go/go/+/refs/tags/go1.23.0:src/io/io.go;l=535-537;drc=beea7c1ba6a93c2a2991e79936ac4050bae851c4

s.base comes from the off parameter in NewSectionReader, which imohash always sets to 0, so if the filesize is greater than sampleThreshold but less than sampleSize, imohash will return an error.

Seems like a bug in imohash, I'll open an issue there.

@schollz
Copy link
Owner

schollz commented Aug 20, 2024

nice work!! ping me when that's merged and I can update it here

@kalafut
Copy link

kalafut commented Aug 22, 2024

I'll update the imohash library soon, but I need to think through it a bit. It's simple to put a fix in, but at its root, this is an under-specification error. There are a few other ports of the imohash spec, so I want to adjust the spec in a good, ideally backwards-compatible way, and let them know too.

@kalafut
Copy link

kalafut commented Aug 27, 2024

@schollz imohash v1.1.0 has been released and should fix this. See kalafut/imohash#16

schollz added a commit that referenced this issue Aug 28, 2024
@schollz schollz reopened this Aug 28, 2024
@schollz
Copy link
Owner

schollz commented Aug 28, 2024

@kalafut thanks. its a breaking change so I'm going to wait for the next major version of croc to release it...otherwise it just creates a huge influx of users asking "why doesn't it work with imohash" if they have incompatible croc versions.

@Hakkin
Copy link
Author

Hakkin commented Aug 28, 2024

FYI, I believe 5cf4d7c was already a breaking change. People using imohash with v10.0.7 or lower will get hash mismatches if the other end is using v10.0.8 or above.

@schollz
Copy link
Owner

schollz commented Aug 28, 2024

Good point

@kalafut
Copy link

kalafut commented Aug 28, 2024

This is good context because I wondered how this hadn't been hit/reported for so long in a popular application like croc. It sounds like these parameters are relatively new.

As you think about a future major change, perhaps also consider some means of transporting options/cfg around imohash. I'd like to upgrade the underlying hash and allow for a configurable chuck count. These would be optional, but if, say, subsequently versions of croc want to use them, it would be good for the two ends to agree on terms. Just a thought.

@schollz
Copy link
Owner

schollz commented Aug 30, 2024

Also good idea!

Merged now, so closing

@schollz schollz closed this as completed Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants