Skip to content
This repository has been archived by the owner on Aug 3, 2021. It is now read-only.

Support for exotic chunking #36

Open
chrysn opened this issue Jan 10, 2019 · 1 comment
Open

Support for exotic chunking #36

chrysn opened this issue Jan 10, 2019 · 1 comment

Comments

@chrysn
Copy link

chrysn commented Jan 10, 2019

Given the considerations mentioned in ipld/legacy-unixfs-v2#15 (comment) (following which I found ipfs-pack) for data that chunked in a content-aware fashion (there are mentioned video streams chunked at keyframes; my applications would rather have raw camera files that contain the literal jpeg inside them, images with XMP that is served as a copy in a sidecar file, or zip files without compression), I think that there are practical cases in which having a fmtstr that only tells the chunker and its parameters may be insufficient to reproduce the data already published on IPFS.

Would it be viable to have the option to explicitly describe the chunking, rather than giving rules the recipient might not be able to follow?

Straw man example:

Qm... <f0000120001> file1.jpg
Qm... <(3200, f0000120001) (4096000, f0000120001), (50000000, f0000120001)> file1.raw
Qm... <f0000120001> .

where file1.jpg is 4096000 bytes long and somewhere in the first part of file1.raw (with 3k of header and 50M of actual raw image at the tail). To reproduce the hash of file1.raw that has been published on IPFS, one needs to concatenate the results of chunking the first 3k, the jpeg data and the rest with the DefaultRaw chunker each rather than running it over the whole file.

I am currently unaware of tools that produce such content-aware chunkings of data (only tried it out manually so far), but the above-linked suggestion indicates that this is more than a pet-peeve of mine.

@chrysn
Copy link
Author

chrysn commented Jan 10, 2019

I just found the notes on future importers in ipfs/notes#216 (section "importers notes" / paragraph on sub-importers) that indicates that such importers will just be added to an always-growing versioned list of importers.

I'm still not closing this issue right away for three reasons:

  • I'd expect that bespoke importers can be a thing (eg. as camera vendors' raw file formats change over time and might not be easily recognized).
  • Importing might produce better results if environment knowledge is used (think PDF files, which can be used to have append-only changes: A PDF chunker might not want to cut at every possible location, but would if the file up to this point was published earlier).
  • Creating packs for arbitrary IPFS-published data would become tricky if there's no fallback path to explicitly stating chunks – at pack creation time, all known importers and settings would need to be tried (though heuristics can certainly cut away most), and that can still fail if the originally used chunker is not known to the pack creator.

If you think both are non-issues, feel free to close without further comment (though obviously, I'd appreciate one).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant