-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: IntactPack - storing UnixFS data in contiguous blocks for multi-mode Filecoin retrieval #513
Comments
|
This is primarily viewed from a retrieval perspective, given the strong allergies that are appearing to using the plain old IPLD-based protocols. I keep on encountering a strong preference for just piece based retrievals instead. Unfortunately I never got around to doing the benchmarking I was going to do on this with the Project Motion data I had stored live to compare the same data fetched both ways to quantify what's been called "the IPLD tax" as it pertains to transfer speeds. There also seems to be a growing meme that if the bytes aren't stored contiguously then it's inefficient. Juan tackled this recently, maybe it was at FilDev Iceland, and I broadly agree with him in that your data is almost never stored contiguously anyway because the various layers of storage abstractions all do funky things with it. But, it's a strong meme, and often these things sway users. I also buy the argument that it'd be simpler for a storage provider to just enable piece retrievals. If we insist on them all exposing trustless, or even bitswap, then we make life harder. If the minimum viable SP stack with retrievals is piece retrievals, then we can make it work with this protocol. Trustless is a value-add on top of it.
I'm proposing that the majority of this work is done via a special API in go-car itself, not the standard API, but an API that opt's in to seeing the data in this way. With that, you just treat byte sequences derived from UnixFS offsets in the same way that we treat them today in "untrusted" CARs—the bytes are invalid for the CID and it's rejected, just as if you uploaded an invalid CAR today. You could make the Filecoin deal for it, but you're not going to get the data back except via piece retrieval without verification. That can be done either on the server (Boost) when serving this trustless style, or on the client side when retrieving byte ranges via piece retrieval. The CID still exists in the reference layer and its digest must match the bytes in the looked up range. |
And yes, :ack: on the inefficiencies of double-streaming to first calculate the reference layer and then stream in the file bytes. That's a trade-off that should be acknowledged but my argument is that it's worth the cost if you want maximally flexible retrievability. |
Maybe worth comparing this to a variant where the data is uploaded / downloaded fully as raw bytes (maybe as file-delineated objects in a car?) and the ipld references can be generated as a look-aside metadata by/on the SP for it's ability also serve that data with full IPLD semantics. |
One drawback there is that you can't do small-chunk trustless retrieval with just a piece endpoint. By packing the proof tree at the start you make this all navigable via piece endpoint with range requests. You can fetch the metadata first, verifying that all makes sense, then fetch the bytes - streaming the whole file, and prove it as you go with the tree you already have. Making /piece/ + Range work for IPLD is one of the key pieces in this proposal. Of course that's just me saying that's an ideal goal, given what I see as the likely trajectory of SPs turning off bitswap and trustless given the overhead and just offering piece retrieval unless there's some other forcing mechanism that works. Right now "serves retrievals" is satisfied with piece retrievals and I suspect it'll stay that way unless Retriev or something else offers token incentives for providing data trustless. CommPact gets part way there but essentially jumps out of an IPLD world, even while staying content addressed. It would be a shame to lose the ecosystem of tooling and techniques because that's all we can do. |
Thank you for writing this up as an issue @rvagg 👍 Regarding this proposal, I am mentally stuck on "Why Both?" question. Mixing raw bytes in a CAR file would make things certainly more complex than just pure CID+Block sections in the current CAR format. This brings me to ask the same sort question that motivated the CommPact proposal:
For me the answer to that is about understanding the original shape of data, minimal adoption barrier, contextual awareness, the "Semantic Web" avenue of thinking. It is less about a cross-cutting representation that would be applied to everything. I believe that the most impactful thing we can do is to help make IPLD representation shine in specific use-cases where existing state of the art representations fall short. |
I feel like I have not explained the core idea behind CommPact very well: The whole point of it is that it keeps things IPLD friendly without forcing data prep until we figure out why one should go through with it. |
I don't think the fact that a filesystem implementation may repartition data it writes to disc drivers is a substantive justification for producing additional re-encodings of data writen to/form that filesystem which force additional copies of the data to be produced. Filesystems may re-partition data in the process of maintaining a large logical store that continuously presents that data to other layers in the way it received it. The filesystem does not leak this abstraction the way IPFS encoding does when that filesystem is networked, the cost of partitioning appears in the place where that cost has a justification, in the filesystem next to the disc driver. All data said to be available is made available "as it appears" and all of these systems maintain a stable API between protocols from network to syscall wrt "ranges of bytes" as there is no need for further consensus or adoption of the encoding structure the filesystem implements. Networked filesystems "network" the layer above the partitioning, rather than below. IPFS, when consceptualized as an encoding rather than a cryptographic address for an abstract "range of bytes," presents several new standards which have to be adopted and agreed upon related to the encoding, and in the case of files this is merely to produce a semantic representation of "a range of bytes." There's nowhere to implement this where it doesn't materialize as a forced copy of the data and there's nowhere that this won't show up as an additional overall cost of IPFS over any alternative technology because it doesn't do anything except produce an IPLD representation of a "range of bytes" which they already have. If you want the merkle tree in order to do incremental verification then you can publish enough information in content discovery to produce any IPFS tree locally from a generic proof and then you aren't fighting over abstract encodings of what is addressed anywhere but the client which can produce whatever encoding it prefers. I wrote this up hastily before the holidays as "Encodingless IPFS" https://hackmd.io/@gozala/content-claims#Encodingless-IPFS |
This is intended to be entirely compatible with the CAR format as-is. The "raw bytes" are still CID prefixed, it's just that the block is quite large and the CID won't be linked by anything else in the CAR, but you should be able to As @magik6k pointed out though, current go-car has some size limitations for blocks based on the fact that an |
Throwing some thoughts out there I've seen in the process of working on web3.storage.
|
sidebar: anyone know what was the motivation for encoding lengths & CIDS next to the blocks? like if we started from scratch on an indexed car, wouldn't it make sense to have: header not for carv1 streaming, but for cars with indexes. |
I think the original design just came from Jeromy wanting to spew a bunch of blocks into a file and then slurp them back in again - both for filecoin chain data, and then to do the whole deal thing to get a CommP that in some way related to the IPLD structure; not a whole lot of thought beyond that afaik, the original go-car wasn't very sophisticated other than spew+slurp. |
This started as a Notion document but it's kind of hidden there, although I've now published it for public viewing and commenting. I don't want it to get lost as I start to focus on other things. I did intend to code up a version of this in go-car and work up a spec PR but I'm worried that it'll get lost in the wash so I'm recording this here for permanency and hopefully a bit more engagement with other folks.
Why Not Both? Packing Content for IPLD vs Piece (IntactPack)
TL;DR
Instead of packing and storing file / blob data like this:
Pack and store it like this:
Background
There currently exists two views, and two separate uses of Filecoin storage of content: IPLD block focused and Filecoin as an opaque block device. These exist as lenses that are used to view the nature of data stored on Filecoin but also impact the technology choices as we move up the stack.
Vision 1: IPLD block focused
Vision 2: Filecoin is an opaque block-device
Competing visions of Filecoin as an IPLD-block storage layer vs a opaque byte storage is older than the network. Opaque bytes make everything on the Storage Provider side significantly simpler and cheaper. Yet deep IPLD integration provides significant value to the network (making this case is beyond the scope of this document, but this is a strongly held conviction by some).
Planning for rearchitecting miner and markets software takes the cost and complexity view that the IPLD layer is a value-add and should be considered an optional extra, to be added if/when it’s needed/demanded. Storing and proving pieces is the critical activity, retrieving pieces is significantly simpler and more efficient (in many definitions of “efficient”) and indexing at the IPLD-block level is complexity that been persistently difficult to solve and is one of the largest costs and risks for scalability.
Finding “Why Not Both?” Approaches
The ideal for storage providers is a choose-your-complexity model, where the simplest form of the software stack has the lowest complexity and cost, but complexity can be added to increase the value-add to a storage provider’s customers where it is needed.
CommPact is one attempt at describing a world where piece data can also be viewed through an IPLD lens, by re-purposing the piece commitment proving tree over the raw piece data. This path is one form of in-between (”why not both?”) approach. In this form, the data is piece-first, in that it is prepared with the perspective of it being stored in a piece, and the IPLD lens can be applied after-the-fact as required (i.e. the IPLD lens is optional).
Why Not Both: IPLD-first
To serve both worlds, we could take an approach where data preparation still involves packing data in a CAR, including (some) UnixFS encoding, but lays out that data in such a way that it can also be retrieved in whole (or large) portions via piece retrieval. Markets software that is aware of this format can serve content both as standard, small, IPLD blocks, and/or as raw pieces. A storage provider who does not want or need to index IPLD blocks can still serve the content to clients that have stored metadata about content location, while a storage provider that wants to value-add with full Bitswap and HTTP retrieval can extract and serve blocks using the minimal metadata included in the packed form. Most importantly, this proposal does not break existing specifications for UnixFS or the CAR format, but it does add some challenges for existing use-cases making assumptions about what they contain.
IntactPack
Input data can be opaque bytes, or it can be file data. In both cases, it is UnixFS encoded to the reference layer. i.e. this “reference layer” is the UnixFS DAG is prepared above the raw byte leaves, as if those leaves are also going to be stored in the CAR. This reference layer is then packed in a CAR, and then the whole byte blob is then added to the CAR as a single (potentially very large) IPLD block, complete with its own CID prefix.
This leaves us with two separate parts to a CAR: the reference layer which has links to blocks that don’t technically exist as sections in the CAR, and the large byte block, which has a CID that is not referenced by anything.
The process can be repeated for any number of blobs in order to fill up a CAR to the required size. Likewise, blobs that are larger than the required size can be split up in to smaller chunks and treated as separate blobs (note there are some complexities in this case which require us to apply some special handling, this is described below).
Standard UnixFS encoding and packing files / blobs into CAR format introduces challenges for Piece retrievals:
Standard UnixFS encoding of a file / blob: Raw byte blocks chunked at bottom layer, reference layer above to contain the leaves in a graph.
Standard UnixFS file / blob packed in order in CAR format.
A full CAR with multiple files / blobs
Retrieving a file / blob using HTTP piece protocol involves:
UnixFS encoding of a file / blob with the reference layer only, leaves are virtual and the UnixFS blocks contain the details of where they exist in the original stream of bytes
UnixFS reference layer blocks packed together in a CAR followed by single raw block containing original (non-chunked) bytes
A full CAR with multiple files / blobs with non-standard format, original files are fully contiguous
Retrieving a file / blob using this modified UnixFS packing requires less accounting (in minimal form) and a simpler process:
Handling Very-large Blobs
Where packing within Filecoin sector constraints, CARs must be under the ~32G (or ~64G) threshold to make a storable piece. Data that is larger than this threshold can be handled by splitting the bytes into roughly sector sized chunks and storing those chunks prefixed by a full, valid reference layer (only) for that chunk.
The first chunk CAR is straightforward to read and understand. Understanding subsequent chunk CARs requires inspecting the initial portion of the reference layer to find the first leaf block’s file offset. The length of the chunk can be either inferred from the raw block’s length or from the reference layer by also inspecting the last block’s offset and length.
Tooling and Protocol Integration
This packing format does not strictly break any existing specifications:
CAR Format Integration
This format does not strictly require changes to the CAR format, however in some cases it is useful to signal that a CAR is packed using this format, with reference layers packed before the full contiguous bytes of a file / blob. The CARv2 format offers a lightweight “capabilities” signalling bitmap in its prefix which can be used to provide a boolean “is IntactPack format” signal to consumers. This minimally just requires a CARv2 header prefixing the CARv1 payload.
It is theoretically possible to determine if a file is packed in this format by reading its contents, determining that the order of blocks is as expected and verifying the raw block chunks match the expected raw leaves contained in the reference layer. However, this requires both a full ingest of the CAR and a hashing of many blocks so is less optimal than capability signalling.x
Storage Provider Integration
A minimum requirement of HTTP piece retrievals is necessary, but a storage provider may optionally index and serve the individual raw leaf IPLD blocks of files within these CARs. Boost can be IntactPack aware by:
Retrieval Tooling Integration
Retrievals can take multiple forms with this format, each of these can be built in to Lassie or other retrieval clients:
The text was updated successfully, but these errors were encountered: