-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feather file with compression and larger than RAM #340
Comments
Hmmmm......we'll have to see what we can do here. I've had the idea for a while as a Tables.jl-wide feature to support projection/filter push down for sources in a generic way. That would translate really well to Arrow and would allow us to more easily avoid decompressing when not necessary. There's probably more we can do in the short-term though to avoid materializing when not needed. |
this is the whole thing we do un UnROOT.jl for a physics-community only thing called while we will get there eventually, in
this way, at most one row group worth of data ever lives in RAM, in fact that's the minimal amount you need in RAM, because you can only know row number start-end for an entire row group and you have to count inside it. but yeah, this is a whole thing in UnROOT.jl and it's mission-critical because our data are like, O(100) GB compressed all the time |
Ping for comments on #412. |
Last time I checked,
mmap
breaks down for files with compression. This is understandable because the compressed buffers clearly can't be re-interpreted without inflation.But the larger the file is more likely to be compressed, can we decompress only a single "row group" (and only the relevant columns, of course) on the fly yet? -- this is for the case when a user is doing per-row iteration
if user access the table by
tbl[range, range]
, then clearly, we might need to read more than one row-group and chop off head/tails depending on where the overlap isThe text was updated successfully, but these errors were encountered: