-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet arrow writer doesn't track memory size correctly for fixed sized lists #6839
Comments
The issue is that GenericColumnWriter::memory_size is not accounting for the data_pages it has buffered waiting for the dictionary page to be flushed. This should be a relatively straightforward case of changing it to be
And adding an appropriate test. FYI @wiedld who added this in #5967 Edit: This should probably actually be reported as part of get_estimated_total_bytes |
Unfortunately I've been swamped and probably don't have time to fix it @alamb , I hope my reproduction will be enough for someone to pick it up! |
100% -- much appreciated 🙏 |
|
Describe the bug
The arrow writer doesn't track memory size correctly, and it seems like it thinks
FixedSizeList
columns have a fixed memory usage. Ie. the reported memory usage doesn't grow despite the buffers actually growing in memory.To Reproduce
Expected behavior
We should see the reported memory usage rise over time, then as flush is triggered, it should go down to around zero. Then repeat.
The text was updated successfully, but these errors were encountered: