-
Notifications
You must be signed in to change notification settings - Fork 211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build sst file in more resource friendly way #486
Labels
feature
New feature or request
Comments
Glad parquet has accepted my proposal, we can just wait version release for both data fusion and parquet. |
2 tasks
ProposalCurrent procedure:
New procedure:
|
However, after reviewing the api of parquet writer, async writing has been supported yet, and the relating issue is apache/arrow-rs#1269. To solve this problem, I guess there are two ways:
Personally, I vote for the first way. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe This Problem
In current implementation, sst write involves two loop:
Two loops mean more cpu usage, what's worse it that this may eat too many memory.
Proposal
It's best we can reduce build procedure to one loop, this depends on apache/arrow-rs#3356
If this is not possible, then we may need to spit RecordBatch to disk in order to reduce memory consumption.
Additional Context
No response
The text was updated successfully, but these errors were encountered: