-
Notifications
You must be signed in to change notification settings - Fork 881
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks for Arrow IPC reader #7091
Conversation
}); | ||
} | ||
|
||
// copied from the zero_copy_ipc example. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did want to benchmark using mmap as I think it is an important usecase
Maybe we should contemplate making this IPCBufferDecoder
its own real API 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tustvold I would be particularly interested in your feedback on this idea (I can file ticket / work with someone to make the code but don't want to do it if you know of some reason not to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see no issue with this, the motivation for the lower level API was to support use-cases where people want to have more control over the IO (as opposed to just punting it to mmap). Having a more ergonomic API for the simple case, seems harmless to me
}) | ||
}); | ||
|
||
group.bench_function("FileReader/read_10/mmap", |b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
6cd7e5d
to
7a6a00a
Compare
Thank you @tustvold |
Which issue does this PR close?
Rationale for this change
Similarly to #7090 we need a benchmark to show efforts to improve validation will actually speed it up.
What changes are included in this PR?
Are there any user-facing changes?
No, this is only benchmark code
As you can see by the flamegraph, indeed most of the time is spent validating data (UTF8 validation specifically)