-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use BufReader
for LocalFileReader to revert performance regression in parquet reading
#1366
Conversation
BufRead
for ChunkObjectReader to improve performanceBufRead
for ChunkObjectReader to revert performance regression in parquet reading
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Dandandan -- I am not sure about the need for the BufRead
trait, but otherwise looks good to me.
Very nice 🕵️ work
@@ -18,7 +18,7 @@ | |||
//! Object store that represents the Local File System. | |||
|
|||
use std::fs::{self, File, Metadata}; | |||
use std::io::{Read, Seek, SeekFrom}; | |||
use std::io::{BufRead, BufReader, Read, Seek, SeekFrom}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL BufRead
trait
// A new file descriptor is opened for each chunk reader. | ||
// This okay because chunks are usually fairly large. | ||
let mut file = File::open(&self.file.path)?; | ||
file.seek(SeekFrom::Start(start))?; | ||
Ok(Box::new(file.take(length as u64))) | ||
|
||
let file = BufReader::new(file.take(length as u64)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this is the actual fix, right? Is the change to require the BufRead
trait needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Rerunning some benchmarks now without the trait.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - looks like BufRead
wasn't needed 🎉
BufRead
for ChunkObjectReader to revert performance regression in parquet readingBufReader
for ChunkObjectReader to revert performance regression in parquet reading
BufReader
for ChunkObjectReader to revert performance regression in parquet readingBufReader
for LocalFileReader to revert performance regression in parquet reading
🎉 |
Thanks @Dandandan ! Can you quickly explain what the reason for the slowdown was exactly? |
As far as I can explain: The earlier code used the By introducing the object storage abstraction, we were directly reading from a By wrapping the Maybe a potential improvement would be having a bit more control, such as setting the capacity of the buffer. |
Which issue does this PR close?
Closes #1363
Rationale for this change
Parquet reading was slow. This recovers the performance regression in the TPC-H benchmark.
There is still a slowdown in query 10 - and other queries, but this is unrelated to Parquet reading #1367 (and performance still improves from roughly 10 to 7s on that query).
What changes are included in this PR?
Use the
BufReadtrait instead of
Read`.Are there any user-facing changes?