-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: refactor writing to NippyJar
#5669
Conversation
This reverts commit b9c51e9.
offsets_file, | ||
tmp_buf: Vec::with_capacity(1_000_000), | ||
uncompressed_row_size: 0, | ||
offsets: Vec::with_capacity(1_000_000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would move these magic nums to constants just for clarity
} | ||
|
||
impl<'a, H: NippyJarHeader> NippyJarWriter<'a, H> { | ||
pub fn new(jar: &'a mut NippyJar<H>) -> Result<Self, NippyJarError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
document that this does consistency checks and self healing on open
} | ||
|
||
/// Flushes offsets to disk. | ||
pub(crate) fn commit_offsets(&mut self) -> Result<(), NippyJarError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should probably do sync_all
to attempt to tell the OS to actually flush the data to disk and not just its i/o buffer https://doc.rust-lang.org/std/fs/struct.File.html#method.sync_all
} | ||
|
||
/// Commits configuration and offsets to disk. It drains the internal offset list. | ||
pub fn commit(&mut self) -> Result<(), NippyJarError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason this does not attempt to flush data to the data file using e.g. sync_all
?
Co-authored-by: Bjerg <[email protected]>
Co-authored-by: Bjerg <[email protected]>
Refactors
NippyJar
for compatibility with the new snapshot scope.NippyJarWriter
: extracts certain logic fromNippyJar
that writes to file, refactors, and extends it. It should now be possible to create it non atomically (start, stop, continue), and prune (aka unwind) rows from the end.*.conf
: contains configuration data used in consistency checks (e.g., number of rows, columns, etc.).*.off
: holds the offset list with one change: the last offset now represents the expected size of the data file (used in consistency checks).*.index
: PHF/Cuckoo filter-related data..conf
(NippyJar
) is always the last file to be written to disk. On openingNippyJarWriter
, the offset list file and data file sizes are verified. If there's any discrepancy, they are truncated to match the number of rows set byNippyJar
.PR on top of #5583