Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor writing to NippyJar #5669

Merged
merged 49 commits into from
Dec 18, 2023
Merged

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Dec 3, 2023

Refactors NippyJar for compatibility with the new snapshot scope.

  • Introduces NippyJarWriter: extracts certain logic from NippyJar that writes to file, refactors, and extends it. It should now be possible to create it non atomically (start, stop, continue), and prune (aka unwind) rows from the end.
  • The data file, which previously included both table data and configuration, has been split. So now, there are four files per snapshot:
    • Data file: holds only the table data.
    • *.conf: contains configuration data used in consistency checks (e.g., number of rows, columns, etc.).
    • *.off: holds the offset list with one change: the last offset now represents the expected size of the data file (used in consistency checks).
    • *.index: PHF/Cuckoo filter-related data.
  • Adds consistency checks.
    • .conf (NippyJar) is always the last file to be written to disk. On opening NippyJarWriter, the offset list file and data file sizes are verified. If there's any discrepancy, they are truncated to match the number of rows set by NippyJar.

PR on top of #5583

@joshieDo joshieDo requested review from gakonst and mattsse December 3, 2023 21:33
@joshieDo joshieDo added C-enhancement New feature or request A-static-files Related to static files labels Dec 3, 2023
@joshieDo joshieDo removed the request for review from gakonst December 3, 2023 21:39
@joshieDo joshieDo marked this pull request as ready for review December 4, 2023 13:48
Base automatically changed from joshie/optional-filters to feat/snapshots December 12, 2023 16:02
@joshieDo joshieDo mentioned this pull request Dec 12, 2023
@joshieDo joshieDo mentioned this pull request Dec 13, 2023
offsets_file,
tmp_buf: Vec::with_capacity(1_000_000),
uncompressed_row_size: 0,
offsets: Vec::with_capacity(1_000_000),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would move these magic nums to constants just for clarity

}

impl<'a, H: NippyJarHeader> NippyJarWriter<'a, H> {
pub fn new(jar: &'a mut NippyJar<H>) -> Result<Self, NippyJarError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document that this does consistency checks and self healing on open

}

/// Flushes offsets to disk.
pub(crate) fn commit_offsets(&mut self) -> Result<(), NippyJarError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably do sync_all to attempt to tell the OS to actually flush the data to disk and not just its i/o buffer https://doc.rust-lang.org/std/fs/struct.File.html#method.sync_all

}

/// Commits configuration and offsets to disk. It drains the internal offset list.
pub fn commit(&mut self) -> Result<(), NippyJarError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason this does not attempt to flush data to the data file using e.g. sync_all?

@shekhirin shekhirin merged commit c9893c0 into feat/snapshots Dec 18, 2023
@shekhirin shekhirin deleted the joshie/writer-jar branch December 18, 2023 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-static-files Related to static files C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants