Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(snapshots): refactor snapshot data access #5583

Merged
merged 23 commits into from
Dec 12, 2023

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Nov 27, 2023

  • load filters and offsets only if requested. (default is off on reth node )
  • offsets are stored in their own file uncompressed. (last/max one sets how many bytes per offset)
  • accessing data means reading the offset from the offset file and then the actual data from the datafile (MmapHandle becomes DataReader)

This allows the loading of snapshots to be quite faster (~30μs), while allowing us to hold more in memory (if not loading filters)

@joshieDo joshieDo added C-enhancement New feature or request A-static-files Related to static files labels Nov 27, 2023
@joshieDo joshieDo changed the title feat(snapshots): refactor snapshot reading feat(snapshots): refactor snapshot data access Nov 27, 2023
Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smol doc nit,
pending @shekhirin

/// File descriptor. Needs to be kept alive as long as the mmap handle.
/// Manages the reading of snapshot data using memory-mapped files.
///
/// Holds file and mmap descriptors of the data and offsets files of a snapshot.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

offsets files?

@@ -235,39 +235,61 @@ where

// SAFETY: File is read-only and its descriptor is kept alive as long as the mmap handle.
let data_reader = unsafe { memmap2::Mmap::map(&data_file)? };
let mut obj: Self = bincode::deserialize_from(data_reader.as_ref())?;
let max_row_size: [u8; 8] =
data_reader.as_ref()[0..8].try_into().expect("slice with incorrect length");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the error message is not obvious, should it rather be something about missing header?

self.offsets.iter().max().expect("should exist.").leading_zeros() as usize +
7) /
8;
file.write_all(&[bytes_needed as u8])?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we writing max_row_size here?

@joshieDo joshieDo changed the base branch from main to feat/snapshots December 12, 2023 16:00
@joshieDo joshieDo merged commit 0dffeb8 into feat/snapshots Dec 12, 2023
@joshieDo joshieDo deleted the joshie/optional-filters branch December 12, 2023 16:02
@joshieDo joshieDo mentioned this pull request Dec 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-static-files Related to static files C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants