-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
file size limit? #237
base: master
Are you sure you want to change the base?
file size limit? #237
Conversation
Is this change backward compatible? For example, if I create a snapshot for a node with the |
the parameter cannot be change, it is written in metadata, so should error when trying to open with a wrong parameter. |
Great stuff. One thing I'm concerned about is hitting the OS limit for memory maps ( |
src/file.rs
Outdated
#[cfg(not(test))] | ||
const RESERVE_ADDRESS_SPACE: usize = 1024 * 1024 * 1024; // 1 Gb | ||
// Use a different value for tests to work around docker limits on the test machine. | ||
#[cfg(test)] | ||
const RESERVE_ADDRESS_SPACE: usize = 64 * 1024 * 1024; // 64 Mb | ||
|
||
let map_len = len + RESERVE_ADDRESS_SPACE; | ||
let map_len = len + RESERVE_ADDRESS_SPACE; // TODO should be max?? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering it it should be: std::cmp::max(RESERVE_ADDRESS_SPACE, len)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RESERVE_ADDRESS_SPACE
is supposed to be additional reserved space. I.e. if RESERVE_ADDRESS_SPACE
is 1024Mb and we try to map 1025Mb it should reserve additional space up to 2048Mb so that when next time we try to map 1026Mb it could use the same space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if max_len
is set, it should be used as max map size instead of RESERVE_ADDRESS_SPACE
let file = self.create_file()?; | ||
let len = GROW_SIZE_BYTES; | ||
try_io!(file.set_len(len)); | ||
let map = mmap(&file, 0)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I switched from mmap(&file, 0)
to mmap(&file, len)
.
yes, sure is convenient being able to define 1mb files (for testing), but I would not really expect something lower than 512Mb as max size, still with a db of 200gb, that would be more than 400 open mmap: clearly something to test. |
src/index.rs
Outdated
madvise_random(&mut map); | ||
log::debug!(target: "parity-db", "Opened existing index {}", id); | ||
Ok(Some(IndexTable { id, path, map: RwLock::new(Some(map)) })) | ||
try_io!(file.set_len(file_size(id.index_bits()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like all the index files are getting the original total size rather than smaller size based on max_chunks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch, thanks 👍
src/index.rs
Outdated
.create_new(true) | ||
.open(path.as_path())); | ||
log::debug!(target: "parity-db", "Created new index {}", self.id); | ||
try_io!(file.set_len(file_size(self.id.index_bits()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same file size issue as above.
src/ref_count.rs
Outdated
Ok(file) => file, | ||
}; | ||
|
||
try_io!(file.set_len(file_size(id.index_bits()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as with index.
src/ref_count.rs
Outdated
.create_new(true) | ||
.open(path.as_path())); | ||
log::debug!(target: "parity-db", "Created new ref count {}", self.id); | ||
try_io!(file.set_len(file_size(self.id.index_bits()))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Last one.
src/file.rs
Outdated
@@ -87,84 +97,111 @@ const GROW_SIZE_BYTES: u64 = 256 * 1024; | |||
|
|||
#[derive(Debug)] | |||
pub struct TableFile { | |||
pub map: RwLock<Option<(memmap2::MmapMut, std::fs::File)>>, | |||
pub path: std::path::PathBuf, | |||
pub map: RwLock<Vec<(memmap2::MmapMut, std::fs::File)>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be called maps
or mem_maps
src/file.rs
Outdated
}, | ||
}; | ||
let capacity = new_len / entry_size as u64; | ||
let per_page_capacity = self.max_size.map(|m| m / entry_size as usize).unwrap_or(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
entries_per_file
. Let's not introduce terms such as "page" here.
src/file.rs
Outdated
let new_map = mmap(&file, new_len as usize)?; | ||
let old_map = std::mem::replace(map, new_map); | ||
try_io!(old_map.flush()); | ||
let map_and_file_len = map_and_file.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
num_maps
src/file.rs
Outdated
@@ -180,44 +217,72 @@ impl TableFile { | |||
|
|||
pub fn grow(&self, entry_size: u16) -> Result<()> { | |||
let mut map_and_file = self.map.write(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maps
src/file.rs
Outdated
capacity += GROW_SIZE_BYTES / entry_size as u64; | ||
try_io!(file.set_len(GROW_SIZE_BYTES)); | ||
} else { | ||
capacity = len / entry_size as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this should be +=
. Though it doesn't cause any obvious problems as if the value is incorrect then grow is called and the correct capacity is calculated there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, sure +=
is right here.
Been ask a few time for a size limit on parity db files.
In itself it does not really make sense, but practically I believe that many tools are pretty bad at using big files (like not even chunking them).
Still this should not be default, but can be of use.
So I try to do a quick implementation to see if it can be a good idea to have.
Got this branch that seems to run the stress test: option max_file_size is in megabyte.
A few todo to handle and test to modify (so we have some runing with multiple files) and this could be good to go (an maybe use smallvec instead of vec).
cc\ @arkpar @MattHalpinParity @PierreBesson