-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Add accounts hard-link files into the bank snapshot directory #29496
Add accounts hard-link files into the bank snapshot directory #29496
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brooksprumo hit all & more of the things I would have commented on here
8de9cde
to
340f1c9
Compare
a45da45
to
379df31
Compare
0194e90
to
aa74ca6
Compare
bd94a92
to
196c679
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to go over this again, but submitting some initial thoughts + nits.
@xiangzhu70 I think one thing that would be really valuable for the PR (and for people looking back without as much context in the future) is if you added an example of what the new snapshot & accounts directories look like in the PR description - preferrably with multiple accounts_dirs.
d991fe5
to
72868f0
Compare
@@ -141,14 +141,16 @@ pub struct SnapshotRequestHandler { | |||
pub accounts_package_sender: Sender<AccountsPackage>, | |||
} | |||
|
|||
type AccountStorages = Vec<Arc<AccountStorageEntry>>; | |||
|
|||
impl SnapshotRequestHandler { | |||
// Returns the latest requested snapshot slot, if one exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment is out of date; we also return the snapshot storages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually...is this comment even correct either? It returns blockheight
, not slot. @brooksprumo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not related to the changes in this PR. not sure if I should make this change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing + updating the comment. I was originally suggesting it should be done here until I noticed the blockheight/slot mismatch - but better to have it fixed now that we've seen it!
/// Hard-link the files from accounts/ to snapshot/<bank_slot>/accounts/ | ||
/// This keeps the appendvec files alive and with the bank snapshot. The slot and id | ||
/// in the file names are also updated in case its file is a recycled one with inconsistent slot | ||
/// and id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing where are the slot and id in the file names updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great finding! I had the code which does the updating. Somehow it was lost when making the multi account_path changes. Now I get the code back. Thanks!
36f26b0
to
3b2dc90
Compare
d1478f9
to
5799adf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
heh, do you want me to re-review after I've approved the PR? |
Oh, I didn't notice your approval earlier. No need to re-review again. Many thanks for all the review work! |
Problem
To construct a bank from the bank snapshot directory, we need to build up the snapshot directory to contain the full state of the snapshot. This is one of the major step, to add the accounts appendvec files into the snapshot as hardlinks.
Because hardlinking only works on the same file system partition, a hardlink directory snapshot/ is created for every slot and every account_path provided. Each <account_path>/snapshot// contains the hardlinks of the account files in run/. The bank snapshot accounts_hardlinks contain symlinks to these directories.
For example, if the accounts_paths vector has the following:
~/ledger/accounts/
/mnt/accounts/
The bank snapshot directory snapshot/ will have an accounts_hardlinks/ directory, which contains symlinks
account_path_0 -> ~/ledger/accounts/snapshot/. # hardlinks of files in ~/ledger/accounts/run
account_path_1 -> /mnt/accounts/snapshot # hardlinks of files in /mnt/accounts/run
This is a split of the PR #28745
Summary of Changes
In add_bank_snapshot, add the hard-links of the accounts/ appendvec files.
Hold the reference counts of the appendvecs for the latest snapshot, to prevent the appendvecs from being recycled. The recycling would change the content of the hardlinked files. For now, a vector of SnapshotStorages is used to hold the reference counts. It could be changed to use option. But some data corruption error happened. Will debug this issue later.
There are also some other small debugging info changes to help debug the appendvec change history.
Fixes #