-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Startup optimization in shrink - don't shrink non-shrinkable slots #17405
Conversation
This PR: |
edaa72d
to
409af0d
Compare
Codecov Report
@@ Coverage Diff @@
## master #17405 +/- ##
========================================
Coverage 82.6% 82.7%
========================================
Files 425 425
Lines 118840 118543 -297
========================================
- Hits 98269 98057 -212
+ Misses 20571 20486 -85 |
d47e292
to
395ada1
Compare
runtime/src/accounts_db.rs
Outdated
drop(accounts_index_map_lock); | ||
index_read_elapsed.stop(); | ||
let aligned_total: u64 = self.page_align(alive_total); | ||
|
||
// If not saving memory maps or more than 1 page of data, then skip the shrink | ||
if num_stores == 1 && aligned_total + PAGE_SIZE >= original_bytes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's also a good idea to extract this into a reusable is_bytes_saved_sufficient_for_shrink()
that can be added to the check here where we add to the shrink_candidates
: https://github.com/solana-labs/solana/blob/master/runtime/src/accounts_db.rs#L4592-L4607.
Otherwise, we'll continually add to the shrink candidates and incur the overhead of running/scanning in the beginning of this do_shrink_slot_stores()
function as we delete accounts from the same slot, even if they continually fail this new aligned_total + PAGE_SIZE >= original_bytes
check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, just out of curiosity, do we know the snapshot sizes before and after?
Theoretically if every slot had an extra PAGE_SIZE = 4096
bytes, we might get an extra 432000 slots * 496 bytes ~= 2GB
raw extra size before compression, but wonder if the results reflect this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran this for a while on mainnet. snapshot sizes didn't seem to be affected:
Every 2.0s: ls -alh mainnet-beta/validator-ledger/snapshot-*
-rw-rw-r-- 1 stephen_solana_com stephen_solana_com 6.3G May 4 15:00 mainnet-beta/validator-ledger/snapshot-76776676-EcV4U9gqqdXuhqgP69WaLhWmoyDQcJYxM1ZE5YuSspdF.tar.zst
-rw-rw-r-- 1 stephen_solana_com stephen_solana_com 6.6G May 27 15:13 mainnet-beta/validator-ledger/snapshot-80249526-9NwTfazvvgmdb4Wr8VnAhC1kEWqnU9APZjfpmrktjpRS.tar.zst
-rw-rw-r-- 1 stephen_solana_com stephen_solana_com 6.6G May 27 15:18 mainnet-beta/validator-ledger/snapshot-80249917-2WYkAHaKCYFEyCtA8gBF4Qxh6iNbrcxRbXUrqUQSvPai.tar.zst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great 😃
if num_stores == 1 && aligned_total + PAGE_SIZE >= original_bytes { | ||
for pubkey in unrefed_pubkeys { | ||
if let Some(locked_entry) = self.accounts_index.get_account_read_entry(pubkey) { | ||
locked_entry.addref(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could avoid this addref()
by either:
-
Removing the earlier
unref()
, and instead just iterate through theunrefed_pubkeys
and callunref()
after we've passed thealigned_total + PAGE_SIZE >= original_bytes
check above. Downside is an extra pass through theunrefed_pubkeys
keys -
We have the utility methods
store.alive_bytes()
andstore.total_bytes()
for tracking the alive/dead sizes. These should be accurate as long as a full clean() runs before shrink() which i think is true inverify_snapshot_bank()
. Moreover, clean() needs to run before shrink() for the currentis_alive
check in shrink to be accurate anyways:
let is_alive = locked_entry.slot_list().iter().any(|(_slot, i)| {
i.store_id == stored_account.store_id
&& i.offset == stored_account.account.offset
});
Maybe using these we could modify and move the aligned_total + PAGE_SIZE >= original_bytes
check earlier in the function before the scan, and exit there if the check fails (would also avoid the overhead of the additional scan through the storages). Something like:
if num_stores == 1 && self.page_align(store.alive_bytes() as u64) + PAGE_SIZE >= store.total_bytes() { return 0;}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right for 1, I am thinking this is the not-common case since we queued it up to shrink and we have other signals that tell us that shrinking is likely productive like alive_bytes/written_bytes. So we should optimize for the case where we are actually doing the shrink.
Clean does run before this shrink. I was thinking those may not be accurate, hence the location of this check, but I think you might be right, they might actually be somewhat accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Pull request has been modified.
4951a62
to
afa4ac4
Compare
df846bb
to
7a43a8d
Compare
@carllin any other concerns on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did one final pass, looks good!
(cherry picked from commit 14c52ab) # Conflicts: # runtime/src/accounts_db.rs
(cherry picked from commit 14c52ab) # Conflicts: # runtime/src/accounts_db.rs
…ackport #17405) (#17792) * Skip shrink when it doesn't save anything (#17405) (cherry picked from commit 14c52ab) # Conflicts: # runtime/src/accounts_db.rs * fix merge error Co-authored-by: sakridge <[email protected]> Co-authored-by: Jeff Washington (jwash) <[email protected]>
Problem
Startup shrink and clean operations are less efficient than they could be
Summary of Changes
generate_index
Fixes #