Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

set hash bins to 65k #17912

Merged
merged 1 commit into from
Jun 16, 2021
Merged

set hash bins to 65k #17912

merged 1 commit into from
Jun 16, 2021

Conversation

jeffwashington
Copy link
Contributor

@jeffwashington jeffwashington commented Jun 12, 2021

Problem

more bins breaks the accounts for scanning into smaller groups, allowing higher paralleism.

Summary of Changes

65k is our current structural limit and is probably a good stopping point for the moment.
Fixes #

@codecov
Copy link

codecov bot commented Jun 12, 2021

Codecov Report

Merging #17912 (5aaba21) into master (361c1bd) will increase coverage by 0.0%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #17912   +/-   ##
=======================================
  Coverage    82.6%    82.6%           
=======================================
  Files         431      431           
  Lines      121184   121179    -5     
=======================================
+ Hits       100158   100174   +16     
+ Misses      21026    21005   -21     

@jeffwashington
Copy link
Contributor Author

jeffwashington commented Jun 15, 2021

colo, mnb snapshot.
Total improves slightly with the current algorithm. The algorithm is about to improve. Data below.

65k:
calculate_accounts_hash_without_index accounts_scan=1189663i eliminate_zeros=217116i hash=71724i hash_time_pre_us=71509i sort=202693i hash_total=70524929i flatten=1283538i storage_sort_us=1393i unreduced_entries=71426722i collect_snapshots_us=51704i num_snapshot_storage=345908i total=3017831i
calculate_accounts_hash_without_index accounts_scan=1073849i eliminate_zeros=198102i hash=67582i hash_time_pre_us=67410i sort=762645i hash_total=70524929i flatten=1155190i storage_sort_us=1532i unreduced_entries=71426722i collect_snapshots_us=72095i num_snapshot_storage=345908i total=3330995i
normal:

@jeffwashington jeffwashington requested a review from sakridge June 15, 2021 04:59
@jeffwashington jeffwashington marked this pull request as ready for review June 15, 2021 04:59
@jeffwashington
Copy link
Contributor Author

jeffwashington commented Jun 15, 2021

some raw data:
upshot:
higher bins + better algorithm makes better performance:
lemond, 147M accounts, 200k slot subset of kin simulation accounts

current algorithm
us
9169758 - 256 bins
7081799 - 65k bins

refined algorithm with no flatten
6399554 - 256 bins
5093599 - 65k bins

@jeffwashington jeffwashington merged commit 55ee3b5 into solana-labs:master Jun 16, 2021
mergify bot pushed a commit that referenced this pull request Jun 16, 2021
(cherry picked from commit 55ee3b5)
mergify bot added a commit that referenced this pull request Jun 16, 2021
(cherry picked from commit 55ee3b5)

Co-authored-by: Jeff Washington (jwash) <[email protected]>
@brooksprumo brooksprumo mentioned this pull request Aug 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants