Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core/snapshot: Dynamic state snapshots #372

Closed
wants to merge 87 commits into from
Closed

Core/snapshot: Dynamic state snapshots #372

wants to merge 87 commits into from

Conversation

endale98
Copy link

@endale98 endale98 commented Jul 17, 2023

This PR creates a secondary data structure for storing the Ethereum state, called a snapshot. This snapshot is special as it dynamically follows the chain and can also handle small-ish reorgs:

  • At the very bottom, the snapshot consists of a disk layer, which is essentially a semi-recent full flat dump of the account and storage contents. This is stored in LevelDB as a <hash> -> <account> mapping for the account trie and <account-hash><slot-hash> -> <slot-value> mapping for the storage tries. The layout permits fast iteration over the accounts and storage, which will be used for a new sync algorithm.
  • Above the disk layer there is a tree of in-memory diff layers that each represent one block's worth of state mutations. Every time a new block is processed, it is linked on top of the existing diff tree, and the bottom layers flattened together to keep the maximum tree depth reasonable. At the very bottom, the first diff layer acts as an accumulator which only gets flattened into the disk layer when it outgrows it's memory allowance. This is done mostly to avoid thrashing LevelDB.

The snapshot can be built fully online, during the live operation of a Geth node. This is harder than it seems because rebuilding the snapshot for mainnet takes 9 hours, during which the in-memory garbage collection long deletes the state needed for a single capture.

  • The PR achieves this by gradually iterating the state tries and maintaining a marker to the account/storage slot position until which the snapshot was already generated. Every time a new block is executed, state mutations prior to the marker get applied directly (the ones afterwards get discarded) and the snapshot builder switches to iterating the new root hash.
  • To handle reorgs, the builder operates on HEAD-128 and is capable of suspending/resuming if a state is missing (a restart will only write out some tries, not all cached in memory).

The benefit of the snapshot is that it acts as an acceleration structure for state accesses:

  • Instead of doing O(log N) disk reads (+leveldb overhead) to access an account / storage slot, the snapshot can provide direct, O(1) access time. This should be a small improvement in block processing and a huge improvement in eth_call evaluations.
  • The snapshot supports account and storage iteration at O(1) complexity per entry + sequential disk access, which should enable remote nodes to retrieve state data significantly cheaper than before (the sort order is the state trie leaf order, so responses can directly be assembled into tries too).
  • The presence of the snapshot can also enable more exotic use cases such as deleting and rebuilding the entire state trie (guerilla pruning) as well as building alternative state trie (e.g. binary vs. hexary), which might be needed in the future.

REFS

trinhdn97 and others added 30 commits June 16, 2023 14:30
trinhdn2 and others added 28 commits July 31, 2023 17:46
Unify multiple keccak interface into one inside crypto package
Move statedb.Account struct to types.StateAccount
@tungng98 tungng98 deleted the branch BuildOnViction:upgrade-core-develop December 10, 2023 16:51
@tungng98 tungng98 closed this Dec 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants