Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: CacheKV Improvements #14377

Closed
1 of 3 tasks
yihuang opened this issue Dec 21, 2022 · 6 comments
Closed
1 of 3 tasks

EPIC: CacheKV Improvements #14377

yihuang opened this issue Dec 21, 2022 · 6 comments
Assignees
Labels

Comments

@yihuang
Copy link
Collaborator

yihuang commented Dec 21, 2022

Summary

cachekv is an important component in cosmos-sdk execution, we identified several performance or thread-safty improvement opportunities.

Problem Definition

Work Breakdown

  • make cachekv store thread-safe again #14376
  • duplicated caches in nested cache stores
    in nested case, each layer of cache store will cache the readed kv-pairs.
    Solution: only cache clean pairs in lowest layer?
  • Unnecessary sortings in Write method in nested cases.
    the Write method will always collect the keys and sort them before write into parent store one by one, but if the parent store is also a cache store, this sorting effort is wasted.
    Solution: support batch Write method in store interface, and let underlying store to decide if it want to sort the change setes.
@alexanderbez alexanderbez changed the title epic: cachekv improvement ideas EPIC: CacheKV Improvements Dec 21, 2022
@yihuang
Copy link
Collaborator Author

yihuang commented Dec 27, 2022

I have an idea for the other two items, which is just to copy the cache store itself.

Context

Current implementation of nested cache stores has most operations with O(N) complexity, for example Get/Iterate need to recursively traverse the nested stack, when commit Write is called on each layer, it sort the keys and write them one by one to upper layer.

Nested performance matters because first of call, there's at least two layers of nesting in tx execution, deliverState.ms plus anteCtx or runMsgCtx, some msg logic will create other layers on top of that.

Another important use case for nested cache is for better integrating with EVM, EVM support the message calls that can revert on exception. Currently ethermint support this by adopting the journal log approach from go-ethereum itself, but it prevents more natural integration with cosmos native functionalities in the form of precompiled contract1. In this case we need efficient cache store with arbitrarily depth of nesting.

So it's important to make nested cache store operate in O(1) complexity despite of nested depth.

Copy-On-Write BTree

Currently we have migrated to tidwall/btree to support the cache store, and using the copy-on-write feature to do the safe iteration, but there are more potentials in that.

First of all, let's do this 2 to unify the cache store with a single btree.

Then we can support a cheap Clone operation on the whole cache store:

// Clone the cache store. This is a copy-on-write operation and is very fast because
// it only performs a shadowed copy.
func (store *Store) Clone() *Store {
	return &Store{
		cache:  store.cache.Copy(),
		parent: store.parent,
	}
}

So instead of doing:

msCache := ctx.ms.CacheMultiStore()
...
msCache.Get(key)  // it recursively calls into each layer of cache stack when not found in cache.
...
msCache.Write()  // significant overhead

we can do:

cacheMS := ctx.ms.Clone()
...
cacheMS.Get(key) // it only calls the uncached parent when not found in cache.
...
ctx = ctx.WithMultiStore(cacheMS) // zero overhead

Consequences

Positives

  • Performance improvement on Get Iterate Write operations on nested cache stores.
  • Clear road-block for using nested cache store to integrate EVM in ethermint.
  • Simpler cache store implementation

Negatives

  • Slower Set operation on cache store, because of difference in btree and golang map itself, but can be considered as compensated by improvement on Write operation.
  • Breaking behavior, previously after branched out, the original store can also be used for writing, when the branch write back, both branch get merged. In the new model, the original one can not be used for writing, the branch will simply replace the original one when commit.

Footnotes

  1. https://github.com/evmos/ethermint/pull/1131

  2. https://github.com/cosmos/cosmos-sdk/pull/14350

@alexanderbez
Copy link
Contributor

@yihuang I love this idea but I have concerns about:

Breaking behavior, previously after branched out, the original store can also be used for writing, when the branch write back, both branch get merged. In the new model, the original one can not be used for writing, the branch will simply replace the original one when commit.

The fact that we cannot write on the parent seems like a design flaw to me. I think if we can devise a way in which we can still write to the original root/parent store, then this approach would be sound. WDYT?

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 28, 2022

The fact that we cannot write on the parent seems like a design flaw to me. I think if we can devise a way in which we can still write to the original root/parent store, then this approach would be sound. WDYT?

It seems almost all the use cases for branching out cache store don't use the original one simultaneously.
Maybe we can support both API at the same time? implement it as a new one.

@yihuang
Copy link
Collaborator Author

yihuang commented Dec 28, 2022

Put it another way, we can support sth like this:

snapshot = ctx.CacheMultiStore().Snapshot()
...
if failure {
  ctx.CacheMultiStore().Restore(snapshot)
}

@alexanderbez
Copy link
Contributor

Seems reasonable to me :)

yihuang added a commit to yihuang/cosmos-sdk that referenced this issue Jan 20, 2023
Closes: cosmos#14377

Solution:
- Unify cachekv's cache content with single tidwall/btree.
- Use the copy-on-write supported of tidwall/btree to implement cheap `Clone`/`Restore` methods in cache store.
- Add `RunAtomic` method in Context to provide more efficient store branching out,
  it don't notifies the tracing module because it don't have the `Write` process as before.
- API breaking: Context always hold a `CacheMultiStore` instead of `MultiStore`.
- Refactor runTx and other module logics to use new `RunAtomic` method instead of `CacheContext`.
@itsdevbear
Copy link

Put it another way, we can support sth like this:

snapshot = ctx.CacheMultiStore().Snapshot()
...
if failure {
  ctx.CacheMultiStore().Restore(snapshot)
}

We're rocking this in Polaris: https://github.com/berachain/polaris/blob/main/store/snapmulti/store.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants