EPIC: CacheKV Improvements #14377

yihuang · 2022-12-21T02:48:12Z

Summary

cachekv is an important component in cosmos-sdk execution, we identified several performance or thread-safty improvement opportunities.

Problem Definition

Work Breakdown

make cachekv store thread-safe again #14376
duplicated caches in nested cache stores
in nested case, each layer of cache store will cache the readed kv-pairs.
Solution: only cache clean pairs in lowest layer?
Unnecessary sortings in Write method in nested cases.
the Write method will always collect the keys and sort them before write into parent store one by one, but if the parent store is also a cache store, this sorting effort is wasted.
Solution: support batch Write method in store interface, and let underlying store to decide if it want to sort the change setes.

The text was updated successfully, but these errors were encountered:

yihuang · 2022-12-27T03:00:20Z

I have an idea for the other two items, which is just to copy the cache store itself.

Context

Current implementation of nested cache stores has most operations with O(N) complexity, for example Get/Iterate need to recursively traverse the nested stack, when commit Write is called on each layer, it sort the keys and write them one by one to upper layer.

Nested performance matters because first of call, there's at least two layers of nesting in tx execution, deliverState.ms plus anteCtx or runMsgCtx, some msg logic will create other layers on top of that.

Another important use case for nested cache is for better integrating with EVM, EVM support the message calls that can revert on exception. Currently ethermint support this by adopting the journal log approach from go-ethereum itself, but it prevents more natural integration with cosmos native functionalities in the form of precompiled contract¹. In this case we need efficient cache store with arbitrarily depth of nesting.

So it's important to make nested cache store operate in O(1) complexity despite of nested depth.

Copy-On-Write BTree

Currently we have migrated to tidwall/btree to support the cache store, and using the copy-on-write feature to do the safe iteration, but there are more potentials in that.

First of all, let's do this ² to unify the cache store with a single btree.

Then we can support a cheap Clone operation on the whole cache store:

// Clone the cache store. This is a copy-on-write operation and is very fast because
// it only performs a shadowed copy.
func (store *Store) Clone() *Store {
	return &Store{
		cache:  store.cache.Copy(),
		parent: store.parent,
	}
}

So instead of doing:

msCache := ctx.ms.CacheMultiStore()
...
msCache.Get(key)  // it recursively calls into each layer of cache stack when not found in cache.
...
msCache.Write()  // significant overhead

we can do:

cacheMS := ctx.ms.Clone()
...
cacheMS.Get(key) // it only calls the uncached parent when not found in cache.
...
ctx = ctx.WithMultiStore(cacheMS) // zero overhead

Consequences

Positives

Performance improvement on Get Iterate Write operations on nested cache stores.
Clear road-block for using nested cache store to integrate EVM in ethermint.
Simpler cache store implementation

Negatives

Slower Set operation on cache store, because of difference in btree and golang map itself, but can be considered as compensated by improvement on Write operation.
Breaking behavior, previously after branched out, the original store can also be used for writing, when the branch write back, both branch get merged. In the new model, the original one can not be used for writing, the branch will simply replace the original one when commit.

alexanderbez · 2022-12-28T15:14:48Z

@yihuang I love this idea but I have concerns about:

Breaking behavior, previously after branched out, the original store can also be used for writing, when the branch write back, both branch get merged. In the new model, the original one can not be used for writing, the branch will simply replace the original one when commit.

The fact that we cannot write on the parent seems like a design flaw to me. I think if we can devise a way in which we can still write to the original root/parent store, then this approach would be sound. WDYT?

yihuang · 2022-12-28T15:58:05Z

The fact that we cannot write on the parent seems like a design flaw to me. I think if we can devise a way in which we can still write to the original root/parent store, then this approach would be sound. WDYT?

It seems almost all the use cases for branching out cache store don't use the original one simultaneously.
Maybe we can support both API at the same time? implement it as a new one.

yihuang · 2022-12-28T18:54:31Z

Put it another way, we can support sth like this:

snapshot = ctx.CacheMultiStore().Snapshot()
...
if failure {
  ctx.CacheMultiStore().Restore(snapshot)
}

alexanderbez · 2022-12-29T15:56:06Z

Seems reasonable to me :)

Closes: cosmos#14377 Solution: - Unify cachekv's cache content with single tidwall/btree. - Use the copy-on-write supported of tidwall/btree to implement cheap `Clone`/`Restore` methods in cache store. - Add `RunAtomic` method in Context to provide more efficient store branching out, it don't notifies the tracing module because it don't have the `Write` process as before. - API breaking: Context always hold a `CacheMultiStore` instead of `MultiStore`. - Refactor runTx and other module logics to use new `RunAtomic` method instead of `CacheContext`.

itsdevbear · 2023-03-07T02:46:21Z

Put it another way, we can support sth like this:
snapshot = ctx.CacheMultiStore().Snapshot()
...
if failure {
  ctx.CacheMultiStore().Restore(snapshot)
}

We're rocking this in Polaris: https://github.com/berachain/polaris/blob/main/store/snapmulti/store.go

alexanderbez changed the title ~~epic: cachekv improvement ideas~~ EPIC: CacheKV Improvements Dec 21, 2022

alexanderbez assigned yihuang and alexanderbez Dec 21, 2022

alexanderbez added the C:Store label Dec 21, 2022

yihuang mentioned this issue Dec 29, 2022

perf: improve performance of nested cache store #14444

Closed

19 tasks

tac0turtle added this to Cosmos-SDK Legacy Jan 2, 2023

github-project-automation bot moved this to 📝 Todo in Cosmos-SDK Legacy Jan 2, 2023

yihuang mentioned this issue Mar 7, 2023

Problem: overhead of nested cache stores berachain/polaris#420

Open

catShaark mentioned this issue Apr 20, 2023

Problem: memiavl cache store is not optimal crypto-org-chain/cronos#982

Closed

tac0turtle closed this as completed Sep 1, 2023

github-project-automation bot moved this from 📝 Todo to 👏 Done in Cosmos-SDK Legacy Sep 1, 2023

tac0turtle mentioned this issue Sep 1, 2023

[Epic]: Storage/v2 #17041

Open

tac0turtle removed this from Cosmos-SDK Legacy Nov 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: CacheKV Improvements #14377

EPIC: CacheKV Improvements #14377

yihuang commented Dec 21, 2022 •

edited

Loading

yihuang commented Dec 27, 2022 •

edited

Loading

alexanderbez commented Dec 28, 2022

yihuang commented Dec 28, 2022 •

edited

Loading

yihuang commented Dec 28, 2022

alexanderbez commented Dec 29, 2022

itsdevbear commented Mar 7, 2023

EPIC: CacheKV Improvements #14377

EPIC: CacheKV Improvements #14377

Comments

yihuang commented Dec 21, 2022 • edited Loading

Summary

Problem Definition

Work Breakdown

yihuang commented Dec 27, 2022 • edited Loading

Context

Copy-On-Write BTree

Consequences

Positives

Negatives

Footnotes

alexanderbez commented Dec 28, 2022

yihuang commented Dec 28, 2022 • edited Loading

yihuang commented Dec 28, 2022

alexanderbez commented Dec 29, 2022

itsdevbear commented Mar 7, 2023

yihuang commented Dec 21, 2022 •

edited

Loading

yihuang commented Dec 27, 2022 •

edited

Loading

yihuang commented Dec 28, 2022 •

edited

Loading