perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast, avoid keys sort (backport #10486) (#10848)

* perf: store/cachekv: avoid a map lookup if unnecessary, clear maps fast (#10486)

We can shave off some milliseconds, but also cut down some Megabytes of
RAM consumed by only requesting from the cache if needed, but also using
the map clearing idiom which is recognized by the compiler to make fast
code.

Noticed in profiles from Tharsis' Ethermint per https://github.com/tharsis/ethermint/issues/710

- Before
* Memory profiles
```shell
   19.50MB    19.50MB    134:	store.cache = make(map[string]*cValue)
   18.50MB    18.50MB    135:	store.deleted = make(map[string]struct{})
   15.50MB    15.50MB    136:	store.unsortedCache = make(map[string]struct{})
```

* CPU profiles
```go
         .          .    118:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    119:	// at least happen atomically.
     150ms      150ms    120:	for _, key := range keys {
     220ms      3.64s    121:		cacheValue := store.cache[key]
         .          .    122:
         .          .    123:		switch {
         .      250ms    124:		case store.isDeleted(key):
         .          .    125:			store.parent.Delete([]byte(key))
     210ms      210ms    126:		case cacheValue.value == nil:
         .          .    127:			// Skip, it already doesn't exist in parent.
         .          .    128:		default:
     240ms     27.94s    129:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    130:		}
         .          .    131:	}

...

      10ms       60ms    134:	store.cache = make(map[string]*cValue)
         .       40ms    135:	store.deleted = make(map[string]struct{})
         .       50ms    136:	store.unsortedCache = make(map[string]struct{})
         .      110ms    137:	store.sortedCache = dbm.NewMemDB()
```

- After
* Memory profiles
```shell
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .          .    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .          .    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .          .    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
```

* CPU profiles
```shell
         .          .    111:	// TODO: Consider allowing usage of Batch, which would allow the write to
         .          .    112:	// at least happen atomically.
     110ms      110ms    113:	for _, key := range keys {
         .      210ms    114:		if store.isDeleted(key) {
         .          .    115:			// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
         .          .    116:			// be sure if the underlying store might do a save with the byteslice or
         .          .    117:			// not. Once we get confirmation that .Delete is guaranteed not to
         .          .    118:			// save the byteslice, then we can assume only a read-only copy is sufficient.
         .          .    119:			store.parent.Delete([]byte(key))
         .          .    120:			continue
         .          .    121:		}
         .          .    122:
      50ms      2.45s    123:		cacheValue := store.cache[key]
     910ms      920ms    124:		if cacheValue.value != nil {
         .          .    125:			// It already exists in the parent, hence delete it.
     120ms     29.56s    126:			store.parent.Set([]byte(key), cacheValue.value)
         .          .    127:		}
         .          .    128:	}
         .          .    129:
         .          .    130:	// Clear the cache using the map clearing idiom
         .          .    131:	// and not allocating fresh objects.
         .          .    132:	// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
         .      210ms    133:	for key := range store.cache {
         .          .    134:		delete(store.cache, key)
         .          .    135:	}
         .       10ms    136:	for key := range store.deleted {
         .          .    137:		delete(store.deleted, key)
         .          .    138:	}
         .      170ms    139:	for key := range store.unsortedCache {
         .          .    140:		delete(store.unsortedCache, key)
         .          .    141:	}
         .      260ms    142:	store.sortedCache = dbm.NewMemDB()
         .       10ms    143:}

```

Fixes #10487
Updates https://github.com/tharsis/ethermint/issues/710

(cherry picked from commit 5399e72f32)

# Conflicts:
#	CHANGELOG.md

* fix conflicts

Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Co-authored-by: Aleksandr Bezobchuk <aleks.bezobchuk@gmail.com>
This commit is contained in:
mergify[bot] 2022-01-03 14:40:15 +01:00 committed by GitHub
parent 7115d3b2c8
commit 14cd943a2e
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 27 additions and 11 deletions

View File

@ -51,6 +51,10 @@ Ref: https://keepachangelog.com/en/1.0.0/
### Improvements
* [\#10486](https://github.com/cosmos/cosmos-sdk/pull/10486) store/cachekv's `Store.Write` conservatively
looks up keys, but also uses the [map clearing idiom](https://bencher.orijtech.com/perfclinic/mapclearing/)
to reduce the RAM usage, CPU time usage, and garbage collection pressure from clearing maps,
instead of allocating new maps.
* (store) [\#10741](https://github.com/cosmos/cosmos-sdk/pull/10741) Significantly speedup iterator creation after delete heavy workloads. Significantly improves IBC migration times.
* (types) [\#10076](https://github.com/cosmos/cosmos-sdk/pull/10076) Significantly speedup and lower allocations for `Coins.String()`.

View File

@ -118,22 +118,34 @@ func (store *Store) Write() {
// TODO: Consider allowing usage of Batch, which would allow the write to
// at least happen atomically.
for _, key := range keys {
cacheValue := store.cache[key]
switch {
case store.isDeleted(key):
if store.isDeleted(key) {
// We use []byte(key) instead of conv.UnsafeStrToBytes because we cannot
// be sure if the underlying store might do a save with the byteslice or
// not. Once we get confirmation that .Delete is guaranteed not to
// save the byteslice, then we can assume only a read-only copy is sufficient.
store.parent.Delete([]byte(key))
case cacheValue.value == nil:
// Skip, it already doesn't exist in parent.
default:
continue
}
cacheValue := store.cache[key]
if cacheValue.value != nil {
// It already exists in the parent, hence delete it.
store.parent.Set([]byte(key), cacheValue.value)
}
}
// Clear the cache
store.cache = make(map[string]*cValue)
store.deleted = make(map[string]struct{})
store.unsortedCache = make(map[string]struct{})
// Clear the cache using the map clearing idiom
// and not allocating fresh objects.
// Please see https://bencher.orijtech.com/perfclinic/mapclearing/
for key := range store.cache {
delete(store.cache, key)
}
for key := range store.deleted {
delete(store.deleted, key)
}
for key := range store.unsortedCache {
delete(store.unsortedCache, key)
}
store.sortedCache = dbm.NewMemDB()
}