change(doc): Document state format upgrade implementation and testing (#7799)

* Explain verification modes * Explain block write modes * Expand upgrade goals * Document testing requirements * Update current format docs * Move the detailed checklist to the ticket template * Move an example back * Reword confusing paragraphs Co-authored-by: Arya <aryasolhi@gmail.com> * Explain the difference between major and minor/patch versions * Simplify upgrade wording Co-authored-by: Marek <mail@marek.onl> --------- Co-authored-by: Arya <aryasolhi@gmail.com> Co-authored-by: Marek <mail@marek.onl>
2023-10-26 08:18:41 +10:00 · 2023-10-26 08:18:41 +10:00 · a126acb160
parent b244d5628c
commit a126acb160
1 changed files with 119 additions and 24 deletions
--- a/book/src/dev/state-db-upgrades.md
+++ b/book/src/dev/state-db-upgrades.md
@ -1,18 +1,64 @@
 # Zebra Cached State Database Implementation

+## Current Implementation
+
+### Verification Modes
+[verification]: #verification
+
+Zebra's state has two verification modes:
+- block hash checkpoints, and
+- full verification.
+
+This means that verification uses two different codepaths, and they must produce the same results.
+
+By default, Zebra uses as many checkpoints as it can, because they are more secure against rollbacks
+(and some other kinds of chain attacks). Then it uses full verification for the last few thousand
+blocks.
+
+When Zebra gets more checkpoints in each new release, it checks the previously verified cached
+state against those checkpoints. This checks that the two codepaths produce the same results.
+
 ## Upgrading the State Database
+[upgrades]: #upgrades

 For most state upgrades, we want to modify the database format of the existing database. If we
 change the major database version, every user needs to re-download and re-verify all the blocks,
 which can take days.

-### In-Place Upgrade Goals
+### Writing Blocks to the State
+[write-block]: #write-block

+Blocks can be written to the database via two different code paths, and both must produce the same results:
+
+- Upgrading a pre-existing database to the latest format
+- Writing newly-synced blocks in the latest format
+
+This code is high risk, because discovering bugs is tricky, and fixing bugs can require a full reset
+and re-write of an entire column family.
+
+Most Zebra instances will do an upgrade, because they already have a cached state, and upgrades are
+faster. But we run a full sync in CI every week, because new users use that codepath. (And it is
+their first experience of Zebra.)
+
+When Zebra starts up and shuts down (and periodically in CI tests), we run checks on the state
+format. This makes sure that the two codepaths produce the same state on disk.
+
+To reduce code and testing complexity:
+- when a previous Zebra version opens a newer state, the entire state is considered to have that lower version, and
+- when a newer Zebra version opens an older state, each required upgrade is run on the entire state.
+
+### In-Place Upgrade Goals
+[upgrade-goals]: #upgrade-goals
+
+Here are the goals of in-place upgrades:
 - avoid a full download and rebuild of the state
- the previous state format must be able to be loaded by the new state
+- Zebra must be able to upgrade the format from previous minor or patch versions of its disk format
+  (Major disk format versions are breaking changes. They create a new empty state and re-sync the whole chain.)
  - this is checked the first time CI runs on a PR with a new state version.
    After the first CI run, the cached state is marked as upgraded, so the upgrade doesn't run
    again. If CI fails on the first run, any cached states with that version should be deleted.
+- the upgrade and full sync formats must be identical
+  - this is partially checked by the state validity checks for each upgrade (see above)
 - previous zebra versions should be able to load the new format
  - this is checked by other PRs running using the upgraded cached state, but only if a Rust PR
    runs after the new PR's CI finishes, but before it merges
@ -30,17 +76,26 @@ This means that:
  - it can't give incorrect results, because that can affect verification or wallets
  - it can return an error
  - it can only return an `Option` if the caller handles it correctly
- multiple upgrades must produce a valid state format
+- full syncs and upgrades must write the same format
+  - the same write method should be called from both the full sync and upgrade code,
+    this helps prevent data inconsistencies
+- repeated upgrades must produce a valid state format
  - if Zebra is restarted, the format upgrade will run multiple times
  - if an older Zebra version opens the state, data can be written in an older format
- the format must be valid before and after each database transaction or API call, because an upgrade can be cancelled at any time
+- the format must be valid before and after each database transaction or API call, because an
+  upgrade can be cancelled at any time
  - multi-column family changes should made in database transactions
-  - if you are building new column family, disable state queries, then enable them once it's done
+  - if you are building new column family:
+    - disable state queries, then enable them once it's done, or
+    - do the upgrade in an order that produces correct results
+      (for example, some data is valid from genesis forward, and some from the tip backward)
  - if each database API call produces a valid format, transactions aren't needed

-If there is an upgrade failure, it can panic and tell the user to delete their cached state and re-launch Zebra.
+If there is an upgrade failure, panic and tell the user to delete their cached state and re-launch
+Zebra.

-### Performance Constraints
+#### Performance Constraints
+[performance]: #performance

 Some column family access patterns can lead to very poor performance.

@ -62,17 +117,50 @@ But we need to use iterators for some operations, so our alternatives are (in pr
 Currently only UTXOs require key deletion, and only `utxo_loc_by_transparent_addr_loc` requires
 deletion and iterators.

-### Implementation Steps
+### Required Tests
+[testing]: #testing

- [ ] update the [database format](https://github.com/ZcashFoundation/zebra/blob/main/book/src/dev/state-db-upgrades.md#current) in the Zebra docs
- [ ] increment the state minor version
- [ ] write the new format in the block write task
- [ ] update older formats in the format upgrade task
- [ ] test that the new format works when creating a new state, and updating an older state
+State upgrades are a high-risk change. They permanently modify the state format on production Zebra
+instances. Format issues are tricky to diagnose, and require extensive testing and a new release to
+fix. Deleting and rebuilding an entire column family can also be costly, taking minutes or hours the
+first time a cached state is upgraded to a new Zebra release.

-See the [upgrade design docs](https://github.com/ZcashFoundation/zebra/blob/main/book/src/dev/state-db-upgrades.md#design) for more details.
+Some format bugs can't be fixed, and require an entire rebuild of the state. For example, deleting
+or corrupting transactions or block headers.

-These steps can be copied into tickets.
+So testing format upgrades is extremely important. Every state format upgrade should test:
+- new format serializations
+- new calculations or data processing
+- the upgrade produces a valid format
+- a full sync produces a valid format
+
+Together, the tests should cover every code path. For example, the subtrees needed mid-block,
+end-of-block, sapling, and orchard tests. They mainly used the validity checks for coverage.
+
+Each test should be followed by a restart, a sync of 200+ blocks, and another restart. This
+simulates typical user behaviour.
+
+And ideally:
+- An upgrade from the earliest supported Zebra version
+  (the CI sync-past-checkpoint tests do this on every PR)
+
+#### Manually Triggering a Format Upgrade
+[manual-upgrade]: #manual-upgrade
+
+Zebra stores the current state minor and patch versions in a `version` file in the database
+directory. This path varies based on the OS, major state version, network, and config.
+
+For example, the default mainnet state version on Linux is at:
+`~/.cache/zebra/state/v25/mainnet/version`
+
+To upgrade a cached Zebra state from `v25.0.0` to the latest disk format, delete the version file.
+To upgrade from a specific version `v25.x.y`, edit the file so it contains `x.y`.
+
+Editing the file and running Zebra will trigger a re-upgrade over an existing state.
+Re-upgrades can hide format bugs. For example, if the old code was correct, and the
+new code skips blocks, the validity checks won't find that bug.
+
+So it is better to test with a full sync, and an older cached state.

 ## Current State Database Format
 [current]: #current
@ -124,8 +212,13 @@ We use the following rocksdb column families:
 | `history_tree`                     | `()`                   | `NonEmptyHistoryTree`         | Update  |
 | `tip_chain_value_pool`             | `()`                   | `ValueBalance`                | Update  |

-Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`.
-Other structures are encoded using `IntoDisk`/`FromDisk`.
+### Data Formats
+[rocksdb-data-format]: #rocksdb-data-format
+
+We use big-endian encoding for keys, to allow database index prefix searches.
+
+Most Zcash protocol structures are encoded using `ZcashSerialize`/`ZcashDeserialize`.
+Other structures are encoded using custom `IntoDisk`/`FromDisk` implementations.

 Block and Transaction Data:
 - `Height`: 24 bits, big-endian, unsigned (allows for ~30 years worth of blocks)
@ -145,17 +238,21 @@ Block and Transaction Data:
 - `NoteCommitmentSubtreeIndex`: 16 bits, big-endian, unsigned
 - `NoteCommitmentSubtreeData<{sapling, orchard}::tree::Node>`: `Height \|\| {sapling, orchard}::tree::Node`

-We use big-endian encoding for keys, to allow database index prefix searches.
-
 Amounts:
 - `Amount`: 64 bits, little-endian, signed
 - `ValueBalance`: `[Amount; 4]`

-Derived Formats:
+Derived Formats (legacy):
 - `*::NoteCommitmentTree`: `bincode` using `serde`
  - stored note commitment trees always have cached roots
- `NonEmptyHistoryTree`: `bincode` using `serde`, using `zcash_history`'s `serde` implementation
+- `NonEmptyHistoryTree`: `bincode` using `serde`, using our copy of an old `zcash_history` `serde`
+  implementation

+`bincode` is a risky format to use, because it depends on the exact order and type of struct fields.
+Do not use it for new column families.
+
+#### Address Format
+[rocksdb-address-format]: #rocksdb-address-format

 The following figure helps visualizing the address index, which is the most complicated part.
 Numbers in brackets are array sizes; bold arrows are compositions (i.e. `TransactionLocation` is the
@ -200,6 +297,7 @@ Each column family handles updates differently, based on its specific consensus
  - Each key-value entry is created once.
  - Keys can be deleted, but values are never updated.
  - Code called by ReadStateService must ignore deleted keys, or use a read lock.
+  - We avoid deleting keys, and avoid using iterators on `Delete` column families, for performance.
  - TODO: should we prevent re-inserts of keys that have been deleted?
 - Update:
  - Each key-value entry is created once.
@ -353,8 +451,6 @@ So they should not be used for consensus-critical checks.
  the Merkle tree nodes as required to insert new items.
  For each block committed, the old tree is deleted and a new one is inserted
  by its new height.
-  **TODO:** store the sprout note commitment tree by `()`,
-  to avoid ReadStateService concurrent write issues.

 - The `{sapling, orchard}_note_commitment_tree` stores the note commitment tree
  state for every height, for the specific pool. Each tree is stored
@ -368,7 +464,6 @@ So they should not be used for consensus-critical checks.
  state. There is always a single entry for it. The tree is stored as the set of "peaks"
  of the "Merkle mountain range" tree structure, which is what is required to
  insert new items.
-  **TODO:** store the history tree by `()`, to avoid ReadStateService concurrent write issues.

 - Each `*_anchors` stores the anchor (the root of a Merkle tree) of the note commitment
  tree of a certain block. We only use the keys since we just need the set of anchors,