From 16cc095484465ec433ff232c87cea9bd82bdfc36 Mon Sep 17 00:00:00 2001
From: Henry de Valence <hdevalence@hdevalence.ca>
Date: Mon, 21 Sep 2020 12:05:31 -0700
Subject: [PATCH] RFC: state updates (#902)

* wip

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* wip2: add nullifiers

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* Update book/src/dev/rfcs/0003-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* Move to RFC number 5

* rfc: add PR link to state update RFC

* rfc: change state RFC to store blocks by height.

The rationale for this change is described in the document: it means
that we write blocks only to one end of the Sled tree, and hopefully
helps us with spatial access patterns.

This should help alleviate a major cause of memory use in Zebra's
current WIP Sled structure, which is that:

- blocks are stored in random, sparse order (by hash) in the B-tree;
- the `Request::GetDepth` method opens the entire block store and
  queries a random part of its block data to determine whether a hash is
  present;
- if present, it deserializes the complete block data of both the given
  block and the current tip block, to compute the difference in block
  heights.

This access pattern forces a large amount of B-tree data to remain
resident, and could probably be avoided if we didn't do that.

* rfc: add sprout and sapling anchors to sled trees.

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* rfc: fill in details of state service requests.

* rfc: extract commit process from API description

* rfc: add anchor parameters to CommitBlock.

These have to be computed by a verifier, so passing them as parameters
means we don't recompute them.

* WIP for in memory state structs

* tweeks from end of session with henry

* more updates from pairing

* rewrite non-finalized state sections

* update query instructions for each request

* more updates

* updates from pairing with henry

* updates from proofreading solo

* add guide level explanation to state rfc

* add drawbacks section

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* Apply suggestions from code review

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* apply changes from code review

* clarify iteration

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* apply changes from code review

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* add info about default constructing chains when forking from finalized state

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* move contextual verification out of Chain

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Jane Lusby <jane@zfnd.org>
---
 book/src/SUMMARY.md                     |   1 +
 book/src/dev/rfcs/0005-state-updates.md | 778 ++++++++++++++++++++++++
 2 files changed, 779 insertions(+)
 create mode 100644 book/src/dev/rfcs/0005-state-updates.md

diff --git a/book/src/SUMMARY.md b/book/src/SUMMARY.md
index 4adadf0bb..de80dda91 100644
--- a/book/src/SUMMARY.md
+++ b/book/src/SUMMARY.md
@@ -14,6 +14,7 @@
     - [Parallel Verification](dev/rfcs/0002-parallel-verification.md)
     - [Inventory Tracking](dev/rfcs/0003-inventory-tracking.md)
     - [Asynchronous Script Verification](dev/rfcs/0004-asynchronous-script-verification.md)
+    - [State Updates](dev/rfcs/0005-state-updates.md)
   - [Diagrams](dev/diagrams.md)
     - [Network Architecture](dev/diagrams/zebra-network.md)
   - [zebra-checkpoints](dev/zebra-checkpoints.md)
diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md
new file mode 100644
index 000000000..dd9b3a8c7
--- /dev/null
+++ b/book/src/dev/rfcs/0005-state-updates.md
@@ -0,0 +1,778 @@
+# State Updates
+
+- Feature Name: state_updates
+- Start Date: 2020-08-14
+- Design PR: https://github.com/ZcashFoundation/zebra/pull/902
+- Zebra Issue: https://github.com/ZcashFoundation/zebra/issues/1049
+
+
+# Summary
+[summary]: #summary
+
+Zebra manages chain state in the `zebra-state` crate, which allows state
+queries via asynchronous RPC (in the form of a Tower service). The state
+system is responsible for contextual verification in the sense of [RFC2],
+checking that new blocks are consistent with the existing chain state before
+committing them. This RFC describes how the state is represented internally,
+and how state updates are performed.
+
+[RFC2]: ./0002-parallel-verification.md
+
+# Motivation
+[motivation]: #motivation
+
+We need to be able to access and modify the chain state, and we want to have
+a description of how this happens and what guarantees are provided by the
+state service.
+
+# Definitions
+[definitions]: #definitions
+
+* **state data**: Any data the state service uses to represent chain state.
+
+* **structural/semantic/contextual verification**: as defined in [RFC2].
+
+* **block chain**: A sequence of valid blocks linked by inclusion of the
+  previous block hash in the subsequent block. Chains are rooted at the
+  *genesis* block and extend to a *tip*.
+
+* **chain state**: The state of the ledger after application of a particular
+  sequence of blocks (state transitions).
+
+* **difficulty**: The cumulative proof-of-work from genesis to the chain tip.
+
+* **best chain**: The chain with the greatest difficulty. This chain
+  represents the consensus state of the Zcash network and transactions.
+
+* **side chain**: A chain which is not contained in the best chain.
+  Side chains are pruned at the reorg limit, when they are no longer
+  connected to the finalized state.
+
+* **chain reorganization**: Occurs when a new best chain is found and the
+  previous best chain becomes a side chain.
+
+* **reorg limit**: The longest reorganization accepted by Zcashd, 100 blocks.
+
+* **orphaned block**: A block which is no longer included in the best chain.
+
+* **non-finalized state**: State data corresponding to blocks above the reorg
+  limit. This data can change in the event of a chain reorg.
+
+* **finalized state**: State data corresponding to blocks below the reorg
+  limit. This data cannot change in the event of a chain reorg.
+
+* **non-finalized tips**: The highest blocks in each non-finalized chain. These
+  tips might be at different heights.
+
+* **finalized tip**: The highest block in the finalized state. The tip of the best
+  chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can
+  be lower during the initial sync, and after a chain reorganization, if the new
+  best chain is at a lower height.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+The `zebra-state` crate provides an implementation of the chain state storage
+logic in a zcash consensus node. Its main responsibility is to store chain
+state, validating new blocks against the existing chain state in the process,
+and to allow later querying of said chain state. `zebra-state` provides this
+interface via a `tower::Service` based on the actor model with a
+request/response interface for passing messages back and forth between the
+state service and the rest of the application.
+
+The main entry point for the `zebra-state` crate is the `init` function. This
+function takes a `zebra_state::Config` and constructs a new state service,
+which it returns wrapped by a `tower::Buffer`. This service is then interacted
+with via the `tower::Service` trait.
+
+```rust
+use tower::{Service, ServiceExt};
+
+let state = zebra_state::on_disk::init(state_config, network);
+let request = zebra_state::Request::BlockLocator;
+let response = state.ready_and().await?.call(request).await?;
+
+assert!(matches!(response, zebra_state::Response::BlockLocator(_)));
+```
+
+**Note**: The `tower::Service` API requires that `ready` is always called
+exactly once before each `call`. It is up to users of the zebra state service
+to uphold this contract.
+
+The `tower::Buffer` wrapper is `Clone`able, allowing shared access to a common state service.  This allows different tasks to share access to the chain state.
+
+The set of operations supported by `zebra-state` are encoded in its `Request`
+enum. This enum has one variant for each supported operation.
+
+```rust
+pub enum Request {
+    CommitBlock {
+        block: Arc<Block>,
+    },
+    CommitFinalizedBlock {
+        block: Arc<Block>,
+    },
+    Depth(Hash),
+    Tip,
+    BlockLocator,
+    Transaction(Hash),
+    Block(HashOrHeight),
+
+    // .. some variants omitted
+}
+```
+
+`zebra-state` breaks down its requests into two categories and provides
+different guarantees for each category: requests that modify the state, and requests that
+do not. Requests that update the state are guaranteed to run sequentially and
+will never race against each other. Requests that read state are done
+asynchronously and are guaranteed to read at least the state present at the
+time the request was processed by the service, or a later state present at the time the request future is executed. The state service avoids
+race conditions between the read state and the written state by doing all
+contextual verification internally.
+
+# Reference-level explanation
+[reference-level-explanation]: #reference-level-explanation
+
+## State Components
+
+Zcash (as implemented by `zcashd`) differs from Bitcoin in its treatment of
+transaction finality. If a new best chain is detected that does not extend
+the previous best chain, blocks at the end of the previous best chain become
+orphaned (no longer included in the best chain). Their state updates are
+therefore no longer included in the best chain's chain state. The process of
+rolling back orphaned blocks and applying new blocks is called a chain
+reorganization. Bitcoin allows chain reorganizations of arbitrary depth,
+while `zcashd` limits chain reorganizations to 100 blocks. (In `zcashd`, the
+new best chain must be a side-chain that forked within 100 blocks of the tip
+of the current best chain.)
+
+This difference means that in Bitcoin, chain state only has probabilistic
+finality, while in Zcash, chain state is final once it is beyond the reorg
+limit. To simplify our implementation, we split the representation of the
+state data at the finality boundary provided by the reorg limit.
+
+State data from blocks *above* the reorg limit (*non-finalized state*) is
+stored in-memory and handles multiple chains. State data from blocks *below*
+the reorg limit (*finalized state*) is stored persistently using `sled` and
+only tracks a single chain. This allows a simplification of our state
+handling, because only finalized data is persistent and the logic for
+finalized data handles less invariants.
+
+One downside of this design is that restarting the node loses the last 100
+blocks, but node restarts are relatively infrequent and a short re-sync is
+cheap relative to the cost of additional implementation complexity.
+
+Another downside of this design is that we do not achieve exactly the same
+behavior as Zcashd in the event of a 51% attack: Zcashd limits *each* chain
+reorganization to 100 blocks, but permits multiple reorgs, while Zebra limits
+*all* chain reorgs to 100 blocks. In the event of a successful 51% attack on
+Zcash, this could be resolved by wiping the Sled state and re-syncing the new
+chain, but in this scenario there are worse problems.
+
+## Service Interface
+[service-interface]: #service-interface
+
+The state is accessed asynchronously through a Tower service interface.
+Determining what guarantees the state service can and should provide to the
+rest of the application requires considering two sets of behaviors:
+
+1. behaviors related to the state's external API (a `Buffer`ed `tower::Service`);
+2. behaviors related to the state's internal implementation (using `sled`).
+
+Making this distinction helps us to ensure we don't accidentally leak
+"internal" behaviors into "external" behaviors, which would violate
+encapsulation and make it more difficult to replace `sled`.
+
+In the first category, our state is presented to the rest of the application
+as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access
+to a service using an actor model, moving the service to be shared into a
+worker task and passing messages to it over an multi-producer single-consumer
+(mpsc) channel. The worker task recieves messages and makes `Service::call`s.
+The `Service::call` method returns a `Future`, and the service is allowed to
+decide how much work it wants to do synchronously (in `call`) and how much
+work it wants to do asynchronously (in the `Future` it returns).
+
+This means that our external API ensures that the state service sees a
+linearized sequence of state requests, although the exact ordering is
+unpredictable when there are multiple senders making requests.
+
+In the second category, the Sled API presents itself synchronously, but
+database and tree handles are clonable and can be moved between threads. All
+that's required to process some request asynchronously is to clone the
+appropriate handle, move it into an async block, and make the call as part of
+the future. (We might want to use Tokio's blocking API for this, but this is
+an implementation detail).
+
+Because the state service has exclusive access to the sled database, and the
+state service sees a linearized sequence of state requests, we have an easy
+way to opt in to asynchronous database access. We can perform sled operations
+synchronously in the `Service::call`, waiting for them to complete, and be
+sure that all future requests will see the resulting sled state. Or, we can
+perform sled operations asynchronously in the future returned by
+`Service::call`.
+
+If we perform all *writes* synchronously and allow reads to be either
+synchronous or asynchronous, we ensure that writes cannot race each other.
+Asynchronous reads are guaranteed to read at least the state present at the
+time the request was processed, or a later state.
+
+### Summary
+
+- **Sled reads** may be done synchronously (in `call`) or asynchronously (in
+  the `Future`), depending on the context;
+
+- **Sled writes** must be done synchronously (in `call`)
+
+## In-memory data structures
+[in-memory]: #in-memory
+
+At a high level, the in-memory data structures store a collection of chains,
+each rooted at the highest finalized block. Each chain consists of a map from
+heights to blocks. Chains are stored using an ordered map from cumulative work to
+chains, so that the map ordering is the ordering of best to worst chains.
+
+### The `Chain` type
+[chain-type]: #chain-type
+
+
+The `Chain` type represents a chain of blocks. Each block represents an
+incremental state update, and the `Chain` type caches the cumulative state
+update from its root to its tip.
+
+The `Chain` type is used to represent the non-finalized portion of a complete
+chain of blocks rooted at the genesis block. The parent block of the root of
+a `Chain` is the tip of the finalized portion of the chain. As an exception, the finalized
+portion of the chain is initially empty, until the genesis block has been finalized.
+
+The `Chain` type supports serveral operations to manipulate chains, `push`,
+`pop_root`, and `fork`. `push` is the most fundamental operation and handles
+contextual validation of chains as they are extended. `pop_root` is provided
+for finalization, and is how we move blocks from the non-finalized portion of
+the state to the finalized portion. `fork` on the other hand handles creating
+new chains for `push` when new blocks arrive whose parent isn't a tip of an
+existing chain.
+
+**Note:** The `Chain` type's API is only designed to handle non-finalized
+data. The genesis block and all pre sapling blocks are always considered to
+be finalized blocks and should not be handled via the `Chain` type through
+`CommitBlock`. They should instead be committed directly to the finalized
+state with `CommitFinalizedBlock`. This is particularly important with the
+genesis block since the `Chain` will panic if used while the finalized state
+is completely empty.
+
+The `Chain` type is defined by the following struct and API:
+
+```rust
+struct Chain {
+    blocks: BTreeMap<block::Height, Arc<Block>>,
+    height_by_hash: HashMap<block::Hash, block::Height>,
+    tx_by_hash: HashMap<transaction::Hash, (block::Height, tx_index)>,
+
+    utxos: HashSet<transparent::Output>,
+    sapling_anchors: HashSet<sapling::tree::Root>,
+    sprout_anchors: HashSet<sprout::tree::Root>,
+    sapling_nullifiers: HashSet<sapling::Nullifier>,
+    sprout_nullifiers: HashSet<sprout::Nullifier>,
+    partial_cumulative_work: PartialCumulativeWork,
+}
+```
+
+#### `pub fn push(&mut self, block: Arc<Block>)`
+
+Push a block into a chain as the new tip
+
+1. Update cumulative data members
+    - Add block to end of `self.blocks`
+    - Add hash to `height_by_hash`
+    - for each `transaction` in `block`
+      - add key: `transaction.hash` and value: `(height, tx_index)` to `tx_by_hash`
+    - Add new utxos and remove consumed utxos from `self.utxos`
+    - Add anchors to the appropriate `self.<version>_anchors`
+    - Add nullifiers to the appropriate `self.<version>_nullifiers`
+    - Add work to `self.partial_cumulative_work`
+
+#### `pub fn pop_root(&mut self) -> Arc<Block>`
+
+Remove the lowest height block of the non-finalized portion of a chain.
+
+1. Remove the lowest height block from `self.blocks`
+
+2. Update cumulative data members
+    - Remove the block's hash from `self.height_by_hash`
+    - for each `transaction` in `block`
+      - remove `transaction.hash` from `tx_by_hash`
+    - Remove new utxos from `self.utxos`
+    - Remove the anchors from the appropriate `self.<version>_anchors`
+    - Remove the nullifiers from the appropriate `self.<version>_nullifiers`
+
+3. Return the block
+
+#### `pub fn fork(&self, new_tip: block::Hash) -> Option<Self>`
+
+Fork a chain at the block with the given hash, if it is part of this chain.
+
+1. If `self` does not contain `new_tip` return `None`
+
+2. Clone self as `forked`
+
+3. While the tip of `forked` is not equal to `new_tip`
+   - call `forked.pop_tip()` and discard the old tip
+
+4. Return `forked`
+
+#### `fn pop_tip(&mut self) -> Arc<Block>`
+
+Remove the highest height block of the non-finalized portion of a chain.
+
+1. Remove the highest height `block` from `self.blocks`
+
+2. Update cumulative data members
+    - Remove the corresponding hash from `self.height_by_hash`
+    - for each `transaction` in `block`
+      - remove `transaction.hash` from `tx_by_hash`
+    - Add consumed utxos and remove new utxos from `self.utxos`
+    - Remove anchors from the appropriate `self.<version>_anchors`
+    - Remove the nullifiers from the appropriate `self.<version>_nullifiers`
+    - Subtract work from `self.partial_cumulative_work`
+
+3. Return the block
+
+#### `Ord`
+
+The `Chain` type implements `Ord` for reorganizing chains. First chains
+are compared by their `partial_cumulative_work`. Ties are then broken by
+comparing `block::Hash`es of the tips of each chain.
+
+**Note**: Unlike `zcashd`, Zebra does not use block arrival times as a
+tie-breaker for the best tip. Since Zebra downloads blocks in parallel,
+download times are not guaranteed to be unique. Using the `block::Hash`
+provides a consistent tip order. (As a side-effect, the tip order is also
+consistent after a node restart, and between nodes.)
+
+#### `Default`
+
+The `Chain` type implements `Default` for constructing new chains whose
+parent block is the tip of the finalized state. This implementation should be
+handled by `#[derive(Default)]`.
+
+1. initialise cumulative data members
+    - Construct an empty `self.blocks`, `height_by_hash`, `tx_by_hash`,
+    `self.utxos`, `self.<version>_anchors`, `self.<version>_nullifiers`
+    - Zero `self.partial_cumulative_work`
+
+**Note:** The chain can be empty if:
+  - after a restart - the non-finalized state is empty
+  - during a fork from the finalized tip - the forked Chain is empty, because
+    all its blocks have been `pop`ped
+
+
+### `ChainSet` Type
+[chainset-type]: #chainset-type
+
+The `ChainSet` type represents the set of all non-finalized state. It
+consists of a set of non-finalized but verified chains and a set of
+unverified blocks which are waiting for the full context needed to verify
+them to become available.
+
+`ChainSet` is defined by the following structure and API:
+
+```rust
+struct ChainSet {
+    chains: BTreeSet<Chain>,
+
+    queued_blocks: BTreeMap<block::Hash, QueuedBlock>,
+    queued_by_parent: BTreeMap<block::Hash, Vec<block::Hash>>,
+    queued_by_height: BTreeMap<block::Height, Vec<block::Hash>>,
+}
+```
+
+#### `pub fn finalize(&mut self) -> Arc<Block>`
+
+Finalize the lowest height block in the non-finalized portion of the best
+chain and updates all side chains to match.
+
+1. Extract the best chain from `self.chains` into `best_chain`
+
+2. Extract the rest of the chains into a `side_chains` temporary variable, so
+   they can be mutated
+
+3. Remove the lowest height block from the best chain with
+   `let block = best_chain.pop_root();`
+
+4. Add `best_chain` back to `self.chains`
+
+5. For each remaining `chain` in `side_chains`
+    - If `chain` starts with `block`, remove `block` and add `chain` back to
+    `self.chains`
+    - Else, drop `chain`
+
+6. calculate the new finalized tip height from the new `best_chain`
+
+7. for each `height` in `self.queued_by_height` where the height is lower than the
+   new reorg limit
+   - for each `hash` in `self.queued_by_height.remove(height)`
+     - Remove the key `hash` from `self.queued_blocks` and store the removed `block`
+     - Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash
+
+8. Return `block`
+
+### `pub fn queue(&mut self, block: QueuedBlock)`
+
+Queue a non-finalized block to be committed to the state.
+
+After queueing a non-finalized block, this method checks whether the newly
+queued block (and any of its descendants) can be committed to the state
+
+1. Check if the parent block exists in any current chain
+
+2. If it does, call `let ret = self.commit_block(block)`
+    - Call `self.process_queued(new_parents)` if `ret` is `Some`
+
+3. Else Add `block` to `self.queued_blocks` and related members and return
+
+### `fn process_queued(&mut self, new_parent: block::Hash)`
+
+1. Create a list of `new_parents` and populate it with `new_parent`
+
+2. While let Some(parent) = new_parents.pop()
+    - for each `hash` in `self.queued_by_parent.remove(&parent.hash)`
+      - lookup the `block` for `hash`
+      - remove `block` from `self.queued_blocks`
+      - remove `hash` from `self.queued_by_height`
+      - let result = `self.commit_block(block)`;
+      - add `result` to `new_parents`
+
+### `fn commit_block(&mut self, block: QueuedBlock) -> Option<block::Hash>`
+
+Try to commit `block` to the non-finalized state. Returns `None` if the block
+cannot be committed due to missing context.
+
+1. Search for the first chain where `block.parent` == `chain.tip`. If it exists:
+    - push `block` onto that chain
+    - broadcast `result` via `block.rsp_tx`
+    - return Some(block.hash) if `result.is_ok()`
+
+2. Find the first chain that contains `block.parent` and fork it with
+  `block.parent` as the new tip
+    - `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));`
+
+3. If `fork` is `Some`
+    - push `block` onto that chain
+    - add `fork` to `self.chains`
+    - broadcast `result` via `block.rsp_tx`
+    - return Some(block.hash) if `result.is_ok()`
+
+5. Else panic, this should be unreachable because `commit_block` is only
+   called when it's ready to be committed.
+
+### Summary
+
+- `Chain` represents the non-finalized portion of a single chain
+
+- `ChainSet` represents the non-finalized portion of all chains and all
+  unverified blocks that are waiting for context to be available.
+
+- `ChainSet::queue` handles queueing and or commiting blocks and
+  reorganizing chains (via `commit_block`) but not finalizing them
+
+- Finalized blocks are returned from `finalize` and must still be committed
+  to disk afterwards
+
+- `finalize` handles pruning queued blocks that are past the reorg limit
+
+## Committing non-finalized blocks
+
+Given the above structures for manipulating the non-finalized state new
+`non-finalized` blocks are commited in two steps. First we commit the block
+to the in memory state, then we finalize all lowest height blocks that are
+past the reorg limit, finally we process any queued blocks and prune any that
+are now past the reorg limit.
+
+1. Run contextual validation on `block` against the finalized and non
+   finalized state
+
+2. If `block.parent` == `finalized_tip.hash`
+    - Construct a new `Chain` with `Chain::default`
+    - push `block` onto that chain
+    - add `fork` to `chain_set.chains`
+    - broadcast `result` via `block.rsp_tx`
+    - return Some(block.hash) if `result.is_ok()`
+
+3. commit or queue the block to the non-finalized state with
+   `chain_set.queue(block);`
+
+4. If the best chain is longer than the reorg limit
+    - Finalize all lowest height blocks in the best chain, and commit them to
+    disk with `CommitFinalizedBlock`:
+
+      ```
+      while self.best_chain().len() > reorg_limit {
+        let finalized = chain_set.finalize()?;
+        let request = CommitFinalizedBlock { finalized };
+        sled_state.ready_and().await?.call(request).await?;
+      };
+      ```
+
+## Sled data structures
+[sled]: #sled
+
+Sled provides a persistent, thread-safe `BTreeMap<&[u8], &[u8]>`. Each map is
+a distinct "tree". Keys are sorted using lex order on byte strings, so
+integer values should be stored using big-endian encoding (so that the lex
+order on byte strings is the numeric ordering).
+
+We use the following Sled trees:
+
+| Tree                 |                  Keys |                              Values |
+|----------------------|-----------------------|-------------------------------------|
+| `hash_by_height`     | `BE32(height)`        | `block::Hash`                       |
+| `height_by_hash`     | `block::Hash`         | `BE32(height)`                      |
+| `block_by_height`    | `BE32(height)`        | `Block`                             |
+| `tx_by_hash`         | `transaction::Hash`   | `BE32(height) || BE32(tx_index)`    |
+| `utxo_by_outpoint`   | `OutPoint`            | `TransparentOutput`                 |
+| `sprout_nullifiers`  | `sprout::Nullifier`   | `()`                                |
+| `sapling_nullifiers` | `sapling::Nullifier`  | `()`                                |
+| `sprout_anchors`     | `sprout::tree::Root`  | `()`                                |
+| `sapling_anchors`    | `sapling::tree::Root` | `()`                                |
+
+Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`.
+
+**Note:** We do not store the cumulative work for the finalized chain, because the finalized work is equal for all non-finalized chains. So the additional non-finalized work can be used to calculate the relative chain order, and choose the best chain.
+
+### Notes on Sled trees
+
+- The `hash_by_height` and `height_by_hash` trees provide the bijection between
+  block heights and block hashes.  (Since the Sled state only stores finalized
+  state, this is actually a bijection).
+
+- Blocks are stored by height, not by hash.  This has the downside that looking
+  up a block by hash requires an extra level of indirection.  The upside is
+  that blocks with adjacent heights are adjacent in the database, and many
+  common access patterns, such as helping a client sync the chain or doing
+  analysis, access blocks in (potentially sparse) height order.  In addition,
+  the fact that we commit blocks in order means we're writing only to the end
+  of the Sled tree, which may help save space.
+
+- Transaction references are stored as a `(height, index)` pair referencing the
+  height of the transaction's parent block and the transaction's index in that
+  block.  This would more traditionally be a `(hash, index)` pair, but because
+  we store blocks by height, storing the height saves one level of indirection.
+
+## Committing finalized blocks
+
+If the parent block is not committed, add the block to an internal queue for
+future processing.  Otherwise, commit the block described below, then
+commit any queued children.  (Although the checkpointer generates verified
+blocks in order when it completes a checkpoint, the blocks are committed in the
+response futures, so they may arrive out of order).
+
+Committing a block to the sled state should be implemented as a wrapper around
+a function also called by [`Request::CommitBlock`](#request-commit-block),
+which should:
+
+1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`.
+Check that `block`'s parent hash is `old_tip` and its height is
+`old_height+1`, or panic. This check is performed as defense-in-depth to
+prevent database corruption, but it is the caller's responsibility (e.g. the
+zebra-state service's responsibility) to commit finalized blocks in order.
+
+The genesis block does not have a parent block. For genesis blocks,
+check that `block`'s parent hash is `null` (all zeroes) and its height is `0`.
+
+2. Insert:
+    - `(hash, height)` into `height_by_hash`;
+    - `(height, hash)` into `hash_by_height`;
+    - `(height, block)` into `block_by_height`.
+
+3. If the block is a genesis block, skip any transaction updates.
+
+(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions
+are ignored during validation.)
+
+3. If the block is a genesis block, skip any transaction updates.
+
+(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions
+are ignored during validation.)
+
+3.  Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors.
+
+4. Iterate over the enumerated transactions in the block. For each transaction:
+
+   1. Insert `(transaction_hash, block_height || BE32(tx_index))` to
+   `tx_by_hash`;
+
+   2. For each `TransparentInput::PrevOut { outpoint, .. }` in the
+   transaction's `inputs()`, remove `outpoint` from `utxo_by_output`.
+
+   3. For each `output` in the transaction's `outputs()`, construct the
+   `outpoint` that identifies it, and insert `(outpoint, output)` into
+   `utxo_by_output`.
+
+   4. For each [`JoinSplit`] description in the transaction,
+   insert `(nullifiers[0],())` and `(nullifiers[1],())` into
+   `sprout_nullifiers`.
+
+   5. For each [`Spend`] description in the transaction, insert
+   `(nullifier,())` into `sapling_nullifiers`.
+
+**Note**: The Sprout and Sapling anchors are the roots of the Sprout and
+Sapling note commitment trees that have already been calculated for the last
+transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or
+`Spend`/`Output` descriptions in the Sapling case. These should be passed as
+fields in the `Commit*Block` requests.
+
+[`JoinSplit`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.JoinSplit.html
+[`Spend`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.Spend.html
+
+These updates can be performed in a batch or without necessarily iterating
+over all transactions, if the data is available by other means; they're
+specified this way for clarity.
+
+
+## Request / Response API
+[request-response]: #request-response
+
+The state API is provided by a pair of `Request`/`Response` enums. Each
+`Request` variant corresponds to particular `Response` variants, and it's
+fine (and encouraged) for caller code to unwrap the expected variants with
+`unreachable!` on the unexpected variants. This is slightly inconvenient but
+it means that we have a unified state interface with unified backpressure.
+
+This API includes both write and read calls. Spotting `Commit` requests in
+code review should not be a problem, but in the future, if we need to
+restrict access to write calls, we could implement a wrapper service that
+rejects these, and export "read" and "write" frontends to the same inner service.
+
+### `Request::CommitBlock`
+[request-commit-block]: #request-commit-block
+
+```rust
+CommitBlock {
+    block: Arc<Block>,
+    sprout_anchor: sprout::tree::Root,
+    sapling_anchor: sapling::tree::Root,
+}
+```
+
+Performs contextual validation of the given block, committing it to the state
+if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of
+the newly committed block or an error.
+
+### `Request::CommitFinalizedBlock`
+[request-commit-finalized-block]: #request-finalized-block
+
+```rust
+CommitFinalizedBlock {
+    block: Arc<Block>,
+    sprout_anchor: sprout::tree::Root,
+    sapling_anchor: sapling::tree::Root,
+}
+```
+
+Commits a finalized block to the sled state, skipping contextual validation.
+This is exposed for use in checkpointing, which produces in-order finalized
+blocks. Returns `Response::Added(block::Hash)` with the hash of the
+committed block if successful.
+
+### `Request::Depth(block::Hash)`
+[request-depth]: #request-depth
+
+Computes the depth in the best chain of the block identified by the given
+hash, returning
+
+- `Response::Depth(Some(depth))` if the block is in the best chain;
+- `Response::Depth(None)` otherwise.
+
+Implemented by querying:
+
+- (non-finalized) the `height_by_hash` map in the best chain, and
+- (finalized) the `height_by_hash` tree
+
+### `Request::Tip`
+[request-tip]: #request-tip
+
+Returns `Response::Tip(block::Hash)` with the current best chain tip.
+
+Implemented by querying:
+
+- (non-finalized) the highest height block in the best chain
+if the `non-finalized` state is empty
+- (finalized) the highest height block in the `hash_by_height` tree
+
+### `Request::BlockLocator`
+[request-block-locator]: #request-block-locator
+
+Returns `Response::BlockLocator(Vec<block::Hash>)` with hashes starting from
+the current chain tip and reaching backwards towards the genesis block. The
+first hash is the best chain tip. The last hash is the tip of the finalized
+portion of the state. If the finalized and non-finalized states are both
+empty, the block locator is also empty.
+
+This can be used by the sync component to request hashes of subsequent
+blocks.
+
+Implemented by querying:
+
+- (non-finalized) the `hash_by_height` map in the best chain
+- (finalized) the `hash_by_height` tree.
+
+### `Request::Transaction(TransactionHash)`
+[request-transaction]: #request-transaction
+
+Returns
+
+- `Response::Transaction(Some(Transaction))` if the transaction identified by
+    the given hash is contained in the state;
+
+- `Response::Transaction(None)` if the transaction identified by the given
+    hash is not contained in the state.
+
+Implemented by querying:
+
+- (non-finalized) the `tx_by_hash` map (to get the block that contains the
+  transaction) of each chain starting with the best chain, and then find
+  block that chain's `blocks` (to get the block containing the transaction
+  data)
+if the transaction is not in any non-finalized chain:
+- (finalized) the `tx_by_hash` tree (to get the block that contains the
+  transaction) and then `block_by_height` tree (to get the block containing
+  the transaction data).
+
+### `Request::Block(BlockHeaderHash)`
+[request-block]: #request-block
+
+Returns
+
+- `Response::Block(Some(Arc<Block>))` if the block identified by the given
+    hash is contained in the state;
+
+- `Response::Block(None)` if the block identified by the given hash is not
+    contained in the state;
+
+Implemented by querying:
+
+- (non-finalized) the `height_by_hash` of each chain starting with the best
+  chain, then find block that chain's `blocks` (to get the block data)
+if the block is not in any non-finalized chain:
+- (finalized) the `height_by_hash` tree (to get the block height) and then
+    the `block_by_height` tree (to get the block data).
+
+
+# Drawbacks
+[drawbacks]: #drawbacks
+
+- Restarts can cause `zebrad` to redownload up to the last one hundred blocks
+  it verified in the best chain, and potentially some recent side-chain blocks.
+
+- The service interface puts some extra responsibility on callers to ensure
+  it is used correctly and does not verify the usage is correct at compile
+  time.
+
+- the service API is verbose and requires manually unwrapping enums
+
+- We do not handle reorgs the same way zcashd does, and could in theory need
+  to delete our entire on disk state and resync the chain in some
+  pathological reorg cases.
+- testnet rollbacks are infrequent, but possible, due to bugs in testnet
+  releases. Each testnet rollback will require additional state service code.