RFC: state updates (#902)

* wip

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* wip2: add nullifiers

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* Update book/src/dev/rfcs/0003-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* Move to RFC number 5

* rfc: add PR link to state update RFC

* rfc: change state RFC to store blocks by height.

The rationale for this change is described in the document: it means
that we write blocks only to one end of the Sled tree, and hopefully
helps us with spatial access patterns.

This should help alleviate a major cause of memory use in Zebra's
current WIP Sled structure, which is that:

- blocks are stored in random, sparse order (by hash) in the B-tree;
- the `Request::GetDepth` method opens the entire block store and
  queries a random part of its block data to determine whether a hash is
  present;
- if present, it deserializes the complete block data of both the given
  block and the current tip block, to compute the difference in block
  heights.

This access pattern forces a large amount of B-tree data to remain
resident, and could probably be avoided if we didn't do that.

* rfc: add sprout and sapling anchors to sled trees.

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* rfc: fill in details of state service requests.

* rfc: extract commit process from API description

* rfc: add anchor parameters to CommitBlock.

These have to be computed by a verifier, so passing them as parameters
means we don't recompute them.

* WIP for in memory state structs

* tweeks from end of session with henry

* more updates from pairing

* rewrite non-finalized state sections

* update query instructions for each request

* more updates

* updates from pairing with henry

* updates from proofreading solo

* add guide level explanation to state rfc

* add drawbacks section

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* Apply suggestions from code review

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* apply changes from code review

* clarify iteration

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* apply changes from code review

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* add info about default constructing chains when forking from finalized state

* Update book/src/dev/rfcs/0005-state-updates.md

Co-authored-by: teor <teor@riseup.net>

* move contextual verification out of Chain

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
Co-authored-by: Jane Lusby <jane@zfnd.org>
This commit is contained in:
Henry de Valence 2020-09-21 12:05:31 -07:00 committed by GitHub
parent c616c7ffd2
commit 16cc095484
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 779 additions and 0 deletions

View File

@ -14,6 +14,7 @@
- [Parallel Verification](dev/rfcs/0002-parallel-verification.md)
- [Inventory Tracking](dev/rfcs/0003-inventory-tracking.md)
- [Asynchronous Script Verification](dev/rfcs/0004-asynchronous-script-verification.md)
- [State Updates](dev/rfcs/0005-state-updates.md)
- [Diagrams](dev/diagrams.md)
- [Network Architecture](dev/diagrams/zebra-network.md)
- [zebra-checkpoints](dev/zebra-checkpoints.md)

View File

@ -0,0 +1,778 @@
# State Updates
- Feature Name: state_updates
- Start Date: 2020-08-14
- Design PR: https://github.com/ZcashFoundation/zebra/pull/902
- Zebra Issue: https://github.com/ZcashFoundation/zebra/issues/1049
# Summary
[summary]: #summary
Zebra manages chain state in the `zebra-state` crate, which allows state
queries via asynchronous RPC (in the form of a Tower service). The state
system is responsible for contextual verification in the sense of [RFC2],
checking that new blocks are consistent with the existing chain state before
committing them. This RFC describes how the state is represented internally,
and how state updates are performed.
[RFC2]: ./0002-parallel-verification.md
# Motivation
[motivation]: #motivation
We need to be able to access and modify the chain state, and we want to have
a description of how this happens and what guarantees are provided by the
state service.
# Definitions
[definitions]: #definitions
* **state data**: Any data the state service uses to represent chain state.
* **structural/semantic/contextual verification**: as defined in [RFC2].
* **block chain**: A sequence of valid blocks linked by inclusion of the
previous block hash in the subsequent block. Chains are rooted at the
*genesis* block and extend to a *tip*.
* **chain state**: The state of the ledger after application of a particular
sequence of blocks (state transitions).
* **difficulty**: The cumulative proof-of-work from genesis to the chain tip.
* **best chain**: The chain with the greatest difficulty. This chain
represents the consensus state of the Zcash network and transactions.
* **side chain**: A chain which is not contained in the best chain.
Side chains are pruned at the reorg limit, when they are no longer
connected to the finalized state.
* **chain reorganization**: Occurs when a new best chain is found and the
previous best chain becomes a side chain.
* **reorg limit**: The longest reorganization accepted by Zcashd, 100 blocks.
* **orphaned block**: A block which is no longer included in the best chain.
* **non-finalized state**: State data corresponding to blocks above the reorg
limit. This data can change in the event of a chain reorg.
* **finalized state**: State data corresponding to blocks below the reorg
limit. This data cannot change in the event of a chain reorg.
* **non-finalized tips**: The highest blocks in each non-finalized chain. These
tips might be at different heights.
* **finalized tip**: The highest block in the finalized state. The tip of the best
chain is usually 100 blocks (the reorg limit) above the finalized tip. But it can
be lower during the initial sync, and after a chain reorganization, if the new
best chain is at a lower height.
# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation
The `zebra-state` crate provides an implementation of the chain state storage
logic in a zcash consensus node. Its main responsibility is to store chain
state, validating new blocks against the existing chain state in the process,
and to allow later querying of said chain state. `zebra-state` provides this
interface via a `tower::Service` based on the actor model with a
request/response interface for passing messages back and forth between the
state service and the rest of the application.
The main entry point for the `zebra-state` crate is the `init` function. This
function takes a `zebra_state::Config` and constructs a new state service,
which it returns wrapped by a `tower::Buffer`. This service is then interacted
with via the `tower::Service` trait.
```rust
use tower::{Service, ServiceExt};
let state = zebra_state::on_disk::init(state_config, network);
let request = zebra_state::Request::BlockLocator;
let response = state.ready_and().await?.call(request).await?;
assert!(matches!(response, zebra_state::Response::BlockLocator(_)));
```
**Note**: The `tower::Service` API requires that `ready` is always called
exactly once before each `call`. It is up to users of the zebra state service
to uphold this contract.
The `tower::Buffer` wrapper is `Clone`able, allowing shared access to a common state service. This allows different tasks to share access to the chain state.
The set of operations supported by `zebra-state` are encoded in its `Request`
enum. This enum has one variant for each supported operation.
```rust
pub enum Request {
CommitBlock {
block: Arc<Block>,
},
CommitFinalizedBlock {
block: Arc<Block>,
},
Depth(Hash),
Tip,
BlockLocator,
Transaction(Hash),
Block(HashOrHeight),
// .. some variants omitted
}
```
`zebra-state` breaks down its requests into two categories and provides
different guarantees for each category: requests that modify the state, and requests that
do not. Requests that update the state are guaranteed to run sequentially and
will never race against each other. Requests that read state are done
asynchronously and are guaranteed to read at least the state present at the
time the request was processed by the service, or a later state present at the time the request future is executed. The state service avoids
race conditions between the read state and the written state by doing all
contextual verification internally.
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
## State Components
Zcash (as implemented by `zcashd`) differs from Bitcoin in its treatment of
transaction finality. If a new best chain is detected that does not extend
the previous best chain, blocks at the end of the previous best chain become
orphaned (no longer included in the best chain). Their state updates are
therefore no longer included in the best chain's chain state. The process of
rolling back orphaned blocks and applying new blocks is called a chain
reorganization. Bitcoin allows chain reorganizations of arbitrary depth,
while `zcashd` limits chain reorganizations to 100 blocks. (In `zcashd`, the
new best chain must be a side-chain that forked within 100 blocks of the tip
of the current best chain.)
This difference means that in Bitcoin, chain state only has probabilistic
finality, while in Zcash, chain state is final once it is beyond the reorg
limit. To simplify our implementation, we split the representation of the
state data at the finality boundary provided by the reorg limit.
State data from blocks *above* the reorg limit (*non-finalized state*) is
stored in-memory and handles multiple chains. State data from blocks *below*
the reorg limit (*finalized state*) is stored persistently using `sled` and
only tracks a single chain. This allows a simplification of our state
handling, because only finalized data is persistent and the logic for
finalized data handles less invariants.
One downside of this design is that restarting the node loses the last 100
blocks, but node restarts are relatively infrequent and a short re-sync is
cheap relative to the cost of additional implementation complexity.
Another downside of this design is that we do not achieve exactly the same
behavior as Zcashd in the event of a 51% attack: Zcashd limits *each* chain
reorganization to 100 blocks, but permits multiple reorgs, while Zebra limits
*all* chain reorgs to 100 blocks. In the event of a successful 51% attack on
Zcash, this could be resolved by wiping the Sled state and re-syncing the new
chain, but in this scenario there are worse problems.
## Service Interface
[service-interface]: #service-interface
The state is accessed asynchronously through a Tower service interface.
Determining what guarantees the state service can and should provide to the
rest of the application requires considering two sets of behaviors:
1. behaviors related to the state's external API (a `Buffer`ed `tower::Service`);
2. behaviors related to the state's internal implementation (using `sled`).
Making this distinction helps us to ensure we don't accidentally leak
"internal" behaviors into "external" behaviors, which would violate
encapsulation and make it more difficult to replace `sled`.
In the first category, our state is presented to the rest of the application
as a `Buffer`ed `tower::Service`. The `Buffer` wrapper allows shared access
to a service using an actor model, moving the service to be shared into a
worker task and passing messages to it over an multi-producer single-consumer
(mpsc) channel. The worker task recieves messages and makes `Service::call`s.
The `Service::call` method returns a `Future`, and the service is allowed to
decide how much work it wants to do synchronously (in `call`) and how much
work it wants to do asynchronously (in the `Future` it returns).
This means that our external API ensures that the state service sees a
linearized sequence of state requests, although the exact ordering is
unpredictable when there are multiple senders making requests.
In the second category, the Sled API presents itself synchronously, but
database and tree handles are clonable and can be moved between threads. All
that's required to process some request asynchronously is to clone the
appropriate handle, move it into an async block, and make the call as part of
the future. (We might want to use Tokio's blocking API for this, but this is
an implementation detail).
Because the state service has exclusive access to the sled database, and the
state service sees a linearized sequence of state requests, we have an easy
way to opt in to asynchronous database access. We can perform sled operations
synchronously in the `Service::call`, waiting for them to complete, and be
sure that all future requests will see the resulting sled state. Or, we can
perform sled operations asynchronously in the future returned by
`Service::call`.
If we perform all *writes* synchronously and allow reads to be either
synchronous or asynchronous, we ensure that writes cannot race each other.
Asynchronous reads are guaranteed to read at least the state present at the
time the request was processed, or a later state.
### Summary
- **Sled reads** may be done synchronously (in `call`) or asynchronously (in
the `Future`), depending on the context;
- **Sled writes** must be done synchronously (in `call`)
## In-memory data structures
[in-memory]: #in-memory
At a high level, the in-memory data structures store a collection of chains,
each rooted at the highest finalized block. Each chain consists of a map from
heights to blocks. Chains are stored using an ordered map from cumulative work to
chains, so that the map ordering is the ordering of best to worst chains.
### The `Chain` type
[chain-type]: #chain-type
The `Chain` type represents a chain of blocks. Each block represents an
incremental state update, and the `Chain` type caches the cumulative state
update from its root to its tip.
The `Chain` type is used to represent the non-finalized portion of a complete
chain of blocks rooted at the genesis block. The parent block of the root of
a `Chain` is the tip of the finalized portion of the chain. As an exception, the finalized
portion of the chain is initially empty, until the genesis block has been finalized.
The `Chain` type supports serveral operations to manipulate chains, `push`,
`pop_root`, and `fork`. `push` is the most fundamental operation and handles
contextual validation of chains as they are extended. `pop_root` is provided
for finalization, and is how we move blocks from the non-finalized portion of
the state to the finalized portion. `fork` on the other hand handles creating
new chains for `push` when new blocks arrive whose parent isn't a tip of an
existing chain.
**Note:** The `Chain` type's API is only designed to handle non-finalized
data. The genesis block and all pre sapling blocks are always considered to
be finalized blocks and should not be handled via the `Chain` type through
`CommitBlock`. They should instead be committed directly to the finalized
state with `CommitFinalizedBlock`. This is particularly important with the
genesis block since the `Chain` will panic if used while the finalized state
is completely empty.
The `Chain` type is defined by the following struct and API:
```rust
struct Chain {
blocks: BTreeMap<block::Height, Arc<Block>>,
height_by_hash: HashMap<block::Hash, block::Height>,
tx_by_hash: HashMap<transaction::Hash, (block::Height, tx_index)>,
utxos: HashSet<transparent::Output>,
sapling_anchors: HashSet<sapling::tree::Root>,
sprout_anchors: HashSet<sprout::tree::Root>,
sapling_nullifiers: HashSet<sapling::Nullifier>,
sprout_nullifiers: HashSet<sprout::Nullifier>,
partial_cumulative_work: PartialCumulativeWork,
}
```
#### `pub fn push(&mut self, block: Arc<Block>)`
Push a block into a chain as the new tip
1. Update cumulative data members
- Add block to end of `self.blocks`
- Add hash to `height_by_hash`
- for each `transaction` in `block`
- add key: `transaction.hash` and value: `(height, tx_index)` to `tx_by_hash`
- Add new utxos and remove consumed utxos from `self.utxos`
- Add anchors to the appropriate `self.<version>_anchors`
- Add nullifiers to the appropriate `self.<version>_nullifiers`
- Add work to `self.partial_cumulative_work`
#### `pub fn pop_root(&mut self) -> Arc<Block>`
Remove the lowest height block of the non-finalized portion of a chain.
1. Remove the lowest height block from `self.blocks`
2. Update cumulative data members
- Remove the block's hash from `self.height_by_hash`
- for each `transaction` in `block`
- remove `transaction.hash` from `tx_by_hash`
- Remove new utxos from `self.utxos`
- Remove the anchors from the appropriate `self.<version>_anchors`
- Remove the nullifiers from the appropriate `self.<version>_nullifiers`
3. Return the block
#### `pub fn fork(&self, new_tip: block::Hash) -> Option<Self>`
Fork a chain at the block with the given hash, if it is part of this chain.
1. If `self` does not contain `new_tip` return `None`
2. Clone self as `forked`
3. While the tip of `forked` is not equal to `new_tip`
- call `forked.pop_tip()` and discard the old tip
4. Return `forked`
#### `fn pop_tip(&mut self) -> Arc<Block>`
Remove the highest height block of the non-finalized portion of a chain.
1. Remove the highest height `block` from `self.blocks`
2. Update cumulative data members
- Remove the corresponding hash from `self.height_by_hash`
- for each `transaction` in `block`
- remove `transaction.hash` from `tx_by_hash`
- Add consumed utxos and remove new utxos from `self.utxos`
- Remove anchors from the appropriate `self.<version>_anchors`
- Remove the nullifiers from the appropriate `self.<version>_nullifiers`
- Subtract work from `self.partial_cumulative_work`
3. Return the block
#### `Ord`
The `Chain` type implements `Ord` for reorganizing chains. First chains
are compared by their `partial_cumulative_work`. Ties are then broken by
comparing `block::Hash`es of the tips of each chain.
**Note**: Unlike `zcashd`, Zebra does not use block arrival times as a
tie-breaker for the best tip. Since Zebra downloads blocks in parallel,
download times are not guaranteed to be unique. Using the `block::Hash`
provides a consistent tip order. (As a side-effect, the tip order is also
consistent after a node restart, and between nodes.)
#### `Default`
The `Chain` type implements `Default` for constructing new chains whose
parent block is the tip of the finalized state. This implementation should be
handled by `#[derive(Default)]`.
1. initialise cumulative data members
- Construct an empty `self.blocks`, `height_by_hash`, `tx_by_hash`,
`self.utxos`, `self.<version>_anchors`, `self.<version>_nullifiers`
- Zero `self.partial_cumulative_work`
**Note:** The chain can be empty if:
- after a restart - the non-finalized state is empty
- during a fork from the finalized tip - the forked Chain is empty, because
all its blocks have been `pop`ped
### `ChainSet` Type
[chainset-type]: #chainset-type
The `ChainSet` type represents the set of all non-finalized state. It
consists of a set of non-finalized but verified chains and a set of
unverified blocks which are waiting for the full context needed to verify
them to become available.
`ChainSet` is defined by the following structure and API:
```rust
struct ChainSet {
chains: BTreeSet<Chain>,
queued_blocks: BTreeMap<block::Hash, QueuedBlock>,
queued_by_parent: BTreeMap<block::Hash, Vec<block::Hash>>,
queued_by_height: BTreeMap<block::Height, Vec<block::Hash>>,
}
```
#### `pub fn finalize(&mut self) -> Arc<Block>`
Finalize the lowest height block in the non-finalized portion of the best
chain and updates all side chains to match.
1. Extract the best chain from `self.chains` into `best_chain`
2. Extract the rest of the chains into a `side_chains` temporary variable, so
they can be mutated
3. Remove the lowest height block from the best chain with
`let block = best_chain.pop_root();`
4. Add `best_chain` back to `self.chains`
5. For each remaining `chain` in `side_chains`
- If `chain` starts with `block`, remove `block` and add `chain` back to
`self.chains`
- Else, drop `chain`
6. calculate the new finalized tip height from the new `best_chain`
7. for each `height` in `self.queued_by_height` where the height is lower than the
new reorg limit
- for each `hash` in `self.queued_by_height.remove(height)`
- Remove the key `hash` from `self.queued_blocks` and store the removed `block`
- Find and remove `hash` from `self.queued_by_parent` using `block.parent`'s hash
8. Return `block`
### `pub fn queue(&mut self, block: QueuedBlock)`
Queue a non-finalized block to be committed to the state.
After queueing a non-finalized block, this method checks whether the newly
queued block (and any of its descendants) can be committed to the state
1. Check if the parent block exists in any current chain
2. If it does, call `let ret = self.commit_block(block)`
- Call `self.process_queued(new_parents)` if `ret` is `Some`
3. Else Add `block` to `self.queued_blocks` and related members and return
### `fn process_queued(&mut self, new_parent: block::Hash)`
1. Create a list of `new_parents` and populate it with `new_parent`
2. While let Some(parent) = new_parents.pop()
- for each `hash` in `self.queued_by_parent.remove(&parent.hash)`
- lookup the `block` for `hash`
- remove `block` from `self.queued_blocks`
- remove `hash` from `self.queued_by_height`
- let result = `self.commit_block(block)`;
- add `result` to `new_parents`
### `fn commit_block(&mut self, block: QueuedBlock) -> Option<block::Hash>`
Try to commit `block` to the non-finalized state. Returns `None` if the block
cannot be committed due to missing context.
1. Search for the first chain where `block.parent` == `chain.tip`. If it exists:
- push `block` onto that chain
- broadcast `result` via `block.rsp_tx`
- return Some(block.hash) if `result.is_ok()`
2. Find the first chain that contains `block.parent` and fork it with
`block.parent` as the new tip
- `let fork = self.chains.iter().find_map(|chain| chain.fork(block.parent));`
3. If `fork` is `Some`
- push `block` onto that chain
- add `fork` to `self.chains`
- broadcast `result` via `block.rsp_tx`
- return Some(block.hash) if `result.is_ok()`
5. Else panic, this should be unreachable because `commit_block` is only
called when it's ready to be committed.
### Summary
- `Chain` represents the non-finalized portion of a single chain
- `ChainSet` represents the non-finalized portion of all chains and all
unverified blocks that are waiting for context to be available.
- `ChainSet::queue` handles queueing and or commiting blocks and
reorganizing chains (via `commit_block`) but not finalizing them
- Finalized blocks are returned from `finalize` and must still be committed
to disk afterwards
- `finalize` handles pruning queued blocks that are past the reorg limit
## Committing non-finalized blocks
Given the above structures for manipulating the non-finalized state new
`non-finalized` blocks are commited in two steps. First we commit the block
to the in memory state, then we finalize all lowest height blocks that are
past the reorg limit, finally we process any queued blocks and prune any that
are now past the reorg limit.
1. Run contextual validation on `block` against the finalized and non
finalized state
2. If `block.parent` == `finalized_tip.hash`
- Construct a new `Chain` with `Chain::default`
- push `block` onto that chain
- add `fork` to `chain_set.chains`
- broadcast `result` via `block.rsp_tx`
- return Some(block.hash) if `result.is_ok()`
3. commit or queue the block to the non-finalized state with
`chain_set.queue(block);`
4. If the best chain is longer than the reorg limit
- Finalize all lowest height blocks in the best chain, and commit them to
disk with `CommitFinalizedBlock`:
```
while self.best_chain().len() > reorg_limit {
let finalized = chain_set.finalize()?;
let request = CommitFinalizedBlock { finalized };
sled_state.ready_and().await?.call(request).await?;
};
```
## Sled data structures
[sled]: #sled
Sled provides a persistent, thread-safe `BTreeMap<&[u8], &[u8]>`. Each map is
a distinct "tree". Keys are sorted using lex order on byte strings, so
integer values should be stored using big-endian encoding (so that the lex
order on byte strings is the numeric ordering).
We use the following Sled trees:
| Tree | Keys | Values |
|----------------------|-----------------------|-------------------------------------|
| `hash_by_height` | `BE32(height)` | `block::Hash` |
| `height_by_hash` | `block::Hash` | `BE32(height)` |
| `block_by_height` | `BE32(height)` | `Block` |
| `tx_by_hash` | `transaction::Hash` | `BE32(height) || BE32(tx_index)` |
| `utxo_by_outpoint` | `OutPoint` | `TransparentOutput` |
| `sprout_nullifiers` | `sprout::Nullifier` | `()` |
| `sapling_nullifiers` | `sapling::Nullifier` | `()` |
| `sprout_anchors` | `sprout::tree::Root` | `()` |
| `sapling_anchors` | `sapling::tree::Root` | `()` |
Zcash structures are encoded using `ZcashSerialize`/`ZcashDeserialize`.
**Note:** We do not store the cumulative work for the finalized chain, because the finalized work is equal for all non-finalized chains. So the additional non-finalized work can be used to calculate the relative chain order, and choose the best chain.
### Notes on Sled trees
- The `hash_by_height` and `height_by_hash` trees provide the bijection between
block heights and block hashes. (Since the Sled state only stores finalized
state, this is actually a bijection).
- Blocks are stored by height, not by hash. This has the downside that looking
up a block by hash requires an extra level of indirection. The upside is
that blocks with adjacent heights are adjacent in the database, and many
common access patterns, such as helping a client sync the chain or doing
analysis, access blocks in (potentially sparse) height order. In addition,
the fact that we commit blocks in order means we're writing only to the end
of the Sled tree, which may help save space.
- Transaction references are stored as a `(height, index)` pair referencing the
height of the transaction's parent block and the transaction's index in that
block. This would more traditionally be a `(hash, index)` pair, but because
we store blocks by height, storing the height saves one level of indirection.
## Committing finalized blocks
If the parent block is not committed, add the block to an internal queue for
future processing. Otherwise, commit the block described below, then
commit any queued children. (Although the checkpointer generates verified
blocks in order when it completes a checkpoint, the blocks are committed in the
response futures, so they may arrive out of order).
Committing a block to the sled state should be implemented as a wrapper around
a function also called by [`Request::CommitBlock`](#request-commit-block),
which should:
1. Obtain the highest entry of `hash_by_height` as `(old_height, old_tip)`.
Check that `block`'s parent hash is `old_tip` and its height is
`old_height+1`, or panic. This check is performed as defense-in-depth to
prevent database corruption, but it is the caller's responsibility (e.g. the
zebra-state service's responsibility) to commit finalized blocks in order.
The genesis block does not have a parent block. For genesis blocks,
check that `block`'s parent hash is `null` (all zeroes) and its height is `0`.
2. Insert:
- `(hash, height)` into `height_by_hash`;
- `(height, hash)` into `hash_by_height`;
- `(height, block)` into `block_by_height`.
3. If the block is a genesis block, skip any transaction updates.
(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions
are ignored during validation.)
3. If the block is a genesis block, skip any transaction updates.
(Due to a [bug in zcashd](https://github.com/ZcashFoundation/zebra/issues/559), genesis block transactions
are ignored during validation.)
3. Update the `sprout_anchors` and `sapling_anchors` trees with the Sprout and Sapling anchors.
4. Iterate over the enumerated transactions in the block. For each transaction:
1. Insert `(transaction_hash, block_height || BE32(tx_index))` to
`tx_by_hash`;
2. For each `TransparentInput::PrevOut { outpoint, .. }` in the
transaction's `inputs()`, remove `outpoint` from `utxo_by_output`.
3. For each `output` in the transaction's `outputs()`, construct the
`outpoint` that identifies it, and insert `(outpoint, output)` into
`utxo_by_output`.
4. For each [`JoinSplit`] description in the transaction,
insert `(nullifiers[0],())` and `(nullifiers[1],())` into
`sprout_nullifiers`.
5. For each [`Spend`] description in the transaction, insert
`(nullifier,())` into `sapling_nullifiers`.
**Note**: The Sprout and Sapling anchors are the roots of the Sprout and
Sapling note commitment trees that have already been calculated for the last
transaction(s) in the block that have `JoinSplit`s in the Sprout case and/or
`Spend`/`Output` descriptions in the Sapling case. These should be passed as
fields in the `Commit*Block` requests.
[`JoinSplit`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.JoinSplit.html
[`Spend`]: https://doc.zebra.zfnd.org/zebra_chain/transaction/struct.Spend.html
These updates can be performed in a batch or without necessarily iterating
over all transactions, if the data is available by other means; they're
specified this way for clarity.
## Request / Response API
[request-response]: #request-response
The state API is provided by a pair of `Request`/`Response` enums. Each
`Request` variant corresponds to particular `Response` variants, and it's
fine (and encouraged) for caller code to unwrap the expected variants with
`unreachable!` on the unexpected variants. This is slightly inconvenient but
it means that we have a unified state interface with unified backpressure.
This API includes both write and read calls. Spotting `Commit` requests in
code review should not be a problem, but in the future, if we need to
restrict access to write calls, we could implement a wrapper service that
rejects these, and export "read" and "write" frontends to the same inner service.
### `Request::CommitBlock`
[request-commit-block]: #request-commit-block
```rust
CommitBlock {
block: Arc<Block>,
sprout_anchor: sprout::tree::Root,
sapling_anchor: sapling::tree::Root,
}
```
Performs contextual validation of the given block, committing it to the state
if successful. Returns `Response::Added(BlockHeaderHash)` with the hash of
the newly committed block or an error.
### `Request::CommitFinalizedBlock`
[request-commit-finalized-block]: #request-finalized-block
```rust
CommitFinalizedBlock {
block: Arc<Block>,
sprout_anchor: sprout::tree::Root,
sapling_anchor: sapling::tree::Root,
}
```
Commits a finalized block to the sled state, skipping contextual validation.
This is exposed for use in checkpointing, which produces in-order finalized
blocks. Returns `Response::Added(block::Hash)` with the hash of the
committed block if successful.
### `Request::Depth(block::Hash)`
[request-depth]: #request-depth
Computes the depth in the best chain of the block identified by the given
hash, returning
- `Response::Depth(Some(depth))` if the block is in the best chain;
- `Response::Depth(None)` otherwise.
Implemented by querying:
- (non-finalized) the `height_by_hash` map in the best chain, and
- (finalized) the `height_by_hash` tree
### `Request::Tip`
[request-tip]: #request-tip
Returns `Response::Tip(block::Hash)` with the current best chain tip.
Implemented by querying:
- (non-finalized) the highest height block in the best chain
if the `non-finalized` state is empty
- (finalized) the highest height block in the `hash_by_height` tree
### `Request::BlockLocator`
[request-block-locator]: #request-block-locator
Returns `Response::BlockLocator(Vec<block::Hash>)` with hashes starting from
the current chain tip and reaching backwards towards the genesis block. The
first hash is the best chain tip. The last hash is the tip of the finalized
portion of the state. If the finalized and non-finalized states are both
empty, the block locator is also empty.
This can be used by the sync component to request hashes of subsequent
blocks.
Implemented by querying:
- (non-finalized) the `hash_by_height` map in the best chain
- (finalized) the `hash_by_height` tree.
### `Request::Transaction(TransactionHash)`
[request-transaction]: #request-transaction
Returns
- `Response::Transaction(Some(Transaction))` if the transaction identified by
the given hash is contained in the state;
- `Response::Transaction(None)` if the transaction identified by the given
hash is not contained in the state.
Implemented by querying:
- (non-finalized) the `tx_by_hash` map (to get the block that contains the
transaction) of each chain starting with the best chain, and then find
block that chain's `blocks` (to get the block containing the transaction
data)
if the transaction is not in any non-finalized chain:
- (finalized) the `tx_by_hash` tree (to get the block that contains the
transaction) and then `block_by_height` tree (to get the block containing
the transaction data).
### `Request::Block(BlockHeaderHash)`
[request-block]: #request-block
Returns
- `Response::Block(Some(Arc<Block>))` if the block identified by the given
hash is contained in the state;
- `Response::Block(None)` if the block identified by the given hash is not
contained in the state;
Implemented by querying:
- (non-finalized) the `height_by_hash` of each chain starting with the best
chain, then find block that chain's `blocks` (to get the block data)
if the block is not in any non-finalized chain:
- (finalized) the `height_by_hash` tree (to get the block height) and then
the `block_by_height` tree (to get the block data).
# Drawbacks
[drawbacks]: #drawbacks
- Restarts can cause `zebrad` to redownload up to the last one hundred blocks
it verified in the best chain, and potentially some recent side-chain blocks.
- The service interface puts some extra responsibility on callers to ensure
it is used correctly and does not verify the usage is correct at compile
time.
- the service API is verbose and requires manually unwrapping enums
- We do not handle reorgs the same way zcashd does, and could in theory need
to delete our entire on disk state and resync the chain in some
pathological reorg cases.
- testnet rollbacks are infrequent, but possible, due to bugs in testnet
releases. Each testnet rollback will require additional state service code.