diff --git a/book/src/dev/rfcs/0005-state-updates.md b/book/src/dev/rfcs/0005-state-updates.md index 1129a5c11..04d9eb943 100644 --- a/book/src/dev/rfcs/0005-state-updates.md +++ b/book/src/dev/rfcs/0005-state-updates.md @@ -611,7 +611,7 @@ We use the following rocksdb column families: | `hash_by_tx_loc` | `TransactionLocation` | `transaction::Hash` | Never | | `tx_loc_by_hash` | `transaction::Hash` | `TransactionLocation` | Never | | *Transparent* | | | | -| `utxo_by_out_loc` | `OutputLocation` | `transparent::Output` | Delete | +| `utxo_by_out_loc` | `OutputLocation` | `Output \|\| AddressLocation` | Delete | | `balance_by_transparent_addr` | `transparent::Address` | `Amount \|\| AddressLocation` | Update | | `utxo_by_transparent_addr_loc` | `AddressLocation` | `AtLeastOne` | Up/Del | | `tx_by_transparent_addr_loc` | `AddressLocation` | `AtLeastOne` | Append | @@ -622,11 +622,11 @@ We use the following rocksdb column families: | *Sapling* | | | | | `sapling_nullifiers` | `sapling::Nullifier` | `()` | Never | | `sapling_anchors` | `sapling::tree::Root` | `()` | Never | -| `sapling_note_commitment_tree` | `block::Height` | `sapling::tree::NoteCommitmentTree` | Delete | +| `sapling_note_commitment_tree` | `block::Height` | `sapling::tree::NoteCommitmentTree` | Never | | *Orchard* | | | | | `orchard_nullifiers` | `orchard::Nullifier` | `()` | Never | | `orchard_anchors` | `orchard::tree::Root` | `()` | Never | -| `orchard_note_commitment_tree` | `block::Height` | `orchard::tree::NoteCommitmentTree` | Delete | +| `orchard_note_commitment_tree` | `block::Height` | `orchard::tree::NoteCommitmentTree` | Never | | *Chain* | | | | | `history_tree` | `block::Height` | `NonEmptyHistoryTree` | Delete | | `tip_chain_value_pool` | `()` | `ValueBalance` | Update | @@ -664,14 +664,47 @@ Derived Formats: Each column family handles updates differently, based on its specific consensus rules: - Never: Keys are never deleted, values are never updated. The value for each key is inserted once. - Delete: Keys can be deleted, but values are never updated. The value for each key is inserted once. + - Code called by ReadStateService must ignore deleted keys, or use a read lock. - TODO: should we prevent re-inserts of keys that have been deleted? - Update: Keys are never deleted, but values can be updated. + - Code called by ReadStateService must accept old or new values, or use a read lock. - Append: Keys are never deleted, existing values are never updated, but sets of values can be extended with more entries. -- Up/Del: Keys can be deleted, existing entries can be removed, - sets of values can be extended with more entries. + - Code called by ReadStateService must accept truncated or extended sets, or use a read lock. +- Up/Del: Keys can be deleted, and values can be added or removed from sets. + - Code called by ReadStateService must ignore deleted keys and values, + accept truncated or extended sets, and accept old or new values. + Or it should use a read lock. + +### RocksDB read locks +[rocksdb-read-locks]: #rocksdb-read-locks -Currently, there are no column families that both delete and update keys. +The read-only ReadStateService needs to handle concurrent writes and deletes of the finalized +column families it reads. It must also handle overlaps between the cached non-finalized `Chain`, +and the current finalized state database. + +The StateService uses RocksDB transactions for each block write. +So ReadStateService queries that only access a single key or value will always see +a consistent view of the database. + +If a ReadStateService query only uses column families that have keys and values appended +(`Never` in the Updates table above), it should ignore extra appended values. +Most queries do this by default. + +For more complex queries, there are several options: + +Reading across multiple column families: +1. Ignore deleted values using custom Rust code +2. Take a database snapshot - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.snapshot + +Reading a single column family: +3. multi_get - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.multi_get_cf +4. iterator - https://docs.rs/rocksdb/latest/rocksdb/struct.DBWithThreadMode.html#method.iterator_cf + +RocksDB also has read transactions, but they don't seem to be exposed in the Rust crate. + +### Low-Level Implementation Details +[rocksdb-low-level]: #rocksdb-low-level RocksDB ignores duplicate puts and deletes, preserving the latest values. If rejecting duplicate puts or deletes is consensus-critical, @@ -693,6 +726,7 @@ and merge operators are unreliable (or have undocumented behaviour). So they should not be used for consensus-critical checks. ### Notes on rocksdb column families +[rocksdb-column-families]: #rocksdb-column-families - The `hash_by_height` and `height_tx_count_by_hash` column families provide a bijection between block heights and block hashes. (Since the rocksdb state only stores finalized @@ -748,32 +782,40 @@ So they should not be used for consensus-critical checks. addresses with large UTXO sets. It also stores the `AddressLocation` for each address, which allows for efficient lookups. -- `utxo_by_transparent_addr_loc` stores unspent transparent output locations by address. - UTXO locations are appended by each block. If an address lookup discovers a UTXO - has been spent in `utxo_by_outpoint`, that UTXO location can be deleted from - `utxo_by_transparent_addr_loc`. (We don't do these deletions every time a block is - committed, because that requires an expensive full index search.) +- `utxo_by_transparent_addr_loc` stores unspent transparent output locations + by address. UTXO locations are appended by each block. This list includes the `AddressLocation`, if it has not been spent. (This duplicate data is small, and helps simplify the code.) + +- When a block write deletes a UTXO from `utxo_by_outpoint`, + that UTXO location should be deleted from `utxo_by_transparent_addr_loc`. + This is an index optimisation. - `tx_by_transparent_addr_loc` stores transaction locations by address. This list includes transactions containing spent UTXOs. It also includes the `TransactionLocation` from the `AddressLocation`. (This duplicate data is small, and helps simplify the code.) -- Each `*_note_commitment_tree` stores the note commitment tree state +- The `sprout_note_commitment_tree` stores the note commitment tree state at the tip of the finalized state, for the specific pool. There is always - a single entry for those; they are indexed by height just to make testing - and debugging easier (so for each block committed, the old tree is - deleted and a new one is inserted by its new height). Each tree is stored + a single entry. Each tree is stored + as a "Merkle tree frontier" which is basically a (logarithmic) subset of + the Merkle tree nodes as required to insert new items. + For each block committed, the old tree is deleted and a new one is inserted + by its new height. + **TODO:** store the sprout note commitment tree by `()`, + to avoid ReadStateService concurrent write issues. + +- The `{sapling, orchard}_note_commitment_tree` stores the note commitment tree + state for every height, for the specific pool. Each tree is stored as a "Merkle tree frontier" which is basically a (logarithmic) subset of the Merkle tree nodes as required to insert new items. - `history_tree` stores the ZIP-221 history tree state at the tip of the finalized - state. There is always a single entry for it; it is indexed by height just - to make testing and debugging easier. The tree is stored as the set of "peaks" + state. There is always a single entry for it. The tree is stored as the set of "peaks" of the "Merkle mountain range" tree structure, which is what is required to insert new items. + **TODO:** store the history tree by `()`, to avoid ReadStateService concurrent write issues. - Each `*_anchors` stores the anchor (the root of a Merkle tree) of the note commitment tree of a certain block. We only use the keys since we just need the set of anchors, diff --git a/zebra-state/src/service/finalized_state/zebra_db/chain.rs b/zebra-state/src/service/finalized_state/zebra_db/chain.rs index c026a97d9..fade94802 100644 --- a/zebra-state/src/service/finalized_state/zebra_db/chain.rs +++ b/zebra-state/src/service/finalized_state/zebra_db/chain.rs @@ -90,9 +90,11 @@ impl DiskWriteBatch { self.zs_delete(history_tree_cf, h); } - // TODO: just store a single history tree, using `()` as the key, - // and remove the delete (like the chain value pool balances). - // This requires a database version update. + // TODO: if we ever need concurrent read-only access to the history tree, + // store it by `()`, not height. + // Otherwise, the ReadStateService could access a height + // that was just deleted by a concurrent StateService write. + // This requires a database version update. if let Some(history_tree) = history_tree.as_ref() { self.zs_insert(history_tree_cf, height, history_tree); }