change(doc): Simplify the database design using prefix iterators (#3916)

* Simplify the database design using prefix iterators

* Fix typos and missed changes

Co-authored-by: Marek <mail@marek.onl>

Co-authored-by: Marek <mail@marek.onl>
This commit is contained in:
teor 2022-03-22 08:38:23 +10:00 committed by GitHub
parent 17056f3c00
commit 2b51f9098d
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 26 additions and 25 deletions

View File

@ -604,7 +604,7 @@ We use the following rocksdb column families:
| ------------------------------ | ---------------------- | ----------------------------------- | ------- | | ------------------------------ | ---------------------- | ----------------------------------- | ------- |
| *Blocks* | | | | | *Blocks* | | | |
| `hash_by_height` | `block::Height` | `block::Hash` | Never | | `hash_by_height` | `block::Height` | `block::Hash` | Never |
| `height_tx_count_by_hash` | `block::Hash` | `HeightTransactionCount` | Never | | `height_by_hash` | `block::Hash` | `block::Height` | Never |
| `block_header_by_height` | `block::Height` | `block::Header` | Never | | `block_header_by_height` | `block::Height` | `block::Header` | Never |
| *Transactions* | | | | | *Transactions* | | | |
| `tx_by_loc` | `TransactionLocation` | `Transaction` | Never | | `tx_by_loc` | `TransactionLocation` | `Transaction` | Never |
@ -639,7 +639,6 @@ Block and Transaction Data:
- `TransactionIndex`: 16 bits, big-endian, unsigned (max ~23,000 transactions in the 2 MB block limit) - `TransactionIndex`: 16 bits, big-endian, unsigned (max ~23,000 transactions in the 2 MB block limit)
- `TransactionCount`: same as `TransactionIndex` - `TransactionCount`: same as `TransactionIndex`
- `TransactionLocation`: `Height \|\| TransactionIndex` - `TransactionLocation`: `Height \|\| TransactionIndex`
- `HeightTransactionCount`: `Height \|\| TransactionCount`
- `OutputIndex`: 24 bits, big-endian, unsigned (max ~223,000 transfers in the 2 MB block limit) - `OutputIndex`: 24 bits, big-endian, unsigned (max ~223,000 transfers in the 2 MB block limit)
- transparent and shielded input indexes, and shielded output indexes: 16 bits, big-endian, unsigned (max ~49,000 transfers in the 2 MB block limit) - transparent and shielded input indexes, and shielded output indexes: 16 bits, big-endian, unsigned (max ~49,000 transfers in the 2 MB block limit)
- `OutputLocation`: `TransactionLocation \|\| OutputIndex` - `OutputLocation`: `TransactionLocation \|\| OutputIndex`
@ -728,28 +727,27 @@ So they should not be used for consensus-critical checks.
### Notes on rocksdb column families ### Notes on rocksdb column families
[rocksdb-column-families]: #rocksdb-column-families [rocksdb-column-families]: #rocksdb-column-families
- The `hash_by_height` and `height_tx_count_by_hash` column families provide a bijection between - The `hash_by_height` and `height_by_hash` column families provide a bijection between
block heights and block hashes. (Since the rocksdb state only stores finalized block heights and block hashes. (Since the rocksdb state only stores finalized
state, they are actually a bijection). state, they are actually a bijection).
- Similarly, the `tx_by_hash` and `hash_by_tx` column families provide a bijection between - Similarly, the `tx_loc_by_hash` and `hash_by_tx_loc` column families provide a bijection between
transaction locations and transaction hashes. transaction locations and transaction hashes.
- The `block_header_by_height` column family provides a bijection between block - The `block_header_by_height` column family provides a bijection between block
heights and block header data. There is no corresponding `height_by_block` column heights and block header data. There is no corresponding `height_by_block` column
family: instead, hash the block, and use the hash from `height_tx_count_by_hash`. (Since the family: instead, hash the block header, and use the hash from `height_by_hash`.
rocksdb state only stores finalized state, they are actually a bijection). (Since the rocksdb state only stores finalized state, they are actually a bijection).
Similarly, there are no column families that go from transaction data Similarly, there are no column families that go from transaction data
to transaction locations: hash the transaction and use `tx_by_hash`. to transaction locations: hash the transaction and use `tx_loc_by_hash`.
- Block headers and transactions are stored separately in the database, - Block headers and transactions are stored separately in the database,
so that individual transactions can be accessed efficiently. so that individual transactions can be accessed efficiently.
Blocks can be re-created on request using the following process: Blocks can be re-created on request using the following process:
- Look up `height` and `tx_count` in `height_tx_count_by_hash` - Look up `height` in `height_by_hash`
- Get the block header for `height` from `block_header_by_height` - Get the block header for `height` from `block_header_by_height`
- Use [`prefix_iterator`](https://docs.rs/rocksdb/0.17.0/rocksdb/struct.DBWithThreadMode.html#method.prefix_iterator) - Use a [`prefix_iterator`](https://docs.rs/rocksdb/0.17.0/rocksdb/struct.DBWithThreadMode.html#method.prefix_iterator)
or [`multi_get`](https://github.com/facebook/rocksdb/wiki/MultiGet-Performance) to get each transaction with `height` from `tx_by_loc`
to get each transaction from `0..tx_count` from `tx_by_location`
- Block headers are stored by height, not by hash. This has the downside that looking - Block headers are stored by height, not by hash. This has the downside that looking
up a block by hash requires an extra level of indirection. The upside is up a block by hash requires an extra level of indirection. The upside is
@ -759,26 +757,27 @@ So they should not be used for consensus-critical checks.
the fact that we commit blocks in order means we're writing only to the end the fact that we commit blocks in order means we're writing only to the end
of the rocksdb column family, which may help save space. of the rocksdb column family, which may help save space.
- Similarly, transaction data is stored in chain order in `tx_by_location`, - Similarly, transaction data is stored in chain order in `tx_by_loc` and `utxo_by_out_loc`,
and chain order within each vector in `tx_by_transparent_address`. and chain order within each vector in `utxo_by_transparent_addr_loc` and
`tx_by_transparent_addr_loc`.
- `TransactionLocation`s are stored as a `(height, index)` pair referencing the - `TransactionLocation`s are stored as a `(height, index)` pair referencing the
height of the transaction's parent block and the transaction's index in that height of the transaction's parent block and the transaction's index in that
block. This would more traditionally be a `(hash, index)` pair, but because block. This would more traditionally be a `(hash, index)` pair, but because
we store blocks by height, storing the height saves one level of indirection. we store blocks by height, storing the height saves one level of indirection.
Transaction hashes can be looked up using `hash_by_tx`. Transaction hashes can be looked up using `hash_by_tx_loc`.
- Similarly, UTXOs are stored in `utxo_by_outpoint` by `OutputLocation`, - Similarly, UTXOs are stored in `utxo_by_out_loc` by `OutputLocation`,
rather than `OutPoint`. `OutPoint`s can be looked up using `tx_by_hash`, rather than `OutPoint`. `OutPoint`s can be looked up using `tx_loc_by_hash`,
and reconstructed using `hash_by_tx`. and reconstructed using `hash_by_tx_loc`.
- The `Utxo` type can be constructed from the `Output` data, - The `Utxo` type can be constructed from the `OutputLocation` and `Output` data,
`height: TransactionLocation.height`, and `height: OutputLocation.height`, and
`is_coinbase: TransactionLocation.index == 0` `is_coinbase: OutputLocation.transaction_index == 0`
(coinbase transactions are always the first transaction in a block). (coinbase transactions are always the first transaction in a block).
- `balance_by_transparent_addr` is the sum of all `utxo_by_transparent_addr_loc`s - `balance_by_transparent_addr` is the sum of all `utxo_by_transparent_addr_loc`s
that are still in `utxo_by_outpoint`. It is cached to improve performance for that are still in `utxo_by_out_loc`. It is cached to improve performance for
addresses with large UTXO sets. It also stores the `AddressLocation` for each addresses with large UTXO sets. It also stores the `AddressLocation` for each
address, which allows for efficient lookups. address, which allows for efficient lookups.
@ -787,9 +786,9 @@ So they should not be used for consensus-critical checks.
This list includes the `AddressLocation`, if it has not been spent. This list includes the `AddressLocation`, if it has not been spent.
(This duplicate data is small, and helps simplify the code.) (This duplicate data is small, and helps simplify the code.)
- When a block write deletes a UTXO from `utxo_by_outpoint`, - When a block write deletes a UTXO from `utxo_by_out_loc`,
that UTXO location should be deleted from `utxo_by_transparent_addr_loc`. that UTXO location should be deleted from `utxo_by_transparent_addr_loc`.
This is an index optimisation. This is an index optimisation, which does not affect query results.
- `tx_by_transparent_addr_loc` stores transaction locations by address. - `tx_by_transparent_addr_loc` stores transaction locations by address.
This list includes transactions containing spent UTXOs. This list includes transactions containing spent UTXOs.
@ -822,6 +821,8 @@ So they should not be used for consensus-critical checks.
regardless of where they come from. The exception is `sprout_anchors` which also maps regardless of where they come from. The exception is `sprout_anchors` which also maps
the anchor to the matching note commitment tree. This is required to support interstitial the anchor to the matching note commitment tree. This is required to support interstitial
treestates, which are unique to Sprout. treestates, which are unique to Sprout.
**TODO:** store the `Root` hash in `sprout_note_commitment_tree`, and use it to look up the
note commitment tree. This de-duplicates tree state data. But we currently only store one sprout tree by height.
- The value pools are only stored for the finalized tip. - The value pools are only stored for the finalized tip.