Add proposed design for db_ledger (#2253)
* Add proposed design for db_ledger
This commit is contained in:
parent
7c6dcc8c73
commit
8116fe8def
|
@ -11,7 +11,7 @@ The basic responsibilities of the window and the ledger in a Solana fullnode
|
|||
are:
|
||||
|
||||
1. Window: serve as a temporary, RAM-backed store of blobs of the PoH chain
|
||||
for re-ordering and assembly into contiguous blocks to be sent to the bank
|
||||
for reordering and assembly into contiguous blocks to be sent to the bank
|
||||
for verification.
|
||||
2. Window: serve as a RAM-backed repair facility for other validator nodes,
|
||||
which may query the network for as-yet unreceived blobs.
|
||||
|
@ -90,19 +90,52 @@ preserving the chain of origination.
|
|||
(i.e. dealing with forks) will have to be used for the most recent entries in
|
||||
the EntryTree.
|
||||
|
||||
### EntryTree Design
|
||||
|
||||
1. Entries in the EntryTree are stored as key-value pairs, where the key is the concatenated
|
||||
slot index and blob index for an entry, and the value is the entry data. Note blob indexes are zero-based for each slot (i.e. they're slot-relative).
|
||||
|
||||
2. The EntryTree maintains metadata for each slot, in the `SlotMeta` struct containing:
|
||||
* `slot_index` - The index of this slot
|
||||
* `num_blocks` - The number of blocks in the slot (used for chaining to a previous slot)
|
||||
* `consumed` - The highest blob index `n`, such that for all `m < n`, there exists a blob in this slot with blob index equal to `n` (i.e. the highest consecutive blob index).
|
||||
* `received` - The highest received blob index for the slot
|
||||
* `next_slots` - A list of future slots this slot could chain to. Used when rebuilding
|
||||
the ledger to find possible fork points.
|
||||
* `consumed_ticks` - Tick height of the highest received blob (used to identify when a slot is full)
|
||||
* `is_trunk` - True iff every block from 0...slot forms a full sequence without any holes. We can derive is_trunk for each slot with the following rules. Let slot(n) be the slot with index `n`, and slot(n).contains_all_ticks() is true if the slot with index `n` has all the ticks expected for that slot. Let is_trunk(n) be the statement that "the slot(n).is_trunk is true". Then:
|
||||
|
||||
is_trunk(0)
|
||||
is_trunk(n+1) iff (is_trunk(n) and slot(n).contains_all_ticks()
|
||||
|
||||
3. Chaining - When a blob for a new slot `x` arrives, we check the number of blocks (`num_blocks`) for that new slot (this information is encoded in the blob). We then know that this new slot chains to slot `x - num_blocks`.
|
||||
|
||||
4. Subscriptions - The EntryTree records a set of slots that have been "subscribed" to. This means entries that chain to these slots will be sent on the EntryTree channel for consumption by the ReplayStage. See the `EntryTree APIs` for details.
|
||||
|
||||
5. Update notifications - The EntryTree notifies listeners when slot(n).is_trunk is flipped from false to true for any `n`.
|
||||
|
||||
### EntryTree APIs
|
||||
|
||||
The EntryTree offers a subscription based API that ReplayStage uses to ask for entries it's interested in. The entries will be sent on a channel exposed by the EntryTree. These subscription API's are as follows:
|
||||
1. `fn get_slots_since(slot_indexes: &[u64]) -> Vec<SlotMeta>`: Returns new slots connecting to any element of the list `slot_indexes`.
|
||||
|
||||
2. `fn get_slot_entries(slot_index: u64, entry_start_index: usize, max_entries: Option<u64>) -> Vec<Entry>`: Returns the entry vector for the slot starting with `entry_start_index`, capping the result at `max` if `max_entries == Some(max)`, otherwise, no upper limit on the length of the return vector is imposed.
|
||||
|
||||
Note: Cumulatively, this means that the replay stage will now have to know when a slot is finished, and subscribe to the next slot it's interested in to get the next set of entries. Previously, the burden of chaining slots fell on the EntryTree.
|
||||
|
||||
### Interfacing with Bank
|
||||
|
||||
The bank exposes to replay stage:
|
||||
|
||||
1. prev_id: which PoH chain it's working on as indicated by the id of the last
|
||||
1. `prev_id`: which PoH chain it's working on as indicated by the id of the last
|
||||
entry it processed
|
||||
2. tick_height: the ticks in the PoH chain currently being verified by this
|
||||
2. `tick_height`: the ticks in the PoH chain currently being verified by this
|
||||
bank
|
||||
3. votes: a stack of records that contain
|
||||
3. `votes`: a stack of records that contain:
|
||||
|
||||
1. prev_ids: what anything after this vote must chain to in PoH
|
||||
2. tick height: the tick_height at which this vote was cast
|
||||
3. lockout period: how long a chain must be observed to be in the ledger to
|
||||
1. `prev_ids`: what anything after this vote must chain to in PoH
|
||||
2. `tick_height`: the tick height at which this vote was cast
|
||||
3. `lockout period`: how long a chain must be observed to be in the ledger to
|
||||
be able to be chained below this vote
|
||||
|
||||
Replay stage uses EntryTree APIs to find the longest chain of entries it can
|
||||
|
|
Loading…
Reference in New Issue