Add proposed design for db_ledger (#2253)

* Add proposed design for db_ledger
This commit is contained in:
carllin 2019-01-03 14:12:55 -08:00 committed by GitHub
parent 7c6dcc8c73
commit 8116fe8def
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 40 additions and 7 deletions

View File

@ -11,7 +11,7 @@ The basic responsibilities of the window and the ledger in a Solana fullnode
are:
1. Window: serve as a temporary, RAM-backed store of blobs of the PoH chain
for re-ordering and assembly into contiguous blocks to be sent to the bank
for reordering and assembly into contiguous blocks to be sent to the bank
for verification.
2. Window: serve as a RAM-backed repair facility for other validator nodes,
which may query the network for as-yet unreceived blobs.
@ -90,19 +90,52 @@ preserving the chain of origination.
(i.e. dealing with forks) will have to be used for the most recent entries in
the EntryTree.
### EntryTree Design
1. Entries in the EntryTree are stored as key-value pairs, where the key is the concatenated
slot index and blob index for an entry, and the value is the entry data. Note blob indexes are zero-based for each slot (i.e. they're slot-relative).
2. The EntryTree maintains metadata for each slot, in the `SlotMeta` struct containing:
* `slot_index` - The index of this slot
* `num_blocks` - The number of blocks in the slot (used for chaining to a previous slot)
* `consumed` - The highest blob index `n`, such that for all `m < n`, there exists a blob in this slot with blob index equal to `n` (i.e. the highest consecutive blob index).
* `received` - The highest received blob index for the slot
* `next_slots` - A list of future slots this slot could chain to. Used when rebuilding
the ledger to find possible fork points.
* `consumed_ticks` - Tick height of the highest received blob (used to identify when a slot is full)
* `is_trunk` - True iff every block from 0...slot forms a full sequence without any holes. We can derive is_trunk for each slot with the following rules. Let slot(n) be the slot with index `n`, and slot(n).contains_all_ticks() is true if the slot with index `n` has all the ticks expected for that slot. Let is_trunk(n) be the statement that "the slot(n).is_trunk is true". Then:
is_trunk(0)
is_trunk(n+1) iff (is_trunk(n) and slot(n).contains_all_ticks()
3. Chaining - When a blob for a new slot `x` arrives, we check the number of blocks (`num_blocks`) for that new slot (this information is encoded in the blob). We then know that this new slot chains to slot `x - num_blocks`.
4. Subscriptions - The EntryTree records a set of slots that have been "subscribed" to. This means entries that chain to these slots will be sent on the EntryTree channel for consumption by the ReplayStage. See the `EntryTree APIs` for details.
5. Update notifications - The EntryTree notifies listeners when slot(n).is_trunk is flipped from false to true for any `n`.
### EntryTree APIs
The EntryTree offers a subscription based API that ReplayStage uses to ask for entries it's interested in. The entries will be sent on a channel exposed by the EntryTree. These subscription API's are as follows:
1. `fn get_slots_since(slot_indexes: &[u64]) -> Vec<SlotMeta>`: Returns new slots connecting to any element of the list `slot_indexes`.
2. `fn get_slot_entries(slot_index: u64, entry_start_index: usize, max_entries: Option<u64>) -> Vec<Entry>`: Returns the entry vector for the slot starting with `entry_start_index`, capping the result at `max` if `max_entries == Some(max)`, otherwise, no upper limit on the length of the return vector is imposed.
Note: Cumulatively, this means that the replay stage will now have to know when a slot is finished, and subscribe to the next slot it's interested in to get the next set of entries. Previously, the burden of chaining slots fell on the EntryTree.
### Interfacing with Bank
The bank exposes to replay stage:
1. prev_id: which PoH chain it's working on as indicated by the id of the last
1. `prev_id`: which PoH chain it's working on as indicated by the id of the last
entry it processed
2. tick_height: the ticks in the PoH chain currently being verified by this
2. `tick_height`: the ticks in the PoH chain currently being verified by this
bank
3. votes: a stack of records that contain
3. `votes`: a stack of records that contain:
1. prev_ids: what anything after this vote must chain to in PoH
2. tick height: the tick_height at which this vote was cast
3. lockout period: how long a chain must be observed to be in the ledger to
1. `prev_ids`: what anything after this vote must chain to in PoH
2. `tick_height`: the tick height at which this vote was cast
3. `lockout period`: how long a chain must be observed to be in the ledger to
be able to be chained below this vote
Replay stage uses EntryTree APIs to find the longest chain of entries it can