Replace the leader rotation chapter with the latest RFC

The content that was originally copied was split into multiple
RFCs, leaving the book copy to bitrot.
This commit is contained in:
Greg Fitzgerald 2018-12-12 10:04:39 -07:00
parent cefbb7c27d
commit 13d4e3f29f
2 changed files with 42 additions and 158 deletions

View File

@ -1,116 +1,53 @@
# Leader Rotation
A property of any permissionless blockchain is that the entity choosing the next block is randomly selected. In proof of stake systems,
that entity is typically called the "leader" or "block producer." In Solana, we call it the leader. Under the hood, a leader is
simply a mode of the fullnode. A fullnode runs as either a leader or validator. In this chapter, we describe how a fullnode determines
what node is the leader, how that mechanism may choose different leaders at the same time, and if so, how the system converges in response.
At any given moment, a cluster expects only one fullnode to produce ledger
entries. By having only one leader at a time, all validators are able to replay
identical copies of the ledger. The drawback of only one leader at a time,
however, is that a malicious leader is cabable of censoring votes and
transactions. Since censoring cannot be distinguished from the network dropping
packets, the cluster cannot simply elect a single node to hold the leader role
indefinitely. Instead, the cluster minimizes the influence of a malcioius
leader by rotating which node takes the lead.
## Leader Seed Generation
Each validator selects the expected leader using the same algorithm, described
below. When the validator receives a new signed ledger entry, it can be certain
that entry was produced by the expected leader.
Leader scheduling is decided via a random seed. The process is as follows:
## Leader Schedule Generation
1. Periodically, at a specific PoH tick count, select the signatures of the votes that made up the last supermajority
2. Concatenate the signatures
3. Hash the resulting string for `N` counts
4. The resulting hash is the random seed for `M` counts, `M` leader slots, where M > N
Leader schedule is generated using a predefined seed. The process is as follows:
## Leader Rotation
1. Periodically use the PoH tick height (a monotonically increasing counter) to
seed a stable pseudo-random algorithm.
2. At that height, sample the bank for all the staked accounts with leader
identities that have voted within a cluster-configured number of ticks. The
sample is called the *active set*.
3. Sort the active set by stake weight.
4. Use the random seed to select nodes weighted by stake to create a
stake-weighted ordering.
5. This ordering becomes valid after a cluster-configured number of ticks.
1. The leader is chosen via a random seed generated from stake weights and votes (the leader schedule)
2. The leader is rotated every `T` PoH ticks (leader slot), according to the leader schedule
3. The schedule is applicable for `M` voting rounds
The seed that is selected is predictable but unbiasable. There is no grinding
attack to influence its outcome. The active set, however, can be biased by a
leader by censoring validator votes. To reduce the likelihood of censorship,
the active set is sampled many slots in advance, such that votes will have been
collected by multiple leaders. If even one node is honest, the malicious
leaders will not be able to use censorship to influence the leader schedule.
Leader's transmit for a count of `T` PoH ticks. When `T` is reached all the validators should switch to the next scheduled leader. To schedule leaders, the supermajority + `M` nodes are shuffled using the above calculated random seed.
## Appending Entries
All `T` ticks must be observed from the current leader for that part of PoH to be accepted by the network. If `T` ticks (and any intervening transactions) are not observed, the network optimistically fills in the `T` ticks, and continues with PoH from the next leader.
A leader schedule is split into *slots*, where each slot has a duration of `T`
PoH ticks.
## Partitions, Forks
A leader transmits entries during its slot. After `T` ticks, all the
validators switch to the next scheduled leader. Validators must ignore entries
sent outside a leader's assigned slot.
Forks can arise at PoH tick counts that correspond to leader rotations, because leader nodes may or may not have observed the previous leader's data. These empty ticks are generated by all nodes in the network at a network-specified rate for hashes-per-tick `Z`.
There are only two possible versions of the PoH during a voting round: PoH with `T` ticks and entries generated by the current leader, or PoH with just ticks. The "just ticks" version of the PoH can be thought of as a virtual ledger, one that all nodes in the network can derive from the last tick in the previous slot.
Validators can ignore forks at other points (e.g. from the wrong leader), or slash the leader responsible for the fork.
Validators vote on the longest chain that contains their previous vote, or a longer chain if the lockout on their previous vote has expired.
#### Validator's View
##### Time Progression
The diagram below represents a validator's view of the PoH stream with possible forks over time. L1, L2, etc. are leader slots, and `E`s represent entries from that leader during that leader's slot. The `x`s represent ticks only, and time flows downwards in the diagram.
<img alt="Leader scheduler" src="img/leader-scheduler.svg" class="center"/>
Note that an `E` appearing on 2 branches at the same slot is a slashable condition, so a validator observing `L3` and `L3'` can slash L3 and safely choose `x` for that slot. Once a validator observes a supermajority vote on any branch, other branches can be discarded below that tick count. For any slot, validators need only consider a single "has entries" chain or a "ticks only" chain.
##### Time Division
It's useful to consider leader rotation over PoH tick count as time division of the job of encoding state for the network. The following table presents the above tree of forks as a time-divided ledger.
leader slot | L1 | L2 | L3 | L4 | L5
-------|----|----|----|----|----
data | E1| E2 | E3 | E4 | E5
ticks since prev | | | | x | xx
Note that only data from leader `L3` will be accepted during leader slot
`L3`. Data from `L3` may include "catchup" ticks back to a slot other than
`L2` if `L3` did not observe `L2`'s data. `L4` and `L5`'s transmissions
include the "ticks since prev" PoH entries.
This arrangement of the network data streams permits nodes to save exactly this
to the ledger for replay, restart, and checkpoints.
#### Leader's View
When a new leader begins a slot, it must first transmit any PoH (ticks)
required to link the new slot with the most recently observed and voted
slot.
## Examples
### Small Partition
1. Network partition `M` occurs for 10% of the nodes
2. The larger partition `K`, with 90% of the stake weight continues to operate as
normal
3. `M` cycles through the ranks until one of them is leader, generating ticks for
slots where the leader is in `K`.
4. `M` validators observe 10% of the vote pool, confirmation is not reached.
5. `M` and `K` reconnect.
6. `M` validators cancel their votes on `M`, which has not reached confirmation, and
re-cast on `K` (after their vote lockout on `M`).
### Leader Timeout
1. Next rank leader node `V` observes a timeout from current leader `A`, fills in
`A`'s slot with virtual ticks and starts sending out entries.
2. Nodes observing both streams keep track of the forks, waiting for:
* their vote on leader `A` to expire in order to be able to vote on `B`
* a supermajority on `A`'s slot
3. If the first case occurs, leader `B`'s slot is filled with ticks. if the
second case occurs, A's slot is filled with ticks
4. Partition is resolved just like in the [Small Partition](#small-parition)
above
## Network Variables
`A` - name of a node
`B` - name of a node
`K` - number of nodes in the supermajority to whom leaders broadcast their
PoH hash for validation
`M` - number of nodes outside the supermajority to whom leaders broadcast their
PoH hash for validation
`N` - number of voting rounds for which a leader schedule is considered before
a new leader schedule is used
`T` - number of PoH ticks per leader slot (also voting round)
`V` - name of a node that will create virtual ticks
`Z` - number of hashes per PoH tick
All `T` ticks must be observed by the next leader for it to build its own
entries on. If entries are not observed (leader is down) or entries are invalid
(leader is buggy or malicious), the next leader must produce ticks to
fill the previous leader's slot. Note that the next leader should do repair
requests in parallel, and postpone sending ticks until it is confident other
validators also failed to observe the previous leader's entries. If a leader
incorrectly builds on its own ticks, the leader following it must replace all
its ticks.

View File

@ -1,53 +0,0 @@
# Leader Rotation
At any given moment, a cluster expects only one fullnode to produce ledger
entries. By having only one leader at a time, all validators are able to replay
identical copies of the ledger. The drawback of only one leader at a time,
however, is that a malicious leader is cabable of censoring votes and
transactions. Since censoring cannot be distinguished from the network dropping
packets, the cluster cannot simply elect a single node to hold the leader role
indefinitely. Instead, the cluster minimizes the influence of a malcioius
leader by rotating which node takes the lead.
Each validator selects the expected leader using the same algorithm, described
below. When the validator receives a new signed ledger entry, it can be certain
that entry was produced by the expected leader.
## Leader Schedule Generation
Leader schedule is generated using a predefined seed. The process is as follows:
1. Periodically use the PoH tick height (a monotonically increasing counter) to
seed a stable pseudo-random algorithm.
2. At that height, sample the bank for all the staked accounts with leader
identities that have voted within a cluster-configured number of ticks. The
sample is called the *active set*.
3. Sort the active set by stake weight.
4. Use the random seed to select nodes weighted by stake to create a
stake-weighted ordering.
5. This ordering becomes valid after a cluster-configured number of ticks.
The seed that is selected is predictable but unbiasable. There is no grinding
attack to influence its outcome. The active set, however, can be biased by a
leader by censoring validator votes. To reduce the likelihood of censorship,
the active set is sampled many slots in advance, such that votes will have been
collected by multiple leaders. If even one node is honest, the malicious
leaders will not be able to use censorship to influence the leader schedule.
## Appending Entries
A leader schedule is split into *slots*, where each slot has a duration of `T`
PoH ticks.
A leader transmits entries during its slot. After `T` ticks, all the
validators switch to the next scheduled leader. Validators must ignore entries
sent outside a leader's assigned slot.
All `T` ticks must be observed by the next leader for it to build its own
entries on. If entries are not observed (leader is down) or entries are invalid
(leader is buggy or malicious), the next leader must produce ticks to
fill the previous leader's slot. Note that the next leader should do repair
requests in parallel, and postpone sending ticks until it is confident other
validators also failed to observe the previous leader's entries. If a leader
incorrectly builds on its own ticks, the leader following it must replace all
its ticks.