Add a design for leader schedule rotation and genesis. (#2714)

Leader schedule rotation.
This commit is contained in:
anatoly yakovenko 2019-02-15 16:34:34 -08:00 committed by GitHub
parent 573116e259
commit c74b8b6df3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 124 additions and 9 deletions

View File

@ -3,17 +3,102 @@
At any given moment, a cluster expects only one fullnode to produce ledger
entries. By having only one leader at a time, all validators are able to replay
identical copies of the ledger. The drawback of only one leader at a time,
however, is that a malicious leader is cabable of censoring votes and
however, is that a malicious leader is capable of censoring votes and
transactions. Since censoring cannot be distinguished from the network dropping
packets, the cluster cannot simply elect a single node to hold the leader role
indefinitely. Instead, the cluster minimizes the influence of a malcioius
indefinitely. Instead, the cluster minimizes the influence of a malicious
leader by rotating which node takes the lead.
Each validator selects the expected leader using the same algorithm, described
below. When the validator receives a new signed ledger entry, it can be certain
that entry was produced by the expected leader.
that entry was produced by the expected leader. The order of slots which each
leader is assigned a slot is called a *leader schedule*.
## Leader Schedule Generation
## Leader Schedule Rotation
A validator rejects blocks that are not signed by the *slot leader*. The list
of identities of all slot leaders is called a *leader schedule*. The leader
schedule is recomputed locally and periodically. It assigns slot leaders for a
duration of time called an _epoch_. The schedule must be computed far in advance
of the slots it assigns, such that the ledger state it uses to compute the
schedule is finalized. That duration is called the *leader schedule offset*.
Solana sets the offset to the duration of slots until the next epoch. That is,
the leader schedule for an epoch is calculated from the ledger state at the
start of the previous epoch. The offset of one epoch is fairly arbitrary and
assumed to be sufficiently long such that all validators will have finalized
their ledger state before the next schedule is generated. A cluster may choose
to shorten the offset to reduce the time between stake changes and leader
schedule updates.
While operating without partitions lasting longer than an epoch, the schedule
only needs to be generated when the root fork crosses the epoch boundary. Since
the schedule is for the next epoch, any new stakes committed to the root fork
will not be active until the next epoch. The block used for generating the
leader schedule is the first block to cross the epoch boundary.
Without a partition lasting longer than an epoch, the cluster will work as
follows:
1. A validator continuously updates its own root fork as it votes.
2. The validator updates its leader schedule each time the slot height crosses
an epoch boundary.
For example:
The epoch duration is 100 slots. The root fork is updated from fork computed at
slot height 99 to a fork computed at slot height 102. Forks with slots at height
100,101 were skipped because of failures. The new leader schedule is computed
using fork at slot height 102. It is active from slot 200 until it is updated
again.
No inconsistency can exist because every validator that is voting with the
cluster has skipped 100 and 101 when its root passes 102. All validators,
regardless of voting pattern, would be committing to a root that is either 102,
or a descendant of 102.
### Leader Schedule Rotation with Epoch Sized Partitions.
The duration of the leader schedule offset has a direct relationship to the
likelihood of a cluster having an inconsistent view of the correct leader
schedule.
Consider the following scenario:
Two partitions that are generating half of the blocks each. Neither is coming
to a definitive supermajority fork. Both will cross epoch 100 and 200 without
actually committing to a root and therefore a cluster wide commitment to a new
leader schedule.
In this unstable scenario, multiple valid leader schedules exist.
* A leader schedule is generated for every fork whose direct parent is in the
previous epoch.
* The leader schedule is valid after the start of the next epoch for descendant
forks until it is updated.
Each partition's schedule will diverge after the partition lasts more than an
epoch. For this reason, the epoch duration should be selected to be much much
larger then slot time and the expected length for a fork to be committed to
root.
After observing the cluster for a sufficient amount of time, the leader schedule
offset can be selected based on the median partition duration and its standard
deviation. For example, an offset longer then the median partition duration
plus six standard deviations would reduce the likelihood of an inconsistent
ledger schedule in the cluster to 1 in 1 million.
## Leader Schedule Generation at Genesis
The genesis block declares the first leader for the first epoch. This leader
ends up scheduled for the first two epochs because the leader schedule is also
generated at slot 0 for the next epoch. The length of the first two epochs can
be specified in the genesis block as well. The minimum length of the first
epochs must be greater than or equal to the maximum rollback depth as defined in
[fork selection](fork-selection.md).
## Leader Schedule Generation Algorithm
Leader schedule is generated using a predefined seed. The process is as follows:
@ -27,12 +112,42 @@ Leader schedule is generated using a predefined seed. The process is as follows
stake-weighted ordering.
5. This ordering becomes valid after a cluster-configured number of ticks.
## Schedule Attack Vectors
### Seed
The seed that is selected is predictable but unbiasable. There is no grinding
attack to influence its outcome. The active set, however, can be biased by a
leader by censoring validator votes. To reduce the likelihood of censorship,
the active set is sampled many slots in advance, such that votes will have been
collected by multiple leaders. If even one node is honest, the malicious
leaders will not be able to use censorship to influence the leader schedule.
attack to influence its outcome.
### Active Set
A leader can bias the active set by censoring validator votes. Two possible
ways exist for leaders to censor the active set:
* Ignore votes from validators
* Refuse to vote for blocks with votes from validators
To reduce the likelihood of censorship, the active set is calculated at the
leader schedule offset boundary over an *active set sampling duration*. The
active set sampling duration is long enough such that votes will have been
collected by multiple leaders.
### Staking
Leaders can censor new staking transactions or refuse to validate blocks with
new stakes. This attack is similar to censorship of validator votes.
### Validator operational key loss
Leaders and validators are expected to use ephemeral keys for operation, and
stake owners authorize the validators to do work with their stake via
delegation.
The cluster should be able to recover from the loss of all the ephemeral keys
used by leaders and validators, which could occur through a common software
vulnerability shared by all the nodes. Stake owners should be able to vote
directly co-sign a validator vote even though the stake is currently delegated
to a validator.
## Appending Entries