Add design proposal for reliable vote transmission (#2601)

* reliable vote transmission design proposal

* summary

* comments
This commit is contained in:
anatoly yakovenko 2019-01-31 07:34:49 -08:00 committed by GitHub
parent 2754ceec60
commit e104941569
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 125 additions and 0 deletions

View File

@ -29,6 +29,7 @@
- [Fork Selection](fork-selection.md)
- [Blocktree](blocktree.md)
- [Data Plane Fanout](data-plane-fanout.md)
- [Reliable Vote Transmission](reliable-vote-transmission.md)
- [Economic Design](ed_overview.md)
- [Validation-client Economics](ed_validation_client_economics.md)

View File

@ -0,0 +1,124 @@
# Reliable Vote Transmission
Validator votes are messages that have a critical function for consensus and
continuous operation of the network. Therefore it is critical that they are
reliably delivered and encoded into the ledger.
## Challenges
1. Leader rotation is triggered by PoH, which is clock with high drift. So many
nodes are likely to have an incorrect view if the next leader is active in
realtime or not.
2. The next leader may be easily be flooded. Thus a DDOS would not only prevent
delivery of regular transactions, but also consensus messages.
3. UDP is unreliable, and our asynchronous protocol requires any message that is
transmitted to be retransmitted until it is observed in the ledger.
Retransmittion could potentially cause an unintentional *thundering herd*
against the leader with a large number of validators. Worst case flood would be
`(num_nodes * num_retransmits)`.
4. Tracking if the vote has been transmitted or not via the ledger does not
guarantee it will appear in a confirmed block. The current observed block may
be unrolled. Validators would need to maintain state for each vote and fork.
## Design
1. Send votes as a push message through gossip. This ensures delivery of the
vote to all the next leaders, not just the next future one.
2. Leaders will read the Crds table for new votes and encode any new received
votes into the blocks they propose. This allows for validator votes to be
included in rollback forks by all the future leaders.
3. Validators that receive votes in the ledger will add them to their local crds
table, not as a push request, but simply add them to the table. This shortcuts
the push message protocol, so the validation messages do not need to be
retransmitted twice around the network.
4. CrdsValue for vote should look like this ``` Votes(Vec<Transaction>) ```
Each vote transaction should maintain a `wallclock` in its userdata. The merge
strategy for Votes will keep the last N set of votes as configured by the local
client. For push/pull the vector is traversed recursively and each Transaction
is treated as an individual CrdsValue with its own local wallclock and
signature.
Gossip is designed for efficient propagation of state. Messages that are sent
through gossip-push are batched and propagated with a minimum spanning tree to
the rest of the network. Any partial failures in the tree are actively repaired
with the gossip-pull protocol while minimizing the amount of data transfered
between any nodes.
## How this design solves the Challenges
1. Because there is no easy way for validators to be in sync with leaders on the
leader's "active" state, gossip allows for eventual delivery regardless of that
state.
2. Gossip will deliver the messages to all the subsequent leaders, so if the
current leader is flooded the next leader would have already received these
votes and is able to encode them.
3. Gossip minimizes the number of requests through the network by maintaining an
efficient spanning tree, and using bloom filters to repair state. So retransmit
back-off is not necessary and messages are batched.
4. Leaders that read the crds table for votes will encode all the new valid
votes that appear in the table. Even if this leader's block is unrolled, the
next leader will try to add the same votes without any additional work done by
the validator. Thus ensuring not only eventual delivery, but eventual encoding
into the ledger.
## Performance
1. Worst case propagation time to the next leader is Log(N) hops with a base
depending on the fanout. With our current default fanout of 6, it is about 6
hops to 20k nodes.
2. The leader should receive 20k validation votes aggregated by gossip-push into
64kb blobs. Which would reduce the number of packets for 20k network to 80
blobs.
3. Each validators votes is replicated across the entire network. To maintain a
queue of 5 previous votes the Crds table would grow by 25 megabytes. `(20,000
nodes * 256 bytes * 5)`.
## Two step implementation rollout
Initially the network can perform reliably with just 1 vote transmitted and
maintained through the network with the current Vote implementation. For small
networks a fanout of 6 is sufficient. With small network the memory and push
overhead is minor.
### Sub 1k validator network
1. Crds just maintains the validators latest vote.
2. Votes are pushed and retransmitted regardless if they are appearing in the
ledger.
3. Fanout of 6.
* Worst case 256kb memory overhead per node.
* Worst case 4 hops to propagate to every node.
* Leader should receive the entire validator vote set in 4 push message blobs.
### Sub 20k network
Everything above plus the following:
1. CRDS table maintains a vector of 5 latest validator votes.
2. Votes encode a wallclock. CrdsValue::Votes is a type that recurses into the
transaction vector for all the gossip protocols.
3. Increase fanout to 20.
* Worst case 25mb memory overhead per node.
* Sub 4 hops worst case to deliver to the entire network.
* 80 blobs received by the leader for all the validator messages.