update turbine docs for redundant path disabled (#29985)

* update turbine docs for redundant path disabled

* Create data-plane-propagation.png

* PR feedback - scrub out neighborhoods
This commit is contained in:
Brennan 2023-06-21 09:46:17 -07:00 committed by GitHub
parent 42aa5d243c
commit 42ccc5cf40
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
6 changed files with 63 additions and 122 deletions

View File

@ -1,19 +0,0 @@
+------------------------------------------------------------------+
| |
| +-----------------+ Neighborhood 0 +-----------------+ |
| | +--------------------->+ | |
| | Validator 1 | | Validator 2 | |
| | Root | | | |
| +--------+-+------+ +------+-+--------+ |
| | | | | |
| | +-----------------------------+ | | |
| | +------------------------+------+ | |
| | | | | |
+------------|------|------------------------|--------|------------+
| | | |
v v v v
+---------+------+---+ +-+--------+---------+
| | | |
| Neighborhood 1 | | Neighborhood 2 |
| | | |
+--------------------+ +--------------------+

View File

@ -1,28 +0,0 @@
+---------------------------------------------------------------------------------------------------------+
| Neighborhood Above |
| +-----------------+-----------------------+-------------------------+ |
| | | | | |
| | v v v |
| +--------------+-+ +----------------+ +----------------+ +----------------+ |
| | | | | | | | | |
| | Neighbor 1 | | Neighbor 2 | | Neighbor 3 | | Neighbor 4 | |
| | Anchor | | | | | | | |
| +--+-------------+ +---+------------+ +------+---------+ +---+------------+ |
| | | | | |
+---------|-------------------------|---------------------------|---------------------|-------------------+
| | | |
| | | |
| | | |
| | | |
+---------|-------------------------|---------------------------|---------------------|-------------------+
| | | Neighborhood Below | | |
| v v v v |
| +--+-------------+ +---+------------+ +------+---------+ +---+------------+ |
| | | | | | | | | |
| | Neighbor 1 | | Neighbor 2 | | Neighbor 3 | | Neighbor 4 | |
| | Anchor | | | | | | | |
| +--------------+-+ +----------------+ +----------------+ +----------------+ |
| | ^ ^ ^ |
| | | | | |
| +-----------------+-----------------------+-------------------------+ |
+---------------------------------------------------------------------------------------------------------+

View File

@ -1,15 +0,0 @@
+--------------+
| |
+------------+ Leader |
| | |
| +--------------+
|
+------------|-----------------------------------------------------+
| v |
| +--------+--------+ Neighborhood 0 +-----------------+ |
| | +--------------------->+ | |
| | Validator 1 | | Validator 2 | |
| | Root | | | |
| +-----------------+ +-----------------+ |
| |
+------------------------------------------------------------------+

View File

@ -1,18 +0,0 @@
+--------------------+
| |
+--------+ Neighborhood 0 +----------+
| | | |
| +--------------------+ |
v v
+---------+----------+ +----------+---------+
| | | |
| Neighborhood 1 | | Neighborhood 2 |
| | | |
+---+-----+----------+ +----------+-----+---+
| | | |
v v v v
+------------------+-+ +-+------------------+ +------------------+-+ +-+------------------+
| | | | | | | |
| Neighborhood 3 | | Neighborhood 4 | | Neighborhood 5 | | Neighborhood 6 |
| | | | | | | |
+--------------------+ +--------------------+ +--------------------+ +--------------------+

View File

@ -2,70 +2,91 @@
title: Turbine Block Propagation
---
A Solana cluster uses a multi-layer block propagation mechanism called _Turbine_ to broadcast transaction shreds to all nodes with minimal amount of duplicate messages. The cluster divides itself into small collections of nodes, called _neighborhoods_. Each node is responsible for propagating any data it receives on to a small set of nodes in downstream neighborhoods and possibly sharing data with the other nodes in its neighborhood. This way each node only has to communicate with a small number of nodes.
A Solana cluster uses a multi-layer block propagation mechanism called _Turbine_
to broadcast ledger entries to all nodes. The cluster divides itself into layers
of nodes, and each node in a given layer is responsible for propagating any data
it receives on to a small set of nodes in the next downstream layer. This way
each node only has to communicate with a small number of nodes.
## Neighborhood Assignment - Weighted Selection
## Layer Structure
In order for data plane fanout to work, the entire cluster must agree on how the cluster is divided into neighborhoods. To achieve this, all the recognized validator nodes \(the TVU peers\) are sorted by stake and stored in a list. This list is then indexed in different ways to figure out neighborhood boundaries and retransmit peers. For example, the leader will simply select the first `DATA_PLANE_FANOUT` nodes to make up layer 1. These will automatically be the highest stake holders, allowing the heaviest votes to come back to the leader first. Layer 1 and lower-layer nodes use the same logic to find their neighbors and next layer peers.
The leader communicates with a special root node. The root can be thought of as
layer 0 and communicates with layer 1, which is made up of at most
`DATA_PLANE_FANOUT` nodes. If the number of nodes in the cluster is greater than
layer 1, then the data plane fanout mechanism adds layers below. The number of
nodes in each additional layer grows by a factor of `DATA_PLANE_FANOUT`.
To reduce the possibility of attack vectors, each shred is transmitted over a random tree of neighborhoods. Each node uses the same set of nodes representing the cluster. A random tree is generated from the set for each shred using a seed derived from the slot leader id, slot, shred index, and shred type.
A good way to think about this is, layer 0 starts with a single node, layer 1
starts with fanout nodes, and layer 2 will have `fanout * number of nodes in
layer 1` and so on.
## Layer and Neighborhood Structure
### Layer Assignment - Weighted Selection
The leader can be thought of as layer 0 and communicates with layer 1, which is made up of at most `DATA_PLANE_FANOUT` nodes. If this layer 1 is smaller than the number of nodes in the cluster, then the data plane fanout mechanism adds layers below. Subsequent layers follow these constraints to determine layer-capacity: Each neighborhood contains `DATA_PLANE_FANOUT` nodes. Layer 1 starts with 1 neighborhood. The number of nodes in each additional neighborhood/layer grows by a factor of `DATA_PLANE_FANOUT`.
In order for data plane fanout to work, the entire cluster must agree on how the
cluster is divided into layers. To achieve this, all the recognized validator
nodes \(the TVU peers\) are shuffled with a stake weighting and stored in a
list. This list is then indexed in different ways to figure out layer boundaries
and retransmit peers - referred to as the \(turbine tree\). For example, the
list is shuffled and leader selects the first node to be the root node, and the
root node selects the next `DATA_PLANE_FANOUT` nodes to make up layer 1. The
shuffle is biased towards higher staked nodes, allowing heavier votes to come
back to the leader first. Layer 2 and lower-layer nodes use the same logic to
find their next layer peers.
A good way to think about this is, layer 1 starts with 1 neighborhood with fanout nodes, layer 2 adds fanout neighborhoods, each with fanout nodes and layer 3 will have `fanout * number of nodes in layer 2` and so on.
The following diagram shows a three layer cluster with a fanout of 2.
![Two layer cluster with a Fanout of 2](/img/data-plane.svg)
To reduce the possibility of attack vectors, the list is shuffled and indexed on
every shred. The turbine tree is generated from the set of validator nodes for
each shred using a seed derived from the slot leader id, slot, shred index, and
shred type.
### Configuration Values
`DATA_PLANE_FANOUT` - Determines the size of layer 1. Subsequent layers grow by a factor of `DATA_PLANE_FANOUT`. The number of nodes in a neighborhood is equal to the fanout value. Neighborhoods will fill to capacity before new ones are added, i.e if a neighborhood isn't full, it _must_ be the last one.
`DATA_PLANE_FANOUT` - Determines the size of layer 1. Subsequent layers grow by
a factor of `DATA_PLANE_FANOUT`. Layers will fill to capacity before new ones are
added, i.e if a layer isn't full, it _must_ be the last one.
Currently, configuration is set when the cluster is launched. In the future, these parameters may be hosted on-chain, allowing modification on the fly as the cluster sizes change.
Currently, configuration is set when the cluster is launched. In the future,
these parameters may be hosted on-chain, allowing modification on the fly as the
cluster sizes change.
## Shred Propagation Flow
During its slot, the leader node \(layer 0\) makes its initial broadcasts to a special root node sitting atop the turbine tree. This root node is rotated every shred. The root shares data within its neighborhood \(layer 1\). Nodes in this neighborhood then retransmit shreds to one node in some neighborhoods in the next layer \(layer 2\). In general, the layer-1 root/anchor node (first node in the neighborhood, rotated on every shred) shares their data with their neighborhood peers, and every node in layer-1 retransmits to nodes in the next layer, etc, until all nodes in the cluster have received all the shreds.
During its slot, the leader node makes its initial broadcasts to a special root
node \(layer 0\) sitting atop the turbine tree. This root node is rotated every
shred based on the weighted shuffle previously mentioned. The root shares data
with layer 1. Nodes in this layer then retransmit shreds to a subset of nodes in
the next layer \(layer 2\). In general, every node in layer-1 retransmits to a
unique subset of nodes in the next layer, etc, until all nodes in the cluster
have received all the shreds.
As mentioned above, each node in a layer only has to broadcast its shreds to exactly 1 node in some next-layer neighborhoods (and to its neighbors if it is the anchor node), instead of to every TVU peer in the cluster. In this way, each node only has to communicate with a maximum of `2 * DATA_PLANE_FANOUT - 1` nodes if it is the anchor node and `DATA_PLANE_FANOUT` if it is not the anchor node.
To prevent redundant transmission, each node uses the deterministically
generated turbine tree, its own index in the tree, and `DATA_PLANE_FANOUT` to
iterate through the tree and identify downstream nodes. Each node in a layer
only has to broadcast its shreds to a maximum of `DATA_PLANE_FANOUT` nodes in
the next layer instead of to every TVU peer in the cluster.
The following diagram shows how the leader sends shreds with a fanout of 2 to the root from Neighborhood 0 in Layer 1 and how the root from Neighborhood 0 shares its data with its neighbors.
The following diagram shows how shreds propagate through a cluster with 15 nodes
and a fanout of 3.
![Leader sends shreds to Neighborhood 0 in Layer 1](/img/data-plane-seeding.svg)
The following diagram shows how Neighborhood 0 fans out to Neighborhoods 1 and 2.
![Neighborhood 0 Fanout to Neighborhood 1 and 2](/img/data-plane-fanout.svg)
### Neighborhood Interaction
The following diagram shows how two neighborhoods in different layers interact. To cripple a neighborhood, enough nodes \(erasure codes +1\) from the neighborhood above need to fail. Since each neighborhood receives shreds from multiple nodes in a neighborhood in the upper layer, we'd need a big network failure in the upper layers to end up with incomplete data.
![Inner workings of a neighborhood](/img/data-plane-neighborhood.svg)
![Shred propagation through 15 node cluster with fanout of 3](/img/data-plane-propagation.png)
## Calculating the required FEC rate
Turbine relies on retransmission of packets between validators. Due to
retransmission, any network wide packet loss is compounded, and the
probability of the packet failing to reach its destination increases
on each hop. The FEC rate needs to take into account the network wide
packet loss, and the propagation depth.
retransmission, any network wide packet loss is compounded, and the probability
of the packet failing to reach its destination increases on each hop. The FEC
rate needs to take into account the network wide packet loss, and the
propagation depth.
A shred group is the set of data and coding packets that can be used
to reconstruct each other. Each shred group has a chance of failure,
based on the likelyhood of the number of packets failing that exceeds
the FEC rate. If a validator fails to reconstruct the shred group,
then the block cannot be reconstructed, and the validator has to rely
on repair to fixup the blocks.
A shred group is the set of data and coding packets that can be used to
reconstruct each other. Each shred group has a chance of failure, based on the
likelyhood of the number of packets failing that exceeds the FEC rate. If a
validator fails to reconstruct the shred group, then the block cannot be
reconstructed, and the validator has to rely on repair to fixup the blocks.
The probability of the shred group failing can be computed using the
binomial distribution. If the FEC rate is `16:4`, then the group size
is 20, and at least 4 of the shreds must fail for the group to fail.
Which is equal to the sum of the probability of 4 or more trials failing
out of 20.
The probability of the shred group failing can be computed using the binomial
distribution. If the FEC rate is `16:4`, then the group size is 20, and at least
4 of the shreds must fail for the group to fail. Which is equal to the sum of
the probability of 4 or more trials failing out of 20.
Probability of a block succeeding in turbine:

Binary file not shown.

After

Width:  |  Height:  |  Size: 140 KiB