diff --git a/book/src/cluster/turbine-block-propagation.md b/book/src/cluster/turbine-block-propagation.md index c792cdad34..cbccc843bc 100644 --- a/book/src/cluster/turbine-block-propagation.md +++ b/book/src/cluster/turbine-block-propagation.md @@ -36,6 +36,59 @@ Finally, the following diagram shows a two layer cluster with a Fanout of 2. Currently, configuration is set when the cluster is launched. In the future, these parameters may be hosted on-chain, allowing modification on the fly as the cluster sizes change. +## Calcuating the required FEC rate + +Turbine relies on retransmission of packets between validators. Due to +retransmission, any network wide packet loss is compounded, and the +probability of the packet failing to reach is destination increases +on each hop. The FEC rate needs to take into account the network wide +packet loss, and the propagation depth. + +A shred group is the set of data and coding packets that can be used +to reconstruct each other. Each shred group has a chance of failure, +based on the likelyhood of the number of packets failing that exceeds +the FEC rate. If a validator fails to reconstruct the shred group, +then the block cannot be reconstructed, and the validator has to rely +on repair to fixup the blocks. + +The probability of the shred group failing can be computed using the +binomial distribution. If the FEC rate is `16:4`, then the group size +is 20, and at least 4 of the shreds must fail for the group to fail. +Which is equal to the sum of the probability of 4 or more trails failing +out of 20. + +Probability of a block succeeding in turbine: + +* Probability of packet failure: `P = 1 - (1 - network_packet_loss_rate)^2` +* FEC rate: `K:M` +* Number of trials: `N = K + M` +* Shred group failure rate: `S = SUM of i=0 -> M for binomial(prob_failure = P, trials = N, failures = i)` +* Shreds per block: `G` +* Block success rate: `B = (1 - S) ^ (G / N) ` +* Binomial distribution for exactly `i` results with probability of P in N trials is defined as `(N choose i) * P^i * (1 - P)^(N-i)` + +For example: + +* Network packet loss rate is 15%. +* 50kpts network generates 6400 shreds per second. +* FEC rate increases the total shres per block by the FEC ratio. + +With a FEC rate: `16:4` +* `G = 8000` +* `P = 1 - 0.85 * 0.85 = 1 - 0.7225 = 0.2775` +* `S = SUM of i=0 -> 4 for binomial(prob_failure = 0.2775, trials = 20, failures = i) = 0.689414` +* `B = (1 - 0.689) ^ (8000 / 20) = 10^-203` + +With FEC rate of `16:16` +* `G = 12800` +* `S = SUM of i=0 -> 32 for binomial(prob_failure = 0.2775, trials = 64, failures = i) = 0.002132` +* `B = (1 - 0.002132) ^ (12800 / 32) = 0.42583` + +With FEC rate of `32:32` +* `G = 12800` +* `S = SUM of i=0 -> 32 for binomial(prob_failure = 0.2775, trials = 64, failures = i) = 0.000048` +* `B = (1 - 0.000048) ^ (12800 / 64) = 0.99045` + ## Neighborhoods The following diagram shows how two neighborhoods in different layers interact. To cripple a neighborhood, enough nodes \(erasure codes +1\) from the neighborhood above need to fail. Since each neighborhood receives shreds from multiple nodes in a neighborhood in the upper layer, we'd need a big network failure in the upper layers to end up with incomplete data.