Commit Graph

39 Commits

Author SHA1 Message Date
behzad nouri 1c7662a37f
asserts that cluster-info keypair is consistent with contact-info id (#29818) 2023-01-24 16:57:55 +00:00
behzad nouri 64c13b74d8
errors out when retransmit loopbacks to the slot leader (#29789)
When broadcasting shreds, turbine excludes the slot leader from the
random shuffle. Doing so, shreds should never loopback to the leader.
If shreds reaching retransmit stage are from the node's own leader slots
they should not be retransmited to any nodes.
2023-01-20 17:20:51 +00:00
behzad nouri 80a39bd6a5
adds feature to (temporarily) drop merkle shreds from testnet (#29711) 2023-01-15 15:41:58 +00:00
behzad nouri 8c212f59ad
renames ContactInfo to LegacyContactInfo (#29566)
Working towards adding a new ContactInfo where new sockets can be
added in a backward compatible way.
2023-01-08 16:00:55 +00:00
behzad nouri 456d06785d
experiments different turbine fanouts for propagating shreds (#29393)
The commit allocates 2% of slots to running experiments with different
turbine fanouts based on the slot number.
The experiment is feature gated with an additional feature to disable
the experiment.
2022-12-26 14:18:56 +00:00
behzad nouri 7d99cddb9f
dedups turbine retransmit peers by tvu socket addresses (#28944)
No need to send duplicate shreds if several nodes have the same tvu
socket address because they are behind a relayer or whatever.
2022-11-28 19:23:02 +00:00
behzad nouri abfb996135
tracks number of staked/stale/dead nodes in turbine cluster-nodes (#27915) 2022-09-19 18:16:04 +00:00
behzad nouri 3b87aa9227
reverts wide fanout in broadcast when the root node is down (#26359)
A change included in
https://github.com/solana-labs/solana/pull/20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.
2022-08-16 19:40:06 +00:00
behzad nouri 88599fd760
skips shreds deserialization before retransmit (#26230)
Fully deserializing shreds in window-service before sending them to
retransmit stage adds latency to shreds propagation.
This commit instead channels through the payload and relies on only
partial deserialization of a few required fields: slot, shred-index,
shred-type.
2022-06-30 12:13:00 +00:00
behzad nouri 67936aaa74
moves Shred::seed to ShredId and adds test coverage (#26251)
Following commits will skip shreds deserializaton before retransmit, and
so we will only have a ShredId and not a fully deserialized shred to
obtain the shuffling seed from.
2022-06-27 17:58:43 +00:00
behzad nouri 47e62add5b
removes feature gate code adding shred-type to shred seed (#25963)
The feature is already activated on all clusters, and does not impact
processing of ledger/snapshots.
2022-06-20 14:39:24 +00:00
behzad nouri b3d1f8d1ac
tracks number of shreds sent and received at different distances from the root (#25989) 2022-06-17 21:33:23 +00:00
behzad nouri cafa85bfbb
includes shred-type when computing turbine broadcast seed (#25556)
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.

This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.

This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
2022-05-25 20:31:53 +00:00
behzad nouri 039488b562
drops redundant turbine propagation path (#24351)
Most nodes in the cluster receive the same shred from two different
nodes: parent, and the first node of their neighborhood:
https://github.com/solana-labs/solana/blob/a8c695ba5/core/src/cluster_nodes.rs#L178-L197

Because of the erasure codings, half of the shreds are already
redundant. So this redundant propagation path will only add extra
overhead.

Additionally the very first node of the broadcast tree has 2x fanout
(i.e. 400 nodes) which adds too much load at one node.

This commit simplifies the broadcast tree by dropping the redundant
propagation path and removing the 2x fanout at root node.
2022-04-19 00:11:29 +00:00
behzad nouri 2b718d00b0 removes legacy compatibility turbine peers shuffle code 2022-04-05 12:04:12 +00:00
behzad nouri d0b850cdd9 removes turbine peers shuffle patch feature 2022-04-05 12:04:12 +00:00
behzad nouri 855801cc95 removes deterministic-shred-seed feature 2022-04-05 12:04:12 +00:00
behzad nouri 7cb3b6cbe2
demotes WeightedShuffle failures to error metrics (#24079)
Since call-sites are calling unwrap anyways, panicking seems too punitive
for our use cases.
2022-04-03 16:20:06 +00:00
Tao Zhu c478fe2047 add timing metrics, some renaming 2022-03-17 19:31:28 -05:00
Tao Zhu fd515097d8 leader qos part 2: add stage to find sender stake, set to packet meta 2022-03-17 19:31:28 -05:00
Jeff Biseda c69e3b73ff
bench get_retransmit_peers (#23292) 2022-03-01 19:10:29 -08:00
behzad nouri dccbddad80
adds reverse lookup index to cluster-nodes (#22892)
retransmit has to exclude slot leader from set of nodes for each shred; 
which currently requires a linear scan:
https://github.com/solana-labs/solana/blob/e3b137066/core/src/cluster_nodes.rs#L238-L242

This commit adds a reverse lookup index to avoid linear scan.
2022-02-02 19:27:50 +00:00
behzad nouri e3b137066d
caches WeightedShuffle struct in ClusterNodes (#22877)
Instead of reconstructing WeightedShuffle struct for each shred
broadcast or retransmit, we can use the same struct with minimal
mutations.
2022-02-02 15:12:26 +00:00
behzad nouri 45e09664b8
removes Rng field from WeightedShuffle struct (#22850) 2022-02-01 15:27:23 +00:00
behzad nouri 604ca9316c
includes zero weighted entries in WeightedShuffle (#22829)
Current WeightedShuffle implementation excludes zero weighted entries
from the shuffle:
https://github.com/solana-labs/solana/blob/13e631dcf/gossip/src/weighted_shuffle.rs#L29-L30

Though mathematically this might make more sense, for our use-cases
(turbine specifically), this results in less efficient code:
https://github.com/solana-labs/solana/blob/13e631dcf/core/src/cluster_nodes.rs#L409-L430

This commit changes the implementation so that zero weighted indices are
also included in the shuffle but appear only at the end after non-zero
weighted indices.
2022-01-31 16:23:50 +00:00
behzad nouri 1297a13586
adds metrics tracking crds writes and votes (#20953) 2021-10-26 13:02:30 +00:00
Michael Vines 350bb561eb Clippy 2021-10-23 08:21:20 +00:00
behzad nouri 0c0384ec32
revises turbine peers shuffling order (#20480)
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).

However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.

This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
  tree.
* Staked nodes are always included regardless of if their contact-info
  is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
2021-10-14 15:09:36 +00:00
behzad nouri 6d9818b8e4
skips retransmit for shreds with unknown slot leader (#19472)
Shreds' signatures should be verified before they reach retransmit
stage, and if the leader is unknown they should fail signature check.
Therefore retransmit-stage can as well expect to know who the slot
leader is and otherwise just skip the shred.

Blockstore checking signature of recovered shreds before sending them to
retransmit stage:
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930

Shred signature verifier:
https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105
2021-09-01 15:44:26 +00:00
behzad nouri 1deb4add81
removes Slot from TransmitShreds (#19327)
An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.
2021-08-20 13:48:33 +00:00
behzad nouri e4be00fece falls back on working-bank if root-bank::epoch-staked-nodes is none
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
2021-08-05 21:47:33 +00:00
behzad nouri eaf927cf49 allows only one thread to update cluster-nodes cache entry for an epoch
If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.

This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.
2021-08-05 21:47:33 +00:00
behzad nouri fb69f45f14 adds fallback & metric for when epoch staked-nodes are none 2021-08-05 21:47:33 +00:00
behzad nouri 50d0e830c9 unifies cluster-nodes computation & caching across turbine stages
Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.
2021-08-05 21:47:33 +00:00
behzad nouri ecc1c7957f implements cluster-nodes cache
Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.
2021-08-05 21:47:33 +00:00
behzad nouri d2d5f36a3c
adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
carllin 588c0464b8
Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
behzad nouri cf31afdd6a
makes CrdsGossip thread-safe (#18615) 2021-07-14 22:27:17 +00:00
behzad nouri 04787be8b1
encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00