Current implementation of weighted_shuffle:
https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37
uses a heuristic which results in biased samples.
For example, if the weights are [1, 10, 100], then the 3rd index should
come first 100 times more often than the 1st index. However,
weighted_shuffle is picking the 3rd index 200+ times more often than the
1st index, showing a disproportional bias in favor of higher weights.
This commit implements weighted shuffle using binary indexed tree to
maintain cumulative sum of weights while sampling. The resulting samples
are demonstrably unbiased and precisely proportional to the weights.
Additionally the iterator interface allows to skip computations when
not all indices are processed.
Of the use cases of weighted_shuffle, changing turbine code requires
feature-gating to keep the cluster in sync. That is not updated in
this commit, but can be done together with future updates to turbine.
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.
Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.
This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
Calling ClusterInfo::id repeatedly in for loops or iterators is
inefficient, because it acquires a lock on ClusterInfo.my_contact_info,
and clones the entire contact-info.
filter_by_shred_version does not check the shred-version of the owner of
the crds-value. It only checks the shred-version of the node which is
relaying the value:
https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289
So crds-values with different shred versions can still pass through this
function as long as they are relayed by a node with matching shred
version; and so, a single node can bridge different shred values
through-out the cluster.
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417
Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.
In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
https://github.com/solana-labs/solana/pull/17899
This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.
epoch-slots may be overwritten before they are written to crds table:
https://github.com/solana-labs/solana/issues/17711
This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
If the crds entry belongs to the caller itself, then the caller will
always have the more recent version of it, regardless of it being
filtered out by the bloom filter or not.
The exception is node-instance types which are meant to detect duplicate
running instances, and those are exempted.
* Move gossip modules to solana-gossip
* Update Protocol abi digest due to move
* Move gossip benches and hook up CI
* Remove unneeded Result entries
* Single use statements
--gossip-port now specifies exactly that, the gossip port to use. The
new --gossip-host argument can be used to specify the DNS name/IP
address for gossip if --entrypoint is not supplied (when --entrypoint is
supplied, the gossip address is automatically set to the node's ip
address as observed by the entrypoint)
* Revert "Revert "Default log level to to RUST_LOG=solana=info (#5296)" (#5302)"
This reverts commit 7796e87814.
* Default to error logs, override with info only for those programs that need it
* Correctly remove replicator from data plane after its done repairing
* Update discover to report nodes and replicators separately
* Fix print and condition to be spy