In order to debug this panic on the clusters:
panicked at 'assertion failed: (vote_index as usize) <
MAX_LOCKOUT_HISTORY', core/src/cluster_info.rs:1012:9
Clusters are kept separate using the shred-versions obtained from
contact-infos. However, this mechanism breaks if there are 2 instances
of the same identity key running on different clusters, because then one
of the two contact-infos have the right shred-version.
If a node has the contact-info with the matching shred-version, then it
will pass all associated crds values even if they belong to the other
instance. So the shred-version check breaks.
As a result we cannot support 2 instances of the same identity key
running on different clusters. To prevent that, this commit is exempting
node-instances from shred-version check so that they are always
propagated across clusters and halt one of the running duplicate
instances.
* Version transaction message and add new message format
* Update abi digest due to message path change
* Update v0.rs
Fix comment
* Update original.rs
* Update message versions name and address map indexes field name
* s/original/legacy
* update comment
* cargo fmt
* Update abi digest due to legacy rename
This commit adds CrdsEntry trait which allows generic lookups into crds
table. For example to get ContactInfo or LowestSlot associated with a
Pubkey, the lookup code would be respectively:
crds.get::<&ContactInfo>(pubkey)
crds.get::<&LowestSlot>(pubkey)
* Fix link target in doc comment
* Fix formatting of log examples in process_instruction
* Fix doc markdown in solana-gossip
* Fix doc markdown in solana-runtime
* Escape square braces in doc comments to avoid warnings
* Surround 'account references' doc items in code spans to avoid warnings
* Fix code block in loader_upgradeable_instruction
* Fix doctest for loader_upgradable_instruction
push_lowest_slot cannot sign the new crds-value unless the id (pubkey)
argument passed-in is the same pubkey as in ClusterInfo::keypair(), in
which case the id argument is redundant:
https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845
Additionally, the lookup is done with self.id(), but insert is done with
the id argument, which is logically a bug.
ClusterInfo is the gateway to CrdsGossip function calls, and it already
has node's pubkey and shred version (full ContactInfo and Keypair in
fact).
Duplicating these data in CrdsGossip adds redundancy and possibility for
bugs should they not be consistent with ClusterInfo.
Current implementation of weighted_shuffle:
https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37
uses a heuristic which results in biased samples.
For example, if the weights are [1, 10, 100], then the 3rd index should
come first 100 times more often than the 1st index. However,
weighted_shuffle is picking the 3rd index 200+ times more often than the
1st index, showing a disproportional bias in favor of higher weights.
This commit implements weighted shuffle using binary indexed tree to
maintain cumulative sum of weights while sampling. The resulting samples
are demonstrably unbiased and precisely proportional to the weights.
Additionally the iterator interface allows to skip computations when
not all indices are processed.
Of the use cases of weighted_shuffle, changing turbine code requires
feature-gating to keep the cluster in sync. That is not updated in
this commit, but can be done together with future updates to turbine.
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.
Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.
This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
Calling ClusterInfo::id repeatedly in for loops or iterators is
inefficient, because it acquires a lock on ClusterInfo.my_contact_info,
and clones the entire contact-info.
filter_by_shred_version does not check the shred-version of the owner of
the crds-value. It only checks the shred-version of the node which is
relaying the value:
https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289
So crds-values with different shred versions can still pass through this
function as long as they are relayed by a node with matching shred
version; and so, a single node can bridge different shred values
through-out the cluster.