Commit Graph

117 Commits

Author SHA1 Message Date
Michael Vines 71efac46cb Hoist keypair() out of some loops 2021-07-01 17:50:04 -07:00
Michael Vines b6792a3328 Add ability to change the validator identity at runtime 2021-07-01 17:50:04 -07:00
Michael Vines bf157506e8 Remove id ref 2021-07-01 17:50:04 -07:00
behzad nouri 9d983a34a0
debug logs when crds table trim failed (#18307)
reports of this error being possibly spammy:
https://discord.com/channels/428295358100013066/689412830075551748/859441080054710293

The commit changes the log level to debug.
Additionally adding a new metric to understand the frequency of this error.
2021-06-29 19:39:46 +00:00
behzad nouri d7b8329b45
removes repeated calls to ClusterInfo::id in iterators and contact-info clone (#18174)
Calling ClusterInfo::id repeatedly in for loops or iterators is
inefficient, because it acquires a lock on ClusterInfo.my_contact_info,
and clones the entire contact-info.
2021-06-23 16:30:14 +00:00
behzad nouri 69a5f0e6cd
filters crds values obtained through gossip by their shred version (#18072)
filter_by_shred_version does not check the shred-version of the owner of
the crds-value. It only checks the shred-version of the node which is
relaying the value:
https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289

So crds-values with different shred versions can still pass through this
function as long as they are relayed by a node with matching shred
version; and so, a single node can bridge different shred values
through-out the cluster.
2021-06-23 14:16:05 +00:00
Michael Vines 84b9de8c18 Shredder no longer holds a keypair 2021-06-21 21:29:52 -07:00
Michael Vines 553fc210f5 Remove duplicated id field 2021-06-21 21:29:52 -07:00
Alexander Meißner 6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
behzad nouri 5a99fa3790
adds mapping from nodes pubkeys to their shred-version (#17940)
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
https://github.com/solana-labs/solana/pull/17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.
2021-06-18 15:56:04 +00:00
sakridge eeee75c5be
Don't use pinned memory when unnecessary (#17832)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.
2021-06-14 16:10:04 +02:00
behzad nouri 985280ec0b
excludes epoch-slots from nodes with unknown or different shred version (#17899)
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
https://github.com/solana-labs/solana/issues/17789
https://github.com/solana-labs/solana/issues/14366#issuecomment-769896036
https://github.com/solana-labs/solana/issues/14366#issuecomment-832754654

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
2021-06-13 14:08:08 +00:00
behzad nouri cab30e2356
parallelizes gossip packets receiver with processing of requests (#17647)
Gossip packet processing is composed of two stages:
  * The first is consuming packets from the socket, deserializing,
    sanitizing and verifying them:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2510-L2521
  * The second is actually processing the requests/messages:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2585-L2605

The former does not acquire any locks and so can be parallelized with
the later, allowing better pipelineing properties and smaller latency in
responding to gossip requests or propagating messages.
2021-06-07 18:36:06 +00:00
behzad nouri 60b0a13444
writes epoch-slots to crds table synchronously (#17719)
epoch-slots may be overwritten before they are written to crds table:
https://github.com/solana-labs/solana/issues/17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
2021-06-04 13:56:51 +00:00
behzad nouri be957f25c9
adds fallback logic if retransmit multicast fails (#17714)
In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733
2021-06-04 12:16:37 +00:00
Tyera Eulberg 3a647c4bea
Rename ValidatorExit and move to sdk (#17728) 2021-06-04 03:06:13 +00:00
Tyera Eulberg 9a5330b7eb
Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00