Commit Graph

162 Commits

Author SHA1 Message Date
Michael Vines 84b9de8c18 Shredder no longer holds a keypair 2021-06-21 21:29:52 -07:00
Michael Vines 553fc210f5 Remove duplicated id field 2021-06-21 21:29:52 -07:00
behzad nouri 598093b5db adds shred-version to ip-echo-server response
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.

In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.
2021-06-21 19:37:16 +00:00
Alexander Meißner 789f33e8db chore: cargo fmt 2021-06-18 10:42:46 -07:00
Alexander Meißner 6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
behzad nouri 5a99fa3790
adds mapping from nodes pubkeys to their shred-version (#17940)
Crds values of nodes with different shred versions are creeping into
gossip table resulting in runtime issues as the one addressed in:
https://github.com/solana-labs/solana/pull/17899

This commit works towards enforcing more checks and filtering based on
shred version by adding necessary mapping and api to gossip table.
Once populated, pubkey->shred-version mapping persists as long as there
are any values associated with the pubkey.
2021-06-18 15:56:04 +00:00
sakridge eeee75c5be
Don't use pinned memory when unnecessary (#17832)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.
2021-06-14 16:10:04 +02:00
behzad nouri cca46308bc
short cuts expiration check if origin's contact-info is still valid (#17918)
Crds::find_old_labels can skip checking values timestamps if the
origin's contact info hasn't expired yet:
https://github.com/solana-labs/solana/blob/985280ec0/gossip/src/crds.rs#L394-L408
2021-06-13 19:47:07 +00:00
behzad nouri 985280ec0b
excludes epoch-slots from nodes with unknown or different shred version (#17899)
Inspecting TDS gossip table shows that crds values of nodes with
different shred-versions are creeping in. Their epoch-slots are
accumulated in ClusterSlots causing bogus slots very far from current
root which are not purged and so cause ClusterSlots keep consuming more
memory:
https://github.com/solana-labs/solana/issues/17789
https://github.com/solana-labs/solana/issues/14366#issuecomment-769896036
https://github.com/solana-labs/solana/issues/14366#issuecomment-832754654

This commit updates ClusterInfo::get_epoch_slots, and discards entries
from nodes with unknown or different shred-version.

Follow up commits will patch gossip not to waste bandwidth and memory
over crds values of nodes with different shred-version.
2021-06-13 14:08:08 +00:00
behzad nouri cab30e2356
parallelizes gossip packets receiver with processing of requests (#17647)
Gossip packet processing is composed of two stages:
  * The first is consuming packets from the socket, deserializing,
    sanitizing and verifying them:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2510-L2521
  * The second is actually processing the requests/messages:
    https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2585-L2605

The former does not acquire any locks and so can be parallelized with
the later, allowing better pipelineing properties and smaller latency in
responding to gossip requests or propagating messages.
2021-06-07 18:36:06 +00:00
behzad nouri 60b0a13444
writes epoch-slots to crds table synchronously (#17719)
epoch-slots may be overwritten before they are written to crds table:
https://github.com/solana-labs/solana/issues/17711

This commit writes new epoch-slots to crds table synchronously with
push_epoch_slots. The functions is still not thread-safe as commented in
the code, however currently only one threads is invoking this code.
2021-06-04 13:56:51 +00:00
behzad nouri be957f25c9
adds fallback logic if retransmit multicast fails (#17714)
In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733
2021-06-04 12:16:37 +00:00
Tyera Eulberg 3a647c4bea
Rename ValidatorExit and move to sdk (#17728) 2021-06-04 03:06:13 +00:00
behzad nouri 7cf6e66ddd
excludes caller's crds values from pull responses (#17542)
If the crds entry belongs to the caller itself, then the caller will
always have the more recent version of it, regardless of it being
filtered out by the bloom filter or not.

The exception is node-instance types which are meant to detect duplicate
running instances, and those are exempted.
2021-05-28 13:19:14 +00:00
Tyera Eulberg 9a5330b7eb
Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
behzad nouri cf1acfb021 uses Duration type for gossip discover timeout 2021-05-22 19:17:36 +00:00
Michael Vines a911ae00ba clippy 2021-04-18 20:55:02 -07:00
Michael Vines 24ab84936e Break up RPC API into three categories: minimal, full and admin 2021-03-04 16:39:44 -08:00
Michael Vines 328f59ebef --gossip-host may now be specified with --entrypoint 2020-11-13 06:20:15 +00:00
Michael Vines a1e2357d12 `solana-gossip spy` can now be given an identity keypair (`--identity` argument) 2020-08-22 17:00:50 -07:00
sakridge 0c72f62e96
Refactor gossip code from one huge function (#10701) 2020-06-18 22:20:52 -07:00
sakridge c4a096d8d4 Wait 15 seconds for gossip rpc url (#10053) 2020-05-15 13:23:40 -07:00
Jack May eb1acaf927
Remove archiver and storage program (#9992)
automerge
2020-05-14 18:22:47 -07:00
Michael Vines 4e4a21f9b7
`solana-gossip spy` can now specify a shred version (#10040) 2020-05-13 19:37:40 -07:00
Michael Vines 2521f75c18
Advertise node software version in gossip (#9981)
* Advertise node version in gossip

* Remove solana_clap_utils::version! macro
2020-05-11 15:02:01 -07:00
Michael Vines d1cbccd9ba
solana-dos can now DoS gossip nodes (#9652)
automerge
2020-04-23 11:46:12 -07:00
Michael Vines 448b957a13
Add --bind-address and --rpc-bind-address validator arguments (#8628) 2020-03-04 22:46:43 -07:00
Michael Vines 356f246a74 Remove get-/show- prefix from cli commands 2020-01-21 08:43:07 -07:00
Jack May 07855e3125
Allow override of RUST_LOG (#7705) 2020-01-08 09:19:12 -08:00
Michael Vines 48a36f59a6 Add get-rpc-url --any option 2020-01-02 17:20:59 -07:00
Michael Vines 965b132664 Permit --gossip-host with --entrypoint 2020-01-02 17:20:59 -07:00
Michael Vines b0271394cd
Clean up --gossip-port argument (#7067)
--gossip-port now specifies exactly that, the gossip port to use.  The
new --gossip-host argument can be used to specify the DNS name/IP
address for gossip if --entrypoint is not supplied (when --entrypoint is
supplied, the gossip address is automatically set to the node's ip
address as observed by the entrypoint)
2019-11-20 15:21:34 -07:00
Michael Vines c3926e6af0
|solana-gossip spy| no longer requires an entrypoint (#6999) 2019-11-16 14:16:28 -07:00
Ryo Onodera 4fc767b3f6
Move version! from core:: to clap_utils:: (#6944)
* Move version! from core to clap-utils

* Completely move version! from core:: to clap_utils::

* rustfmt

* Do remaining transition after rebase
2019-11-14 13:10:38 +09:00
Ryo Onodera 3faeb7fa79 Rename solana-netutil to solana-net-utils for consistency (#6895)
* sed -i -e 's/netutil/net_utils/g' $(git grep --files-with-matches netutil :**.rs)

* sed -i -e 's/netutil/net-utils/g' $(git grep --files-with-matches netutil)

* git mv netutil/ net-utils

* Tweak a bit

* Fix rustfmt & clippy
2019-11-12 13:37:13 -07:00
Ryo Onodera d84f367317
Extract duplicate clap helpers into clap-utils (#6812) 2019-11-12 09:42:08 +09:00
Michael Vines 4be646c695
discover() by gossip sockaddr instead of just by gossip ip address (#6865) 2019-11-11 12:42:58 -07:00
Michael Vines 0fbd508c5f
Only check the entrypoint's RPC address (#6851) 2019-11-09 00:56:31 -07:00
Michael Vines 397ea05aa7
spy nodes are now gossip entrypoints (#6532) 2019-10-24 15:35:33 -07:00
Michael Vines 8e5e48dd92
Add get-rpc-url --all flag (#6533) 2019-10-24 10:44:05 -07:00
Greg Fitzgerald 9232057e95
Rename replicator to archiver (#6464)
* Rename replicator to archiver

* cargo fmt

* Fix grammar
2019-10-21 11:29:37 -06:00
Greg Fitzgerald 322fcea6e5
More fullnode to validator renaming (#6337) 2019-10-11 13:30:52 -06:00
Michael Vines f9f5bc2eb5
More clippy 2019-10-02 21:21:07 -07:00
Michael Vines 8e888059d8
Use built-in solana-gossip timeout for better error messages (#6189) 2019-10-01 12:30:11 -07:00
Michael Vines e3a6c9234a Entrypoint RPC service discovery now blocks until the entrypoint is actually found (#5756)
automerge
2019-08-30 16:12:58 -07:00
Michael Vines 22667d64d1 Add various missing cli validators (#5745)
automerge
2019-08-30 09:27:35 -07:00
Michael Vines 3450b9a44d
Rename solana to solana-core (#5583) 2019-08-21 10:23:33 -07:00
Michael Vines 08f6a2ea3e
debash: Add `solana-gossip get-rpc-url` command to avoid hard coding (#5513) 2019-08-13 10:49:48 -07:00
Michael Vines 7981431f09
--entrypoint is a global arg 2019-08-12 16:08:45 -07:00
Michael Vines 4e093525c7
Default to error logs, override with info only for those programs that need it (#5321)
* Revert "Revert "Default log level to to RUST_LOG=solana=info (#5296)" (#5302)"

This reverts commit 7796e87814.

* Default to error logs, override with info only for those programs that need it
2019-07-29 10:57:00 -07:00
Michael Vines 7796e87814
Revert "Default log level to to RUST_LOG=solana=info (#5296)" (#5302)
This reverts commit c63a38ae57.
2019-07-27 07:46:45 -07:00
Michael Vines c63a38ae57
Default log level to to RUST_LOG=solana=info (#5296) 2019-07-26 16:29:16 -07:00
Sagar Dhawan 65adce65fa
Always send pull responses to the origin addr (#4894) 2019-07-01 16:49:05 -07:00
Sagar Dhawan a0ffbf50a5
Correctly remove replicator from data plane after its done repairing (#4301)
* Correctly remove replicator from data plane after its done repairing

* Update discover to report nodes and replicators separately

* Fix print and condition to be spy
2019-05-16 07:14:58 -07:00
Michael Vines f3f416b7ba
Rename --network argument to --entrypoint (#4149) 2019-05-03 15:00:19 -07:00
Michael Vines 7fe3c75c6b
Add a node-specific ip echo service to remove dependency on ifconfig.co (#4137) 2019-05-03 11:01:35 -07:00
Sagar Dhawan 9add8d0afc Add alternative to Spy Nodes that can fully participate in Gossip (#4087)
automerge
2019-04-30 16:42:56 -07:00
Michael Vines 05bcb7f292
Add stop node command to solana-gossip (#3928) 2019-04-22 14:51:20 -07:00
Michael Vines 149d809e86
Minor cli help cleanup (#3786) 2019-04-15 13:36:14 -07:00
Michael Vines 0767c0c07f Add DNS resolution to cli tools 2019-04-14 21:25:46 -07:00
Michael Vines 2277a39dd2 Default solana-gossip log-level to 'info' 2019-04-14 07:07:15 -07:00
Tyera Eulberg af97ad3d68 Add solana-gossip module 2019-04-01 23:05:25 -06:00