Commit Graph

322 Commits

Author SHA1 Message Date
Brooks 685c22ff41
Inlines variables into format strings (#29945) 2023-01-27 06:23:03 +00:00
behzad nouri d6fbf3fb17
adds new contact-info with forward compatible sockets (#29596)
The commit implement new ContactInfo where
* Ports and IP addresses are specified separately so that unique IP
  addresses can only be specified once.
* Different sockets (tvu, tpu, etc) are specified by opaque u8 tags so
  that adding and removing sockets is backward and forward compatible.
* solana_version::Version is also embedded in so that it won't need to
  be gossiped separately.
* NodeInstance is also rolled in by adding a field identifying when the
  instance was first created so that it won't need to be gossiped
  separately.

Update plan:
* Once the cluster is able to ingest the new type (i.e. this patch), a
  2nd patch will start gossiping the new ContactInfo along with the
  LegacyContactInfo.
* Once all nodes in the cluster gossip the new ContactInfo, a 3rd patch
  will start solely using the new ContactInfo while still gossiping the
  old LegacyContactInfo.
* Once all nodes in the cluster solely use the new ContactInfo, a 4th
  patch will stop gossiping the old LegacyContactInfo.
2023-01-26 17:02:18 +00:00
behzad nouri 1c7662a37f
asserts that cluster-info keypair is consistent with contact-info id (#29818) 2023-01-24 16:57:55 +00:00
Kevin Ji dd92f225bb
Use Ipv4Addr::{LOCALHOST, UNSPECIFIED} constants (#29813) 2023-01-23 16:49:51 -06:00
behzad nouri 590b75140f
removes legacy retransmit tests (#29817)
Retransmit code has moved to core/src/cluster_nodes.rs and has been
significantly revised.
gossip/tests/cluster_info.rs is testing the old code which is no longer
relevant.
2023-01-21 22:28:48 +00:00
behzad nouri 272e667cb2
deprecates Pubkey::new in favor of Pubkey::{,try_}from (#29805)
The commit deprecates Pubkey::new which lacks type-safety and instead
implements TryFrom<&[u8]> and TryFrom<Vec<u8>> for Pubkey.
2023-01-21 18:06:27 +00:00
Wen b36791956e
Ingest duplicate proofs sent through Gossip (#29227)
* First draft of ingesting duplicate proofs in Gossip into blockstore.

* Add more unittests.

* Add more unittests for bad cases.

* Fix lint errors for tests.

* More linter fixes for tests.

* Lint fixes

* Rename get_entries, move location of comment.

* Some renaming changes and comment fixes.

* Fix compile warning, this enum is not used.

* Fix lint errors.

* Slow down cleanup because this could potentially be expensive.

* Forgot to reset cleanup count.

* Add protection against attackers when constructing chunk map when
we ingest Gossip proofs.

* Use duplicate shred index instead of get_entries.

* Rename ClusterInfoDuplicateShredListener and fix a few small problems.

* Use into_shreds to piece together the proof.

* Remove redundant code.

* Address a few small errors.

* Discard slots too advanced in the future.

* - Use oldest proof for each pubkey
- Limit number of pubkeys in each slot to 100

* Disable duplicate shred handling for now.

* Revert "Disable duplicate shred handling for now."

This reverts commit c3fcf403876cfbf90afe4d2265a826f21a5e24ab.
2023-01-19 13:00:56 -08:00
behzad nouri 9f2910e962
factors out common gossip {push,pull}_options code (#29737) 2023-01-18 17:43:09 +00:00
behzad nouri 0941d133a8
adds new solana_version::Version with ClientId (#29649) 2023-01-17 22:21:14 +00:00
behzad nouri d4ce59eee7
reworks weights for gossip pull-requests peer sampling (#28463)
Amplifying gossip peer sampling weights by the time since last
pull-request has undesired consequence that a node coming back online
will see a huge number of pull requests all at once.
This "time since last request" is also unnecessary to include in
weights because as long as sampling probabilities are non-zero, a node
will be almost surely periodically selected in the sample.
The commit reworks peer sampling probabilities by just using (dampened)
stakes as weights.
2023-01-14 15:44:38 +00:00
Illia Bobyr e410d021ea
gossip: crds::test::test_update_timestamp: Remove hash comparison (#29567)
It was not immediately clear why the second `CrdsValue` insertion in the
test must always succeed.  Turns out the test was relying on hash values
having a specific relationship.  It is confusing to someone not deeply
familiar with the test.

As overwrite based on the hash value is not part of the behavior that we
consider valuable, we just remove that check.

Unified assertion between two checks into one.
2023-01-12 00:19:44 -08:00
behzad nouri d89cf0d28b
includes origin's stake in gossip push nodes sampling (#29343)
Gossip push samples nodes by stake. This is unnecessarily wasteful and
creates too much congestion at high staked nodes if the CRDS value to be
propagated is from a node with low or zero stake.
This commit instead maintains several active-sets for push, each
corresponding with a stake bucket. Peer sampling weights are accordingly
capped by the respective bucket stake.
2023-01-11 19:46:32 +00:00
behzad nouri 677b6d6458
removes LegacyContactInfo::is_valid_tvu_address (#29570)
Since
https://github.com/solana-labs/solana/pull/20480
turbine includes all epoch staked nodes in tree construction and no
longer relies on obtaining their contact-info from gossip; and so
distinguishing between is_valid_address and is_valid_tvu_address is no
longer necessary and the latter can be removed.
2023-01-08 22:53:45 +00:00
behzad nouri 8c212f59ad
renames ContactInfo to LegacyContactInfo (#29566)
Working towards adding a new ContactInfo where new sockets can be
added in a backward compatible way.
2023-01-08 16:00:55 +00:00
behzad nouri 283a2b1540
removes #[allow(clippy::same_item_push)] (#29543) 2023-01-06 17:32:26 +00:00
behzad nouri e5323166b3
dedups gossip addresses, taking the one with highest weight (#29421)
dedups gossip addresses, keeping only the one with the highest weight

In order to avoid traffic congestion or sending duplicate packets, when
sampling gossip nodes if several nodes have the same gossip address
(because they are behind a relayer or whatever), they need to be
deduplicated into one.
2023-01-03 19:37:43 +00:00
behzad nouri 2d849a2eae
indexes duplicate-shreds in gossip crds table (#29317)
Also adding Crds::get_duplicate_shreds which retrieves all upserted
duplicate-shreds since a given cursor using the index.
2022-12-20 13:48:05 +00:00
behzad nouri 78a04ed432
ignores pubkey in Protocol::PruneMessage (#29280)
Protocol::PruneMessage(Pubkey, _) is the same as PruneData.pubkey and so
is redundant and can be ignored:
https://github.com/solana-labs/solana/blob/95d339300/gossip/src/cluster_info.rs#LL277-L279
https://github.com/solana-labs/solana/blob/95d339300/gossip/src/cluster_info.rs#L361-L367
2022-12-15 17:51:12 +00:00
behzad nouri a5c8c7c536
locks crds table only once to process push messages (#29218)
Processing push messages is locking and unlocking crds table for each
push message:
https://github.com/solana-labs/solana/blob/536b879aa/gossip/src/cluster_info.rs#L2266-L2276
https://github.com/solana-labs/solana/blob/536b879aa/gossip/src/crds_gossip_push.rs#L215C9-L260

This commit instead locks the crds table once for all the received push
messages.
2022-12-15 16:02:46 +00:00
behzad nouri 95d3393008 prunes gossip nodes based on timeliness of delivered messages
As described here:
https://github.com/solana-labs/solana/issues/28642#issuecomment-1337449607
current gossip pruning code fails to maintain spanning trees across
cluster.

This commit instead implements a pruning code based on timeliness of
delivered messages. If a messages is delivered timely enough (in terms
of number of duplicates already observed for that value), it counts
towards the respective node's score. Once there are enough many CRDS
upserts from a specific origin, redundant nodes are pruned based on the
tracked score.

Since the pruning leaves some configurable redundancy and the scores are
reset frequently, it should better tolerate active-set rotations.
2022-12-15 13:28:27 +00:00
behzad nouri b06656cbba tracks number of gossip push duplicates
The commit tracks number of times duplicates of a CRDS value is received
from gossip push. Following commits will utilize this metric to score
gossip nodes in terms of timeliness of their push messages, in order to
better pick which nodes to prune.
2022-12-15 13:28:27 +00:00
behzad nouri 8ea5dd8b28
removes metric for process_push_success (#29211)
This is already tracked in CrdsDataStats:
https://github.com/solana-labs/solana/blob/5e799ad56/gossip/src/crds.rs#L96-L106
https://github.com/solana-labs/solana/blob/5e799ad56/gossip/src/cluster_info_metrics.rs#L652-L656
and is so duplicated.
Removing the metric would simplify this code path for upcoming commits.
2022-12-12 22:10:38 +00:00
behzad nouri 9524c9dbff patches errors from clippy::uninlined_format_args
https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
2022-12-06 19:32:15 +00:00
Jon Cinque b1340d77a2
sdk: Make Packet::meta private, use accessor functions (#29092)
sdk: Make packet meta private
2022-12-06 12:54:49 +01:00
behzad nouri 718f433206
adds metrics for gossip push fanout (#29065) 2022-12-04 15:20:51 +00:00
Tyera c32377b5af
Split out quic- and udp-client definitions (#28762)
* Move ConnectionCache back to solana-client, and duplicate ThinClient, TpuClient there

* Dedupe thin_client modules

* Dedupe tpu_client modules

* Move TpuClient to TpuConnectionCache

* Move ThinClient to TpuConnectionCache

* Move TpuConnection and quic/udp trait implementations back to solana-client

* Remove enum_dispatch from solana-tpu-client

* Move udp-client to its own crate

* Move quic-client to its own crate
2022-11-18 12:21:45 -07:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
behzad nouri f703275fc4
pings peers before sending push messages (#28537) 2022-10-25 00:01:23 +00:00
behzad nouri e283461d99
enforces hash domain for ping-pong protocol (#28433)
https://github.com/solana-labs/solana/pull/27193
added hash domain to ping-pong protocol.
For backward compatibility responses both with and without domain were
generated and accepted.
Now that all clusters are upgraded, this commit enforces the hash domain
by removing the response without the domain.
2022-10-18 18:17:12 +00:00
Brooks Prumo 12df0f234d
Upgrade to Rust 1.64.0 (#28034) 2022-09-29 09:32:24 -04:00
behzad nouri abfaf06e87
counts gossip packets received before excess packets are dropped (#28086)
Currently, gossip packets are counted after excess packets are dropped.
This makes it difficult to debug gossip traffic spikes if the majority
of the packets are dropped.

This commit instead counts gossip packets received before excess packets
are dropped
2022-09-27 13:43:35 +00:00
Jeff Biseda 8b0f9b4917
make ping cache rate limit delay configurable (#27955) 2022-09-26 14:16:56 -07:00
behzad nouri f49beb0cbc
caches reed-solomon encoder/decoder instance (#27510)
ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache:
https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466

In order to cache this computation, this commit caches the reed-solomon
encoder/decoder instance for each (data_shards, parity_shards) pair.
2022-09-25 18:09:47 +00:00
behzad nouri 97c9af4c6b plumbs through flag to generate merkle variant of shreds 2022-09-23 16:45:18 +00:00
behzad nouri 9a57c64f21
patches clippy errors from new rust nightly release (#27996) 2022-09-22 22:23:03 +00:00
Will Hickey 4aa2a42cc7
Fix test_bench_tps_local_cluster_solana (#27448)
* Fix test_bench_tps_local_cluster_solana
* Remove #[ignore] annotations from dos tests (which are also fixed by this change)
* Remove #[ignore] annotations from local cluster tests (which are also fixed by this change)
2022-08-30 13:04:31 -05:00
Tyera Eulberg b8b3d723da
Use new client crates (#27360)
* Update ancillary cli crates

* Update cli

* Update command-line tools

* Update rpc, etc

* Update client-test

* Update core, validator

* Update local-cluster
2022-08-24 10:47:02 -06:00
Brennan Watt e4a7d01e10
Rust v1.63 (#27303)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

* Update quinn-udp crate

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-22 18:01:03 -07:00
Michael Vines 3f4731b37f Standardize thread names
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
2022-08-20 07:49:39 -07:00
Brennan Watt 7573000d87
Revert "Rust v1.63.0 (#27148)" (#27245)
This reverts commit a2e7bdf50a.
2022-08-19 09:19:44 +01:00
behzad nouri 6928b2a5af
adds hash domain to ping-pong protocol (#27193)
In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.
2022-08-18 22:39:31 +00:00
Brennan Watt a2e7bdf50a
Rust v1.63.0 (#27148)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-17 15:48:33 -07:00
behzad nouri fea66c8b63
derives Error trait for ClusterInfoError and core::result::Error (#27208) 2022-08-17 22:01:51 +00:00
Brennan Watt 521c550ccd
Sleep between vote refreshes (#27115)
* Sleep between vote refreshes in unit test
2022-08-12 13:45:00 -07:00
Tyera Eulberg 45c0da8597
Fix quic client on TestValidator, alternative (#27046)
Add new method to enable custom offset
2022-08-10 15:27:12 +00:00
Jeff Biseda 857be1e237
sign repair requests (#26833) 2022-07-31 15:48:51 -07:00
behzad nouri 128226c6cc
patches flaky test_push_votes_with_tower (#26554)
cargo test --package solana-gossip --release test_push_votes_with_tower

occasionally fails because with --release all votes are generated at
the same wallclock (milliseconds resolution) and so the new ones will
not necessarily override existing entries in the table.

The commit ensures that the new vote is pushed with a wallclock later
than existing entries.
2022-07-11 12:56:31 +00:00
behzad nouri ba785cf8ab
removes erroneous uses of std::mem::swap (#26536)
All instances should be replace by std::mem::{replace,take},
or just plain assignment.
2022-07-11 11:33:15 +00:00
behzad nouri df616a0dda
removes redundant clone in gossip PruneData::signable_data (#26510)
PruneData::signable_data redundantly clones inner fields, while only
references suffice:
https://github.com/solana-labs/solana/blob/d1370f2c7/gossip/src/cluster_info.rs#L219-L233
2022-07-10 13:13:07 +00:00
Greg Cusack 032bee13ab
Add Gossip Loop metrics (#26195)
* add three gossip metrics measuring gossip loop times

* add 5 metrics

* rm space

* rm space

* Update SECURITY.md

- fix nav link
- add bounty split policy for duplicate reports

* Add transaction index in slot to geyser plugin TransactionInfo (#25688)

* Define shuffle to prep using same shuffle for multiple slices

* Determine transaction indexes and plumb to execute_batch

* Pair transaction_index with transaction in TransactionStatusService

* Add new ReplicaTransactionInfoVersion

* Plumb transaction_indexes through BankingStage

* Prepare BankingStage to receive transaction indexes from PohRecorder

* Determine transaction indexes in PohRecorder; add field to WorkingBank

* Add PohRecorder::record unit test

* Only pass starting_transaction_index around PohRecorder

* Add helper structs to simplify test DashMap

* Pass entry and starting-index into process_entries_with_callback together

* Add tx-index checks to test_rebatch_transactions

* Revert shuffle definition and use zip/unzip

* Only zip/unzip if randomize

* Add confirm_slot_entries test

* Review nits

* Add type alias to make sender docs more clear

* Update SECURITY.md

finish filling out the table....

* rpc: fix possible deadlock in rpc (#26051)

* Add StatusCache::root_slot_deltas() and use it (#26170)

* Remove InMemAccountsIndex::map() and use map_internal directly (#26189)

* [quic]Decrement total_streams correctly (#26158)

* remove comment

* alphabetical metrics. no abbreviations

* remove trailing white space

* cargo fmt to update code format/readability

Co-authored-by: Trent Nelson <trent@solana.com>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
Co-authored-by: Boqin Qin(秦 伯钦) <Bobbqqin@gmail.com>
Co-authored-by: Brooks Prumo <brooks@solana.com>
Co-authored-by: Miles Obare <bdhobare@gmail.com>
2022-06-29 11:55:41 -06:00
behzad nouri f534b8981b
maps number of data shreds to erasure batch size (#25917)
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
  erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
  replace and remove entries_to_data_shreds from the public interface.
2022-06-23 13:27:54 +00:00
Jon Cinque 79a8ecd0ac
client: Remove static connection cache, plumb it instead (#25667)
* client: Remove static connection cache, plumb it instead

* Add TpuClient::new_with_connection_cache to not break downstream

* Refactor get_connection and RwLock into ConnectionCache

* Fix merge conflicts from new async TpuClient

* Remove `ConnectionCache::set_use_quic`

* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
2022-06-08 13:57:12 +02:00
HaoranYi 4223f82922
Fix format alignment for cluster info trace (#25741)
* double shrinking

* remove pre/post shrink time

* fix cluster info trace alignemnt

* add test

* format

* typo

* add checks in cluster info trace test

* update cargo lock

* clippy

* clippy

* move regex deps to dev deps

* cargo lock
2022-06-06 09:51:00 -05:00
Pankaj Garg 1c2ae470c5
Fix forwarding of transactions over QUIC (#25674)
* Spawn QUIC server to receive forwarded txs

* Update validator port range

* forward votes using UDP

* no forwarding from unstaked nodes

* forwarding stats in banking stage

* fix test builds

* fix lifetime of forward sender
2022-06-02 11:14:58 -07:00
behzad nouri 69cbbaf483
patches flaky gossip pull from entrypoint test (#25589)
test_pull_from_entrypoint_if_not_present relies on a deterministic
ordering for the entries when generating gossip pull requests.

https://github.com/solana-labs/solana/pull/25460
changed an intermediate type for gossip pull-requests from Vec to
HashMap, and so the entries are no longer deterministically ordered.
This causes the test to be flaky.

The commit updates the test so that it no longer relies on the ordering.
2022-05-26 18:58:06 +00:00
behzad nouri 1925b4f5cb
fans out gossip pull-requests to many randomly selected peers (#25460)
Each time a node generates gossip pull-requests, it sends out all the
requests to a single randomly selected peer:
https://github.com/solana-labs/solana/blob/fd7ad31ee/gossip/src/crds_gossip_pull.rs#L253-L266

This causes a burst of pull-requests at a single node at once. In order
to make gossip in-bound traffic less bursty, this commit fans out gossip
pull-requests to several randomly selected peers.

This should reduce spikes in inbound gossip traffic without changing the
average load which may help reduce number of times outbound data budget is
exhausted when responding to gossip pull-requests at the receiving node, and
reduce number of pull-requests dropped.
2022-05-26 12:45:53 +00:00
Justin Starry cad1c41ce2 Add Packet::deserialize_slice convenience method 2022-05-24 17:31:14 +08:00
steviez ec7ca411dd
Make PacketBatch packets vector non-public (#25413)
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
2022-05-23 15:30:15 -05:00
behzad nouri c248fb3f51
renames Packet Meta::{,set_}addr methods to {,set_}socket_addr (#25478)
In order to distinguish between Meta.addr field which is an IpAddr and
the methods which refer to a SocketAddr.
2022-05-23 15:48:59 +00:00
Michael Vines 3608801a54 Avoid clippy::significant_drop_in_scrutinee 2022-05-22 22:22:21 -07:00
Michael Vines b05c7d91ed Fix derive_partial_eq_without_eq clippy lint 2022-05-22 22:22:21 -07:00
Michael Vines c54e06355f
voteSubscribe pubsub notification now includes the vote transaction signature (#25291) 2022-05-19 18:28:46 -07:00
sakridge 3d96a1ab76
Block packets in vote-only mode (#24906) 2022-05-14 17:53:37 +02:00
Justin Starry 7100f1c94b
Collect stats in streamer receiver and report fetch stage metrics (#25010) 2022-05-06 02:56:18 +08:00
steviez b48fd4eec2
Construct PacketBatches from PongMessages directly (#24708)
Serialize pongs directly into PacketBatch to save copying the data from
intermediate packets into PacketBatch.
2022-04-26 21:30:00 -05:00
behzad nouri 12ae8d3be5
returns Error when Shred::sanitize fails (#24653)
Including the error in the output allows to debug when Shred::sanitize
fails.
2022-04-25 23:19:37 +00:00
behzad nouri 895f76a93c
hides implementation details of shred from its public interface (#24563)
Working towards embedding versioning into shreds binary, so that a new
variant of shred struct can include merkle tree hashes of the erasure
set.
2022-04-25 12:43:22 +00:00
behzad nouri 1d50832389
replaces counters with datapoints in gossip metrics (#24451) 2022-04-18 23:14:59 +00:00
HaoranYi 6d1b6bdd7c
typo (#24400) 2022-04-18 08:52:56 -05:00
Tyera Eulberg afeb1d3cca
Bump lru crate (#24150) 2022-04-06 16:18:42 -06:00
behzad nouri cd09390367
reduces gossip crds stats (#24132) 2022-04-06 15:35:25 +00:00
BG Zhu 22224127e0
Refactor thin_client::create_client (#24067)
Refactor the thin_client::create_client to take addresses separately instead of as a tuple

Co-authored-by: Bijie Zhu <bijiezhu@Bijies-MBP.cable.rcn.com>
2022-04-06 11:03:38 -04:00
ryleung-solana a38bd4acc8
Use LRU in connection-cache (#24109)
Switch to using LRU for connection-cache
2022-04-06 10:58:32 -04:00
behzad nouri db23295e1c
removes legacy weighted_shuffle and weighted_best methods (#24125)
Older weighted_shuffle is based on a heuristic which results in biased
samples as shown in:
https://github.com/solana-labs/solana/pull/18343
and can be replaced with WeightedShuffle.

Also, as described in:
https://github.com/solana-labs/solana/pull/13919
weighted_best can be replaced with rand::distributions::WeightedIndex,
or WeightdShuffle::first.
2022-04-05 19:19:22 +00:00
behzad nouri 2b718d00b0 removes legacy compatibility turbine peers shuffle code 2022-04-05 12:04:12 +00:00
behzad nouri 7cb3b6cbe2
demotes WeightedShuffle failures to error metrics (#24079)
Since call-sites are calling unwrap anyways, panicking seems too punitive
for our use cases.
2022-04-03 16:20:06 +00:00
ryleung-solana 8b72200afb
Thin client quic (#23973)
Change thin-client to use connection-cache
2022-03-31 15:47:00 -04:00
Michael Vines 7ef18f220a Update Version CrdsData on node identity changes 2022-03-28 15:57:16 -07:00
ryleung-solana 17b00ad3a4
Add quic-client module (#23166)
* Add quic-client module to send transactions via quic, abstracted behind the TpuConnection trait (along with a legacy UDP implementation of TpuConnection) and change thin-client to use TpuConnection
2022-03-09 21:33:05 -05:00
sakridge a4f4ac5279
add plumbing to allow for arbitrary tpu address in gossip (#22703)
* add plumbing to allow for arbitrary tpu address in gossip

* make clippy happy

* Review comments

Co-authored-by: CherryWorm <nico.gruendel@web.de>
2022-03-02 09:42:14 +01:00
behzad nouri 1282277126
bumps up crds-shards-bits (#23220)
The commit adjust CRDS_SHARDS_BITS up to be in-line with mask_bits in
gossip pull request. This will avoid redundant filtering of irrelevant
crds entries when responding to pull requests.
2022-03-01 15:14:11 +00:00
sakridge 514aab46d9
Search for consecutive ports (#22979) 2022-02-07 17:53:40 +01:00
sakridge 5a230f418d
Add quic port for accepting transactions (#22753)
using quinn library

streamer: Sign TLS cert with validator identity key

Handle multiple incoming chunks
2022-02-04 15:27:09 +01:00
behzad nouri e3b137066d
caches WeightedShuffle struct in ClusterNodes (#22877)
Instead of reconstructing WeightedShuffle struct for each shred
broadcast or retransmit, we can use the same struct with minimal
mutations.
2022-02-02 15:12:26 +00:00
behzad nouri 45e09664b8
removes Rng field from WeightedShuffle struct (#22850) 2022-02-01 15:27:23 +00:00
behzad nouri 604ca9316c
includes zero weighted entries in WeightedShuffle (#22829)
Current WeightedShuffle implementation excludes zero weighted entries
from the shuffle:
https://github.com/solana-labs/solana/blob/13e631dcf/gossip/src/weighted_shuffle.rs#L29-L30

Though mathematically this might make more sense, for our use-cases
(turbine specifically), this results in less efficient code:
https://github.com/solana-labs/solana/blob/13e631dcf/core/src/cluster_nodes.rs#L409-L430

This commit changes the implementation so that zero weighted indices are
also included in the shuffle but appear only at the end after non-zero
weighted indices.
2022-01-31 16:23:50 +00:00
Michael Vines 6d5bbca630 Pacify clippy 2022-01-21 19:12:57 -08:00
Justin Starry 7f20c6149e
Refactor: move simple vote parsing to runtime (#22537) 2022-01-20 10:39:21 +08:00
anatoly yakovenko d343713f61
Optimize packet dedup (#22571)
* Use bloom filter to dedup packets

* dedup first

* Update bloom/src/bloom.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* fixup

* fixup

* fixup

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-19 13:58:20 -08:00
Jeff Biseda 8b66625c95
convert std::sync::mpsc to crossbeam_channel (#22264) 2022-01-11 02:44:46 -08:00
behzad nouri 49da347d84
limits gossip vote stats to the top most voted slots (#22416) 2022-01-10 21:23:41 +00:00
behzad nouri c9c78622a8
discards serialized gossip crds votes if cannot parse tx (#22129) 2021-12-29 19:31:26 +00:00
behzad nouri 65d59f4ef0
tracks erasure coding shreds' indices explicitly (#21822)
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.
2021-12-19 22:37:55 +00:00
carllin 7f6fb6937a
Ensure AncestorHashesSerice selects an open port (#21919) 2021-12-18 00:44:01 -05:00
Jeff Biseda 97a1fa10a6
streamer send destination metrics for repair, gossip (#21564) 2021-12-17 15:21:05 -08:00
behzad nouri 89d66c3210
removes next_shred_index from return value of entries to shreds api (#21961)
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
2021-12-17 15:01:55 +00:00
Justin Starry 254ef3e7b6
Rename Packets to PacketBatch (#21794) 2021-12-11 09:44:15 -05:00
Ashwin Sekar f0acf7681e
Add vote instructions that directly update on chain vote state (#21531)
* Add vote state instructions

UpdateVoteState and UpdateVoteStateSwitch

* cargo tree

* extract vote state version conversion to common fn
2021-12-07 16:47:26 -08:00
Michael Vines b8837c04ec Reformat imports to a consistent style for imports
rustfmt.toml configuration:
  imports_granularity = "One"
  group_imports = "One"
2021-12-03 09:19:13 -08:00
behzad nouri 9886366977
exempts AccountsHashes from stake check (#21565)
Otherwise getHealth fails if account hashes are not propagated.
2021-12-02 18:01:32 +00:00
behzad nouri 57057f8d39 uses enum for shred type
Current code is using u8 which does not have any type-safety and can
contain invalid values:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L167

Checks for invalid shred-types are scattered through the code:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/blockstore.rs#L849-L851
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L346-L348

The commit uses enum for shred type with #[repr(u8)]. Backward
compatibility is maintained by implementing Serialize and Deserialize
compatible with u8, and adding a test to assert that.
2021-11-19 14:16:39 +00:00
carllin b30c94ce55
ClusterInfoVoteListener send only missing votes to BankingStage (#20873) 2021-11-18 15:20:41 -08:00
behzad nouri 3fc858eb60 adds methods to obtain data/coding shreds indices from ErasureMeta 2021-11-13 17:08:05 +00:00
Michael Keleti b0ca335463
Rename "trusted" to "known" in `validators/` (#21197)
* Replaced trusted with known validator

* Format Convention
2021-11-12 11:57:55 -07:00
behzad nouri eea3fb327f
seeds rng for test_build_crds_filter test (#21031) 2021-10-28 18:29:32 +00:00
behzad nouri 43168e6365
doubles crds unique pubkey capacity (#20947) 2021-10-26 13:06:55 +00:00
behzad nouri 1297a13586
adds metrics tracking crds writes and votes (#20953) 2021-10-26 13:02:30 +00:00
Brooks Prumo 14af1957d6
Make pub IncrementalSnapshotHashes fields (#20727) 2021-10-18 13:38:43 -05:00
behzad nouri 0c0384ec32
revises turbine peers shuffling order (#20480)
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).

However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.

This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
  tree.
* Staked nodes are always included regardless of if their contact-info
  is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
2021-10-14 15:09:36 +00:00
Brooks Prumo 1fcfbfccbb
Add fn to push IncrementalSnapshotHashes to cluster via gossip (#20395) 2021-10-08 08:20:35 -05:00
Brooks Prumo 57592e463e
Add get_incremental_snapshot_hash_for_node() to gossip (#20394) 2021-10-07 19:47:14 -05:00
behzad nouri 0da661de62
adds metrics for number of nodes vs number of pubkeys (#20512) 2021-10-07 18:56:05 +00:00
Tao Zhu 177a375479
Tpu vote 1.7 (#20187) (#20494)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-07 09:38:23 +00:00
Brooks Prumo 4e3818e5c1
Add CrdsData::IncrementalSnapshotHashes (#20374) 2021-10-05 09:57:46 -05:00
Brooks Prumo 5d141fe01d
Rename CRDS SnapshotHash to SnapshotHashes (#20421) 2021-10-04 19:03:28 -05:00
carllin ee8621a8bd
Add metric measuring number of successfully inserted push messages (#20275)
* Add number of successfully inserted push messages
2021-09-28 21:41:17 -07:00
behzad nouri 43ed727ba7
reverts #17542 (#20259)
https://github.com/solana-labs/solana/pull/17542
excludes caller's crds values from pull responses.

Reverting that commit so that when a (staked) node restarts, it can
obtain its crds values before restart from other nodes.
2021-09-27 22:03:26 +00:00
sakridge 013e1d9d49
Limit transaction forwarding from banking_stage (#19940) 2021-09-21 08:49:41 -07:00
sakridge 44c8b1bca2
Remove clippy (#19793) 2021-09-13 20:08:28 -07:00
behzad nouri d7051b0d21
adds logs when push-vote panics with invalid vote-index (#19485)
In order to debug this panic on the clusters:

  panicked at 'assertion failed: (vote_index as usize) <
  MAX_LOCKOUT_HISTORY', core/src/cluster_info.rs:1012:9
2021-08-31 12:15:07 +00:00
behzad nouri 6909a79b6f
removes require-stake-for-gossip feature (#19476)
The feature is already activated on all clusters.
2021-08-27 21:17:15 +00:00
behzad nouri 3efccbffab sends shreds (instead of packets) to retransmit stage
Working towards channelling through shreds recovered from erasure codes
to retransmit stage.
2021-08-17 13:44:10 +00:00
behzad nouri 140abec6ef
exempts node-instances from shred-version check (#19190)
Clusters are kept separate using the shred-versions obtained from
contact-infos. However, this mechanism breaks if there are 2 instances
of the same identity key running on different clusters, because then one
of the two contact-infos have the right shred-version.
If a node has the contact-info with the matching shred-version, then it
will pass all associated crds values even if they belong to the other
instance. So the shred-version check breaks.
As a result we cannot support 2 instances of the same identity key
running on different clusters. To prevent that, this commit is exempting
node-instances from shred-version check so that they are always
propagated across clusters and halt one of the running duplicate
instances.
2021-08-14 00:47:44 +00:00
behzad nouri 7a789e0763
filters for recent contact-infos when checking for live stake (#19204)
Contact-infos are saved to disk:
https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683

and restored on validator start-up:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450

Staked nodes entries will not expire until an epoch after. So when the
validator checks for online stake it is erroneously picking up
contact-infos restored from disk, which breaks the entire
wait-for-supermajority logic:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561

This commit adds an extra check for the age of contact-info entries and
filters out old ones.
2021-08-13 12:12:40 +00:00
behzad nouri f302774cf7
implements copy-on-write for staked-nodes (#19090)
Bank::staked_nodes and Bank::epoch_staked_nodes redundantly clone
staked-nodes HashMap even though an immutable reference will suffice:
https://github.com/solana-labs/solana/blob/a9014cece/runtime/src/vote_account.rs#L77

This commit implements copy-on-write semantics for staked-nodes by
wrapping the underlying HashMap in Arc<...>.
2021-08-10 12:59:12 +00:00
Justin Starry 8817f59b6e
Version transaction message and add new message format (#18725)
* Version transaction message and add new message format

* Update abi digest due to message path change

* Update v0.rs

Fix comment

* Update original.rs

* Update message versions name and address map indexes field name

* s/original/legacy

* update comment

* cargo fmt

* Update abi digest due to legacy rename
2021-08-09 22:03:39 -07:00
behzad nouri 049fb0417f
allows sendmmsg api taking owned values (as well as references) (#18999)
Current signature of api in sendmmsg requires a slice of inner
references:
https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152

That forces the call-site to convert owned values to references even
though doing so is redundant and adds an extra level of indirection:
https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291

This commit expands the api using AsRef and Borrow traits to allow
calling the method with owned values (as well as references like
before).
2021-07-30 20:58:49 +00:00
behzad nouri 81026f9ea5
passes through --allow-private-addr to validators in system perf tests (#18876) 2021-07-29 19:04:45 +00:00
behzad nouri f1198fc6d5
filters crds values in parallel when responding to gossip pull-requests (#18877)
When responding to gossip pull-requests, filter_crds_values takes a lot of time
while holding onto read-lock:
https://github.com/solana-labs/solana/blob/f51d64868/gossip/src/crds_gossip_pull.rs#L509-L566

This commit will filter-crds-values in parallel using rayon thread-pools.
2021-07-26 17:13:11 +00:00
behzad nouri d2d5f36a3c
adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
carllin 588c0464b8
Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
behzad nouri bbd22f06f4
implements generic lookups into gossip crds table (#18765)
This commit adds CrdsEntry trait which allows generic lookups into crds
table. For example to get ContactInfo or LowestSlot associated with a
Pubkey, the lookup code would be respectively:
   crds.get::<&ContactInfo>(pubkey)
   crds.get::<&LowestSlot>(pubkey)
2021-07-21 12:16:26 +00:00
Justin Starry 207c90bd8b
Shorten long SerializeWith type paths in abi digest (#18734) 2021-07-20 08:59:50 -05:00
behzad nouri 8da261cf5c
locks crds only once in ClusterInfo::repair_peers (#18752)
ClusterInfo::repair_peers locks crds table twice, and shows performance
regression if the RwLock is not reader-preferred:
https://github.com/solana-labs/solana/blob/269028360/gossip/src/cluster_info.rs#L1188-L1210
2021-07-18 16:55:58 +00:00
behzad nouri e316586516 excludes private ip addresses 2021-07-16 20:05:48 -06:00
Jeff Biseda ae5ad5cf9b
sendmmsg cleanup #18589
Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.
2021-07-16 14:36:49 -07:00
Brian Anderson 37ee0b5599
Eliminate doc warnings and fix some markdown (#18566)
* Fix link target in doc comment

* Fix formatting of log examples in process_instruction

* Fix doc markdown in solana-gossip

* Fix doc markdown in solana-runtime

* Escape square braces in doc comments to avoid warnings

* Surround 'account references' doc items in code spans to avoid warnings

* Fix code block in loader_upgradeable_instruction

* Fix doctest for loader_upgradable_instruction
2021-07-16 00:40:07 +00:00
behzad nouri cf31afdd6a
makes CrdsGossip thread-safe (#18615) 2021-07-14 22:27:17 +00:00
sakridge 7f2254225e
Move entry/poh to own crate to speed up poh bench build (#18225) 2021-07-14 14:16:29 +02:00
behzad nouri c90af3cd63
removes id from push_lowest_slot args (#18645)
push_lowest_slot cannot sign the new crds-value unless the id (pubkey)
argument passed-in is the same pubkey as in ClusterInfo::keypair(), in
which case the id argument is redundant:
https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845

Additionally, the lookup is done with self.id(), but insert is done with
the id argument, which is logically a bug.
2021-07-13 22:32:59 +00:00
behzad nouri 90f8cf0920
makes CrdsGossipPush thread-safe (#18581) 2021-07-13 14:04:25 +00:00
behzad nouri e7a1f2c9b0
makes CrdsGossipPull thread-safe (#18578) 2021-07-11 15:32:10 +00:00
carllin 175083c4c1
Add updated duplicate broadcast test (#18506) 2021-07-10 22:22:07 -07:00
behzad nouri 918b5c28b2
removes redundant (mutable) self receivers (#18574) 2021-07-10 22:16:33 +00:00
behzad nouri fd9c10c2e2
adds a generic implementation of Gossip{Read,Write}Lock (#18559) 2021-07-10 14:13:52 +00:00
behzad nouri 4e1333fbe6
removes id and shred_version from CrdsGossip (#18505)
ClusterInfo is the gateway to CrdsGossip function calls, and it already
has node's pubkey and shred version (full ContactInfo and Keypair in
fact).
Duplicating these data in CrdsGossip adds redundancy and possibility for
bugs should they not be consistent with ClusterInfo.
2021-07-09 13:10:08 +00:00
behzad nouri 27cc7577a1
skips process_push_message for local messages (#18493)
received_cache is not relevant for local messages, and does not need to
be updated:
https://github.com/solana-labs/solana/blob/92c5cdab6/gossip/src/crds_gossip_push.rs#L166-L189
2021-07-09 01:42:13 +00:00
Michael Vines 1e0942e900 Rename ClusterInfo::send_vote to ClusterInfo::send_transaction 2021-07-07 15:51:14 -07:00
Justin Starry 92c5cdab62
Fix cargo check (#18499) 2021-07-07 14:21:08 -05:00
behzad nouri dba42c57b4
implements an unbiased weighted shuffle using binary indexed tree (#18343)
Current implementation of weighted_shuffle:
https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37
uses a heuristic which results in biased samples.

For example, if the weights are [1, 10, 100], then the 3rd index should
come first 100 times more often than the 1st index. However,
weighted_shuffle is picking the 3rd index 200+ times more often than the
1st index, showing a disproportional bias in favor of higher weights.

This commit implements weighted shuffle using binary indexed tree to
maintain cumulative sum of weights while sampling. The resulting samples
are demonstrably unbiased and precisely proportional to the weights.

Additionally the iterator interface allows to skip computations when
not all indices are processed.

Of the use cases of weighted_shuffle, changing turbine code requires
feature-gating to keep the cluster in sync. That is not updated in
this commit, but can be done together with future updates to turbine.
2021-07-07 14:14:43 +00:00