solana

Commit Graph

Author	SHA1	Message	Date
behzad nouri	abfaf06e87	counts gossip packets received before excess packets are dropped (#28086 ) Currently, gossip packets are counted after excess packets are dropped. This makes it difficult to debug gossip traffic spikes if the majority of the packets are dropped. This commit instead counts gossip packets received before excess packets are dropped	2022-09-27 13:43:35 +00:00
Jeff Biseda	8b0f9b4917	make ping cache rate limit delay configurable (#27955 )	2022-09-26 14:16:56 -07:00
behzad nouri	f49beb0cbc	caches reed-solomon encoder/decoder instance (#27510 ) ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache: https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466 In order to cache this computation, this commit caches the reed-solomon encoder/decoder instance for each (data_shards, parity_shards) pair.	2022-09-25 18:09:47 +00:00
behzad nouri	97c9af4c6b	plumbs through flag to generate merkle variant of shreds	2022-09-23 16:45:18 +00:00
behzad nouri	9a57c64f21	patches clippy errors from new rust nightly release (#27996 )	2022-09-22 22:23:03 +00:00
Will Hickey	4aa2a42cc7	Fix test_bench_tps_local_cluster_solana (#27448 ) * Fix test_bench_tps_local_cluster_solana * Remove #[ignore] annotations from dos tests (which are also fixed by this change) * Remove #[ignore] annotations from local cluster tests (which are also fixed by this change)	2022-08-30 13:04:31 -05:00
Tyera Eulberg	b8b3d723da	Use new client crates (#27360 ) * Update ancillary cli crates * Update cli * Update command-line tools * Update rpc, etc * Update client-test * Update core, validator * Update local-cluster	2022-08-24 10:47:02 -06:00
Brennan Watt	e4a7d01e10	Rust v1.63 (#27303 ) * Upgrade to Rust v1.63.0 * Add nightly_clippy_allows * Resolve some new clippy nightly lints * Increase QUIC packets completion timeout * Update quinn-udp crate Co-authored-by: Michael Vines <mvines@gmail.com>	2022-08-22 18:01:03 -07:00
Michael Vines	3f4731b37f	Standardize thread names Tenets: 1. Limit thread names to 15 characters 2. Prefix all Solana-controlled threads with "sol" 3. Use Camel case. It's more character dense than Snake or Kebab case	2022-08-20 07:49:39 -07:00
Brennan Watt	7573000d87	Revert "Rust v1.63.0 (#27148 )" (#27245 ) This reverts commit `a2e7bdf50a`.	2022-08-19 09:19:44 +01:00
behzad nouri	6928b2a5af	adds hash domain to ping-pong protocol (#27193 ) In order to maintain backward compatibility, for now the responding node will hash the token both with and without domain so that the other node will accept the response regardless of its upgrade status. Once the cluster has upgraded to the new code, we will remove the legacy domain = false case.	2022-08-18 22:39:31 +00:00
Brennan Watt	a2e7bdf50a	Rust v1.63.0 (#27148 ) * Upgrade to Rust v1.63.0 * Add nightly_clippy_allows * Resolve some new clippy nightly lints * Increase QUIC packets completion timeout Co-authored-by: Michael Vines <mvines@gmail.com>	2022-08-17 15:48:33 -07:00
behzad nouri	fea66c8b63	derives Error trait for ClusterInfoError and core::result::Error (#27208 )	2022-08-17 22:01:51 +00:00
Brennan Watt	521c550ccd	Sleep between vote refreshes (#27115 ) * Sleep between vote refreshes in unit test	2022-08-12 13:45:00 -07:00
Tyera Eulberg	45c0da8597	Fix quic client on TestValidator, alternative (#27046 ) Add new method to enable custom offset	2022-08-10 15:27:12 +00:00
Jeff Biseda	857be1e237	sign repair requests (#26833 )	2022-07-31 15:48:51 -07:00
behzad nouri	128226c6cc	patches flaky test_push_votes_with_tower (#26554 ) cargo test --package solana-gossip --release test_push_votes_with_tower occasionally fails because with --release all votes are generated at the same wallclock (milliseconds resolution) and so the new ones will not necessarily override existing entries in the table. The commit ensures that the new vote is pushed with a wallclock later than existing entries.	2022-07-11 12:56:31 +00:00
behzad nouri	ba785cf8ab	removes erroneous uses of std::mem::swap (#26536 ) All instances should be replace by std::mem::{replace,take}, or just plain assignment.	2022-07-11 11:33:15 +00:00
behzad nouri	df616a0dda	removes redundant clone in gossip PruneData::signable_data (#26510 ) PruneData::signable_data redundantly clones inner fields, while only references suffice: https://github.com/solana-labs/solana/blob/d1370f2c7/gossip/src/cluster_info.rs#L219-L233	2022-07-10 13:13:07 +00:00
Greg Cusack	032bee13ab	Add Gossip Loop metrics (#26195 ) * add three gossip metrics measuring gossip loop times * add 5 metrics * rm space * rm space * Update SECURITY.md - fix nav link - add bounty split policy for duplicate reports * Add transaction index in slot to geyser plugin TransactionInfo (#25688) * Define shuffle to prep using same shuffle for multiple slices * Determine transaction indexes and plumb to execute_batch * Pair transaction_index with transaction in TransactionStatusService * Add new ReplicaTransactionInfoVersion * Plumb transaction_indexes through BankingStage * Prepare BankingStage to receive transaction indexes from PohRecorder * Determine transaction indexes in PohRecorder; add field to WorkingBank * Add PohRecorder::record unit test * Only pass starting_transaction_index around PohRecorder * Add helper structs to simplify test DashMap * Pass entry and starting-index into process_entries_with_callback together * Add tx-index checks to test_rebatch_transactions * Revert shuffle definition and use zip/unzip * Only zip/unzip if randomize * Add confirm_slot_entries test * Review nits * Add type alias to make sender docs more clear * Update SECURITY.md finish filling out the table.... * rpc: fix possible deadlock in rpc (#26051) * Add StatusCache::root_slot_deltas() and use it (#26170) * Remove InMemAccountsIndex::map() and use map_internal directly (#26189) * [quic]Decrement total_streams correctly (#26158) * remove comment * alphabetical metrics. no abbreviations * remove trailing white space * cargo fmt to update code format/readability Co-authored-by: Trent Nelson <trent@solana.com> Co-authored-by: Tyera Eulberg <tyera@solana.com> Co-authored-by: Boqin Qin(秦伯钦) <Bobbqqin@gmail.com> Co-authored-by: Brooks Prumo <brooks@solana.com> Co-authored-by: Miles Obare <bdhobare@gmail.com>	2022-06-29 11:55:41 -06:00
behzad nouri	f534b8981b	maps number of data shreds to erasure batch size (#25917 ) In prepration of https://github.com/solana-labs/solana/pull/25807 which reworks erasure batch sizes, this commit: * adds a helper function mapping the number of data shreds to the erasure batch size. * adds ProcessShredsStats to Shredder::entries_to_shreds in order to replace and remove entries_to_data_shreds from the public interface.	2022-06-23 13:27:54 +00:00
Jon Cinque	79a8ecd0ac	client: Remove static connection cache, plumb it instead (#25667 ) * client: Remove static connection cache, plumb it instead * Add TpuClient::new_with_connection_cache to not break downstream * Refactor get_connection and RwLock into ConnectionCache * Fix merge conflicts from new async TpuClient * Remove `ConnectionCache::set_use_quic` * Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()	2022-06-08 13:57:12 +02:00
HaoranYi	4223f82922	Fix format alignment for cluster info trace (#25741 ) * double shrinking * remove pre/post shrink time * fix cluster info trace alignemnt * add test * format * typo * add checks in cluster info trace test * update cargo lock * clippy * clippy * move regex deps to dev deps * cargo lock	2022-06-06 09:51:00 -05:00
Pankaj Garg	1c2ae470c5	Fix forwarding of transactions over QUIC (#25674 ) * Spawn QUIC server to receive forwarded txs * Update validator port range * forward votes using UDP * no forwarding from unstaked nodes * forwarding stats in banking stage * fix test builds * fix lifetime of forward sender	2022-06-02 11:14:58 -07:00
behzad nouri	69cbbaf483	patches flaky gossip pull from entrypoint test (#25589 ) test_pull_from_entrypoint_if_not_present relies on a deterministic ordering for the entries when generating gossip pull requests. https://github.com/solana-labs/solana/pull/25460 changed an intermediate type for gossip pull-requests from Vec to HashMap, and so the entries are no longer deterministically ordered. This causes the test to be flaky. The commit updates the test so that it no longer relies on the ordering.	2022-05-26 18:58:06 +00:00
behzad nouri	1925b4f5cb	fans out gossip pull-requests to many randomly selected peers (#25460 ) Each time a node generates gossip pull-requests, it sends out all the requests to a single randomly selected peer: https://github.com/solana-labs/solana/blob/fd7ad31ee/gossip/src/crds_gossip_pull.rs#L253-L266 This causes a burst of pull-requests at a single node at once. In order to make gossip in-bound traffic less bursty, this commit fans out gossip pull-requests to several randomly selected peers. This should reduce spikes in inbound gossip traffic without changing the average load which may help reduce number of times outbound data budget is exhausted when responding to gossip pull-requests at the receiving node, and reduce number of pull-requests dropped.	2022-05-26 12:45:53 +00:00
Justin Starry	cad1c41ce2	Add Packet::deserialize_slice convenience method	2022-05-24 17:31:14 +08:00
steviez	ec7ca411dd	Make PacketBatch packets vector non-public (#25413 ) Upcoming changes to PacketBatch to support variable sized packets will modify the internals of PacketBatch. So, this change removes usage of the internal packet struct and instead uses accessors (which are currently just wrappers of Vector functions but will change down the road).	2022-05-23 15:30:15 -05:00
behzad nouri	c248fb3f51	renames Packet Meta::{,set_}addr methods to {,set_}socket_addr (#25478 ) In order to distinguish between Meta.addr field which is an IpAddr and the methods which refer to a SocketAddr.	2022-05-23 15:48:59 +00:00
Michael Vines	3608801a54	Avoid clippy::significant_drop_in_scrutinee	2022-05-22 22:22:21 -07:00
Michael Vines	b05c7d91ed	Fix derive_partial_eq_without_eq clippy lint	2022-05-22 22:22:21 -07:00
Michael Vines	c54e06355f	voteSubscribe pubsub notification now includes the vote transaction signature (#25291 )	2022-05-19 18:28:46 -07:00
sakridge	3d96a1ab76	Block packets in vote-only mode (#24906 )	2022-05-14 17:53:37 +02:00
Justin Starry	7100f1c94b	Collect stats in streamer receiver and report fetch stage metrics (#25010 )	2022-05-06 02:56:18 +08:00
steviez	b48fd4eec2	Construct PacketBatches from PongMessages directly (#24708 ) Serialize pongs directly into PacketBatch to save copying the data from intermediate packets into PacketBatch.	2022-04-26 21:30:00 -05:00
behzad nouri	12ae8d3be5	returns Error when Shred::sanitize fails (#24653 ) Including the error in the output allows to debug when Shred::sanitize fails.	2022-04-25 23:19:37 +00:00
behzad nouri	895f76a93c	hides implementation details of shred from its public interface (#24563 ) Working towards embedding versioning into shreds binary, so that a new variant of shred struct can include merkle tree hashes of the erasure set.	2022-04-25 12:43:22 +00:00
behzad nouri	1d50832389	replaces counters with datapoints in gossip metrics (#24451 )	2022-04-18 23:14:59 +00:00
HaoranYi	6d1b6bdd7c	typo (#24400 )	2022-04-18 08:52:56 -05:00
Tyera Eulberg	afeb1d3cca	Bump lru crate (#24150 )	2022-04-06 16:18:42 -06:00
behzad nouri	cd09390367	reduces gossip crds stats (#24132 )	2022-04-06 15:35:25 +00:00
BG Zhu	22224127e0	Refactor thin_client::create_client (#24067 ) Refactor the thin_client::create_client to take addresses separately instead of as a tuple Co-authored-by: Bijie Zhu <bijiezhu@Bijies-MBP.cable.rcn.com>	2022-04-06 11:03:38 -04:00
ryleung-solana	a38bd4acc8	Use LRU in connection-cache (#24109 ) Switch to using LRU for connection-cache	2022-04-06 10:58:32 -04:00
behzad nouri	db23295e1c	removes legacy weighted_shuffle and weighted_best methods (#24125 ) Older weighted_shuffle is based on a heuristic which results in biased samples as shown in: https://github.com/solana-labs/solana/pull/18343 and can be replaced with WeightedShuffle. Also, as described in: https://github.com/solana-labs/solana/pull/13919 weighted_best can be replaced with rand::distributions::WeightedIndex, or WeightdShuffle::first.	2022-04-05 19:19:22 +00:00
behzad nouri	2b718d00b0	removes legacy compatibility turbine peers shuffle code	2022-04-05 12:04:12 +00:00
behzad nouri	7cb3b6cbe2	demotes WeightedShuffle failures to error metrics (#24079 ) Since call-sites are calling unwrap anyways, panicking seems too punitive for our use cases.	2022-04-03 16:20:06 +00:00
ryleung-solana	8b72200afb	Thin client quic (#23973 ) Change thin-client to use connection-cache	2022-03-31 15:47:00 -04:00
Michael Vines	7ef18f220a	Update Version CrdsData on node identity changes	2022-03-28 15:57:16 -07:00
ryleung-solana	17b00ad3a4	Add quic-client module (#23166 ) * Add quic-client module to send transactions via quic, abstracted behind the TpuConnection trait (along with a legacy UDP implementation of TpuConnection) and change thin-client to use TpuConnection	2022-03-09 21:33:05 -05:00
sakridge	a4f4ac5279	add plumbing to allow for arbitrary tpu address in gossip (#22703 ) * add plumbing to allow for arbitrary tpu address in gossip * make clippy happy * Review comments Co-authored-by: CherryWorm <nico.gruendel@web.de>	2022-03-02 09:42:14 +01:00
behzad nouri	1282277126	bumps up crds-shards-bits (#23220 ) The commit adjust CRDS_SHARDS_BITS up to be in-line with mask_bits in gossip pull request. This will avoid redundant filtering of irrelevant crds entries when responding to pull requests.	2022-03-01 15:14:11 +00:00
sakridge	514aab46d9	Search for consecutive ports (#22979 )	2022-02-07 17:53:40 +01:00
sakridge	5a230f418d	Add quic port for accepting transactions (#22753 ) using quinn library streamer: Sign TLS cert with validator identity key Handle multiple incoming chunks	2022-02-04 15:27:09 +01:00
behzad nouri	e3b137066d	caches WeightedShuffle struct in ClusterNodes (#22877 ) Instead of reconstructing WeightedShuffle struct for each shred broadcast or retransmit, we can use the same struct with minimal mutations.	2022-02-02 15:12:26 +00:00
behzad nouri	45e09664b8	removes Rng field from WeightedShuffle struct (#22850 )	2022-02-01 15:27:23 +00:00
behzad nouri	604ca9316c	includes zero weighted entries in WeightedShuffle (#22829 ) Current WeightedShuffle implementation excludes zero weighted entries from the shuffle: https://github.com/solana-labs/solana/blob/13e631dcf/gossip/src/weighted_shuffle.rs#L29-L30 Though mathematically this might make more sense, for our use-cases (turbine specifically), this results in less efficient code: https://github.com/solana-labs/solana/blob/13e631dcf/core/src/cluster_nodes.rs#L409-L430 This commit changes the implementation so that zero weighted indices are also included in the shuffle but appear only at the end after non-zero weighted indices.	2022-01-31 16:23:50 +00:00
Michael Vines	6d5bbca630	Pacify clippy	2022-01-21 19:12:57 -08:00
Justin Starry	7f20c6149e	Refactor: move simple vote parsing to runtime (#22537 )	2022-01-20 10:39:21 +08:00
anatoly yakovenko	d343713f61	Optimize packet dedup (#22571 ) * Use bloom filter to dedup packets * dedup first * Update bloom/src/bloom.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * fixup * fixup * fixup Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>	2022-01-19 13:58:20 -08:00
Jeff Biseda	8b66625c95	convert std::sync::mpsc to crossbeam_channel (#22264 )	2022-01-11 02:44:46 -08:00
behzad nouri	49da347d84	limits gossip vote stats to the top most voted slots (#22416 )	2022-01-10 21:23:41 +00:00
behzad nouri	c9c78622a8	discards serialized gossip crds votes if cannot parse tx (#22129 )	2021-12-29 19:31:26 +00:00
behzad nouri	65d59f4ef0	tracks erasure coding shreds' indices explicitly (#21822 ) The indices for erasure coding shreds are tied to data shreds: https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921 However with the upcoming changes to erasure schema, there will be more erasure coding shreds than data shreds and we can no longer infer coding shreds indices from data shreds. The commit adds constructs to track coding shreds indices explicitly.	2021-12-19 22:37:55 +00:00
carllin	7f6fb6937a	Ensure AncestorHashesSerice selects an open port (#21919 )	2021-12-18 00:44:01 -05:00
Jeff Biseda	97a1fa10a6	streamer send destination metrics for repair, gossip (#21564 )	2021-12-17 15:21:05 -08:00
behzad nouri	89d66c3210	removes next_shred_index from return value of entries to shreds api (#21961 ) next-shred-index is already readily available from returned data shreds. The commit simplifies the api for upcoming changes to erasure coding schema which will require explicit tracking of indices for coding shreds as well as data shreds.	2021-12-17 15:01:55 +00:00
Justin Starry	254ef3e7b6	Rename Packets to PacketBatch (#21794 )	2021-12-11 09:44:15 -05:00
Ashwin Sekar	f0acf7681e	Add vote instructions that directly update on chain vote state (#21531 ) * Add vote state instructions UpdateVoteState and UpdateVoteStateSwitch * cargo tree * extract vote state version conversion to common fn	2021-12-07 16:47:26 -08:00
Michael Vines	b8837c04ec	Reformat imports to a consistent style for imports rustfmt.toml configuration: imports_granularity = "One" group_imports = "One"	2021-12-03 09:19:13 -08:00
behzad nouri	9886366977	exempts AccountsHashes from stake check (#21565 ) Otherwise getHealth fails if account hashes are not propagated.	2021-12-02 18:01:32 +00:00
behzad nouri	57057f8d39	uses enum for shred type Current code is using u8 which does not have any type-safety and can contain invalid values: https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L167 Checks for invalid shred-types are scattered through the code: https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/blockstore.rs#L849-L851 https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L346-L348 The commit uses enum for shred type with #[repr(u8)]. Backward compatibility is maintained by implementing Serialize and Deserialize compatible with u8, and adding a test to assert that.	2021-11-19 14:16:39 +00:00
carllin	b30c94ce55	ClusterInfoVoteListener send only missing votes to BankingStage (#20873 )	2021-11-18 15:20:41 -08:00
behzad nouri	3fc858eb60	adds methods to obtain data/coding shreds indices from ErasureMeta	2021-11-13 17:08:05 +00:00
Michael Keleti	b0ca335463	Rename "trusted" to "known" in `validators/` (#21197 ) * Replaced trusted with known validator * Format Convention	2021-11-12 11:57:55 -07:00
behzad nouri	eea3fb327f	seeds rng for test_build_crds_filter test (#21031 )	2021-10-28 18:29:32 +00:00
behzad nouri	43168e6365	doubles crds unique pubkey capacity (#20947 )	2021-10-26 13:06:55 +00:00
behzad nouri	1297a13586	adds metrics tracking crds writes and votes (#20953 )	2021-10-26 13:02:30 +00:00
Brooks Prumo	14af1957d6	Make pub IncrementalSnapshotHashes fields (#20727 )	2021-10-18 13:38:43 -05:00
behzad nouri	0c0384ec32	revises turbine peers shuffling order (#20480 ) Turbine randomly shuffles cluster nodes on a broadcast tree for each shred. This requires knowing the stakes and nodes' contact-infos (from gossip). However gossip is subject to partitioning and propogation delays. Additionally unstaked nodes may join and leave the cluster at any moment, changing the cluster view from one node to another. This commit: * Always arranges the unstaked nodes at the bottom of turbine broadcast tree. * Staked nodes are always included regardless of if their contact-info is available in gossip or not. * Uses the unbiased WeightedShuffle construct for shuffling nodes.	2021-10-14 15:09:36 +00:00
Brooks Prumo	1fcfbfccbb	Add fn to push IncrementalSnapshotHashes to cluster via gossip (#20395 )	2021-10-08 08:20:35 -05:00
Brooks Prumo	57592e463e	Add get_incremental_snapshot_hash_for_node() to gossip (#20394 )	2021-10-07 19:47:14 -05:00
behzad nouri	0da661de62	adds metrics for number of nodes vs number of pubkeys (#20512 )	2021-10-07 18:56:05 +00:00
Tao Zhu	177a375479	Tpu vote 1.7 (#20187 ) (#20494 ) * Add separate vote processing tpu port * Add feature to send to tpu vote port * Add vote rejecting sigverify mode * use packet.meta.is_simple_vote_tx in place of deserialization * consolidate code that identifies vote tx atcommon path for cpu and gpu * new key for feature set * banking forward tpu vote * add tpu vote port to dockerfile and other review changes * Simplify thread id compare * fix a test; updated cluster_info ABI change Co-authored-by: Tao Zhu <tao@solana.com> Co-authored-by: sakridge <sakridge@gmail.com>	2021-10-07 09:38:23 +00:00
Brooks Prumo	4e3818e5c1	Add CrdsData::IncrementalSnapshotHashes (#20374 )	2021-10-05 09:57:46 -05:00
Brooks Prumo	5d141fe01d	Rename CRDS SnapshotHash to SnapshotHashes (#20421 )	2021-10-04 19:03:28 -05:00
carllin	ee8621a8bd	Add metric measuring number of successfully inserted push messages (#20275 ) * Add number of successfully inserted push messages	2021-09-28 21:41:17 -07:00
behzad nouri	43ed727ba7	reverts #17542 (#20259 ) https://github.com/solana-labs/solana/pull/17542 excludes caller's crds values from pull responses. Reverting that commit so that when a (staked) node restarts, it can obtain its crds values before restart from other nodes.	2021-09-27 22:03:26 +00:00
sakridge	013e1d9d49	Limit transaction forwarding from banking_stage (#19940 )	2021-09-21 08:49:41 -07:00
sakridge	44c8b1bca2	Remove clippy (#19793 )	2021-09-13 20:08:28 -07:00
behzad nouri	d7051b0d21	adds logs when push-vote panics with invalid vote-index (#19485 ) In order to debug this panic on the clusters: panicked at 'assertion failed: (vote_index as usize) < MAX_LOCKOUT_HISTORY', core/src/cluster_info.rs:1012:9	2021-08-31 12:15:07 +00:00
behzad nouri	6909a79b6f	removes require-stake-for-gossip feature (#19476 ) The feature is already activated on all clusters.	2021-08-27 21:17:15 +00:00
behzad nouri	3efccbffab	sends shreds (instead of packets) to retransmit stage Working towards channelling through shreds recovered from erasure codes to retransmit stage.	2021-08-17 13:44:10 +00:00
behzad nouri	140abec6ef	exempts node-instances from shred-version check (#19190 ) Clusters are kept separate using the shred-versions obtained from contact-infos. However, this mechanism breaks if there are 2 instances of the same identity key running on different clusters, because then one of the two contact-infos have the right shred-version. If a node has the contact-info with the matching shred-version, then it will pass all associated crds values even if they belong to the other instance. So the shred-version check breaks. As a result we cannot support 2 instances of the same identity key running on different clusters. To prevent that, this commit is exempting node-instances from shred-version check so that they are always propagated across clusters and halt one of the running duplicate instances.	2021-08-14 00:47:44 +00:00
behzad nouri	7a789e0763	filters for recent contact-infos when checking for live stake (#19204 ) Contact-infos are saved to disk: https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683 and restored on validator start-up: https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450 Staked nodes entries will not expire until an epoch after. So when the validator checks for online stake it is erroneously picking up contact-infos restored from disk, which breaks the entire wait-for-supermajority logic: https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561 This commit adds an extra check for the age of contact-info entries and filters out old ones.	2021-08-13 12:12:40 +00:00
behzad nouri	f302774cf7	implements copy-on-write for staked-nodes (#19090 ) Bank::staked_nodes and Bank::epoch_staked_nodes redundantly clone staked-nodes HashMap even though an immutable reference will suffice: https://github.com/solana-labs/solana/blob/a9014cece/runtime/src/vote_account.rs#L77 This commit implements copy-on-write semantics for staked-nodes by wrapping the underlying HashMap in Arc<...>.	2021-08-10 12:59:12 +00:00
Justin Starry	8817f59b6e	Version transaction message and add new message format (#18725 ) * Version transaction message and add new message format * Update abi digest due to message path change * Update v0.rs Fix comment * Update original.rs * Update message versions name and address map indexes field name * s/original/legacy * update comment * cargo fmt * Update abi digest due to legacy rename	2021-08-09 22:03:39 -07:00
behzad nouri	049fb0417f	allows sendmmsg api taking owned values (as well as references) (#18999 ) Current signature of api in sendmmsg requires a slice of inner references: https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152 That forces the call-site to convert owned values to references even though doing so is redundant and adds an extra level of indirection: https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291 This commit expands the api using AsRef and Borrow traits to allow calling the method with owned values (as well as references like before).	2021-07-30 20:58:49 +00:00
behzad nouri	81026f9ea5	passes through --allow-private-addr to validators in system perf tests (#18876 )	2021-07-29 19:04:45 +00:00
behzad nouri	f1198fc6d5	filters crds values in parallel when responding to gossip pull-requests (#18877 ) When responding to gossip pull-requests, filter_crds_values takes a lot of time while holding onto read-lock: https://github.com/solana-labs/solana/blob/f51d64868/gossip/src/crds_gossip_pull.rs#L509-L566 This commit will filter-crds-values in parallel using rayon thread-pools.	2021-07-26 17:13:11 +00:00
behzad nouri	d2d5f36a3c	adds validator flag to allow private ip addresses (#18850 )	2021-07-23 15:25:03 +00:00
carllin	588c0464b8	Add sampling logic and DuplicateSlotRepairStatus module (#18721 )	2021-07-21 11:15:08 -07:00
behzad nouri	bbd22f06f4	implements generic lookups into gossip crds table (#18765 ) This commit adds CrdsEntry trait which allows generic lookups into crds table. For example to get ContactInfo or LowestSlot associated with a Pubkey, the lookup code would be respectively: crds.get::<&ContactInfo>(pubkey) crds.get::<&LowestSlot>(pubkey)	2021-07-21 12:16:26 +00:00
Justin Starry	207c90bd8b	Shorten long SerializeWith type paths in abi digest (#18734 )	2021-07-20 08:59:50 -05:00
behzad nouri	8da261cf5c	locks crds only once in ClusterInfo::repair_peers (#18752 ) ClusterInfo::repair_peers locks crds table twice, and shows performance regression if the RwLock is not reader-preferred: https://github.com/solana-labs/solana/blob/269028360/gossip/src/cluster_info.rs#L1188-L1210	2021-07-18 16:55:58 +00:00
behzad nouri	e316586516	excludes private ip addresses	2021-07-16 20:05:48 -06:00
Jeff Biseda	ae5ad5cf9b	sendmmsg cleanup #18589 Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.	2021-07-16 14:36:49 -07:00
Brian Anderson	37ee0b5599	Eliminate doc warnings and fix some markdown (#18566 ) * Fix link target in doc comment * Fix formatting of log examples in process_instruction * Fix doc markdown in solana-gossip * Fix doc markdown in solana-runtime * Escape square braces in doc comments to avoid warnings * Surround 'account references' doc items in code spans to avoid warnings * Fix code block in loader_upgradeable_instruction * Fix doctest for loader_upgradable_instruction	2021-07-16 00:40:07 +00:00
behzad nouri	cf31afdd6a	makes CrdsGossip thread-safe (#18615 )	2021-07-14 22:27:17 +00:00
sakridge	7f2254225e	Move entry/poh to own crate to speed up poh bench build (#18225 )	2021-07-14 14:16:29 +02:00
behzad nouri	c90af3cd63	removes id from push_lowest_slot args (#18645 ) push_lowest_slot cannot sign the new crds-value unless the id (pubkey) argument passed-in is the same pubkey as in ClusterInfo::keypair(), in which case the id argument is redundant: https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845 Additionally, the lookup is done with self.id(), but insert is done with the id argument, which is logically a bug.	2021-07-13 22:32:59 +00:00
behzad nouri	90f8cf0920	makes CrdsGossipPush thread-safe (#18581 )	2021-07-13 14:04:25 +00:00
behzad nouri	e7a1f2c9b0	makes CrdsGossipPull thread-safe (#18578 )	2021-07-11 15:32:10 +00:00
carllin	175083c4c1	Add updated duplicate broadcast test (#18506 )	2021-07-10 22:22:07 -07:00
behzad nouri	918b5c28b2	removes redundant (mutable) self receivers (#18574 )	2021-07-10 22:16:33 +00:00
behzad nouri	fd9c10c2e2	adds a generic implementation of Gossip{Read,Write}Lock (#18559 )	2021-07-10 14:13:52 +00:00
behzad nouri	4e1333fbe6	removes id and shred_version from CrdsGossip (#18505 ) ClusterInfo is the gateway to CrdsGossip function calls, and it already has node's pubkey and shred version (full ContactInfo and Keypair in fact). Duplicating these data in CrdsGossip adds redundancy and possibility for bugs should they not be consistent with ClusterInfo.	2021-07-09 13:10:08 +00:00
behzad nouri	27cc7577a1	skips process_push_message for local messages (#18493 ) received_cache is not relevant for local messages, and does not need to be updated: https://github.com/solana-labs/solana/blob/92c5cdab6/gossip/src/crds_gossip_push.rs#L166-L189	2021-07-09 01:42:13 +00:00
Michael Vines	1e0942e900	Rename ClusterInfo::send_vote to ClusterInfo::send_transaction	2021-07-07 15:51:14 -07:00
Justin Starry	92c5cdab62	Fix cargo check (#18499 )	2021-07-07 14:21:08 -05:00
behzad nouri	dba42c57b4	implements an unbiased weighted shuffle using binary indexed tree (#18343 ) Current implementation of weighted_shuffle: https://github.com/solana-labs/solana/blob/b08f8bd1b/gossip/src/weighted_shuffle.rs#L11-L37 uses a heuristic which results in biased samples. For example, if the weights are [1, 10, 100], then the 3rd index should come first 100 times more often than the 1st index. However, weighted_shuffle is picking the 3rd index 200+ times more often than the 1st index, showing a disproportional bias in favor of higher weights. This commit implements weighted shuffle using binary indexed tree to maintain cumulative sum of weights while sampling. The resulting samples are demonstrably unbiased and precisely proportional to the weights. Additionally the iterator interface allows to skip computations when not all indices are processed. Of the use cases of weighted_shuffle, changing turbine code requires feature-gating to keep the cluster in sync. That is not updated in this commit, but can be done together with future updates to turbine.	2021-07-07 14:14:43 +00:00
behzad nouri	04787be8b1	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites.	2021-07-07 00:35:25 +00:00
Michael Vines	c17451ca73	Acquire instance read lock once	2021-07-01 17:50:04 -07:00
Michael Vines	db3a9ae7fb	Fully replace NodeInstance	2021-07-01 17:50:04 -07:00
Michael Vines	71efac46cb	Hoist keypair() out of some loops	2021-07-01 17:50:04 -07:00
Michael Vines	b6792a3328	Add ability to change the validator identity at runtime	2021-07-01 17:50:04 -07:00
Michael Vines	bf157506e8	Remove id ref	2021-07-01 17:50:04 -07:00
Ashwin Sekar	f4fb5de545	Consider all peers as potential candidates during pull-request in case of offline nodes (#18333 ) * Try all peers during pull-request in case of offline nodes * fix clippy err	2021-07-01 12:00:10 -07:00
behzad nouri	9d983a34a0	debug logs when crds table trim failed (#18307 ) reports of this error being possibly spammy: https://discord.com/channels/428295358100013066/689412830075551748/859441080054710293 The commit changes the log level to debug. Additionally adding a new metric to understand the frequency of this error.	2021-06-29 19:39:46 +00:00
behzad nouri	d7b8329b45	removes repeated calls to ClusterInfo::id in iterators and contact-info clone (#18174 ) Calling ClusterInfo::id repeatedly in for loops or iterators is inefficient, because it acquires a lock on ClusterInfo.my_contact_info, and clones the entire contact-info.	2021-06-23 16:30:14 +00:00
behzad nouri	69a5f0e6cd	filters crds values obtained through gossip by their shred version (#18072 ) filter_by_shred_version does not check the shred-version of the owner of the crds-value. It only checks the shred-version of the node which is relaying the value: https://github.com/solana-labs/solana/blob/5cc073420/gossip/src/cluster_info.rs#L2274-L2289 So crds-values with different shred versions can still pass through this function as long as they are relayed by a node with matching shred version; and so, a single node can bridge different shred values through-out the cluster.	2021-06-23 14:16:05 +00:00
Michael Vines	84b9de8c18	Shredder no longer holds a keypair	2021-06-21 21:29:52 -07:00
Michael Vines	553fc210f5	Remove duplicated id field	2021-06-21 21:29:52 -07:00
behzad nouri	598093b5db	adds shred-version to ip-echo-server response When starting a validator, the node initially joins gossip with shred_verison = 0, until it adopts the entrypoint's shred-version: https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417 Depending on the load on the entrypoint, this adopting entrypoint shred-version through gossip sometimes becomes very slow, and causes several problems in gossip because we have to partially support shred_version == 0 which is a source of leaking crds values from one cluster to another. e.g. see https://github.com/solana-labs/solana/pull/17899 and the other linked issues there. In order to remove shred_version == 0 from gossip, this commit adds shred-version to ip-echo-server response. Once the entrypoints are updated, on validator start-up, if --expected_shred_version is not specified we will obtain shred-version from the entrypoint using ip-echo-server.	2021-06-21 19:37:16 +00:00
Alexander Meißner	789f33e8db	chore: cargo fmt	2021-06-18 10:42:46 -07:00
Alexander Meißner	6514096a67	chore: cargo +nightly clippy --fix -Z unstable-options	2021-06-18 10:42:46 -07:00
behzad nouri	5a99fa3790	adds mapping from nodes pubkeys to their shred-version (#17940 ) Crds values of nodes with different shred versions are creeping into gossip table resulting in runtime issues as the one addressed in: https://github.com/solana-labs/solana/pull/17899 This commit works towards enforcing more checks and filtering based on shred version by adding necessary mapping and api to gossip table. Once populated, pubkey->shred-version mapping persists as long as there are any values associated with the pubkey.	2021-06-18 15:56:04 +00:00
sakridge	eeee75c5be	Don't use pinned memory when unnecessary (#17832 ) Reports of excessive GPU memory usage and errors from cudaHostRegister. There are some cases where pinning is not required.	2021-06-14 16:10:04 +02:00
behzad nouri	cca46308bc	short cuts expiration check if origin's contact-info is still valid (#17918 ) Crds::find_old_labels can skip checking values timestamps if the origin's contact info hasn't expired yet: https://github.com/solana-labs/solana/blob/985280ec0/gossip/src/crds.rs#L394-L408	2021-06-13 19:47:07 +00:00
behzad nouri	985280ec0b	excludes epoch-slots from nodes with unknown or different shred version (#17899 ) Inspecting TDS gossip table shows that crds values of nodes with different shred-versions are creeping in. Their epoch-slots are accumulated in ClusterSlots causing bogus slots very far from current root which are not purged and so cause ClusterSlots keep consuming more memory: https://github.com/solana-labs/solana/issues/17789 https://github.com/solana-labs/solana/issues/14366#issuecomment-769896036 https://github.com/solana-labs/solana/issues/14366#issuecomment-832754654 This commit updates ClusterInfo::get_epoch_slots, and discards entries from nodes with unknown or different shred-version. Follow up commits will patch gossip not to waste bandwidth and memory over crds values of nodes with different shred-version.	2021-06-13 14:08:08 +00:00
behzad nouri	cab30e2356	parallelizes gossip packets receiver with processing of requests (#17647 ) Gossip packet processing is composed of two stages: * The first is consuming packets from the socket, deserializing, sanitizing and verifying them: https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2510-L2521 * The second is actually processing the requests/messages: https://github.com/solana-labs/solana/blob/7f0349b29/gossip/src/cluster_info.rs#L2585-L2605 The former does not acquire any locks and so can be parallelized with the later, allowing better pipelineing properties and smaller latency in responding to gossip requests or propagating messages.	2021-06-07 18:36:06 +00:00
behzad nouri	60b0a13444	writes epoch-slots to crds table synchronously (#17719 ) epoch-slots may be overwritten before they are written to crds table: https://github.com/solana-labs/solana/issues/17711 This commit writes new epoch-slots to crds table synchronously with push_epoch_slots. The functions is still not thread-safe as commented in the code, however currently only one threads is invoking this code.	2021-06-04 13:56:51 +00:00
behzad nouri	be957f25c9	adds fallback logic if retransmit multicast fails (#17714 ) In retransmit-stage, based on the packet.meta.seed and resulting children/neighbors, each packet is sent to a different set of peers: https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457 However, current code errors out as soon as a multicast call fails, which will skip all the remaining packets: https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can exacerbate packets loss in turbine. This commit: * keeps iterating over retransmit packets for loop even if some intermediate sends fail. * adds a fallback to UdpSocket::send_to if multicast fails. Recent discord chat: https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733	2021-06-04 12:16:37 +00:00
Tyera Eulberg	3a647c4bea	Rename ValidatorExit and move to sdk (#17728 )	2021-06-04 03:06:13 +00:00
behzad nouri	7cf6e66ddd	excludes caller's crds values from pull responses (#17542 ) If the crds entry belongs to the caller itself, then the caller will always have the more recent version of it, regardless of it being filtered out by the bloom filter or not. The exception is node-instance types which are meant to detect duplicate running instances, and those are exempted.	2021-05-28 13:19:14 +00:00
Tyera Eulberg	9a5330b7eb	Move gossip modules into solana-gossip crate (#17352 ) * Move gossip modules to solana-gossip * Update Protocol abi digest due to move * Move gossip benches and hook up CI * Remove unneeded Result entries * Single use statements	2021-05-26 09:15:46 -06:00
behzad nouri	cf1acfb021	uses Duration type for gossip discover timeout	2021-05-22 19:17:36 +00:00
Michael Vines	a911ae00ba	clippy	2021-04-18 20:55:02 -07:00
Michael Vines	24ab84936e	Break up RPC API into three categories: minimal, full and admin	2021-03-04 16:39:44 -08:00
Michael Vines	328f59ebef	--gossip-host may now be specified with --entrypoint	2020-11-13 06:20:15 +00:00
Michael Vines	a1e2357d12	`solana-gossip spy` can now be given an identity keypair (`--identity` argument)	2020-08-22 17:00:50 -07:00

1 2 3 4 5 ...

292 Commits