solana

Commit Graph

Author	SHA1	Message	Date
Michael Vines	3f4731b37f	Standardize thread names Tenets: 1. Limit thread names to 15 characters 2. Prefix all Solana-controlled threads with "sol" 3. Use Camel case. It's more character dense than Snake or Kebab case	2022-08-20 07:49:39 -07:00
behzad nouri	3b87aa9227	reverts wide fanout in broadcast when the root node is down (#26359 ) A change included in https://github.com/solana-labs/solana/pull/20480 was that when the root node in turbine broadcast tree is down, the leader will broadcast the shred to all nodes in the first layer. The intention was to mitigate the impact of dead nodes on shreds propagation, because if the root node is down, then the entire cluster will miss out the shred. On the other hand, if x% of stake is down, this will cause 200*x% + 1 packets/shreds ratio at the broadcast stage which might contribute to line-rate saturation and packet drop. To avoid this bandwidth saturation issue, this commit reverts that logic and always broadcasts shreds from the leader only to the root node. As before we rely on erasure codes to recover shreds lost due to staked nodes being offline.	2022-08-16 19:40:06 +00:00
behzad nouri	88599fd760	skips shreds deserialization before retransmit (#26230 ) Fully deserializing shreds in window-service before sending them to retransmit stage adds latency to shreds propagation. This commit instead channels through the payload and relies on only partial deserialization of a few required fields: slot, shred-index, shred-type.	2022-06-30 12:13:00 +00:00
behzad nouri	f534b8981b	maps number of data shreds to erasure batch size (#25917 ) In prepration of https://github.com/solana-labs/solana/pull/25807 which reworks erasure batch sizes, this commit: * adds a helper function mapping the number of data shreds to the erasure batch size. * adds ProcessShredsStats to Shredder::entries_to_shreds in order to replace and remove entries_to_data_shreds from the public interface.	2022-06-23 13:27:54 +00:00
behzad nouri	fe3c1d3d49	removes erroneous uses of &Arc<...> from broadcast-stage (#25962 )	2022-06-15 13:44:24 +00:00
Michael Vines	b05c7d91ed	Fix derive_partial_eq_without_eq clippy lint	2022-05-22 22:22:21 -07:00
behzad nouri	895f76a93c	hides implementation details of shred from its public interface (#24563 ) Working towards embedding versioning into shreds binary, so that a new variant of shred struct can include merkle tree hashes of the erasure set.	2022-04-25 12:43:22 +00:00
Jeff Biseda	8b66625c95	convert std::sync::mpsc to crossbeam_channel (#22264 )	2022-01-11 02:44:46 -08:00
behzad nouri	65d59f4ef0	tracks erasure coding shreds' indices explicitly (#21822 ) The indices for erasure coding shreds are tied to data shreds: https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921 However with the upcoming changes to erasure schema, there will be more erasure coding shreds than data shreds and we can no longer infer coding shreds indices from data shreds. The commit adds constructs to track coding shreds indices explicitly.	2021-12-19 22:37:55 +00:00
carllin	385efae4b3	Remove need to send bank in retransmit request from ReplayStage (#21943 ) * Remove need to send bank in retransmitter	2021-12-16 21:11:01 -05:00
Michael Vines	b8837c04ec	Reformat imports to a consistent style for imports rustfmt.toml configuration: imports_granularity = "One" group_imports = "One"	2021-12-03 09:19:13 -08:00
behzad nouri	0c0384ec32	revises turbine peers shuffling order (#20480 ) Turbine randomly shuffles cluster nodes on a broadcast tree for each shred. This requires knowing the stakes and nodes' contact-infos (from gossip). However gossip is subject to partitioning and propogation delays. Additionally unstaked nodes may join and leave the cluster at any moment, changing the cluster view from one node to another. This commit: * Always arranges the unstaked nodes at the bottom of turbine broadcast tree. * Staked nodes are always included regardless of if their contact-info is available in gossip or not. * Uses the unbiased WeightedShuffle construct for shuffling nodes.	2021-10-14 15:09:36 +00:00
behzad nouri	6d9818b8e4	skips retransmit for shreds with unknown slot leader (#19472 ) Shreds' signatures should be verified before they reach retransmit stage, and if the leader is unknown they should fail signature check. Therefore retransmit-stage can as well expect to know who the slot leader is and otherwise just skip the shred. Blockstore checking signature of recovered shreds before sending them to retransmit stage: https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930 Shred signature verifier: https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57 https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105	2021-09-01 15:44:26 +00:00
behzad nouri	1deb4add81	removes Slot from TransmitShreds (#19327 ) An earlier version of the code was funneling through stakes along with shreds to broadcast: https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127 This was changed to only slots as stakes computation was pushed further down the pipeline in: https://github.com/solana-labs/solana/pull/18971 However shreds themselves embody which slot they belong to. So pairing them with slot is redundant and adds rooms for bugs should they become inconsistent.	2021-08-20 13:48:33 +00:00
behzad nouri	aa32738dd5	uses cluster-nodes cache in broadcast-stage * Current caching mechanism does not update cluster-nodes when the epoch (and so epoch staked nodes) changes: https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344 * Additionally, the cache update has a concurrency bug in which the thread which does compare_and_swap may be blocked when it tries to obtain the write-lock on cache, while other threads will keep running ahead with the outdated cache (since the atomic timestamp is already updated). In the new ClusterNodesCache, entries are keyed by epoch, and so if epoch changes cluster-nodes will be recalculated. The time-to-live eviction policy is also encapsulated and rigidly enforced.	2021-08-05 21:47:33 +00:00
behzad nouri	44b11154ca	sends slots (instead of stakes) through broadcast flow Current broadcast code is computing stakes for each slot before sending them down the channel: https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228 https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349 Since the stakes are a function of epoch the slot belongs to (and so does not necessarily change from one slot to another), forwarding the slot itself would allow better caching downstream. In addition we need to invalidate the cache if the epoch changes (which the current code does not do), and that requires to know which slot (and so epoch) current broadcasted shreds belong to: https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344	2021-08-05 21:47:33 +00:00
Jeff Washington (jwash)	14361906ca	for all tests, bank::new -> bank::new_for_tests (#19064 )	2021-08-05 08:42:38 -05:00
behzad nouri	049fb0417f	allows sendmmsg api taking owned values (as well as references) (#18999 ) Current signature of api in sendmmsg requires a slice of inner references: https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152 That forces the call-site to convert owned values to references even though doing so is redundant and adds an extra level of indirection: https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291 This commit expands the api using AsRef and Borrow traits to allow calling the method with owned values (as well as references like before).	2021-07-30 20:58:49 +00:00
sakridge	84e78316b1	Write helper for multithread update (#18808 )	2021-07-29 03:16:36 +02:00
behzad nouri	d2d5f36a3c	adds validator flag to allow private ip addresses (#18850 )	2021-07-23 15:25:03 +00:00
behzad nouri	e316586516	excludes private ip addresses	2021-07-16 20:05:48 -06:00
Jeff Biseda	ae5ad5cf9b	sendmmsg cleanup #18589 Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.	2021-07-16 14:36:49 -07:00
sakridge	7f2254225e	Move entry/poh to own crate to speed up poh bench build (#18225 )	2021-07-14 14:16:29 +02:00
carllin	175083c4c1	Add updated duplicate broadcast test (#18506 )	2021-07-10 22:22:07 -07:00
jbiseda	a86ced0bac	generate deterministic seeds for shreds (#17950 ) * generate shred seed from leader pubkey * clippy * clippy * review * review 2 * fmt * review * check * review * cleanup * fmt	2021-07-07 08:21:12 -07:00
behzad nouri	04787be8b1	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites.	2021-07-07 00:35:25 +00:00
Michael Vines	b6792a3328	Add ability to change the validator identity at runtime	2021-07-01 17:50:04 -07:00
Michael Vines	84b9de8c18	Shredder no longer holds a keypair	2021-06-21 21:29:52 -07:00
Michael Vines	4a12c715a3	Drop Error suffix from enum values to avoid the enum_variant_names clippy lint	2021-06-18 23:02:13 +00:00
Alexander Meißner	6514096a67	chore: cargo +nightly clippy --fix -Z unstable-options	2021-06-18 10:42:46 -07:00
Justin Starry	050bb5446d	Add local cluster tests that broadcast duplicate slots (#13995 ) * Add duplicate node local cluster test * fix clippy * remove dupe test	2021-06-09 15:01:48 -07:00
Tyera Eulberg	544b3c0d17	Create solana-poh and move remaining rpc modules to solana-rpc (#17698 ) * Create solana-poh crate * Move BigTableUploadService to solana-ledger * Add solana-rpc to workspace * Move dependencies to solana-rpc * Move remaining rpc modules to solana-rpc * Single use statement solana-poh * Single use statement solana-rpc	2021-06-04 09:23:06 -06:00
Tyera Eulberg	9a5330b7eb	Move gossip modules into solana-gossip crate (#17352 ) * Move gossip modules to solana-gossip * Update Protocol abi digest due to move * Move gossip benches and hook up CI * Remove unneeded Result entries * Single use statements	2021-05-26 09:15:46 -06:00
behzad nouri	37b8587d4e	expands number of erasure coding shreds in the last batch in slots (#16484 ) Number of parity coding shreds is always less than the number of data shreds in FEC blocks: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719 Data shreds are batched in chunks of 32 shreds each: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714 However the very last batch of data shreds in a slot can be small, in which case the loss rate can be exacerbated. This commit expands the number of coding shreds in the last FEC block in slots to: 64 - number of data shreds; so that FEC blocks are always 64 data and parity coding shreds each. As a consequence of this, the last FEC block has more parity coding shreds than data shreds. So for some shred indices we will have a coding shred but no data shreds. This should not cause any kind of overlapping FEC blocks as in: https://github.com/solana-labs/solana/pull/10095 since this is done only for the very last batch in a slot, and the next slot will reset the shred index.	2021-04-21 12:47:50 +00:00
behzad nouri	570fd3f810	makes turbine peer computation consistent between broadcast and retransmit (#14910 ) get_broadcast_peers is using tvu_peers: https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370 which is potentially inconsistent with retransmit_peers: https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345 Also, the leader does not include its own contact-info when broadcasting shreds: https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324 but on the retransmit side, slot leader is removed only _after_ neighbors and children are computed: https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384 So the turbine broadcast tree is different between the two stages. This commit: * Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers consistently. * Retransmit stage removes slot leader _before_ computing children and neighbors.	2021-03-24 13:34:48 +00:00
behzad nouri	4f82b897bc	buffers data shreds to make larger erasure coded sets (#15849 ) Broadcast stage batches up to 8 entries: https://github.com/solana-labs/solana/blob/79280b304/core/src/broadcast_stage/broadcast_utils.rs#L26-L29 which will be serialized into some number of shreds and chunked into FEC sets of at most 32 shreds each: https://github.com/solana-labs/solana/blob/79280b304/ledger/src/shred.rs#L576-L597 So depending on the size of entries, FEC sets can be small, which may aggravate loss rate. For example 16 FEC sets of 2:2 data/code shreds each have higher loss rate than one 32:32 set. This commit broadcasts data shreds immediately, but also buffers them until it has a batch of 32 data shreds, at which point 32 coding shreds are generated and broadcasted.	2021-03-23 14:52:38 +00:00
Michael Vines	5df36aec7d	Pacify clippy	2021-02-19 20:08:41 -08:00
behzad nouri	e1021d9f83	removes redundant epoch stakes cache in retransmit (#14781 ) Following `d6d76219b`, staked nodes computed from vote accounts are already cached in runtime::Stakes, so the caching in retransmit_stage is redundant.	2021-01-24 21:15:09 +00:00
Michael Vines	cbffab7850	Upgrade to Rust v1.49.0	2021-01-23 19:16:36 -08:00
sakridge	f8a4afc7c1	Fix flaky broadcast test (#14329 )	2020-12-29 12:35:04 -08:00
sakridge	c693ffaa08	Fix subtraction overflow in metrics (#14290 )	2020-12-27 16:26:22 -08:00
behzad nouri	d6d76219b6	caches staked nodes computed from vote-accounts (#13929 )	2020-12-17 21:22:50 +00:00
Michael Vines	7143aaa89b	Clippy	2020-12-14 08:03:29 -08:00
sakridge	b4cf968e14	Add back shredding broadcast stats (#13463 )	2020-11-09 23:04:27 -08:00
Michael Vines	d15173ad9d	Address latest nightly clippy lints, but globally disable stable_sort_primitive	2020-08-17 22:36:10 -07:00
sakridge	2cf719ac2c	Cache tvu peers for broadcast (#10373 )	2020-06-03 08:24:05 -07:00
carllin	97f2bcff69	master: Add nonce to shreds repairs, add shred data size to header (#10109 ) * Add nonce to shreds/repairs * Add data shred size to header Co-authored-by: Carl <carl@solana.com>	2020-05-19 12:38:18 -07:00
Kristofer Peterson	58ef02f02b	9951 clippy errors in the test suite (#10030 ) automerge	2020-05-15 09:35:43 -07:00
carllin	bab3502260	Push down cluster_info lock (#9594 ) * Push down cluster_info lock * Rework budget decrement Co-authored-by: Carl <carl@solana.com>	2020-04-21 12:54:45 -07:00
anatoly yakovenko	77fb4230d6	Calculate distance between u64 without overflow (#9592 ) * fix overflow * fixed num_live_peers overflow	2020-04-19 23:05:26 -07:00

1 2 3

123 Commits