solana

Commit Graph

Author	SHA1	Message	Date
behzad nouri	f1b82ec44d	factors out common retransmit work for shreds of the same slot (#26218 ) Shreds arriving at a node for retransmit tend to belong to the same slot (or a just a couple of different slots). Slot leader and cluster nodes are common for the shreds of the same slot, and so the common work to look up these values can be factored out. This commit first group-bys shreds by slot to factor out that common lookup work.	2022-06-25 15:49:05 +00:00
behzad nouri	1f0f5dc03e	verifies shred-version in fetch stage Shred versions are not verified until window-service where resources are already wasted to sig-verify and deserialize shreds. The commit verifies shred-version earlier in the pipeline in fetch stage.	2022-06-22 12:17:37 +00:00
behzad nouri	75425521b4	moves slot updates notifications after shreds retransmit (#26094 ) RetransmitSlotStats can already be utilized to track when the first shred for a slot was received; therefore first_shreds_received: &Mutex<BTreeSet<Slot>> is redundant. Sending update notifications after shreds retransmit will also bypass the need for a mutex.	2022-06-21 17:19:40 -04:00
behzad nouri	d2afa6b418	moves packet-hasher out of the mutex (#26091 ) Packet-hasher is not mutated across threads and does not need to be wrapped in a mutex.	2022-06-21 16:29:27 +00:00
behzad nouri	b3d1f8d1ac	tracks number of shreds sent and received at different distances from the root (#25989 )	2022-06-17 21:33:23 +00:00
behzad nouri	eff59193db	enforces that LAST_SHRED_IN_SLOT is also DATA_COMPLETE_SHRED (#24892 ) A data shred cannot be LAST_SHRED_IN_SLOT if not also DATA_COMPLETE_SHRED. So LAST_SHRED_IN_SLOT should also imply DATA_COMPLETE_SHRED: https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shredder.rs#L116-L117 https://github.com/solana-labs/solana/blob/74b586ae7/core/src/broadcast_stage/standard_broadcast_run.rs#L80-L81 However current shred constructs allow specifying a shred which is LAST_SHRED_IN_SLOT but not DATA_COMPLETE_SHRED: https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L117-L118 https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L272-L273 The commit updates ShredFlags so that if a shred is not DATA_COMPLETE_SHRED it cannot be LAST_SHRED_IN_SLOT either.	2022-05-02 23:33:53 +00:00
behzad nouri	0f60665100	replaces Shred::new_empty_coding with Shred::new_from_parity_shard (#24749 ) Removing implementation details of shreds and payload offsets from shredder, so that shredder does not need to mutate payload: https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L968-L977 Also, Shred::new_from_data can simply obtain a slice as opposed to Option<&[u8]>: https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L268-L278	2022-04-27 18:04:10 +00:00
behzad nouri	895f76a93c	hides implementation details of shred from its public interface (#24563 ) Working towards embedding versioning into shreds binary, so that a new variant of shred struct can include merkle tree hashes of the erasure set.	2022-04-25 12:43:22 +00:00
Justin Starry	c544742091	Local cluster test cleanup and refactoring (#24559 ) * remove FixedSchedule.start_epoch * use duration for timing * Rename to partition bool to turbine_disabled * simplify partition config	2022-04-22 12:14:07 +08:00
behzad nouri	1d50832389	replaces counters with datapoints in gossip metrics (#24451 )	2022-04-18 23:14:59 +00:00
behzad nouri	2282571493	removes outdated and flaky test_skip_repair from retransmit-stage (#24121 ) test_skip_repair in retransmit-stage is no longer relevant because following: https://github.com/solana-labs/solana/pull/19233 repair packets are filtered out earlier in window-service and so retransmit stage does not know if a shred is repaired or not. Also, following turbine peer shuffle changes: https://github.com/solana-labs/solana/pull/24080 the test has become flaky since it does not take into account how peers are shuffled for each shred.	2022-04-05 16:02:53 +00:00
HaoranYi	fedf4e984f	typo (#23910 )	2022-03-24 15:21:59 -05:00
Michael Vines	390dc24608	Create leader schedule before processing blockstore	2022-03-14 15:29:58 -07:00
Michael Vines	115f376465	Factor out bank_forks_utils::load_bank_forks()	2022-03-14 15:29:58 -07:00
Michael Vines	c2ce152be8	Inline do_process_blockstore_from_root	2022-03-14 15:29:58 -07:00
Jeff Biseda	c69e3b73ff	bench get_retransmit_peers (#23292 )	2022-03-01 19:10:29 -08:00
Jeff Biseda	8b66625c95	convert std::sync::mpsc to crossbeam_channel (#22264 )	2022-01-11 02:44:46 -08:00
behzad nouri	01a096adc8	adds bitflags to Packet.Meta Instead of a separate bool type for each flag, all the flags can be encoded in a type-safe bitflags encoded in a single u8: https://github.com/solana-labs/solana/blob/d6ec103be/sdk/src/packet.rs#L19-L31	2022-01-04 13:53:40 +00:00
carllin	7f6fb6937a	Ensure AncestorHashesSerice selects an open port (#21919 )	2021-12-18 00:44:01 -05:00
behzad nouri	4ceb2689f5	adds ShredId uniquely identifying each shred (#21820 )	2021-12-14 17:34:02 +00:00
Justin Starry	254ef3e7b6	Rename Packets to PacketBatch (#21794 )	2021-12-11 09:44:15 -05:00
behzad nouri	cd17f63d81	adds back position field to coding-shred-header (#21600 ) https://github.com/solana-labs/solana/pull/17004 removed position field from coding-shred-header because as it stands the field is redundant and unused. However, with the upcoming changes to erasure coding schema this field will no longer be redundant and needs to be populated.	2021-12-05 14:42:09 +00:00
Michael Vines	b8837c04ec	Reformat imports to a consistent style for imports rustfmt.toml configuration: imports_granularity = "One" group_imports = "One"	2021-12-03 09:19:13 -08:00
behzad nouri	57057f8d39	uses enum for shred type Current code is using u8 which does not have any type-safety and can contain invalid values: https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L167 Checks for invalid shred-types are scattered through the code: https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/blockstore.rs#L849-L851 https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L346-L348 The commit uses enum for shred type with #[repr(u8)]. Backward compatibility is maintained by implementing Serialize and Deserialize compatible with u8, and adding a test to assert that.	2021-11-19 14:16:39 +00:00
behzad nouri	5e1cf39c74	adds metrics for number of outgoing shreds in retransmit stage (#20882 )	2021-10-24 13:12:27 +00:00
behzad nouri	0c0384ec32	revises turbine peers shuffling order (#20480 ) Turbine randomly shuffles cluster nodes on a broadcast tree for each shred. This requires knowing the stakes and nodes' contact-infos (from gossip). However gossip is subject to partitioning and propogation delays. Additionally unstaked nodes may join and leave the cluster at any moment, changing the cluster view from one node to another. This commit: * Always arranges the unstaked nodes at the bottom of turbine broadcast tree. * Staked nodes are always included regardless of if their contact-info is available in gossip or not. * Uses the unbiased WeightedShuffle construct for shuffling nodes.	2021-10-14 15:09:36 +00:00
Lijun Wang	fe97cb2ddf	AccountsDb plugin framework (#20047 ) Summary of Changes Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows Data stores of connection strings/credentials to be configured, Accounts with patterns to be streamed PostgreSQL implementation of the streaming for different destination stores to be plugged in. The code comprises 4 major parts: accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement. accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins. accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates. The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.	2021-09-30 14:26:17 -07:00
Brooks Prumo	a0552e5b46	Make startup aware of Incremental Snapshots (#19600 )	2021-09-07 20:43:43 +00:00
behzad nouri	01a7ec8198	uses rayon thread-pool for retransmit-stage parallelization (#19486 )	2021-09-07 15:15:01 +00:00
Brooks Prumo	e9374d32a3	Revert "Make startup aware of Incremental Snapshots (#19550 )" (#19599 ) This reverts commit `d45ced0a5d`.	2021-09-02 19:14:41 -05:00
Brooks Prumo	d45ced0a5d	Make startup aware of Incremental Snapshots (#19550 )	2021-09-02 19:05:15 -05:00
behzad nouri	6d9818b8e4	skips retransmit for shreds with unknown slot leader (#19472 ) Shreds' signatures should be verified before they reach retransmit stage, and if the leader is unknown they should fail signature check. Therefore retransmit-stage can as well expect to know who the slot leader is and otherwise just skip the shred. Blockstore checking signature of recovered shreds before sending them to retransmit stage: https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930 Shred signature verifier: https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57 https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105	2021-09-01 15:44:26 +00:00
behzad nouri	7a8807b8bb	retransmits shreds recovered from erasure codes Shreds recovered from erasure codes have not been received from turbine and have not been retransmitted to other nodes downstream. This results in more repairs across the cluster which is slower. This commit channels through recovered shreds to retransmit stage in order to further broadcast the shreds to downstream nodes in the tree.	2021-08-17 13:44:10 +00:00
behzad nouri	3efccbffab	sends shreds (instead of packets) to retransmit stage Working towards channelling through shreds recovered from erasure codes to retransmit stage.	2021-08-17 13:44:10 +00:00
behzad nouri	6e413331b5	removes erroneous uses of Arc<...> from retransmit stage	2021-08-17 13:44:10 +00:00
behzad nouri	bf437b0336	removes packet-count metrics from retransmit stage Working towards sending shreds (instead of packets) to retransmit stage so that shreds recovered from erasure codes are as well retransmitted. Following commit will add these metrics back to window-service, earlier in the pipeline.	2021-08-17 13:44:10 +00:00
behzad nouri	b64eeb7729	removes erroneous uses of &Arc<...> from window-service	2021-08-13 17:26:31 +00:00
behzad nouri	e4be00fece	falls back on working-bank if root-bank::epoch-staked-nodes is none bank.get_leader_schedule_epoch(shred_slot) is one epoch after epoch_schedule.get_epoch(shred_slot). At epoch boundaries, shred is already one epoch after the root-slot. So we need epoch-stakes 2 epochs ahead of the root. But the root bank only has epoch-stakes for one epoch ahead, and as a result looking up epoch staked-nodes from the root-bank fails. To be backward compatible with the current master code, this commit implements a fallback on working-bank if epoch staked-nodes obtained from the root-bank is none.	2021-08-05 21:47:33 +00:00
behzad nouri	50d0e830c9	unifies cluster-nodes computation & caching across turbine stages Broadcast-stage is using epoch_staked_nodes based on the same slot that shreds belong to: https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228 https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349 But retransmit-stage is using bank-epoch of the working-bank: https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289 So the two are not consistent at epoch boundaries where some nodes may have a working bank (or similarly a root bank) lagging other nodes. As a result the node which obtains a packet may construct turbine broadcast tree inconsistently with its parent node in the tree and so some packets may fail to reach all nodes in the tree.	2021-08-05 21:47:33 +00:00
behzad nouri	30bec3921e	uses cluster-nodes cache in retransmit stage The new cluster-nodes cache will: * ensure cluster-nodes are recalculated if the epoch (and so the epoch staked nodes) changes. * encapsulate time-to-live eviction policy.	2021-08-05 21:47:33 +00:00
Ryo Onodera	da480bdb5f	Fix unstable retransmit-num_nodes (#18970 )	2021-07-29 17:32:32 +00:00
behzad nouri	d06dc6c8a6	shares cluster-nodes between retransmit threads (#18947 ) cluster_nodes and last_peer_update are not shared between retransmit threads, as each thread have its own value: https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477 Additionally, with shared references, this code: https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328 has a concurrency bug where the thread which does compare_and_swap, updates cluster_nodes much later after other threads have run with outdated cluster_nodes for a while. In particular, the write-lock there may block.	2021-07-29 16:20:15 +00:00
sakridge	84e78316b1	Write helper for multithread update (#18808 )	2021-07-29 03:16:36 +02:00
carllin	c0704d4ec9	Plumb signal from replay to ancestor hashes service (#18880 )	2021-07-26 20:59:00 -07:00
carllin	1ee64afb12	Introduce AncestorHashesService (#18812 )	2021-07-23 16:54:47 -07:00
behzad nouri	d2d5f36a3c	adds validator flag to allow private ip addresses (#18850 )	2021-07-23 15:25:03 +00:00
behzad nouri	04787be8b1	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites.	2021-07-07 00:35:25 +00:00
Jeff Washington (jwash)	ec2f930475	user process.accounts_db_test_hash_calculation for debug_verify hash (#18053 )	2021-06-21 10:20:27 -05:00
Michael Vines	4a12c715a3	Drop Error suffix from enum values to avoid the enum_variant_names clippy lint	2021-06-18 23:02:13 +00:00
Michael Vines	fa04531c7a	Extricate RpcCompletedSlotsService from RetransmitStage	2021-06-16 16:20:35 -07:00

1 2 3 4

160 Commits