solana-with-rpc-optimizations

Commit Graph

Author	SHA1	Message	Date
Andrew Fitzgerald	ee2f760d3d	MultiIteratorScanner - improve banking stage performance with high contention	2022-11-17 10:54:12 -06:00
Brooks Prumo	d1ba42180d	clippy for rust 1.65.0 (#28765 )	2022-11-09 19:39:38 +00:00
Ashwin Sekar	f207af765e	Split out voting and banking threads in banking stage (#27931 ) * Split out voting and banking threads in banking stage Additionally this allows us to aggressively prune the buffer for voting threads as with the new vote state only the latest vote from each validator is necessary. * Update local cluster test to use new Vote ix * Encapsulate transaction storage filtering better * Address pr comments * Commit cargo lock change * clippy * Remove unsafe impls * pr comments * compute_sanitized_transaction -> build_sanitized_transaction * &Arc -> Arc * Move test * Refactor metrics enums * clippy	2022-10-20 21:10:48 +00:00
Tao Zhu	82e65593ee	Batch filtering invalid transactions before forwarding (#26798 ) - Batch filtering invalid transactions (fail to sanitize, too old or already processed) before forwarding - Combine packet filtering and forwarding to share sanitized transactions - `iter_desc` is no longer needed, remove it; - Add a method to share the logic of removing packets from buffer after they were removed from MinMaxHeap - Add test coverage for forward_packet_batches_by_accounts - rebase, resolve conflicts	2022-09-29 16:33:40 -05:00
Ashwin Sekar	84acef007c	Add bench test for voting threads (#28031 )	2022-09-27 12:12:22 -07:00
behzad nouri	f49beb0cbc	caches reed-solomon encoder/decoder instance (#27510 ) ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache: https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466 In order to cache this computation, this commit caches the reed-solomon encoder/decoder instance for each (data_shards, parity_shards) pair.	2022-09-25 18:09:47 +00:00
behzad nouri	9ee53e594d	patches clippy errors from new rust nightly release (#28028 )	2022-09-23 20:57:27 +00:00
behzad nouri	97c9af4c6b	plumbs through flag to generate merkle variant of shreds	2022-09-23 16:45:18 +00:00
apfitzge	452866dbcf	shredder: clippy nightly fixes (#27522 ) clippy nightly fixes	2022-09-07 15:04:32 -05:00
Tyera Eulberg	b8b3d723da	Use new client crates (#27360 ) * Update ancillary cli crates * Update cli * Update command-line tools * Update rpc, etc * Update client-test * Update core, validator * Update local-cluster	2022-08-24 10:47:02 -06:00
behzad nouri	ac91cdab74	removes buffering when generating coding shreds in broadcast (#25807 ) Given the 32:32 erasure recovery schema, current implementation requires exactly 32 data shreds to generate coding shreds for the batch (except for the final erasure batch in each slot). As a result, when serializing ledger entries to data shreds, if the number of data shreds is not a multiple of 32, the coding shreds for the last batch cannot be generated until there are more data shreds to complete the batch to 32 data shreds. This adds latency in generating and broadcasting coding shreds. In addition, with Merkle variants for shreds, data shreds cannot be signed and broadcasted until coding shreds are also generated. As a result both code and data shreds will be delayed before broadcast if we still require exactly 32 data shreds for each batch. This commit instead always generates and broadcast coding shreds as soon as there any number of data shreds available. When serializing entries to shreds: * if the number of resulting data shreds is less than 32, then more coding shreds will be generated so that the resulting erasure batch has the same recovery probabilities as a 32:32 batch. * if the number of data shreds is more than 32, then the data shreds are split uniformly into erasure batches with _at least_ 32 data shreds in each batch. Each erasure batch will have the same number of code and data shreds. For example: * If there are 19 data shreds, 27 coding shreds are generated. The resulting 19(data):27(code) erasure batch has the same recovery probabilities as a 32:32 batch. * If there are 107 data shreds, they are split into 3 batches of 36:36, 36:36 and 35:35 data:code shreds each. A consequence of this change is that code and data shreds indices will no longer align as there will be more coding shreds than data shreds (not only in the last batch in each slot but also in the intermediate ones);	2022-08-11 12:44:27 +00:00
Nicholas Clarke	ee0a40937e	Add validator argument log_messages_bytes_limit to change log truncation limit. Add new cli argument log_messages_bytes_limit to solana-validator to control how long program logs can be before truncation	2022-07-11 10:53:18 -05:00
Tao Zhu	c1d89ad749	forward packets by prioritization in desc order (#25406 ) - Forward packets by prioritization in desc order - Add support of cost-tracking by transaction requested compute units - Hook up account buckets to forwarder - Add metrics for forwardable batches count - Remove redundant invalid packets filtering at end of slot since forwarder will do the same when batch forwardable packets - Add bench test for forwarding	2022-07-05 23:24:58 -05:00
behzad nouri	61f0a7d9c3	replaces Mutex<PohRecorder> with RwLock<PohRecorder> (#26370 ) Mutex causes superfluous lock contention when a read-only reference suffices.	2022-07-05 14:29:44 +00:00
behzad nouri	88599fd760	skips shreds deserialization before retransmit (#26230 ) Fully deserializing shreds in window-service before sending them to retransmit stage adds latency to shreds propagation. This commit instead channels through the payload and relies on only partial deserialization of a few required fields: slot, shred-index, shred-type.	2022-06-30 12:13:00 +00:00
behzad nouri	67936aaa74	moves Shred::seed to ShredId and adds test coverage (#26251 ) Following commits will skip shreds deserializaton before retransmit, and so we will only have a ShredId and not a fully deserialized shred to obtain the shuffling seed from.	2022-06-27 17:58:43 +00:00
Tyera Eulberg	a6ba5a9a05	Add transaction index in slot to geyser plugin TransactionInfo (#25688 ) * Define shuffle to prep using same shuffle for multiple slices * Determine transaction indexes and plumb to execute_batch * Pair transaction_index with transaction in TransactionStatusService * Add new ReplicaTransactionInfoVersion * Plumb transaction_indexes through BankingStage * Prepare BankingStage to receive transaction indexes from PohRecorder * Determine transaction indexes in PohRecorder; add field to WorkingBank * Add PohRecorder::record unit test * Only pass starting_transaction_index around PohRecorder * Add helper structs to simplify test DashMap * Pass entry and starting-index into process_entries_with_callback together * Add tx-index checks to test_rebatch_transactions * Revert shuffle definition and use zip/unzip * Only zip/unzip if randomize * Add confirm_slot_entries test * Review nits * Add type alias to make sender docs more clear	2022-06-23 13:37:38 -06:00
behzad nouri	f534b8981b	maps number of data shreds to erasure batch size (#25917 ) In prepration of https://github.com/solana-labs/solana/pull/25807 which reworks erasure batch sizes, this commit: * adds a helper function mapping the number of data shreds to the erasure batch size. * adds ProcessShredsStats to Shredder::entries_to_shreds in order to replace and remove entries_to_data_shreds from the public interface.	2022-06-23 13:27:54 +00:00
behzad nouri	b3d1f8d1ac	tracks number of shreds sent and received at different distances from the root (#25989 )	2022-06-17 21:33:23 +00:00
Jon Cinque	79a8ecd0ac	client: Remove static connection cache, plumb it instead (#25667 ) * client: Remove static connection cache, plumb it instead * Add TpuClient::new_with_connection_cache to not break downstream * Refactor get_connection and RwLock into ConnectionCache * Fix merge conflicts from new async TpuClient * Remove `ConnectionCache::set_use_quic` * Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()	2022-06-08 13:57:12 +02:00
behzad nouri	81231a89b9	adds support for different variants of ShredCode and ShredData The commit implements two new types: pub enum ShredCode { Legacy(legacy::ShredCode), } pub enum ShredData { Legacy(legacy::ShredData), } Following commits will extend these types by adding merkle variants: pub enum ShredCode { Legacy(legacy::ShredCode), Merkle(merkle::ShredCode), } pub enum ShredData { Legacy(legacy::ShredData), Merkle(merkle::ShredData), }	2022-06-02 18:55:50 +00:00
HaoranYi	d3ac4e941b	Bench: preshrink + sigverify (#25480 ) * double shrinking * add bench * rename * aggregate timing * remove pre/post shrink time * update api after merge	2022-06-02 09:19:01 -05:00
carllin	9651cdad99	Refactor Sigverify trait (#25359 )	2022-05-24 16:01:41 -05:00
steviez	ec7ca411dd	Make PacketBatch packets vector non-public (#25413 ) Upcoming changes to PacketBatch to support variable sized packets will modify the internals of PacketBatch. So, this change removes usage of the internal packet struct and instead uses accessors (which are currently just wrappers of Vector functions but will change down the road).	2022-05-23 15:30:15 -05:00
Brooks Prumo	f8842032c6	clippy: fix "this let-binding has unit value" warnings (#25429 )	2022-05-22 12:17:59 -04:00
Tao Zhu	b1b3702e6d	Prioritize transactions in banking stage by their compute unit price (#25178 ) * - get prioritization fee from compute_budget instruction; - update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message; - updated runtime.md * update transaction fee calculation for prioritization fee rate as lamports per 10K CUs * review changes * fix test * fix a bpf test * fix bpf test * patch feedback * fix clippy * fix bpf test * feedback * rename prioritization fee rate to compute unit price * feedback Co-authored-by: Justin Starry <justin@solana.com>	2022-05-16 12:06:33 +08:00
carllin	870ac80b79	Prioritize BankingStage packets individually in min-max heap (#24187 )	2022-05-04 21:50:56 -05:00
behzad nouri	eff59193db	enforces that LAST_SHRED_IN_SLOT is also DATA_COMPLETE_SHRED (#24892 ) A data shred cannot be LAST_SHRED_IN_SLOT if not also DATA_COMPLETE_SHRED. So LAST_SHRED_IN_SLOT should also imply DATA_COMPLETE_SHRED: https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shredder.rs#L116-L117 https://github.com/solana-labs/solana/blob/74b586ae7/core/src/broadcast_stage/standard_broadcast_run.rs#L80-L81 However current shred constructs allow specifying a shred which is LAST_SHRED_IN_SLOT but not DATA_COMPLETE_SHRED: https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L117-L118 https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L272-L273 The commit updates ShredFlags so that if a shred is not DATA_COMPLETE_SHRED it cannot be LAST_SHRED_IN_SLOT either.	2022-05-02 23:33:53 +00:00
Justin Starry	4e58b3870c	Update all BankForks methods to return owned values (#24801 )	2022-04-28 18:51:00 +00:00
sakridge	5a430c15e2	Separate sigverify metrics for each verifier (#24744 )	2022-04-28 01:16:17 -07:00
behzad nouri	0f60665100	replaces Shred::new_empty_coding with Shred::new_from_parity_shard (#24749 ) Removing implementation details of shreds and payload offsets from shredder, so that shredder does not need to mutate payload: https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L968-L977 Also, Shred::new_from_data can simply obtain a slice as opposed to Option<&[u8]>: https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L268-L278	2022-04-27 18:04:10 +00:00
behzad nouri	081c844d6e	removes Shred::new_empty_data_shred (#24714 ) Shred::new_empty_data_shred returns an invalid shred (i.e. shred.sanitize() returns error). The method is only used in tests and can be easily replaced with Shred::new_from_data. To keep the shred api surface small, this commit removes this method.	2022-04-26 23:13:12 +00:00
behzad nouri	895f76a93c	hides implementation details of shred from its public interface (#24563 ) Working towards embedding versioning into shreds binary, so that a new variant of shred struct can include merkle tree hashes of the erasure set.	2022-04-25 12:43:22 +00:00
behzad nouri	d0b850cdd9	removes turbine peers shuffle patch feature	2022-04-05 12:04:12 +00:00
Stephen Akridge	976b138e76	Add tx weighting stage	2022-03-17 19:31:28 -05:00
Tao Zhu	8590911b0a	Replace type alias with newtype for UnprocesedPacketBatches	2022-03-14 13:14:27 -05:00
Tao Zhu	35d1235ed0	- move `unprocessed_packet_batches` from `BankingStage` to its own (#23508 ) module - deserialize packets during receving and buffering	2022-03-10 18:47:46 +00:00
Jeff Biseda	c69e3b73ff	bench get_retransmit_peers (#23292 )	2022-03-01 19:10:29 -08:00
buffalu	70ebab2c82	Add rustfmt.toml and `cargo fmt` (#23238 ) * fmt * formatted Co-authored-by: Lucas B <buffalu@jito.network>	2022-02-19 13:32:29 +08:00
anatoly yakovenko	83d31c9e65	shrink batches when over 80% of the space is wasted (#23066 ) * shrink batches when over 80% of the space is wasted	2022-02-16 08:18:17 -08:00
carllin	2f9e30a1f7	Introduce slot-specific packet metrics (#22906 )	2022-02-11 03:07:45 -05:00
Tao Zhu	e52e48076e	bench should update leader schedule cache (#22991 )	2022-02-08 02:28:28 +00:00
anatoly yakovenko	d343713f61	Optimize packet dedup (#22571 ) * Use bloom filter to dedup packets * dedup first * Update bloom/src/bloom.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * Update core/src/sigverify_stage.rs Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com> * fixup * fixup * fixup Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>	2022-01-19 13:58:20 -08:00
behzad nouri	dcf44d2523	improves sigverify discard_excess_packets performance (#22577 ) As shown by the added benchmark, current code does worse if there is a spam address plus a lot of unique addresses. on current master: test bench_packet_discard_many_senders ... bench: 1,997,960 ns/iter (+/- 103,715) test bench_packet_discard_mixed_senders ... bench: 14,256,116 ns/iter (+/- 534,865) test bench_packet_discard_single_sender ... bench: 1,306,809 ns/iter (+/- 61,992) with this commit: test bench_packet_discard_many_senders ... bench: 1,644,025 ns/iter (+/- 83,715) test bench_packet_discard_mixed_senders ... bench: 1,089,789 ns/iter (+/- 86,324) test bench_packet_discard_single_sender ... bench: 955,234 ns/iter (+/- 55,953)	2022-01-19 18:10:02 +00:00
sakridge	49443406fd	Use VecDeque instead of Vec in sigverify stage (#22538 ) avoid bad performance of remove(0) for a single sender	2022-01-17 18:37:05 +01:00
Tao Zhu	9c9f2dd5bd	port counting vote CUs to block cost (#22477 )	2022-01-14 10:50:29 -06:00
Eric Warehime	8161cee70f	Remove unnecessary var in banking_stage bench (#22408 )	2022-01-11 22:25:21 -06:00
Jeff Biseda	8b66625c95	convert std::sync::mpsc to crossbeam_channel (#22264 )	2022-01-11 02:44:46 -08:00
behzad nouri	01a096adc8	adds bitflags to Packet.Meta Instead of a separate bool type for each flag, all the flags can be encoded in a type-safe bitflags encoded in a single u8: https://github.com/solana-labs/solana/blob/d6ec103be/sdk/src/packet.rs#L19-L31	2022-01-04 13:53:40 +00:00
behzad nouri	73a7741c49	uses std::net::IpAddr type for Packet.Meta.addr	2022-01-04 13:53:40 +00:00
Justin Starry	93c776ce19	Refactor packet deduplication and harden bench test (#22080 )	2021-12-22 23:05:10 -06:00
Tao Zhu	dd80a525ef	Leader QoS service metrics (#21708 ) * - qos_service metrics tagged with leader thread ids to separate gossip/tpu votes and transactions; - qos_service metrics is reported with bank slot; - replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread * - add tpu live packets (eg, not buffered packets) states to qos metrics reporting	2021-12-22 21:39:59 +00:00
behzad nouri	65d59f4ef0	tracks erasure coding shreds' indices explicitly (#21822 ) The indices for erasure coding shreds are tied to data shreds: https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921 However with the upcoming changes to erasure schema, there will be more erasure coding shreds than data shreds and we can no longer infer coding shreds indices from data shreds. The commit adds constructs to track coding shreds indices explicitly.	2021-12-19 22:37:55 +00:00
behzad nouri	89d66c3210	removes next_shred_index from return value of entries to shreds api (#21961 ) next-shred-index is already readily available from returned data shreds. The commit simplifies the api for upcoming changes to erasure coding schema which will require explicit tracking of indices for coding shreds as well as data shreds.	2021-12-17 15:01:55 +00:00
Justin Starry	254ef3e7b6	Rename Packets to PacketBatch (#21794 )	2021-12-11 09:44:15 -05:00
Michael Vines	b8837c04ec	Reformat imports to a consistent style for imports rustfmt.toml configuration: imports_granularity = "One" group_imports = "One"	2021-12-03 09:19:13 -08:00
Tao Zhu	0ca255220e	- Encapsulate QoS Service metrics reporting within QosServioce, so client (#21191 ) code (eg banking_stage) doesn't need to worry about it. - Remove dead cost_* stats from banking_stage, clean up call path.	2021-11-18 15:35:30 -06:00
Justin Starry	66fa062f13	rename process_entries to indicate it's only for tests (#21321 )	2021-11-17 20:53:40 +01:00
behzad nouri	5fb0ab9d00	removes redundant args from Shredder::try_recovery (#21226 ) Shredder::try_recovery is already taking a Vec<Shred> as an argument. All the other arguments are embedded in the shreds, and are so redundant.	2021-11-10 21:19:03 +00:00
Tao Zhu	c2bfce90b3	- cost_tracker is data member of a bank, it can report metrics when bank is frozen (#20802 ) - removed cost_tracker_stats and histogram - move stats reporting outside of bank freeze	2021-10-24 22:19:23 -05:00
Tao Zhu	7496b5784b	- make cost_tracker a member of bank, remove shared instance from TPU; (#20627 ) - decouple cost_model from cost_tracker; allowing one cost_model instance being shared within a validator; - update cost_model api to calculate_cost(&self...)->transaction_cost	2021-10-19 14:37:33 -05:00
sakridge	09e7782d76	Refactor code to get block signatures in get_confirmed_signatures_for_address2 (#20575 ) * Refactor get_confirmed_signatures_for_address2 * Move blockstore benches to ledger where they belong	2021-10-13 09:55:19 +02:00
Tao Zhu	005d6863fd	- move cost tracker into bank, so each bank has its own cost tracker; (#20527 ) - move related modules to runtime	2021-10-12 08:51:33 -05:00
Tao Zhu	177a375479	Tpu vote 1.7 (#20187 ) (#20494 ) * Add separate vote processing tpu port * Add feature to send to tpu vote port * Add vote rejecting sigverify mode * use packet.meta.is_simple_vote_tx in place of deserialization * consolidate code that identifies vote tx atcommon path for cpu and gpu * new key for feature set * banking forward tpu vote * add tpu vote port to dockerfile and other review changes * Simplify thread id compare * fix a test; updated cluster_info ABI change Co-authored-by: Tao Zhu <tao@solana.com> Co-authored-by: sakridge <sakridge@gmail.com>	2021-10-07 09:38:23 +00:00
Tao Zhu	03913f6661	add tx count and thread id to stats, each stat reports and resets when slot changes (#20451 )	2021-10-06 00:09:19 -05:00
Tao Zhu	6ff508c643	add transaction cost histogram metrics (#20350 )	2021-10-05 08:57:39 -05:00
sakridge	94668c95c2	Prune sigverify queue (#20331 )	2021-09-30 05:41:05 +02:00
Jon Cinque	567f30aa1a	windows: Make solana-test-validator work (#20099 ) * windows: Make solana-test-validator work The important changes to get this going on Windows: * ledger lock needs to be done on a file instead of the directory * IPC service needs to use the Windows pipe naming scheme * always disable the JIT * file logging not possible yet because we can't redirect stderr, but this will change once env_logger fixes the pipe output target! * Integrate review feedback	2021-09-22 23:10:35 +02:00
behzad nouri	01a7ec8198	uses rayon thread-pool for retransmit-stage parallelization (#19486 )	2021-09-07 15:15:01 +00:00
behzad nouri	1deb4add81	removes Slot from TransmitShreds (#19327 ) An earlier version of the code was funneling through stakes along with shreds to broadcast: https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127 This was changed to only slots as stakes computation was pushed further down the pipeline in: https://github.com/solana-labs/solana/pull/18971 However shreds themselves embody which slot they belong to. So pairing them with slot is redundant and adds rooms for bugs should they become inconsistent.	2021-08-20 13:48:33 +00:00
Justin Starry	c50b01cb60	Store versioned transactions in the ledger, disabled by default (#19139 ) * Add support for versioned transactions, but disable by default * merge conflicts * trent's feedback * bump Cargo.lock * Fix transaction error encoding * Rename legacy_transaction method * cargo clippy * Clean up casts, int arithmetic, and unused methods * Check for duplicates in sanitized message conversion * fix clippy * fix new test * Fix bpf conditional compilation for message module	2021-08-17 15:17:56 -07:00
behzad nouri	3efccbffab	sends shreds (instead of packets) to retransmit stage Working towards channelling through shreds recovered from erasure codes to retransmit stage.	2021-08-17 13:44:10 +00:00
behzad nouri	6e413331b5	removes erroneous uses of Arc<...> from retransmit stage	2021-08-17 13:44:10 +00:00
Michael Vines	e9722474eb	Move tower storage into its own module	2021-08-11 00:20:46 -07:00
Michael Vines	397801a2d8	Extract tower storage details from Tower struct	2021-08-06 10:04:37 -07:00
Jeff Washington (jwash)	a9014ceceb	Bank::default_for_tests() (#19084 )	2021-08-05 11:53:29 -05:00
Jeff Washington (jwash)	bde9b4de94	Bank::new -> Bank::new_for_benches (#19063 )	2021-08-04 17:30:43 -05:00
carllin	03353d500f	Actively manage dead slots in AncestorHashesService (#18912 )	2021-08-02 14:33:28 -07:00
behzad nouri	d06dc6c8a6	shares cluster-nodes between retransmit threads (#18947 ) cluster_nodes and last_peer_update are not shared between retransmit threads, as each thread have its own value: https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477 Additionally, with shared references, this code: https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328 has a concurrency bug where the thread which does compare_and_swap, updates cluster_nodes much later after other threads have run with outdated cluster_nodes for a while. In particular, the write-lock there may block.	2021-07-29 16:20:15 +00:00
sakridge	84e78316b1	Write helper for multithread update (#18808 )	2021-07-29 03:16:36 +02:00
behzad nouri	d2d5f36a3c	adds validator flag to allow private ip addresses (#18850 )	2021-07-23 15:25:03 +00:00
Justin Starry	d166b9856a	Move transaction sanitization earlier in the pipeline (#18655 ) * Move transaction sanitization earlier in the pipeline * Renamed HashedTransaction to SanitizedTransaction * Implement deref for sanitized transaction * bring back process_transactions test method * Use sanitized transactions for cost model calculation	2021-07-15 22:51:27 -05:00
sakridge	7f2254225e	Move entry/poh to own crate to speed up poh bench build (#18225 )	2021-07-14 14:16:29 +02:00
carllin	4d3e301ee4	Introduce slot dumping to ReplayStage (#18160 )	2021-07-08 19:07:32 -07:00
jbiseda	a86ced0bac	generate deterministic seeds for shreds (#17950 ) * generate shred seed from leader pubkey * clippy * clippy * review * review 2 * fmt * review * check * review * cleanup * fmt	2021-07-07 08:21:12 -07:00
behzad nouri	04787be8b1	encapsulates turbine peers computations of broadcast & retransmit stages (#18238 ) Broadcast stage and retransmit stage should arrange nodes on turbine broadcast tree in exactly same order. Additionally any changes to this ordering (e.g. updating how unstaked nodes are handled) requires feature gating to keep the cluster in sync. Current implementation is scattered out over several public methods and exposes too much of implementation details (e.g. usize indices into peers vector) which makes code changes and checking for feature activations more difficult. This commit encapsulates turbine peer computations into a new struct, and only exposes two public methods, get_broadcast_peer and get_retransmit_peers, for call-sites.	2021-07-07 00:35:25 +00:00
Tao Zhu	0e039b4094	Aggregate cost_model into cost_tracker (#18374 ) * * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions * review fixes	2021-07-06 15:41:25 +00:00
Tao Zhu	9d6f1ebef4	investigate system performance test degradation (#17919 ) * Add stats and counter around cost model ops, mainly: - calculate transaction cost - check transaction can fit in a block - update block cost tracker after transactions are added to block - replay_stage to update/insert execution cost to table * Change mutex on cost_tracker to RwLock * removed cloning cost_tracker for local use, as the metrics show clone is very expensive. * acquire and hold locks for block of TXs, instead of acquire and release per transaction; * remove redundant would_fit check from cost_tracker update execution path * refactor cost checking with less frequent lock acquiring * avoid many Transaction_cost heap allocation when calculate cost, which is in the hot path - executed per transaction. * create hashmap with new_capacity to reduce runtime heap realloc. * code review changes: categorize stats, replace explicit drop calls, concisely initiate to default * address potential deadlock by acquiring locks one at time	2021-06-28 21:34:04 -05:00
Michael Vines	84b9de8c18	Shredder no longer holds a keypair	2021-06-21 21:29:52 -07:00
Alexander Meißner	6514096a67	chore: cargo +nightly clippy --fix -Z unstable-options	2021-06-18 10:42:46 -07:00
behzad nouri	161838655c	removes port-based forwarding logic from turbine retransmit (#17716 ) Turbine retransmit logic is based on which socket it received the packet from (i.e `packet.meta.forward`): https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470 This can leave the cluster vulnerable to spoofing and selective propagation of packets; see https://github.com/solana-labs/solana/issues/6672 https://github.com/solana-labs/solana/pull/7774 This commit identifies if the node is on the "critical path" based on its index in the shuffled cluster. If so, it forwards the packet to both neighbors and children; otherwise, the packet is only forwarded to the children. The metrics added in https://github.com/solana-labs/solana/pull/17351 shows that the number of times the index does not match the port is very rare, and therefore this change should be safe.	2021-06-15 13:19:41 +00:00
Tao Zhu	ae27fcbcda	replay stage feed back program cost (#17731 ) * replay stage feeds back realtime per-program execution cost to cost model; * program cost execution table is initialized into empty table, no longer populated with hardcoded numbers; * changed cost unit to microsecond, using value collected from mainnet; * add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.	2021-06-09 17:10:59 -05:00
Tyera Eulberg	544b3c0d17	Create solana-poh and move remaining rpc modules to solana-rpc (#17698 ) * Create solana-poh crate * Move BigTableUploadService to solana-ledger * Add solana-rpc to workspace * Move dependencies to solana-rpc * Move remaining rpc modules to solana-rpc * Single use statement solana-poh * Single use statement solana-rpc	2021-06-04 09:23:06 -06:00
Tao Zhu	b000d490ce	Cost Model to limit transactions which are not parallelizeable (#16694 ) * * Add following to banking_stage: 1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions. 2. CostTracker which is shared between threads, tracks transaction costs for each block. * replace hard coded program ID with id() calls * Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed. * Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table. * add test for cost_tracker atomically try_add operation, serves as safety guard for future changes * check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker; * bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations	2021-06-01 09:16:17 -05:00
Tyera Eulberg	9a5330b7eb	Move gossip modules into solana-gossip crate (#17352 ) * Move gossip modules to solana-gossip * Update Protocol abi digest due to move * Move gossip benches and hook up CI * Remove unneeded Result entries * Single use statements	2021-05-26 09:15:46 -06:00
behzad nouri	9d112cf41f	encapsulates purged values bookkeeping into crds module (#17265 ) For all code paths (gossip push, pull, purge, etc) that remove or override a crds value, it is necessary to record hash of values purged from crds table, in order to exclude them from subsequent pull-requests; otherwise the next pull request will likely return outdated values, wasting bandwidth: https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491 Currently this is done all over the place in multiple modules, and this has caused bugs in the past where purged values were not recorded. This commit encapsulated this bookkeeping into crds module, so that any code path which removes or overrides a crds value, also records the hash of purged value in-place.	2021-05-24 13:47:21 +00:00
Tyera Eulberg	827355a6b1	Create solana-rpc crate and move subscriptions (#17320 ) * Move non_circulating_supply to runtime * Add solana-rpc crate and move max_slots * Move subscriptions to solana-rpc * Single use statements	2021-05-19 00:54:28 -06:00
behzad nouri	1ac2a8cfa5	removes delayed crds inserts when upserting gossip table (#16806 ) It is crucial that VersionedCrdsValue::insert_timestamp does not go backward in time: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79 Otherwise methods such as get_votes and get_epoch_slots_since will break, which will break their downstream flow, including vote-listener and optimistic confirmation: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215 https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298 For that, Crds::new_versioned is intended to be called "atomically" with Crds::insert_verioned (as the comment already says so): https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129 However, currently this is violated in the code. For example, filter_pull_responses creates VersionedCrdsValues (with the current timestamp), then acquires an exclusive lock on gossip, then process_pull_responses writes those values to the crds table: https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392 Depending on the workload and lock contention, the insert_timestamps may well be in the past when these values finally are inserted into gossip. To avoid such scenarios, this commit: * removes Crds::new_versioned and Crd::insert_versioned. * makes VersionedCrdsValue constructor private, only invoked in Crds::insert, so that insert_timestamp is populated right before insert. This will improve insert_timestamp monotonicity as long as Crds::insert is not called with a stalled timestamp. Following commits may further improve this by calling timestamp() inside Crds::insert, and/or switching to std::time::Instant which guarantees monotonicity.	2021-04-28 11:56:13 +00:00
behzad nouri	03194145c0	removes first_coding_index from erasure recovery code (#16646 ) first_coding_index is the same as the set_index and is so redundant: https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60	2021-04-23 12:00:37 +00:00
behzad nouri	37b8587d4e	expands number of erasure coding shreds in the last batch in slots (#16484 ) Number of parity coding shreds is always less than the number of data shreds in FEC blocks: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719 Data shreds are batched in chunks of 32 shreds each: https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714 However the very last batch of data shreds in a slot can be small, in which case the loss rate can be exacerbated. This commit expands the number of coding shreds in the last FEC block in slots to: 64 - number of data shreds; so that FEC blocks are always 64 data and parity coding shreds each. As a consequence of this, the last FEC block has more parity coding shreds than data shreds. So for some shred indices we will have a coding shred but no data shreds. This should not cause any kind of overlapping FEC blocks as in: https://github.com/solana-labs/solana/pull/10095 since this is done only for the very last batch in a slot, and the next slot will reset the shred index.	2021-04-21 12:47:50 +00:00

1 2 3 4 5 ...

275 Commits