solana-with-rpc-optimizations

Commit Graph

Author	SHA1	Message	Date
behzad nouri	1a69c3486b	removes #[derive(Ord, PartialOrd)] from LegacyContactInfo (#868 ) Ord and PartialOrd traits are not necessary for LegacyContactInfo and removing them will simplify new ContactInfo migration.	2024-04-17 19:33:04 +00:00
behzad nouri	50f10284bb	allows gossip pull requests with new contact-info (#803 ) Current code is only allowing gossip pull requests with legacy contact-info: https://github.com/anza-xyz/agave/blob/8c5a33a81/gossip/src/cluster_info.rs#L1958-L1966 Working towards migrating to the new contact-info, the commit allows gossip pull requests with both legacy and new contact-infos.	2024-04-15 17:37:26 +00:00
steviez	7138ea7517	Plumb CLI arg to control number of TVU receive threads/sockets (#550 ) The parameter directly controls the number of sockets that are created; the sockets later have one thread created per socket to listen.	2024-04-15 16:56:10 +02:00
carllin	d5c291a934	Remove send snapshot hard unwrap (#326 )	2024-04-11 12:18:52 -04:00
behzad nouri	293414f482	pads last erasure batch with empty data shreds (#639 ) For duplicate blocks prevention we want to verify that the last erasure batch was sufficiently propagated through turbine. This requires additional bookkeeping because, depending on the erasure coding schema, the entire batch might be recovered from only a few coding shreds. In order to simplify above, this commit instead ensures that the last erasure batch has >= 32 data shreds so that the batch cannot be recovered unless 32+ shreds are received from turbine or repair.	2024-04-11 14:50:43 +00:00
Andrew Fitzgerald	1744e9efd7	BankingStage Forwarding Filter (#685 ) * add PacketFlags::FROM_STAKED_NODE * Only forward packets from staked node * fix local-cluster test forwarding * review comment * tpu_votes get marked as from_staked_node	2024-04-09 23:12:26 +00:00
steviez	64765bf817	Introduce NodeConfig for parameters to Node type (#533 ) The parameter list is already kind of long, so squash the parameters into a config struct	2024-04-02 11:59:03 -05:00
steviez	79e316eb56	Reduce the default number of IP echo server threads (#354 ) The IP echo server currently spins up a worker thread for every thread on the machine. Observing some data for nodes, - MNB validators and RPC nodes look to get several hundred of these requests per day - MNB entrypoint nodes look to get 2-3 requests per second on average In both instances, the current threadpool is severely overprovisioned which is a waste of resources. This PR plumnbs a flag to control the number of worker threads for this pool as well as setting a default of two threads for this server. Two threads allow for one thread to always listen on the TCP port while the other thread processes requests	2024-04-01 10:24:59 -05:00
Greg Cusack	04feed2cf5	add metric for duplicate push messages (#321 ) * add metric for duplicate push messages * add in num_total_push * address comments. don't lock stats each time * address comments. remove num_total_push * change dup push message name in code to reflect metric name	2024-03-29 12:12:12 -07:00
behzad nouri	30eecd62b1	implements weighted shuffle using N-ary tree (#259 ) This is port of firedancer's implementation of weighted shuffle: https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c https://github.com/anza-xyz/agave/pull/185 implemented weighted shuffle using binary tree. Though asymptotically a binary tree has better performance, compared to a Fenwick tree, it has less cache locality resulting in smaller improvements and in particular slower WeightedShuffle::new. In order to improve cache locality and reduce the overheads of traversing the tree, this commit instead uses a generalized N-ary tree with fanout of 16, showing significant improvements in both WeightedShuffle::new and WeightedShuffle::shuffle. With 4000 weights: N-ary tree (fanout 16): test bench_weighted_shuffle_new ... bench: 36,244 ns/iter (+/- 243) test bench_weighted_shuffle_shuffle ... bench: 149,082 ns/iter (+/- 1,474) Binary tree: test bench_weighted_shuffle_new ... bench: 58,514 ns/iter (+/- 229) test bench_weighted_shuffle_shuffle ... bench: 269,961 ns/iter (+/- 16,446) Fenwick tree: test bench_weighted_shuffle_new ... bench: 39,413 ns/iter (+/- 179) test bench_weighted_shuffle_shuffle ... bench: 364,771 ns/iter (+/- 2,078) The improvements become even more significant as there are more items to shuffle. With 20_000 weights: N-ary tree (fanout 16): test bench_weighted_shuffle_new ... bench: 200,659 ns/iter (+/- 4,395) test bench_weighted_shuffle_shuffle ... bench: 941,928 ns/iter (+/- 26,492) Binary tree: test bench_weighted_shuffle_new ... bench: 881,114 ns/iter (+/- 12,343) test bench_weighted_shuffle_shuffle ... bench: 1,822,257 ns/iter (+/- 12,772) Fenwick tree: test bench_weighted_shuffle_new ... bench: 276,936 ns/iter (+/- 14,692) test bench_weighted_shuffle_shuffle ... bench: 2,644,713 ns/iter (+/- 49,252)	2024-03-26 05:21:54 +00:00
behzad nouri	b6d2237403	implements weighted shuffle using binary tree (#185 ) This is partial port of firedancer's implementation of weighted shuffle: https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c Though Fenwick trees use less space, inverse queries require an additional O(log n) factor for binary search resulting an overall O(n log n log n) performance for weighted shuffle. This commit instead uses a binary tree where each node contains the sum of all weights in its left sub-tree. The weights themselves are implicitly stored at the leaves. Inverse queries and updates to the tree all can be done O(log n) resulting an overall O(n log n) weighted shuffle implementation. Based on benchmarks, this results in 24% improvement in WeightedShuffle::shuffle: Fenwick tree: test bench_weighted_shuffle_new ... bench: 36,686 ns/iter (+/- 191) test bench_weighted_shuffle_shuffle ... bench: 342,625 ns/iter (+/- 4,067) Binary tree: test bench_weighted_shuffle_new ... bench: 59,131 ns/iter (+/- 362) test bench_weighted_shuffle_shuffle ... bench: 260,194 ns/iter (+/- 11,195) Though WeightedShuffle::new is now slower, it generally can be cached and reused as in Turbine: https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68 Additionally the new code has better asymptotic performance. For example with 20_000 weights WeightedShuffle::shuffle is 31% faster: Fenwick tree: test bench_weighted_shuffle_new ... bench: 255,071 ns/iter (+/- 9,591) test bench_weighted_shuffle_shuffle ... bench: 2,466,058 ns/iter (+/- 9,873) Binary tree: test bench_weighted_shuffle_new ... bench: 830,727 ns/iter (+/- 10,210) test bench_weighted_shuffle_shuffle ... bench: 1,696,160 ns/iter (+/- 75,271)	2024-03-23 13:53:46 +00:00
carllin	e963f87da9	Evict oldest vote on vote refresh after restart (#327 )	2024-03-21 17:54:17 -04:00
Greg Cusack	792d7454d9	switch to `solana-tpu-client` from `solana_client::tpu_client` for `bench-tps`, `dos/`, `LocalCluster`, `gossip/` (#310 ) * switch over to solana-tpu-client for bench-tps, dos, gossip, local-cluster * put TpuClientWrapper back in solana_client	2024-03-21 09:25:54 -07:00
sakridge	b3fd87fe81	Fix gossip contact trace (#241 )	2024-03-14 19:43:59 +01:00
Greg Cusack	d49ceb0e3f	Add in metrics for detecting Redundant Pulls (#199 )	2024-03-14 11:22:52 -07:00
Yihau Chen	51dc7e6fb7	[anza migration]: add 'agave=info' to default log level (#223 )	2024-03-14 20:35:33 +08:00
Greg Cusack	151675b5ca	update changelog and remove deprecated label on `gossip_service::get_client()` (#227 ) update changelog and remove deprecated label on get_client	2024-03-13 13:26:54 -07:00
Greg Cusack	218de23ce2	Remove `ThinClient` from `dos/` (#117 ) * remove `ThinClient` from `dos/` and replace `ThinClient` with `TpuClient` * remove test for valid_client_facing_addr since it is no longer used	2024-03-11 18:19:48 -04:00
Greg Cusack	209924d220	bump deprecated version numbers for `get_client` and `get_multi_client` (#184 ) bump deprecated version numbers	2024-03-11 15:33:19 -04:00
behzad nouri	f205d0e729	expands weighted-shuffle benchmarks (#179 ) Adding separate benchmarks for WeightedShuffle::new and WeightedShuffle::shuffle.	2024-03-11 18:49:35 +00:00
Greg Cusack	00c984fe4d	deprecate `get_client` and `get_multi_client` (#177 ) deprecate get_client and get_multi_client	2024-03-11 13:13:56 -04:00
steviez	7d6f1d5911	Give streamer::receiver() threads unique names (#35369 ) The name was previously hard-coded to solReceiver. The use of the same name makes it hard to figure out which thread is which when these threads are handling many services (Gossip, Tvu, etc).	2024-03-01 13:36:08 -06:00
Brooks	c8cdd0087f	Removes pushing and pulling account hashes in gossip (#34979 )	2024-01-29 17:19:55 -05:00
behzad nouri	79bbe4381a	adds chained_merkle_root to shredder arguments (#34952 ) Working towards chaining Merkle root of erasure batches, the commit adds chained_merkle_root to shredder arguments.	2024-01-27 15:04:31 +00:00
Ashwin Sekar	93271d91b0	gossip: notify state machine of duplicate proofs (#32963 ) * gossip: notify state machine of duplicate proofs * Add feature flag for ingesting duplicate proofs from Gossip. * Use the Epoch the shred is in instead of the root bank epoch. * Fix unittest by activating the feature. * Add a test for feature disabled case. * EpochSchedule is now not copyable, clone it explicitly. * pr feedback: read epoch schedule on startup, add guard for ff recache * pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff * pr feedback: bank.forks_try_read() -> read() * pr feedback: fix local-cluster setup * local-cluster: do not expose gossip internals, use retry mechanism instead * local-cluster: split out case 4b into separate test and ignore * pr feedback: avoid taking lock if ff is already found * pr feedback: do not cache ff epoch * pr feedback: bank_forks lock, revert to cached_slots_in_epoch * pr feedback: move local variable into helper function * pr feedback: use let else, remove epoch 0 hack --------- Co-authored-by: Wen <crocoxu@gmail.com>	2024-01-26 07:58:37 -08:00
Wen	0d92254736	Add push_heaviest_fork and get_heaviest_fork. (#34892 ) Add push_get_heaviest_fork and push_get_heaviest_fork.	2024-01-24 08:57:50 -08:00
Wen	4a2871f384	Add RestartHeaviestFork to Gossip (#34161 ) * Add RestartHeaviestFork to Gossip. * Add a test for out of bound value. * Send observed_stake and total_epoch_stake in ResatartHeaviestFork. * Remove total_epoch_stake from RestartHeaviestFork. * Forgot to update ABI digest. * Remove checking of whether stake is zero. * Remove unnecessary new function and make new_rand pub(crate).	2024-01-19 13:59:25 -08:00
Greg Cusack	8ed149a3f2	Add ContactInfo handling for shred versioning (#34286 ) * handle ContactInfo in places where only LegacyContactInfo was used * missed a spot * missed a spot * import contact info for crds lookup * cargo fmt * rm contactinfo from crds_entry. not supported yet * typo * remove crds.nodes insert for ContactInfo. not supported yet * forgot to remove clusterinfo in remove() * move around contactinfo match arm * remove contactinfo updating crds.shred_version	2023-12-21 14:15:50 -08:00
GoodDaisy	03386cc7b9	Fix typos (#34459 ) * Fix typos * Fix typos * fix typo	2023-12-21 13:06:00 -07:00
Andrew Fitzgerald	f0ff69b9cb	Rename: AtomicBloom to ConcurrentBloom (#34483 )	2023-12-21 06:59:20 -08:00
Lucas Steuernagel	b97b3dd4ab	Use BankForks on tests - Part 3 (#34248 ) * Add BankForks to core tests * Refactor functions under DCOU	2023-12-01 13:47:22 -03:00
Greg Cusack	0a2ff8525a	Increase pull request clusterinfo probability (#34231 ) * ensure new contactinfo propagated quicker when handling pull requests * improve readability	2023-11-28 16:08:12 -08:00
Wen	ae4b62c6f5	Move Gossip values added for wen_retart into restart_crds_values. (#34128 ) * HvA9J * Rename file and change orders of definitions. * Use .from() on u16 to usize which shouldn't fail. * Update ABI congest.	2023-11-17 10:13:25 -08:00
Ashwin Sekar	ca6ab08555	gossip: process duplicate proofs for merkle root conflicts (#34066 ) * gossip: process duplicate proofs for merkle root conflicts * pr comments + abi	2023-11-17 11:59:53 -05:00
Wen	3081b4378d	Add push and get methods for RestartLastVotedForkSlots (#33613 ) * Add push and get methods for RestartLastVotedForkSlots * Improve expression format. * Remove fill() from RestartLastVotedForkSlots and move into constructor. * Update ABI signature. * Use flate2 compress directly instead of relying on CompressedSlots. * Make constructor of RestartLastVotedForkSlots return error if necessary. * Use minmax and remove unnecessary code. * Replace flate2 with run-length encoding in RestartLastVotedForkSlots. * Remove accidentally added file. * The passed in last_voted_fork don't need to be mutable any more. * Switch to different type of run-length encoding. * Fix typo. * Move constant into RestartLastVotedForkSlots. * Use BitVec in RawOffsets. * Remove the unnecessary clone. * Use iter functions for RLE. * Use take_while instead of loop. * Change Run length encoding to iterator implementation. * Allow one slot in RestartLastVotedForkSlots. * Various simplifications. * Fix various errors and use customized error type. * Various simplifications. * Return error from push_get_restart_last_voted_fork_slots and remove unnecessary constraints in to_slots. * Allow 81k slots on RestartLastVotedForkSlots. * Limit MAX_SLOTS to 65535 so we can go back to u16. * Use u16::MAX instead of 65535.	2023-11-16 12:35:34 -08:00
behzad nouri	ba0a49b436	propagates the new contact-info through gossip (#34092 ) Working towards migrating from legacy contact-info to the new contact-info: https://github.com/solana-labs/solana/pull/29596	2023-11-15 19:02:21 +00:00
steviez	b91da2242d	Change Blockstore max_root from RwLock<Slot> to AtomicU64 (#33998 ) The Blockstore currently maintains a RwLock<Slot> of the maximum root it has seen inserted. The value is initialized during Blockstore::open() and updated during calls to Blockstore::set_roots(). The max root is queried fairly often for several use cases, and caching the value is cheaper than constructing an iterator to look it up every time. However, the access patterns of these RwLock match that of an atomic. That is, there is no critical section of code that is run while the lock is head. Rather, read/write locks are acquired in order to read/ update, respectively. So, change the RwLock<u64> to an AtomicU64.	2023-11-10 17:27:43 -06:00
Jeff Biseda	3f805ad06d	improve batch_send error handling (#33936 )	2023-10-31 23:39:26 -07:00
Pankaj Garg	9d42cd7efe	Initialize fork graph in program cache during bank_forks creation (#33810 ) * Initialize fork graph in program cache during bank_forks creation * rename BankForks::new to BankForks::new_rw_arc * fix compilation * no need to set fork_graph on insert() * fix partition tests	2023-10-23 09:32:41 -07:00
behzad nouri	afd044e296	removes redundant ClusterInfo::drain_push_queue (#33753 )	2023-10-18 18:53:58 +00:00
behzad nouri	2465abce5c	simplifies pull-responses handling (#33743 ) Following: https://github.com/solana-labs/solana/pull/33722 from pubkey in PullResponse is no longer used in processing pull-responses and so the code can be simplified.	2023-10-18 18:06:14 +00:00
behzad nouri	c699bc9cab	down samples outgoing gossip pull requests (#33719 ) Push message propagation has improved in recent versions of the gossip code and we don't rely on pull requests as much as before. Handling pull requests is also inefficient and expensive. The commit reduces number of outgoing pull requests by down sampling.	2023-10-18 13:41:42 +00:00
Greg Cusack	6efc7ec61d	remove redundant pubkey update record (#33722 ) * remove redundant pubkey update record * from became unused, so removed from all process_pull_response() calls	2023-10-17 10:34:12 -07:00
Wen	0a3810854f	Add RestartLastVotedForkSlots for wen_restart. (#33239 ) * Add RestartLastVotedForkSlots and RestartHeaviestFork for wen_restart. * Fix linter errors. * Revert RestartHeaviestFork, it will be added in another PR. * Update frozen abi message. * Fix wrong number in test generation, change to pub(crate) to limit scope. * Separate push_epoch_slots and push_restart_last_voted_fork_slots. * Add RestartLastVotedForkSlots data structure. * Remove unused parts to make PR smaller. * Remove unused clone. * Use CompressedSlotsVec to share code between EpochSlots and RestartLastVotedForkSlots. * Add total_messages to show how many messages are there. * Reduce RestartLastVotedForkSlots to one packet (16k slots). * Replace last_vote_slot with shred_version, revert CompressedSlotsVec.	2023-10-09 15:01:50 -07:00
behzad nouri	1d91b60a57	removes unused legacy-snapshot-hashes api in gossip (#33593 ) https://github.com/solana-labs/solana/pull/33576 stops broadcasting legacy snapshot hashes over gossip, and this commit removes unused legacy snapshot hashed code in gossip.	2023-10-09 15:22:34 +00:00
Wen	630feeddf2	Add wen_restart module (#33344 ) * Add wen_restart module: - Implement reading LastVotedForkSlots from blockstore. - Add proto file to record the intermediate results. - Also link wen_restart into validator. - Move recreation of tower outside replay_stage so we can get last_vote. * Update lock file. * Fix linter errors. * Fix depencies order. * Update wen_restart explanation and small fixes. * Generate tower outside tvu. * Update validator/src/cli.rs Co-authored-by: Tyera <teulberg@gmail.com> * Update wen-restart/protos/wen_restart.proto Co-authored-by: Tyera <teulberg@gmail.com> * Update wen-restart/build.rs Co-authored-by: Tyera <teulberg@gmail.com> * Update wen-restart/src/wen_restart.rs Co-authored-by: Tyera <teulberg@gmail.com> * Rename proto directory. * Rename InitRecord to MyLastVotedForkSlots, add imports. * Update wen-restart/Cargo.toml Co-authored-by: Tyera <teulberg@gmail.com> * Update wen-restart/src/wen_restart.rs Co-authored-by: Tyera <teulberg@gmail.com> * Move prost-build dependency to project toml. * No need to continue if the distance between slot and last_vote is already larger than MAX_SLOTS_ON_VOTED_FORKS. * Use 16k slots instead of 81k slots, a few more wording changes. * Use AncestorIterator which does the same thing. * Update Cargo.lock * Update Cargo.lock --------- Co-authored-by: Tyera <teulberg@gmail.com>	2023-10-06 15:04:37 -07:00
Greg Cusack	1261b3d496	gossip test update (#33431 ) fix bug in gossip test	2023-09-29 08:57:32 -07:00
Pankaj Garg	f50342a790	Split vote related code from runtime to its own crate (#32882 ) * Move vote related code to its own crate * Update imports in code and tests * update programs/sbf/Cargo.lock * fix check errors * update abi_digest * rebase fixes * fixes after rebase	2023-09-19 10:46:37 -07:00
behzad nouri	528a03f32a	removes outdated matches crate from dependencies (#33172 ) removes outdated matches crate from the dependencies std::matches has been stable since rust 1.42.0. Other use-cases are covered by assert_matches crate.	2023-09-07 12:52:57 +00:00
Alexander Meißner	9e703f85de	Upgrades Rust to 1.72.0 & nightly-2023-08-25 (#32961 ) * allow pedantic invalid cast lint * allow lint with false-positive triggered by `test-case` crate * nightly `fmt` correction * adapt to rust layout changes * remove dubious test * Use transmute instead of pointer cast and de/ref when check_aligned is false. * Renames clippy::integer_arithmetic to clippy::arithmetic_side_effects. * bump rust nightly to 2023-08-25 * Upgrades Rust to 1.72.0 --------- Co-authored-by: Trent Nelson <trent@solana.com>	2023-09-01 07:26:13 +00:00

1 2 3 4 5 ...

451 Commits