solana

Commit Graph

Author	SHA1	Message	Date
mergify[bot]	4f423a512a	v2.0: reworks max number of outgoing push messages (backport of #3016 ) (#3038 ) reworks max number of outgoing push messages (#3016) max_bytes for outgoing push messages is pretty outdated and does not allow gossip to function properly with current testnet cluster size. In particular it does not allow to clear out queue of pending push messages unless the new_push_messages function is called very frequently which involves repeatedly locking/unlocking CRDS table. Additionally leaving gossip entries in the queue for the next round will add delay to propagating push messages which can compound as messages go through several hops. (cherry picked from commit `489f483e1d`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2024-10-01 01:01:37 +00:00
mergify[bot]	246fe45d49	v2.0: removes early return if prune_messages are empty (backport of #3006 ) (#3011 ) removes early return if prune_messages are empty (#3006) Even if there are no outgoing prune messages we still need to generate outgoing push messages for packets just received, so the code should not early return here: https://github.com/anza-xyz/agave/blob/d2cc71f0d/gossip/src/cluster_info.rs#L2400-L2402 (cherry picked from commit `ce158213fd`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2024-09-30 20:36:04 +00:00
mergify[bot]	3aa2c0a6ef	v2.0: excludes node's pubkey from bloom filter of pruned origins (backport of #2990 ) (#3015 ) excludes node's pubkey from bloom filter of pruned origins (#2990) Bloom filter of pruned origins can return false positive for a node's own pubkey but a node should always be able to push its own values to other nodes in the cluster. (cherry picked from commit `bce28c0282`) Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2024-09-30 15:17:25 +00:00
mergify[bot]	2f2d7ec1ad	v2.0: gossip: demote invalid duplicate proof errors to info (backport of #2754 ) (#2759 ) gossip: demote invalid duplicate proof errors to info (#2754) * gossip: demote invalid duplicate proof errors to info * pr feedback: explicitly list every enum (cherry picked from commit `7b6e6c179f`) Co-authored-by: Ashwin Sekar <ashwin@anza.xyz>	2024-08-28 12:08:05 -04:00
mergify[bot]	11b87c1ba3	v2.0: patches bug causing false duplicate nodes error (backport of #2666 ) (#2681 ) * customizes override logic for gossip ContactInfo (#2579) If there are two running instances of the same node, we want the ContactInfo with more recent start time to be propagated through gossip regardless of wallclocks. The commit adds custom override logic for ContactInfo to first compare by outset timestamp. * updates ContactInfo.outset when hot-swapping identity (#2613) When hot-swapping identity, ContactInfo.outset should be updated so that the new ContactInfo overrides older node with the same pubkey. * patches bug causing false duplicate nodes error (#2666) The bootstrap code during the validator start pushes a contact-info with more recent timestamp to gossip. If the node is staked the contact-info lingers in gossip causing false duplicate node instances when the fully initialized node joins gossip later on. The commit refreshes the timestamp on contact-info so that it overrides the one pushed by bootstrap and avoid false duplicates error. --------- Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2024-08-23 20:18:47 +00:00
mergify[bot]	590a23c7d8	v2.0: gossip: ignore retransmitter signatures when comparing duplicate shreds (backport of #2673 ) (#2699 ) * gossip: ignore retransmitter signatures when comparing duplicate shreds (#2673) * gossip: ignore retransmitter signatures when comparing duplicate shreds * pr feedback: compare rest of payload instead of setting sig * pr feedback: remove dcou, pub(super) (cherry picked from commit `ff87ed9187`) # Conflicts: # ledger/src/shred.rs * fix conflicts --------- Co-authored-by: Ashwin Sekar <ashwin@anza.xyz> Co-authored-by: Ashwin Sekar <ashwin@solana.com>	2024-08-23 15:59:02 -04:00
mergify[bot]	f615f416a1	v2.0: checks for duplicate instances using the new ContactInfo (backport of #2506 ) (#2510 ) * checks for duplicate instances using the new ContactInfo (#2506) Working towards deprecating NodeInstance CRDS value, the commit adds check for duplicate instances using the new ContactInfo. (cherry picked from commit `1d825df4e1`) * removes unwarp --------- Co-authored-by: behzad nouri <behzadnouri@gmail.com>	2024-08-14 17:52:33 +00:00
mergify[bot]	96fe61ad79	v2.0: Remove assorted deprecated symbols (backport of #1995 ) (#2002 ) * Remove assorted deprecated symbols (#1995) * Remove deprecated method from program-test * Remove deprecated programs stuff from TestValidator * Remove deprecated method from TestValidator (deprecated in v1.11.0) * Remove deprecated re-export from cluster_info (cherry picked from commit `efdbdc00ce`) # Conflicts: # gossip/src/cluster_info.rs * Fix conflict --------- Co-authored-by: Tyera <tyera@anza.xyz>	2024-07-04 00:29:49 -06:00
mergify[bot]	5b9c5b965c	v2.0: gossip: do not allow duplicate proofs for incorrect shred versions (backport of #1931 ) (#1941 ) gossip: do not allow duplicate proofs for incorrect shred versions (#1931) * gossip: do not allow duplicate proofs for incorrect shred versions * pr feedback: refactor test function to take shred_version (cherry picked from commit `69ea21e947`) Co-authored-by: Ashwin Sekar <ashwin@anza.xyz>	2024-07-02 09:28:21 -04:00
Greg Cusack	ddb4b3b2b7	run full gossip node when running `solana-gossip rpc-url` (#1753 ) * send actual gossip ip:port when running solana-gossip rpc-url * refactor gossip_addr out from spy and rpc methods	2024-06-18 00:05:02 +00:00
Greg Cusack	f8e08228dc	add signing prefix for `PruneData` and verify `PruneData` with and without signing prefix (#1472 ) * add PruneData serializing with prefix and support verifying both serialized data with and without prefix * prefix PRUNE_DATA_PREFIX with 0xff	2024-06-11 09:16:14 -07:00
behzad nouri	2493542020	adds a metric for number of unverified gossip addresses (#1671 )	2024-06-10 21:59:42 +00:00
behzad nouri	329a186c50	pings received contact-infos on gossip socket address (#1615 )	2024-06-06 18:59:33 +00:00
Yihau Chen	c59fa1ea77	chore: remove unused clippy attributes, needless_collect (#1517 ) * remove clippy needless_collect * update comment	2024-06-03 13:24:20 +00:00
Yihau Chen	55aff7288e	chore: remove unused clippy attributes, float_cmp (#1515 ) remove clippy float_cmp	2024-05-29 12:39:30 +00:00
Kevin Heavey	cadba689cb	Make solana-frozen-abi optional in all remaining crates (#1278 ) * put most AbiExample derivations behind a cfg_attr * feature gate all `extern crate solana_frozen_abi_macro;` * use cfg_attr wherever we were deriving both AbiExample and AbiEnumVisitor * fix cases where AbiEnumVisitor was still being derived unconditionally * fix a case where AbiExample was derived unconditionally * fix more cases where both AbiEnumVisitor and AbiExample were derived unconditionally * two more cases where AbiExample and AbiEnumVisitor were unconditionally derived * fix remaining unconditional derivations of AbiEnumVisitor * fix cases where AbiExample is the first thing derived * fix most remaining unconditional derivations of AbiExample * move all `frozen_abi(digest =` behind cfg_attr * replace incorrect cfg with cfg_attr * fix one more unconditionally derived AbiExample * feature gate AbiExample impls * add frozen-abi feature to required Cargo.toml files * make frozen-abi features activate recursively * fmt * add missing feature gating * fix accidentally changed digest * activate frozen-abi in relevant test scripts * don't activate solana-program's frozen-abi in sdk dev-dependencies * update to handle AbiExample derivation on new AppendVecFileBacking enum * revert toml formatting * remove unused frozen-abi entries from address-lookup-table Cargo.toml * remove toml references to solana-address-lookup-table-program/frozen-abi * update lock file * remove no-longer-used generic param	2024-05-17 14:42:58 +02:00
behzad nouri	5b9b1ca81d	obtains node's client version from the new contact-info (#1128 ) The new contact-info embeds client version and can be used directly to lookup the client version.	2024-04-30 22:56:21 +00:00
behzad nouri	c05ff2dd0f	removes gossip::LegacyContactInfo from public interface (#1089 ) LegacyContactInfo is deprecated and no longer used outside of the gossip crate. The commit removes LegacyContactInfo from the public interface.	2024-04-30 17:07:42 +00:00
behzad nouri	443bb6c1dc	migrates to the new contact-info (#823 ) The commit replaces (most) uses of LegacyContactInfo with the new ContactInfo.	2024-04-24 18:47:04 +00:00
behzad nouri	b86661b9df	adds legacy contact-info api to the new one (#916 ) Working towards migrating to the new contact-info, the commit expands api parity between the legacy and the new one.	2024-04-19 16:46:22 +00:00
behzad nouri	c7a5199f49	removes #[derive(Hash)] from LegacyContactInfo (#869 ) Hash trait is not necessary (and pretty slow) for LegacyContactInfo and removing it will simplify new ContactInfo migration.	2024-04-18 14:30:05 +00:00
behzad nouri	1a69c3486b	removes #[derive(Ord, PartialOrd)] from LegacyContactInfo (#868 ) Ord and PartialOrd traits are not necessary for LegacyContactInfo and removing them will simplify new ContactInfo migration.	2024-04-17 19:33:04 +00:00
behzad nouri	50f10284bb	allows gossip pull requests with new contact-info (#803 ) Current code is only allowing gossip pull requests with legacy contact-info: https://github.com/anza-xyz/agave/blob/8c5a33a81/gossip/src/cluster_info.rs#L1958-L1966 Working towards migrating to the new contact-info, the commit allows gossip pull requests with both legacy and new contact-infos.	2024-04-15 17:37:26 +00:00
steviez	7138ea7517	Plumb CLI arg to control number of TVU receive threads/sockets (#550 ) The parameter directly controls the number of sockets that are created; the sockets later have one thread created per socket to listen.	2024-04-15 16:56:10 +02:00
carllin	d5c291a934	Remove send snapshot hard unwrap (#326 )	2024-04-11 12:18:52 -04:00
behzad nouri	293414f482	pads last erasure batch with empty data shreds (#639 ) For duplicate blocks prevention we want to verify that the last erasure batch was sufficiently propagated through turbine. This requires additional bookkeeping because, depending on the erasure coding schema, the entire batch might be recovered from only a few coding shreds. In order to simplify above, this commit instead ensures that the last erasure batch has >= 32 data shreds so that the batch cannot be recovered unless 32+ shreds are received from turbine or repair.	2024-04-11 14:50:43 +00:00
Andrew Fitzgerald	1744e9efd7	BankingStage Forwarding Filter (#685 ) * add PacketFlags::FROM_STAKED_NODE * Only forward packets from staked node * fix local-cluster test forwarding * review comment * tpu_votes get marked as from_staked_node	2024-04-09 23:12:26 +00:00
steviez	64765bf817	Introduce NodeConfig for parameters to Node type (#533 ) The parameter list is already kind of long, so squash the parameters into a config struct	2024-04-02 11:59:03 -05:00
steviez	79e316eb56	Reduce the default number of IP echo server threads (#354 ) The IP echo server currently spins up a worker thread for every thread on the machine. Observing some data for nodes, - MNB validators and RPC nodes look to get several hundred of these requests per day - MNB entrypoint nodes look to get 2-3 requests per second on average In both instances, the current threadpool is severely overprovisioned which is a waste of resources. This PR plumnbs a flag to control the number of worker threads for this pool as well as setting a default of two threads for this server. Two threads allow for one thread to always listen on the TCP port while the other thread processes requests	2024-04-01 10:24:59 -05:00
Greg Cusack	04feed2cf5	add metric for duplicate push messages (#321 ) * add metric for duplicate push messages * add in num_total_push * address comments. don't lock stats each time * address comments. remove num_total_push * change dup push message name in code to reflect metric name	2024-03-29 12:12:12 -07:00
behzad nouri	30eecd62b1	implements weighted shuffle using N-ary tree (#259 ) This is port of firedancer's implementation of weighted shuffle: https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c https://github.com/anza-xyz/agave/pull/185 implemented weighted shuffle using binary tree. Though asymptotically a binary tree has better performance, compared to a Fenwick tree, it has less cache locality resulting in smaller improvements and in particular slower WeightedShuffle::new. In order to improve cache locality and reduce the overheads of traversing the tree, this commit instead uses a generalized N-ary tree with fanout of 16, showing significant improvements in both WeightedShuffle::new and WeightedShuffle::shuffle. With 4000 weights: N-ary tree (fanout 16): test bench_weighted_shuffle_new ... bench: 36,244 ns/iter (+/- 243) test bench_weighted_shuffle_shuffle ... bench: 149,082 ns/iter (+/- 1,474) Binary tree: test bench_weighted_shuffle_new ... bench: 58,514 ns/iter (+/- 229) test bench_weighted_shuffle_shuffle ... bench: 269,961 ns/iter (+/- 16,446) Fenwick tree: test bench_weighted_shuffle_new ... bench: 39,413 ns/iter (+/- 179) test bench_weighted_shuffle_shuffle ... bench: 364,771 ns/iter (+/- 2,078) The improvements become even more significant as there are more items to shuffle. With 20_000 weights: N-ary tree (fanout 16): test bench_weighted_shuffle_new ... bench: 200,659 ns/iter (+/- 4,395) test bench_weighted_shuffle_shuffle ... bench: 941,928 ns/iter (+/- 26,492) Binary tree: test bench_weighted_shuffle_new ... bench: 881,114 ns/iter (+/- 12,343) test bench_weighted_shuffle_shuffle ... bench: 1,822,257 ns/iter (+/- 12,772) Fenwick tree: test bench_weighted_shuffle_new ... bench: 276,936 ns/iter (+/- 14,692) test bench_weighted_shuffle_shuffle ... bench: 2,644,713 ns/iter (+/- 49,252)	2024-03-26 05:21:54 +00:00
behzad nouri	b6d2237403	implements weighted shuffle using binary tree (#185 ) This is partial port of firedancer's implementation of weighted shuffle: https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c Though Fenwick trees use less space, inverse queries require an additional O(log n) factor for binary search resulting an overall O(n log n log n) performance for weighted shuffle. This commit instead uses a binary tree where each node contains the sum of all weights in its left sub-tree. The weights themselves are implicitly stored at the leaves. Inverse queries and updates to the tree all can be done O(log n) resulting an overall O(n log n) weighted shuffle implementation. Based on benchmarks, this results in 24% improvement in WeightedShuffle::shuffle: Fenwick tree: test bench_weighted_shuffle_new ... bench: 36,686 ns/iter (+/- 191) test bench_weighted_shuffle_shuffle ... bench: 342,625 ns/iter (+/- 4,067) Binary tree: test bench_weighted_shuffle_new ... bench: 59,131 ns/iter (+/- 362) test bench_weighted_shuffle_shuffle ... bench: 260,194 ns/iter (+/- 11,195) Though WeightedShuffle::new is now slower, it generally can be cached and reused as in Turbine: https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68 Additionally the new code has better asymptotic performance. For example with 20_000 weights WeightedShuffle::shuffle is 31% faster: Fenwick tree: test bench_weighted_shuffle_new ... bench: 255,071 ns/iter (+/- 9,591) test bench_weighted_shuffle_shuffle ... bench: 2,466,058 ns/iter (+/- 9,873) Binary tree: test bench_weighted_shuffle_new ... bench: 830,727 ns/iter (+/- 10,210) test bench_weighted_shuffle_shuffle ... bench: 1,696,160 ns/iter (+/- 75,271)	2024-03-23 13:53:46 +00:00
carllin	e963f87da9	Evict oldest vote on vote refresh after restart (#327 )	2024-03-21 17:54:17 -04:00
Greg Cusack	792d7454d9	switch to `solana-tpu-client` from `solana_client::tpu_client` for `bench-tps`, `dos/`, `LocalCluster`, `gossip/` (#310 ) * switch over to solana-tpu-client for bench-tps, dos, gossip, local-cluster * put TpuClientWrapper back in solana_client	2024-03-21 09:25:54 -07:00
sakridge	b3fd87fe81	Fix gossip contact trace (#241 )	2024-03-14 19:43:59 +01:00
Greg Cusack	d49ceb0e3f	Add in metrics for detecting Redundant Pulls (#199 )	2024-03-14 11:22:52 -07:00
Yihau Chen	51dc7e6fb7	[anza migration]: add 'agave=info' to default log level (#223 )	2024-03-14 20:35:33 +08:00
Greg Cusack	151675b5ca	update changelog and remove deprecated label on `gossip_service::get_client()` (#227 ) update changelog and remove deprecated label on get_client	2024-03-13 13:26:54 -07:00
Greg Cusack	218de23ce2	Remove `ThinClient` from `dos/` (#117 ) * remove `ThinClient` from `dos/` and replace `ThinClient` with `TpuClient` * remove test for valid_client_facing_addr since it is no longer used	2024-03-11 18:19:48 -04:00
Greg Cusack	209924d220	bump deprecated version numbers for `get_client` and `get_multi_client` (#184 ) bump deprecated version numbers	2024-03-11 15:33:19 -04:00
behzad nouri	f205d0e729	expands weighted-shuffle benchmarks (#179 ) Adding separate benchmarks for WeightedShuffle::new and WeightedShuffle::shuffle.	2024-03-11 18:49:35 +00:00
Greg Cusack	00c984fe4d	deprecate `get_client` and `get_multi_client` (#177 ) deprecate get_client and get_multi_client	2024-03-11 13:13:56 -04:00
steviez	7d6f1d5911	Give streamer::receiver() threads unique names (#35369 ) The name was previously hard-coded to solReceiver. The use of the same name makes it hard to figure out which thread is which when these threads are handling many services (Gossip, Tvu, etc).	2024-03-01 13:36:08 -06:00
Brooks	c8cdd0087f	Removes pushing and pulling account hashes in gossip (#34979 )	2024-01-29 17:19:55 -05:00
behzad nouri	79bbe4381a	adds chained_merkle_root to shredder arguments (#34952 ) Working towards chaining Merkle root of erasure batches, the commit adds chained_merkle_root to shredder arguments.	2024-01-27 15:04:31 +00:00
Ashwin Sekar	93271d91b0	gossip: notify state machine of duplicate proofs (#32963 ) * gossip: notify state machine of duplicate proofs * Add feature flag for ingesting duplicate proofs from Gossip. * Use the Epoch the shred is in instead of the root bank epoch. * Fix unittest by activating the feature. * Add a test for feature disabled case. * EpochSchedule is now not copyable, clone it explicitly. * pr feedback: read epoch schedule on startup, add guard for ff recache * pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff * pr feedback: bank.forks_try_read() -> read() * pr feedback: fix local-cluster setup * local-cluster: do not expose gossip internals, use retry mechanism instead * local-cluster: split out case 4b into separate test and ignore * pr feedback: avoid taking lock if ff is already found * pr feedback: do not cache ff epoch * pr feedback: bank_forks lock, revert to cached_slots_in_epoch * pr feedback: move local variable into helper function * pr feedback: use let else, remove epoch 0 hack --------- Co-authored-by: Wen <crocoxu@gmail.com>	2024-01-26 07:58:37 -08:00
Wen	0d92254736	Add push_heaviest_fork and get_heaviest_fork. (#34892 ) Add push_get_heaviest_fork and push_get_heaviest_fork.	2024-01-24 08:57:50 -08:00
Wen	4a2871f384	Add RestartHeaviestFork to Gossip (#34161 ) * Add RestartHeaviestFork to Gossip. * Add a test for out of bound value. * Send observed_stake and total_epoch_stake in ResatartHeaviestFork. * Remove total_epoch_stake from RestartHeaviestFork. * Forgot to update ABI digest. * Remove checking of whether stake is zero. * Remove unnecessary new function and make new_rand pub(crate).	2024-01-19 13:59:25 -08:00
Greg Cusack	8ed149a3f2	Add ContactInfo handling for shred versioning (#34286 ) * handle ContactInfo in places where only LegacyContactInfo was used * missed a spot * missed a spot * import contact info for crds lookup * cargo fmt * rm contactinfo from crds_entry. not supported yet * typo * remove crds.nodes insert for ContactInfo. not supported yet * forgot to remove clusterinfo in remove() * move around contactinfo match arm * remove contactinfo updating crds.shred_version	2023-12-21 14:15:50 -08:00
GoodDaisy	03386cc7b9	Fix typos (#34459 ) * Fix typos * Fix typos * fix typo	2023-12-21 13:06:00 -07:00

1 2 3 4 5 ...

472 Commits