solana

Commit Graph

Author	SHA1	Message	Date
carllin	c2e8814dce	Add limit and shrink policy for recycler (#15320 )	2021-02-24 00:15:58 -08:00
Michael Vines	5df36aec7d	Pacify clippy	2021-02-19 20:08:41 -08:00
behzad nouri	aa3aac766f	adds metrics for inbound/outbound gossip packets counts (#15407 )	2021-02-19 22:49:35 +00:00
behzad nouri	076c20f1ca	checks that prune-messages have the same inner/outer pubkey (#15352 )	2021-02-16 21:06:18 +00:00
behzad nouri	0ad063f4e9	adds flag to disable duplicate instance check (#15006 )	2021-02-03 16:26:17 +00:00
dependabot[bot]	1df93fa2be	chore: bump serde from 1.0.112 to 1.0.118 (#14828 ) * chore: bump serde from 1.0.112 to 1.0.122 Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.122. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.122) Signed-off-by: dependabot[bot] <support@github.com> * [auto-commit] Update all Cargo lock files * Update frozen_abi digest following serde update * Revert "chore: bump serde from 1.0.112 to 1.0.122" This reverts commit a3ef4442a4c985144ae2bd7ceaf8899a7ab8d7c0. * Revert "[auto-commit] Update all Cargo lock files" This reverts commit c41c3b005fb1ccade55155302c52cd5736c4b55f. * chore: bump serde from 1.0.112 to 1.0.118 Bumps [serde](https://github.com/serde-rs/serde) from 1.0.112 to 1.0.118. - [Release notes](https://github.com/serde-rs/serde/releases) - [Commits](https://github.com/serde-rs/serde/compare/v1.0.112...v1.0.118) Signed-off-by: dependabot[bot] <support@github.com> * [auto-commit] Update all Cargo lock files * Remove serum-dex pinning * blind commit! Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com> Co-authored-by: Ryo Onodera <ryoqun@gmail.com>	2021-02-02 23:28:16 +09:00
behzad nouri	e1021d9f83	removes redundant epoch stakes cache in retransmit (#14781 ) Following `d6d76219b`, staked nodes computed from vote accounts are already cached in runtime::Stakes, so the caching in retransmit_stage is redundant.	2021-01-24 21:15:09 +00:00
behzad nouri	491b059755	broadcasts duplicate shreds through gossip (#14699 )	2021-01-24 15:47:43 +00:00
behzad nouri	8e581601d6	patches crds vote-index assignment bug (#14438 ) If tower is full, old votes are evicted from the front of the deque: https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373 whereas recent votes if expire are evicted from the back: https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537 As a result, from a single tower_index scalar, we cannot infer which crds-vote should be overwritten: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576 In addition there is an off by one bug in the existing code. tower_index is bounded by MAX_LOCKOUT_HISTORY - 1: https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382 So, it is at most 30, whereas MAX_VOTES is 32: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29 Which means that this branch is never taken: https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593 so crds table alwasys keeps 29 oldest votes by wallclock, and then only overrides the 30st one each time. (i.e a tally of only two most recent votes).	2021-01-21 13:08:07 +00:00
behzad nouri	b5fd0ed859	rewrites turbine retransmit peers computation (#14584 )	2021-01-19 04:18:47 +00:00
Michael Vines	9ddd6f08e8	Persist gossip contact info	2020-12-27 20:46:54 -08:00
behzad nouri	2fd38d9912	indexes votes in crds table (#14272 )	2020-12-27 13:31:05 +00:00
behzad nouri	49019c6613	obtains staked-nodes from the root-bank (#14257 ) ... as opposed to the working bank	2020-12-27 13:28:05 +00:00
Michael Vines	ace360ade2	Multiple entrypoint support	2020-12-22 18:35:31 -08:00
Michael Vines	3373082ffa	Update entrypoint contact info even when shred version adoption is not requested	2020-12-22 18:35:31 -08:00
behzad nouri	a14cfd660a	removes &Arc<Self> receivers (#14234 )	2020-12-22 23:51:53 +00:00
behzad nouri	691031fefd	limits number of crds values returned when responding to pull requests (#13739 ) Crds values buffered when responding to pull-requests can be very large taking a lot of memory. Added a limit for number of buffered crds values based on outbound data budget.	2020-12-18 18:45:12 +00:00
behzad nouri	6a3797e164	adds crds-value for broadcasting duplicate shreds through gossip (#14133 ) In gossip, the header overhead we get from: https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435 https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36 https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73 already exceeds SIZE_OF_NONCE in shreds. We also need aditional meta-data (wallclock, source pubkey, ...). Which means that given the SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE: https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80 On top of that, we need 2 shred payloads as the proof of duplicate. So each DuplicateShred crds value includes only a chunk of the payload, along with the meta-data to reconstruct the full payload from the chunks on the receiving end.	2020-12-18 14:32:43 +00:00
behzad nouri	d6d76219b6	caches staked nodes computed from vote-accounts (#13929 )	2020-12-17 21:22:50 +00:00
Michael Vines	7143aaa89b	Clippy	2020-12-14 08:03:29 -08:00
behzad nouri	409fe3bca1	adds the instance token to crds-labels for node-instance crds-values (#14037 ) If a node "a" receives instance-info from node "b1" it will override any instance-info associated with "b1" pubkey in its crds table. This makes it less likely that when "b1" receives crds values from "a" (either through pull or push), it sees other instances of itself (because node "a" discarded them when it received "b1" instance info). In order for the crds table to contain all instance-info associated with the same pubkey at the same time, we need to add the instance tokens to the keys in the crds table (i.e. the CrdsValueLabel).	2020-12-10 17:01:55 +00:00
behzad nouri	1d267eae6b	std::process::exit to kill all threads	2020-12-09 10:24:23 -08:00
behzad nouri	895d7d6a65	removes RwLock on ClusterInfo.instance	2020-12-09 10:24:23 -08:00
behzad nouri	542198180a	pushes node-instance along with version early in gossip	2020-12-09 10:24:23 -08:00
behzad nouri	8cd5eb9863	checks for duplicate validator instances using gossip	2020-12-09 10:24:23 -08:00
behzad nouri	6706f2b3bb	removes recursive read-locks on gossip (#13973 ) ClusterInfo::tvu_peers acquires a read-lock on gossip: https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1171-L1185 and so, ClusterInfo::repair_peers is recursively locking gossip for read twice: https://github.com/solana-labs/solana/blob/f0e934145/core/src/cluster_info.rs#L1202-L1223 But std::sync::RwLock is not re-entrant (recursive).	2020-12-06 15:14:49 +00:00
behzad nouri	c3048b451d	samples repair peers using WeightedIndex (#13919 ) To output one random sample, weighted_best generates n random numbers: https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/weighted_shuffle.rs#L38-L63 WeightedIndex does so with only one random number: https://github.com/rust-random/rand/blob/eb02f0e46/src/distributions/weighted_index.rs#L223-L240 Additionally, if the index is already constructed, it only does a total of O(log(n)) amount of work; which can be achieved if RepairCache, caches the weighted index: https://github.com/solana-labs/solana/blob/f751a5d4e/core/src/serve_repair.rs#L83 Also, the repair-peers code can be reorganized to have fewer redundant unlock-then-lock code.	2020-12-03 14:26:07 +00:00
Tyera Eulberg	10c81a2448	Remove rpc_banks from validator (#13882 ) * Remove rpc_banks from validator * Bump abi-digest	2020-12-02 03:25:09 +00:00
behzad nouri	26bf2b7e45	processes pull-request callers only once per unique caller (#13750 ) process_pull_requests acquires a write lock on crds table to update records timestamp for each of the pull-request callers: https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300 However, pull-requests overlap a lot in callers and this function ends up doing a lot of redundant duplicate work. This commit obtains unique callers before acquiring an exclusive lock on crds table.	2020-11-22 17:51:14 +00:00
sakridge	c1eb350c47	Allow contact debug interval to be adjusted (#13737 )	2020-11-20 14:47:37 -08:00
behzad nouri	b58f69297f	makes crds fields private (#13703 ) Crds fields should maintain several invariants between themselves, so exposing them as public fields can be bug prone. In addition these invariants are asserted on every write: https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154 https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262 which adds extra instructions and is not optimal. Should these fields be private the asserts will be redundant.	2020-11-19 20:57:40 +00:00
behzad nouri	1ffab5de77	breaks prunes data into chunks to fit into packets (#13613 ) Validator logs show that prune messages are dropped because they exceed packet data size: https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92 This can exacerbate gossip traffic by redundantly increasing push messages across network. The workaround is to break prunes into smaller chunks and send over in multiple messages.	2020-11-19 16:38:01 +00:00
behzad nouri	5e8490ab9d	packs more crds-values in a single gossip packet (#13500 ) split_gossip_messages: https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L1536-L1574 splits crds-values into chunks to fit into a gossip packet. However it is using a global upper-bound for the header-size across all protocols: https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L90-L93 This can be wasteful as the specific gossip protocol can have smaller header than this upper-bound (e.g. Protocol::PushMessage is 170 bytes smaller). Adding more crds-values in one gossip packet can avoid the overheads of separate packets and reduce total number of bytes sent over the wire. This commit updates the splitting function to take a max-chunk-size argument. At call-site, this value is set to the size of the protocol which the values are sent over.	2020-11-15 18:23:59 +00:00
behzad nouri	cbea9ebc34	indexes nodes' contact infos in crds table (#13553 ) In several places in gossip code, the entire crds table is scanned only to filter out nodes' contact infos. Currently on mainnet, crds table is of size ~70k, while there are only ~470 nodes. So the full table scan is inefficient. Instead we may maintain an index of only nodes' contact infos.	2020-11-15 16:38:04 +00:00
behzad nouri	73ac104df2	propagates errors out of Packet::from_data (#13445 ) Packet::from_data is ignoring serialization errors: https://github.com/solana-labs/solana/blob/d08c3232e/sdk/src/packet.rs#L42-L48 This is likely never useful as the packet will be sent over the wire taking bandwidth but at the receiving end will either fail to deserialize or it will be invalid. This commit will propagate the errors out of the function to the call-site, allowing the call-site to handle the error.	2020-11-08 15:10:03 +00:00
behzad nouri	7f4debdad5	drops older gossip packets when load shedding (#13364 ) Gossip drops incoming packets when overloaded: https://github.com/solana-labs/solana/blob/f6a73098a/core/src/cluster_info.rs#L2462-L2475 However newer packets are dropped in favor of the older ones. This is probably not ideal as newer packets are more likely to contain more recent data, so dropping them will keep the validator state lagging.	2020-11-05 17:14:28 +00:00
behzad nouri	8f0796436a	shares the lock on gossip when processing prune messages (#13339 ) Processing prune messages acquires an exclusive lock on gossip: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L1824-L1825 This can be reduced to a shared lock if active-sets are changed to use atomic bloom filters: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/crds_gossip_push.rs#L50	2020-11-05 15:42:00 +00:00
behzad nouri	118ce47b97	measures processing time of each kind of gossip packets (#13366 )	2020-11-05 15:34:34 +00:00
behzad nouri	10fa4f45ab	uses thread-pool when handling push messages (#13338 ) From runtime profiles, the majority time of solana-listen thread: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2720 is spent handling push messages. The code here: https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L2272-L2364 may utilize the idle gossip thread-pool.	2020-11-04 19:15:58 +00:00
Michael Vines	df8dab9d2b	Native/builtin programs now receive an InvokeContext	2020-10-29 21:45:24 -07:00
behzad nouri	3738611f5c	adds more parallel processing to gossip packets handling (#12988 )	2020-10-29 15:17:19 +00:00
behzad nouri	ae91270961	implements ping-pong packets between nodes (#12794 ) https://hackerone.com/reports/991106 > It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker > can spoof IP address in UDP packet when sending PullRequest to the node. > There's no any validation if provided source IP address is not spoofed and > the node can send much larger PullResponse to victim's IP. As I checked, > PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means > that amplification is about 34x. This way an attacker can easily perform DDoS > attack both on Solana node and third-party server. > > To prevent it, need for example to implement ping-pong mechanism similar as > in Ethereum: Before accepting requests from remote client needs to validate > his IP. Local node sends Ping packet to the remote node and it needs to reply > with Pong packet that contains hash of matching Ping packet. Content of Ping > packet is unpredictable. If hash from Pong packet matches, local node can > remember IP where Ping packet was sent as correct and allow further > communication. > > More info: > https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof > https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol The commit adds a PingCache, which maintains records of remote nodes which have returned a valid response to a ping message, and on-the-fly ping messages pending a pong response from the remote node. When handling pull-requests, those from addresses which have not passed the ping-pong check are filtered out, and additionally ping packets are added for addresses which need to be (re)verified.	2020-10-28 17:03:02 +00:00
behzad nouri	4bfda3e766	marks pull request creation time only once per peer (#13113 ) mark_pull_request_creation time requires an exclusive lock on gossip: https://github.com/solana-labs/solana/blob/16944e218/core/src/cluster_info.rs#L1547-L1548 Current code is redundantly marking each peer once for each request. There are at most only 2 unique peers, whereas there are hundreds of requests per each. So the lock is acquired hundreds of time longer than necessary.	2020-10-26 17:11:31 +00:00
Michael Vines	a4956844bd	Update frozen_abi hashes The movement of files in sdk/ caused ABI hashes to change	2020-10-24 08:37:55 -07:00
behzad nouri	37c8842bcb	scans crds table in parallel for finding old labels (#13073 ) From runtime profiles, the majority time of ClusterInfo::handle_purge https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626 is spent scanning crds table finding old labels: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197 This can be done in parallel given that gossip thread-pool: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641 is idle when handle_purge is invoked: https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681	2020-10-23 14:17:37 +00:00
Justin Starry	c95f6c4b83	Remove spammy invalid rpc log (#13100 )	2020-10-23 07:05:29 +00:00
Justin Starry	8b0242a5d8	Allow nodes to advertise a different rpc address over gossip (#13053 ) * Allow nodes to advertise a different rpc address over gossip * Feedback	2020-10-22 03:31:48 +00:00
Michael Vines	959880db60	Remove unused pubkey::Pubkey imports	2020-10-21 19:08:13 -07:00
Michael Vines	7bc073defe	Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand`	2020-10-21 19:08:13 -07:00
behzad nouri	75d62ca095	improves threads' utilization in processing gossip packets (#12962 ) ClusterInfo::process_packets handles incoming packets in a thread_pool: https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134 However, profiling runtime shows that threads are not well utilized and a lot of the processing is done sequentially. This commit redistributes the work done in parallel. Testing on a gce cluster shows 20%+ improvement in processing gossip packets with much smaller variations.	2020-10-19 19:03:38 +00:00

1 2 3 4 5 ...

281 Commits