Commit Graph

241 Commits

Author SHA1 Message Date
behzad nouri 3738611f5c
adds more parallel processing to gossip packets handling (#12988) 2020-10-29 15:17:19 +00:00
behzad nouri ae91270961
implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
behzad nouri 4bfda3e766
marks pull request creation time only once per peer (#13113)
mark_pull_request_creation time requires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/16944e218/core/src/cluster_info.rs#L1547-L1548
Current code is redundantly marking each peer once for each request.
There are at most only 2 unique peers, whereas there are hundreds of
requests per each. So the lock is acquired hundreds of time longer than
necessary.
2020-10-26 17:11:31 +00:00
Michael Vines a4956844bd Update frozen_abi hashes
The movement of files in sdk/ caused ABI hashes to change
2020-10-24 08:37:55 -07:00
behzad nouri 37c8842bcb
scans crds table in parallel for finding old labels (#13073)
From runtime profiles, the majority time of ClusterInfo::handle_purge
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626
is spent scanning crds table finding old labels:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197

This can be done in parallel given that gossip thread-pool:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641
is idle when handle_purge is invoked:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681
2020-10-23 14:17:37 +00:00
Justin Starry c95f6c4b83
Remove spammy invalid rpc log (#13100) 2020-10-23 07:05:29 +00:00
Justin Starry 8b0242a5d8
Allow nodes to advertise a different rpc address over gossip (#13053)
* Allow nodes to advertise a different rpc address over gossip

* Feedback
2020-10-22 03:31:48 +00:00
Michael Vines 959880db60 Remove unused pubkey::Pubkey imports 2020-10-21 19:08:13 -07:00
Michael Vines 7bc073defe Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand` 2020-10-21 19:08:13 -07:00
behzad nouri 75d62ca095
improves threads' utilization in processing gossip packets (#12962)
ClusterInfo::process_packets handles incoming packets in a thread_pool:
https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134

However, profiling runtime shows that threads are not well utilized and
a lot of the processing is done sequentially.

This commit redistributes the work done in parallel. Testing on a gce
cluster shows 20%+ improvement in processing gossip packets with much
smaller variations.
2020-10-19 19:03:38 +00:00
behzad nouri 48283161c3
passes through feature-set to gossip requests handling (#12878)
* passes through feature-set to down to gossip requests handling
* takes the feature-set from root_bank instead of working_bank
2020-10-15 20:54:21 +00:00
behzad nouri 05cf15a382
implements DataBudget using atomics (#12856) 2020-10-15 11:33:58 +00:00
sakridge 1f1eb9f26e
Add separate push queue to reduce push lock contention (#12713) 2020-10-13 18:10:25 -07:00
behzad nouri 1866521df6
retains hash value of outdated responses received from pull requests (#12513)
pull_response_fail_inserts has been increasing:
https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png
but for outdated values which fail to insert:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108
are not recorded anywhere, and so the next pull request may obtain the
same redundant payload again, unnecessary taking bandwidth.

This commit holds on to the hashes of failed-inserts for a while, similar
to purged_values:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380
and filter them out for the next pull request:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204
2020-10-01 00:39:22 +00:00
behzad nouri 537bbde22e
builds crds filters in parallel (#12360)
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
2020-09-29 23:06:02 +00:00
behzad nouri 0d5258b6d3
separates out ClusterInfo::{gossip,listen} thread-pools (#12535)
https://github.com/solana-labs/solana/pull/12402
moved gossip-work threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/cluster_info.rs#L2330-L2334
to ClusterInfo::new as a new field in the ClusterInfo struct:
https://github.com/solana-labs/solana/blob/35208c5ee/core/src/cluster_info.rs#L249
So that they can be shared between listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67

However, in testing https://github.com/solana-labs/solana/pull/12360
it turned out this will cause breakage:
https://buildkite.com/solana-labs/solana/builds/31646
https://buildkite.com/solana-labs/solana/builds/31651
https://buildkite.com/solana-labs/solana/builds/31655
Whereas with separate thread pools all is good. It might be the case
that one thread is slowing down the other by exhausting the thread-pool
whereas with separate thread-pools we get fair scheduling guarantees
from the os.

This commit reverts https://github.com/solana-labs/solana/pull/12402
and instead adds separate thread-pools for listen and gossip threads:
https://github.com/solana-labs/solana/blob/afd9bfc45/core/src/gossip_service.rs#L54-L67
2020-09-29 09:05:31 +00:00
Michael Vines 35f5f9fc7b Add feature set identifier to gossiped version information 2020-09-25 11:40:36 -07:00
behzad nouri 42f1ef8acb
moves gossip-work thread pool cons to ClusterInfo::new (#12402) 2020-09-24 18:36:31 +00:00
Michael Vines daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
carllin 0f0a2ddafe
Filter push/pulls from spies (#11620)
* Filter push/pulls from spies

* Don't pull from peers with shred version == 0, don't push to people with shred_version == 0

Co-authored-by: Carl <carl@solana.com>
2020-08-18 18:52:45 -07:00
Michael Vines d15173ad9d Address latest nightly clippy lints, but globally disable stable_sort_primitive 2020-08-17 22:36:10 -07:00
sakridge 54137e3446
Add incoming pull response counter (#11591) 2020-08-12 14:07:05 -07:00
carllin 1b238dd63e
Gossip log (#11555)
Co-authored-by: Carl <carl@solana.com>
2020-08-11 21:03:54 +00:00
anatoly yakovenko 713851b68d
filter out old gossip pull requests (#11448)
* init

* builds

* stats

* revert

* tests

* clippy

* add some jitter

* shorter jitter timer

* update

* fixup! update

* use saturating_sub

* fix filters
2020-08-11 06:26:42 -07:00
Greg Fitzgerald bad486823c
Add a client for BankForks (#10728)
Also:
* Use BanksClient in solana-tokens
2020-08-07 08:45:17 -06:00
Michael Vines eefcf484cb clippy 2020-08-03 18:35:15 +00:00
Ryo Onodera 39b3ac6a8d
Introduce automatic ABI maintenance mechanism (2/2; rollout) (#8012)
* Introduce automatic ABI maintenance mechanism (2/2; rollout)

* Fix stable clippy

* Change to symlink

* Freeze abi of Tower

* fmt...

* Improve dev-experience!

* Update BankSlotDelta

$ diff -u /tmp/abi8/*7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj* /tmp/abi8/*9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w*
--- /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj      2020-06-18 18:01:22.831228087 +0900
+++ /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w      2020-07-03 15:59:58.430695244 +0900
@@ -140,7 +140,7 @@
                                                         field u8
                                                             primitive u8
                                                         field solana_sdk::instruction::InstructionError
-                                                            enum InstructionError (variants = 34)
+                                                            enum InstructionError (variants = 35)
                                                                 variant(0) GenericError (unit)
                                                                 variant(1) InvalidArgument (unit)
                                                                 variant(2) InvalidInstructionData (unit)
@@ -176,6 +176,7 @@
                                                                 variant(31) CallDepth (unit)
                                                                 variant(32) MissingAccount (unit)
                                                                 variant(33) ReentrancyNotAllowed (unit)
+                                                                variant(34) MaxSeedLengthExceeded (unit)
                                                     variant(9) CallChainTooDeep (unit)
                                                     variant(10) MissingSignatureForFee (unit)
                                                     variant(11) InvalidAccountIndex (unit)

* Fix some merge conflicts...
2020-07-06 20:22:23 +09:00
sakridge d9b389f510
Reduce logging lines (#10835) 2020-06-29 15:57:28 -07:00
Greg Fitzgerald 6ee222363e
Move BankForks to solana_runtime (#10637)
* Move BankForks to solana_runtime

* Update imports
2020-06-17 15:27:03 +00:00
sakridge 1eca9b19ab
Entry verify cleanup and gossip counters (#10632)
* Add prune message counter

* Switch to us verification time to match other counters

* Add separate transaction/poh verify timing
2020-06-16 14:00:29 -07:00
anatoly yakovenko ba83e4ca50
Fix fannout gossip bench (#10509)
* Gossip benchmark

* Rayon tweaking

* push pulls

* fanout to max nodes

* fixup! fanout to max nodes

* fixup! fixup! fanout to max nodes

* update

* multi vote test

* fixup prune

* fast propagation

* fixups

* compute up to 95%

* test for specific tx

* stats

* stats

* fixed tests

* rename

* track a lagging view of which nodes have the local node in their active set in the local received_cache

* test fixups

* dups are old now

* dont prune your own origin

* send vote to tpu

* tests

* fixed tests

* fixed test

* update

* ignore scale

* lint

* fixup

* fixup

* fixup

* cleanup

Co-authored-by: Stephen Akridge <sakridge@gmail.com>
2020-06-13 22:03:38 -07:00
sakridge 4c140acb3b
ClusterInfo cleanup (#10504)
automerge
2020-06-10 17:00:17 -07:00
sakridge 6eb5ef6ac7
Add back missing pull_response success counter (#10491) 2020-06-10 09:17:57 -07:00
sakridge ecb6959720
Optimize process pull responses (#10460)
* Batch process pull responses

* Generate pull requests at 1/2 rate

* Do filtering work of process_pull_response in read lock

Only take write lock to insert if needed.
2020-06-09 17:08:13 -07:00
anatoly yakovenko 832d324a23
Revert "Gossip PullRequests tend to return a lot of duplicates. (#10326)" (#10455)
This reverts commit 31e20eff82.
2020-06-09 07:27:00 -07:00
Kristofer Peterson e23340d89e
Clippy cleanup for all targets and nighly rust (also support 1.44.0) (#10445)
* address warnings from 'rustup run beta cargo clippy --workspace'

minor refactoring in:
- cli/src/cli.rs
- cli/src/offline/blockhash_query.rs
- logger/src/lib.rs
- runtime/src/accounts_db.rs

expect some performance improvement AccountsDB::clean_accounts()

* address warnings from 'rustup run beta cargo clippy --workspace --tests'

* address warnings from 'rustup run nightly cargo clippy --workspace --all-targets'

* rustfmt

* fix warning stragglers

* properly fix clippy warnings test_vote_subscribe()
replace ref-to-arc with ref parameters where arc not cloned

* Remove lock around JsonRpcRequestProcessor (#10417)

automerge

* make ancestors parameter optional to avoid forcing construction of empty hash maps

Co-authored-by: Greg Fitzgerald <greg@solana.com>
2020-06-09 09:38:14 +09:00
Greg Fitzgerald af8c21c559
Remove lock around JsonRpcRequestProcessor (#10417)
automerge
2020-06-07 20:54:03 -07:00
sakridge 0645a0c96d
Gossip cleanup remove duplicate gossip metrics and name worker threads (#10435)
Refactor into functions
2020-06-06 15:05:45 -07:00
sakridge 3d2230f1a9
Add pull request count metrics (#10421) 2020-06-05 09:36:31 -07:00
anatoly yakovenko 31e20eff82
Gossip PullRequests tend to return a lot of duplicates. (#10326)
* filter messages that are likely to be pushed from the response

* tests

* tests

* wait to start filtering responses, and push stats to influx

* wait to start filtering responses, and push stats to influx

* reduce the timers to match the publish self timeout

* fmt

* fmt
2020-06-05 08:01:45 -07:00
sakridge ef37b82ffa
More cluster stats and add epoch stakes cache in retransmit stage (#10345)
* More cluster info metrics for push request/response counts

* Cache staked peers for the epoch
2020-06-01 08:37:54 -07:00
Kristofer Peterson fb4d8e1f62
cleanup clippy tests (#10172)
automerge
2020-05-29 00:26:06 -07:00
sakridge 3f508b37fd
Skip gossip requests with different shred version and split lock (#10240) 2020-05-28 11:38:13 -07:00
sakridge 7ebd8ee531
Cluster info metrics (#10215) 2020-05-25 15:03:34 -07:00
carllin 3aae98c8be
Add switching vote instruction (#10197)
* Add switching vote

* Make sure vote size stays under gossip limit

Co-authored-by: Carl <carl@solana.com>
2020-05-24 15:38:35 -07:00
Kristofer Peterson 58ef02f02b
9951 clippy errors in the test suite (#10030)
automerge
2020-05-15 09:35:43 -07:00
Jack May eb1acaf927
Remove archiver and storage program (#9992)
automerge
2020-05-14 18:22:47 -07:00
Michael Vines 4e4a21f9b7
`solana-gossip spy` can now specify a shred version (#10040) 2020-05-13 19:37:40 -07:00
Michael Vines 2521f75c18
Advertise node software version in gossip (#9981)
* Advertise node version in gossip

* Remove solana_clap_utils::version! macro
2020-05-11 15:02:01 -07:00
Michael Vines 16ddd001f6
Gossip no longer pushes/pulls from nodes with a different shred version (#9868) 2020-05-05 20:15:19 -07:00