Commit Graph

37 Commits

Author SHA1 Message Date
behzad nouri b58f69297f
makes crds fields private (#13703)
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
behzad nouri 8f0796436a
shares the lock on gossip when processing prune messages (#13339)
Processing prune messages acquires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L1824-L1825
This can be reduced to a shared lock if active-sets are changed to use
atomic bloom filters:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/crds_gossip_push.rs#L50
2020-11-05 15:42:00 +00:00
behzad nouri ae91270961
implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
behzad nouri 37c8842bcb
scans crds table in parallel for finding old labels (#13073)
From runtime profiles, the majority time of ClusterInfo::handle_purge
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626
is spent scanning crds table finding old labels:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197

This can be done in parallel given that gossip thread-pool:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641
is idle when handle_purge is invoked:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681
2020-10-23 14:17:37 +00:00
Michael Vines 7bc073defe Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand` 2020-10-21 19:08:13 -07:00
sakridge 1f1eb9f26e
Add separate push queue to reduce push lock contention (#12713) 2020-10-13 18:10:25 -07:00
behzad nouri a5c6a78f6d
filters out inactive nodes from push options (#12674)
* filters out inactive nodes from push options

https://github.com/solana-labs/solana/pull/12620
patched the DDOS issue with nodes which go offline:
https://github.com/solana-labs/solana/issues/12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip
2020-10-06 13:48:32 +00:00
behzad nouri 2c669f65f1
limits number of threads in core/tests/crds_gossip.rs (#12615)
crds_gossip tests start large networks, which with large thread-pools
will exhaust system resources, causing failures in ci tests:
https://buildkite.com/solana-labs/solana/builds/31953

The commit limits size of thread-pools in the test.
2020-10-02 16:55:44 +00:00
behzad nouri 1866521df6
retains hash value of outdated responses received from pull requests (#12513)
pull_response_fail_inserts has been increasing:
https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png
but for outdated values which fail to insert:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108
are not recorded anywhere, and so the next pull request may obtain the
same redundant payload again, unnecessary taking bandwidth.

This commit holds on to the hashes of failed-inserts for a while, similar
to purged_values:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380
and filter them out for the next pull request:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204
2020-10-01 00:39:22 +00:00
behzad nouri 537bbde22e
builds crds filters in parallel (#12360)
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
2020-09-29 23:06:02 +00:00
Michael Vines daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
anatoly yakovenko 713851b68d
filter out old gossip pull requests (#11448)
* init

* builds

* stats

* revert

* tests

* clippy

* add some jitter

* shorter jitter timer

* update

* fixup! update

* use saturating_sub

* fix filters
2020-08-11 06:26:42 -07:00
sakridge ecb6959720
Optimize process pull responses (#10460)
* Batch process pull responses

* Generate pull requests at 1/2 rate

* Do filtering work of process_pull_response in read lock

Only take write lock to insert if needed.
2020-06-09 17:08:13 -07:00
anatoly yakovenko 832d324a23
Revert "Gossip PullRequests tend to return a lot of duplicates. (#10326)" (#10455)
This reverts commit 31e20eff82.
2020-06-09 07:27:00 -07:00
anatoly yakovenko 31e20eff82
Gossip PullRequests tend to return a lot of duplicates. (#10326)
* filter messages that are likely to be pushed from the response

* tests

* tests

* wait to start filtering responses, and push stats to influx

* wait to start filtering responses, and push stats to influx

* reduce the timers to match the publish self timeout

* fmt

* fmt
2020-06-05 08:01:45 -07:00
sakridge 3f508b37fd
Skip gossip requests with different shred version and split lock (#10240) 2020-05-28 11:38:13 -07:00
sakridge 7ebd8ee531
Cluster info metrics (#10215) 2020-05-25 15:03:34 -07:00
Kristofer Peterson 58ef02f02b
9951 clippy errors in the test suite (#10030)
automerge
2020-05-15 09:35:43 -07:00
Sagar Dhawan fa00803fbf
Filter old CrdsValues received via Pull Responses in Gossip (#8150)
* Add CrdsValue timeout checks on Pull Responses

* Allow older values to enter Crds as long as a ContactInfo exists

* Allow staked contact infos to be inserted into crds if they haven't expired

* Try and handle oveflows

* Fix test

* Some comments

* Fix compile

* fix test deadlock

* Add a test for processing timed out values received via pull response
2020-02-07 12:38:24 -08:00
anatoly yakovenko b150da837a Use epoch as the gossip purge timeout for staked nodes. (#7005)
automerge
2019-11-20 11:25:18 -08:00
Sagar Dhawan f108f483b7
Remove Blobs and switch to Packets (#6937)
* Remove Blobs and switch to Packets

* Fix some gossip messages not respecting MTU size

* Failure to serialize is not fatal

* Add log macros

* Remove unused extern

* Apparently macro use is required

* Explicitly scope macro

* Fix test compile
2019-11-14 10:24:53 -08:00
Sagar Dhawan 568475e2db
Fix incorrectly signed CrdsValues (#6696) 2019-11-03 10:07:51 -08:00
Michael Vines 3450b9a44d
Rename solana to solana-core (#5583) 2019-08-21 10:23:33 -07:00
Sagar Dhawan 4ee212ae4c
Coalesce gossip pull requests and serve them in batches (#5501)
* Coalesce gossip pull requests and serve them in batches

* batch all filters and immediately respond to messages in gossip

* Fix tests

* make download_from_replicator perform a greedy recv
2019-08-15 17:04:45 -07:00
Sagar Dhawan 1d0608200c
Restore blob size fix (#5516)
* Revert "Revert "Fix gossip messages growing beyond blob size  (#5460)" (#5512)"

This reverts commit 97d57d168b.

* Fix Crds filters
2019-08-13 18:04:14 -07:00
Sagar Dhawan 97d57d168b Revert "Fix gossip messages growing beyond blob size (#5460)" (#5512)
This reverts commit a8eb0409b7.
2019-08-13 10:29:26 -07:00
Sagar Dhawan a8eb0409b7
Fix gossip messages growing beyond blob size (#5460)
* fixed bloom filter math

* Add split each pull request into multiple pulls with different filters

* Rework CrdsFilter to generate all possible masks to cover the keyspace

* Limit the bloom sizes such that each pull request is no larger than mtu
2019-08-12 13:51:29 -07:00
Pankaj Garg b05b42d74d
Reduce max blob size (#5345)
* Reduce max blob size

* ignore test_star_network_push_rstar_200
2019-07-30 22:15:07 -07:00
Justin Starry bf5bce50a4
Fix stake pruning test (#5124) 2019-07-16 13:20:03 -04:00
Justin Starry 861d6468ca Stake weighted pruning for the gossip network (#4769)
* Stake weighted pruning

* Fix compile error

* Fix clippy errors

* Add helper for creating a connected staked network

* Bug fixes and test groundwork

* Small refactor

* Anatoly's feedback and tests

* Doc updates

* @rob-solana's feedback

* Fix test bug and add log trace

* @rob-solana's feedback
2019-06-26 00:30:16 -07:00
Sathish 182096dc1a
Create bank snapshots (#4244)
* Revert "Revert "Create bank snapshots (#3671)" (#4243)"

This reverts commit 81fa69d347.

* keep saved and unsaved copies of status cache

* fix format check

* bench for status cache serialize

* misc cleanup

* remove appendvec storage on purge

* fix accounts restore

* cleanup

* Pass snapshot path as args

* Fix clippy
2019-05-30 21:31:35 -07:00
Sagar Dhawan 335dfdc4d5
Fix Gossip skipping push for some values (#4463)
* Make gossip skip over values from Pruned nodes

* Add test and init blooms to contain the origin
2019-05-28 18:39:40 -07:00
Rob Walker 81fa69d347
Revert "Create bank snapshots (#3671)" (#4243)
This reverts commit abf2b300da.
2019-05-09 19:27:27 -07:00
Sathish abf2b300da Create bank snapshots (#3671)
* Be able to create bank snapshots

* fix clippy

* load snapshot on start

* regenerate account index from the storage

* Remove rc feature dependency

* cleanup

* save snapshot for slot 0
2019-05-09 19:27:06 -07:00
carllin b8fd51e97d
Add new gossip structure for supporting repairs (#4205)
* Add Epoch Slots to gossip

* Add new gossip structure to support Repair

* remove unnecessary clones

* Setup dummy fast repair in repair_service

* PR comments
2019-05-08 13:50:32 -07:00
Greg Fitzgerald fcef54d062 Add a constructor to generate random pubkeys 2019-03-31 16:23:18 -06:00
Rob Walker c70412d7bb
move core tests to core (#3355)
* move core tests to core

* remove window

* fix up flaky tests

* test_entryfication needs a singly-threaded banking_stage

* move core benches to core

* remove unnecessary dependencies

* remove core as a member for now, test it like runtime

* stop running tests twice

* remove duplicate runs of tests in perf
2019-03-18 22:08:21 -07:00