Commit Graph

49 Commits

Author SHA1 Message Date
behzad nouri e867d7f3b8
removes Crds::lookup and lookup_versioned (#17438) 2021-05-24 18:21:54 +00:00
behzad nouri 9d112cf41f
encapsulates purged values bookkeeping into crds module (#17265)
For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491

Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.

This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.
2021-05-24 13:47:21 +00:00
behzad nouri 2adce67260
extends crds values timeouts if stakes are unknown (#17261)
If stakes are unknown, then timeouts will be short, resulting in values
being purged from the crds table, and consequently higher pull-response
load when they are obtained again from gossip. In particular, this slows
down validator start where almost all values obtained from entrypoint
are immediately discarded.
2021-05-21 15:55:22 +00:00
behzad nouri 0e646d10bb
prunes received-cache only once per unique owner's key (#17039) 2021-05-13 13:50:16 +00:00
behzad nouri 22c02b917e reads gossip push messages off crds ordinal index
Having an ordinal index on crds values based on insert order allows to
efficiently filter values using a cursor. In particular
CrdsGossipPush::push_messages hash-map can be replaced with a cursor,
saving on the bookkeepings, purging, etc
2021-05-09 22:40:41 +00:00
behzad nouri 7cea2c4466 validates gossip addresses before sending pull-requests
IP addresses need to be validated before sending packets to them.
This commit, sends a ping packet to nodes before any pull requests.
Pull requests are then only sent to the nodes which have responded with
the correct hash of their respective ping packet.
2021-05-03 18:21:06 +00:00
behzad nouri 25054bfd35
retains peer's contact-info when making pull requests (#16715)
ClusterInfo::new_pull_requests has to lookup contact-infos:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/cluster_info.rs#L1663-L1673

when it was already available when making pull requests:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/crds_gossip_pull.rs#L232
2021-04-28 13:19:12 +00:00
behzad nouri f2865dfd63
requires stakes for propagating crds values through gossip (#15561) 2021-03-12 15:50:14 +00:00
Trent Nelson 7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00
Tom Parker-Shemilt 01230a0105
Remove serial_test_derive dependency (#14891) 2021-01-28 22:35:31 -07:00
behzad nouri 691031fefd
limits number of crds values returned when responding to pull requests (#13739)
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
2020-12-18 18:45:12 +00:00
Michael Vines 7143aaa89b Clippy 2020-12-14 08:03:29 -08:00
behzad nouri b58f69297f
makes crds fields private (#13703)
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
behzad nouri 8f0796436a
shares the lock on gossip when processing prune messages (#13339)
Processing prune messages acquires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/cluster_info.rs#L1824-L1825
This can be reduced to a shared lock if active-sets are changed to use
atomic bloom filters:
https://github.com/solana-labs/solana/blob/55b0428ff/core/src/crds_gossip_push.rs#L50
2020-11-05 15:42:00 +00:00
behzad nouri ae91270961
implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
behzad nouri 37c8842bcb
scans crds table in parallel for finding old labels (#13073)
From runtime profiles, the majority time of ClusterInfo::handle_purge
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626
is spent scanning crds table finding old labels:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197

This can be done in parallel given that gossip thread-pool:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641
is idle when handle_purge is invoked:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681
2020-10-23 14:17:37 +00:00
Michael Vines 7bc073defe Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand` 2020-10-21 19:08:13 -07:00
sakridge 1f1eb9f26e
Add separate push queue to reduce push lock contention (#12713) 2020-10-13 18:10:25 -07:00
behzad nouri a5c6a78f6d
filters out inactive nodes from push options (#12674)
* filters out inactive nodes from push options

https://github.com/solana-labs/solana/pull/12620
patched the DDOS issue with nodes which go offline:
https://github.com/solana-labs/solana/issues/12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip
2020-10-06 13:48:32 +00:00
behzad nouri 2c669f65f1
limits number of threads in core/tests/crds_gossip.rs (#12615)
crds_gossip tests start large networks, which with large thread-pools
will exhaust system resources, causing failures in ci tests:
https://buildkite.com/solana-labs/solana/builds/31953

The commit limits size of thread-pools in the test.
2020-10-02 16:55:44 +00:00
behzad nouri 1866521df6
retains hash value of outdated responses received from pull requests (#12513)
pull_response_fail_inserts has been increasing:
https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png
but for outdated values which fail to insert:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108
are not recorded anywhere, and so the next pull request may obtain the
same redundant payload again, unnecessary taking bandwidth.

This commit holds on to the hashes of failed-inserts for a while, similar
to purged_values:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380
and filter them out for the next pull request:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204
2020-10-01 00:39:22 +00:00
behzad nouri 537bbde22e
builds crds filters in parallel (#12360)
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
2020-09-29 23:06:02 +00:00
Michael Vines daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
anatoly yakovenko 713851b68d
filter out old gossip pull requests (#11448)
* init

* builds

* stats

* revert

* tests

* clippy

* add some jitter

* shorter jitter timer

* update

* fixup! update

* use saturating_sub

* fix filters
2020-08-11 06:26:42 -07:00
sakridge ecb6959720
Optimize process pull responses (#10460)
* Batch process pull responses

* Generate pull requests at 1/2 rate

* Do filtering work of process_pull_response in read lock

Only take write lock to insert if needed.
2020-06-09 17:08:13 -07:00
anatoly yakovenko 832d324a23
Revert "Gossip PullRequests tend to return a lot of duplicates. (#10326)" (#10455)
This reverts commit 31e20eff82.
2020-06-09 07:27:00 -07:00
anatoly yakovenko 31e20eff82
Gossip PullRequests tend to return a lot of duplicates. (#10326)
* filter messages that are likely to be pushed from the response

* tests

* tests

* wait to start filtering responses, and push stats to influx

* wait to start filtering responses, and push stats to influx

* reduce the timers to match the publish self timeout

* fmt

* fmt
2020-06-05 08:01:45 -07:00
sakridge 3f508b37fd
Skip gossip requests with different shred version and split lock (#10240) 2020-05-28 11:38:13 -07:00
sakridge 7ebd8ee531
Cluster info metrics (#10215) 2020-05-25 15:03:34 -07:00
Kristofer Peterson 58ef02f02b
9951 clippy errors in the test suite (#10030)
automerge
2020-05-15 09:35:43 -07:00
Sagar Dhawan fa00803fbf
Filter old CrdsValues received via Pull Responses in Gossip (#8150)
* Add CrdsValue timeout checks on Pull Responses

* Allow older values to enter Crds as long as a ContactInfo exists

* Allow staked contact infos to be inserted into crds if they haven't expired

* Try and handle oveflows

* Fix test

* Some comments

* Fix compile

* fix test deadlock

* Add a test for processing timed out values received via pull response
2020-02-07 12:38:24 -08:00
anatoly yakovenko b150da837a Use epoch as the gossip purge timeout for staked nodes. (#7005)
automerge
2019-11-20 11:25:18 -08:00
Sagar Dhawan f108f483b7
Remove Blobs and switch to Packets (#6937)
* Remove Blobs and switch to Packets

* Fix some gossip messages not respecting MTU size

* Failure to serialize is not fatal

* Add log macros

* Remove unused extern

* Apparently macro use is required

* Explicitly scope macro

* Fix test compile
2019-11-14 10:24:53 -08:00
Sagar Dhawan 568475e2db
Fix incorrectly signed CrdsValues (#6696) 2019-11-03 10:07:51 -08:00
Michael Vines 3450b9a44d
Rename solana to solana-core (#5583) 2019-08-21 10:23:33 -07:00
Sagar Dhawan 4ee212ae4c
Coalesce gossip pull requests and serve them in batches (#5501)
* Coalesce gossip pull requests and serve them in batches

* batch all filters and immediately respond to messages in gossip

* Fix tests

* make download_from_replicator perform a greedy recv
2019-08-15 17:04:45 -07:00
Sagar Dhawan 1d0608200c
Restore blob size fix (#5516)
* Revert "Revert "Fix gossip messages growing beyond blob size  (#5460)" (#5512)"

This reverts commit 97d57d168b.

* Fix Crds filters
2019-08-13 18:04:14 -07:00
Sagar Dhawan 97d57d168b Revert "Fix gossip messages growing beyond blob size (#5460)" (#5512)
This reverts commit a8eb0409b7.
2019-08-13 10:29:26 -07:00
Sagar Dhawan a8eb0409b7
Fix gossip messages growing beyond blob size (#5460)
* fixed bloom filter math

* Add split each pull request into multiple pulls with different filters

* Rework CrdsFilter to generate all possible masks to cover the keyspace

* Limit the bloom sizes such that each pull request is no larger than mtu
2019-08-12 13:51:29 -07:00
Pankaj Garg b05b42d74d
Reduce max blob size (#5345)
* Reduce max blob size

* ignore test_star_network_push_rstar_200
2019-07-30 22:15:07 -07:00
Justin Starry bf5bce50a4
Fix stake pruning test (#5124) 2019-07-16 13:20:03 -04:00
Justin Starry 861d6468ca Stake weighted pruning for the gossip network (#4769)
* Stake weighted pruning

* Fix compile error

* Fix clippy errors

* Add helper for creating a connected staked network

* Bug fixes and test groundwork

* Small refactor

* Anatoly's feedback and tests

* Doc updates

* @rob-solana's feedback

* Fix test bug and add log trace

* @rob-solana's feedback
2019-06-26 00:30:16 -07:00
Sathish 182096dc1a
Create bank snapshots (#4244)
* Revert "Revert "Create bank snapshots (#3671)" (#4243)"

This reverts commit 81fa69d347.

* keep saved and unsaved copies of status cache

* fix format check

* bench for status cache serialize

* misc cleanup

* remove appendvec storage on purge

* fix accounts restore

* cleanup

* Pass snapshot path as args

* Fix clippy
2019-05-30 21:31:35 -07:00
Sagar Dhawan 335dfdc4d5
Fix Gossip skipping push for some values (#4463)
* Make gossip skip over values from Pruned nodes

* Add test and init blooms to contain the origin
2019-05-28 18:39:40 -07:00
Rob Walker 81fa69d347
Revert "Create bank snapshots (#3671)" (#4243)
This reverts commit abf2b300da.
2019-05-09 19:27:27 -07:00
Sathish abf2b300da Create bank snapshots (#3671)
* Be able to create bank snapshots

* fix clippy

* load snapshot on start

* regenerate account index from the storage

* Remove rc feature dependency

* cleanup

* save snapshot for slot 0
2019-05-09 19:27:06 -07:00
carllin b8fd51e97d
Add new gossip structure for supporting repairs (#4205)
* Add Epoch Slots to gossip

* Add new gossip structure to support Repair

* remove unnecessary clones

* Setup dummy fast repair in repair_service

* PR comments
2019-05-08 13:50:32 -07:00
Greg Fitzgerald fcef54d062 Add a constructor to generate random pubkeys 2019-03-31 16:23:18 -06:00
Rob Walker c70412d7bb
move core tests to core (#3355)
* move core tests to core

* remove window

* fix up flaky tests

* test_entryfication needs a singly-threaded banking_stage

* move core benches to core

* remove unnecessary dependencies

* remove core as a member for now, test it like runtime

* stop running tests twice

* remove duplicate runs of tests in perf
2019-03-18 22:08:21 -07:00