Commit Graph

58 Commits

Author SHA1 Message Date
Michael Vines cbffab7850 Upgrade to Rust v1.49.0 2021-01-23 19:16:36 -08:00
behzad nouri 8e581601d6
patches crds vote-index assignment bug (#14438)
If tower is full, old votes are evicted from the front of the deque:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L367-L373
whereas recent votes if expire are evicted from the back:
https://github.com/solana-labs/solana/blob/2074e407c/programs/vote/src/vote_state/mod.rs#L529-L537

As a result, from a single tower_index scalar, we cannot infer which crds-vote
should be overwritten:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L576

In addition there is an off by one bug in the existing code. tower_index is
bounded by MAX_LOCKOUT_HISTORY - 1:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/consensus.rs#L382
So, it is at most 30, whereas MAX_VOTES is 32:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L29
Which means that this branch is never taken:
https://github.com/solana-labs/solana/blob/2074e407c/core/src/crds_value.rs#L590-L593
so crds table alwasys keeps 29 **oldest** votes by wallclock, and then
only overrides the 30st one each time. (i.e a tally of only two most
recent votes).
2021-01-21 13:08:07 +00:00
behzad nouri 691031fefd
limits number of crds values returned when responding to pull requests (#13739)
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
2020-12-18 18:45:12 +00:00
Michael Vines 7143aaa89b Clippy 2020-12-14 08:03:29 -08:00
behzad nouri a8c29505f0
sanitizes bloom filters to avoid division by zero (#13714)
Pull requests received over the wire can cause a validator to panic
because of division by zero in bloom filters:
https://github.com/solana-labs/solana/blob/af08ba93e/runtime/src/bloom.rs#L86-L88
2020-11-19 23:35:22 +00:00
behzad nouri b58f69297f
makes crds fields private (#13703)
Crds fields should maintain several invariants between themselves, so
exposing them as public fields can be bug prone. In addition these
invariants are asserted on every write:
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L138-L154
https://github.com/solana-labs/solana/blob/9668dd85d/core/src/crds.rs#L239-L262
which adds extra instructions and is not optimal. Should these fields be
private the asserts will be redundant.
2020-11-19 20:57:40 +00:00
behzad nouri cbea9ebc34
indexes nodes' contact infos in crds table (#13553)
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
2020-11-15 16:38:04 +00:00
behzad nouri 4e4e12b384
filters out offline nodes from pull options (#13533)
Inactive nodes are still observing incoming gossip traffic:
https://discord.com/channels/428295358100013066/670512312339398668/776140351291260968
likely because of pull-requests.

Previous related issues and commits:
https://github.com/solana-labs/solana/issues/12409
https://github.com/solana-labs/solana/pull/12620
https://github.com/solana-labs/solana/pull/12674

This commit implements same logic as
https://github.com/solana-labs/solana/pull/12674
to exclude inactive nodes from pull options, with the same periodic
retry logic for offline staked nodes in order to mitigate eclipse
attack.
2020-11-12 16:09:37 +00:00
behzad nouri ae91270961
implements ping-pong packets between nodes (#12794)
https://hackerone.com/reports/991106

> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol

The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.

When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
2020-10-28 17:03:02 +00:00
behzad nouri 37c8842bcb
scans crds table in parallel for finding old labels (#13073)
From runtime profiles, the majority time of ClusterInfo::handle_purge
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1605-L1626
is spent scanning crds table finding old labels:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/crds.rs#L175-L197

This can be done in parallel given that gossip thread-pool:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1637-L1641
is idle when handle_purge is invoked:
https://github.com/solana-labs/solana/blob/0776fa05c/core/src/cluster_info.rs#L1681
2020-10-23 14:17:37 +00:00
Michael Vines 17c391121a Run `codemod --extensions rs Hash::new_rand solana_sdk:#️⃣:new_rand` 2020-10-21 19:08:13 -07:00
Michael Vines 7bc073defe Run `codemod --extensions rs Pubkey::new_rand solana_sdk::pubkey::new_rand` 2020-10-21 19:08:13 -07:00
behzad nouri a5c6a78f6d
filters out inactive nodes from push options (#12674)
* filters out inactive nodes from push options

https://github.com/solana-labs/solana/pull/12620
patched the DDOS issue with nodes which go offline:
https://github.com/solana-labs/solana/issues/12409

However, offline nodes still see (much lesser) traffic spike, likely
because no origins are pruned from their bloom filter in active set:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L276-L286
and so multiple nodes push redundant duplicate messages to them
simultaneously:
https://github.com/solana-labs/solana/blob/aaf3790d8/core/src/crds_gossip_push.rs#L254-L255

This commit will filter out inactive peers from potential push targets
entirely. To mitigate eclipse attacks, staked nodes are retried
periodically.

* uses current timestamp in test/crds_gossip
2020-10-06 13:48:32 +00:00
behzad nouri 1866521df6
retains hash value of outdated responses received from pull requests (#12513)
pull_response_fail_inserts has been increasing:
https://cdn.discordapp.com/attachments/478692221441409024/759096187587657778/pull_response_fail_insert.png
but for outdated values which fail to insert:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L332-L344
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds.rs#L104-L108
are not recorded anywhere, and so the next pull request may obtain the
same redundant payload again, unnecessary taking bandwidth.

This commit holds on to the hashes of failed-inserts for a while, similar
to purged_values:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L380
and filter them out for the next pull request:
https://github.com/solana-labs/solana/blob/a5c3fc14b3/core/src/crds_gossip_pull.rs#L204
2020-10-01 00:39:22 +00:00
behzad nouri 537bbde22e
builds crds filters in parallel (#12360)
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.

This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
2020-09-29 23:06:02 +00:00
behzad nouri 9b866d79fb
shards crds values based on their hash prefix (#12187)
filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.

This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.

If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.

Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
2020-09-17 14:05:16 +00:00
Michael Vines daae638781 Add --gossip-validator argument 2020-09-14 20:18:27 -07:00
behzad nouri d6ec03f13c
patches default impl for crds filter (#12199)
In CrdsFilter.mask all bits after mask_bits are set to 1:
https://github.com/solana-labs/solana/blob/555252f4/core/src/crds_gossip_pull.rs#L65
However the default implementation, sets both mask and mask_bits to zero
which is inconsistent with CrdsFilter::compute_mask for a mask_bits of
zero.

This commit changes the default implementation by setting mask to
`!0u64` (i.e all bits set to one). As a result, for the default crds
filter, `test_mask` will always return true, whereas previously it was
always returning false.
https://github.com/solana-labs/solana/blob/555252f4/core/src/crds_gossip_pull.rs#L85

This is only used in tests and benchmarks, but causes some benchmarks to
be misleading by short circuiting in this line:
https://github.com/solana-labs/solana/blob/555252f4/core/src/crds_gossip_pull.rs#L429
2020-09-13 13:08:25 +00:00
behzad nouri 28f2fa3fd5
uses rust intrinsics to convert hashes to u64 (#12097) 2020-09-09 15:28:17 +00:00
behzad nouri 114c211b66
adds new CrdsFilterSet type for Vec<CrdsFilter> (#12029) 2020-09-04 13:04:47 +00:00
behzad nouri bc7adb97ed
builds crds filters without looping over filters (#11998) 2020-09-03 20:32:23 +00:00
behzad nouri 418b483af6
Fix filter_crds_values output alignment with the inputs (#11734) 2020-08-21 12:32:37 -07:00
carllin 0f0a2ddafe
Filter push/pulls from spies (#11620)
* Filter push/pulls from spies

* Don't pull from peers with shred version == 0, don't push to people with shred_version == 0

Co-authored-by: Carl <carl@solana.com>
2020-08-18 18:52:45 -07:00
sakridge f519fdecc2
generate_pull_response optimization (#11597) 2020-08-12 22:45:19 -07:00
carllin 1b238dd63e
Gossip log (#11555)
Co-authored-by: Carl <carl@solana.com>
2020-08-11 21:03:54 +00:00
Ryo Onodera 2910fd467f
Fix rust fmt (#11537) 2020-08-11 22:53:55 +09:00
anatoly yakovenko 713851b68d
filter out old gossip pull requests (#11448)
* init

* builds

* stats

* revert

* tests

* clippy

* add some jitter

* shorter jitter timer

* update

* fixup! update

* use saturating_sub

* fix filters
2020-08-11 06:26:42 -07:00
Ryo Onodera 39b3ac6a8d
Introduce automatic ABI maintenance mechanism (2/2; rollout) (#8012)
* Introduce automatic ABI maintenance mechanism (2/2; rollout)

* Fix stable clippy

* Change to symlink

* Freeze abi of Tower

* fmt...

* Improve dev-experience!

* Update BankSlotDelta

$ diff -u /tmp/abi8/*7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj* /tmp/abi8/*9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w*
--- /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_7dg6BreYxTuxiVz6aLvk3p2Z7GQk2cJqfGvC9h4FAoSj      2020-06-18 18:01:22.831228087 +0900
+++ /tmp/abi8/bank__BankSlotDelta_frozen_abi__test_abi_digest_9chBcbXVJ4fK7uGgydQzam5aHipaAKFw6V4LDFpjbE4w      2020-07-03 15:59:58.430695244 +0900
@@ -140,7 +140,7 @@
                                                         field u8
                                                             primitive u8
                                                         field solana_sdk::instruction::InstructionError
-                                                            enum InstructionError (variants = 34)
+                                                            enum InstructionError (variants = 35)
                                                                 variant(0) GenericError (unit)
                                                                 variant(1) InvalidArgument (unit)
                                                                 variant(2) InvalidInstructionData (unit)
@@ -176,6 +176,7 @@
                                                                 variant(31) CallDepth (unit)
                                                                 variant(32) MissingAccount (unit)
                                                                 variant(33) ReentrancyNotAllowed (unit)
+                                                                variant(34) MaxSeedLengthExceeded (unit)
                                                     variant(9) CallChainTooDeep (unit)
                                                     variant(10) MissingSignatureForFee (unit)
                                                     variant(11) InvalidAccountIndex (unit)

* Fix some merge conflicts...
2020-07-06 20:22:23 +09:00
anatoly yakovenko ba83e4ca50
Fix fannout gossip bench (#10509)
* Gossip benchmark

* Rayon tweaking

* push pulls

* fanout to max nodes

* fixup! fanout to max nodes

* fixup! fixup! fanout to max nodes

* update

* multi vote test

* fixup prune

* fast propagation

* fixups

* compute up to 95%

* test for specific tx

* stats

* stats

* fixed tests

* rename

* track a lagging view of which nodes have the local node in their active set in the local received_cache

* test fixups

* dups are old now

* dont prune your own origin

* send vote to tpu

* tests

* fixed tests

* fixed test

* update

* ignore scale

* lint

* fixup

* fixup

* fixup

* cleanup

Co-authored-by: Stephen Akridge <sakridge@gmail.com>
2020-06-13 22:03:38 -07:00
sakridge ecb6959720
Optimize process pull responses (#10460)
* Batch process pull responses

* Generate pull requests at 1/2 rate

* Do filtering work of process_pull_response in read lock

Only take write lock to insert if needed.
2020-06-09 17:08:13 -07:00
anatoly yakovenko 832d324a23
Revert "Gossip PullRequests tend to return a lot of duplicates. (#10326)" (#10455)
This reverts commit 31e20eff82.
2020-06-09 07:27:00 -07:00
anatoly yakovenko 31e20eff82
Gossip PullRequests tend to return a lot of duplicates. (#10326)
* filter messages that are likely to be pushed from the response

* tests

* tests

* wait to start filtering responses, and push stats to influx

* wait to start filtering responses, and push stats to influx

* reduce the timers to match the publish self timeout

* fmt

* fmt
2020-06-05 08:01:45 -07:00
sakridge 3f508b37fd
Skip gossip requests with different shred version and split lock (#10240) 2020-05-28 11:38:13 -07:00
sakridge 7ebd8ee531
Cluster info metrics (#10215) 2020-05-25 15:03:34 -07:00
Kristofer Peterson 58ef02f02b
9951 clippy errors in the test suite (#10030)
automerge
2020-05-15 09:35:43 -07:00
Michael Vines 16ddd001f6
Gossip no longer pushes/pulls from nodes with a different shred version (#9868) 2020-05-05 20:15:19 -07:00
Michael Vines c11abf88b7
Clean up `use` to keep rust 1.43.0 from complaining (#9740) 2020-04-27 16:54:11 -07:00
anatoly yakovenko 8ef097bf6f
Input values are not sanitized after they are deserialized, making it far too easy for Leo to earn SOL (#9706)
* sanitize gossip protocol messages
* sanitize transactions
* crds protocol sanitize
2020-04-27 11:06:00 -07:00
sakridge fa20963b93
Revert shred fs (#9712)
* Revert "Untar is called for shred archives that do not exist. (#9565)"

This reverts commit 729cb5eec6.

* Revert "Dont insert shred payload into rocksdb (#9366)"

This reverts commit 5ed39de8c5.
2020-04-24 15:04:23 -07:00
anatoly yakovenko 5ed39de8c5
Dont insert shred payload into rocksdb (#9366)
automerge
2020-04-16 18:20:55 -07:00
carllin 7aa4d401f7
Fix broadcast metrics (#9461)
* Rework broadcast metrics to support multiple threads

* Update dashboards

Co-authored-by: Carl <carl@solana.com>
2020-04-15 15:22:16 -07:00
Sagar Dhawan fa00803fbf
Filter old CrdsValues received via Pull Responses in Gossip (#8150)
* Add CrdsValue timeout checks on Pull Responses

* Allow older values to enter Crds as long as a ContactInfo exists

* Allow staked contact infos to be inserted into crds if they haven't expired

* Try and handle oveflows

* Fix test

* Some comments

* Fix compile

* fix test deadlock

* Add a test for processing timed out values received via pull response
2020-02-07 12:38:24 -08:00
anatoly yakovenko b150da837a Use epoch as the gossip purge timeout for staked nodes. (#7005)
automerge
2019-11-20 11:25:18 -08:00
Sagar Dhawan 568475e2db
Fix incorrectly signed CrdsValues (#6696) 2019-11-03 10:07:51 -08:00
Sagar Dhawan bd193535c9
Cap CrdsFilter sizes such that PullRequest no longer exceeds MTU (#5561) 2019-08-19 18:14:10 -07:00
Sagar Dhawan 4ee212ae4c
Coalesce gossip pull requests and serve them in batches (#5501)
* Coalesce gossip pull requests and serve them in batches

* batch all filters and immediately respond to messages in gossip

* Fix tests

* make download_from_replicator perform a greedy recv
2019-08-15 17:04:45 -07:00
Sagar Dhawan 1d0608200c
Restore blob size fix (#5516)
* Revert "Revert "Fix gossip messages growing beyond blob size  (#5460)" (#5512)"

This reverts commit 97d57d168b.

* Fix Crds filters
2019-08-13 18:04:14 -07:00
Sagar Dhawan 97d57d168b Revert "Fix gossip messages growing beyond blob size (#5460)" (#5512)
This reverts commit a8eb0409b7.
2019-08-13 10:29:26 -07:00
Sagar Dhawan a8eb0409b7
Fix gossip messages growing beyond blob size (#5460)
* fixed bloom filter math

* Add split each pull request into multiple pulls with different filters

* Rework CrdsFilter to generate all possible masks to cover the keyspace

* Limit the bloom sizes such that each pull request is no larger than mtu
2019-08-12 13:51:29 -07:00
Sathish 182096dc1a
Create bank snapshots (#4244)
* Revert "Revert "Create bank snapshots (#3671)" (#4243)"

This reverts commit 81fa69d347.

* keep saved and unsaved copies of status cache

* fix format check

* bench for status cache serialize

* misc cleanup

* remove appendvec storage on purge

* fix accounts restore

* cleanup

* Pass snapshot path as args

* Fix clippy
2019-05-30 21:31:35 -07:00