https://hackerone.com/reports/991106
> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
mark_pull_request_creation time requires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/16944e218/core/src/cluster_info.rs#L1547-L1548
Current code is redundantly marking each peer once for each request.
There are at most only 2 unique peers, whereas there are hundreds of
requests per each. So the lock is acquired hundreds of time longer than
necessary.
ClusterInfo::process_packets handles incoming packets in a thread_pool:
https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134
However, profiling runtime shows that threads are not well utilized and
a lot of the processing is done sequentially.
This commit redistributes the work done in parallel. Testing on a gce
cluster shows 20%+ improvement in processing gossip packets with much
smaller variations.
* Follow up to persistent tower
* Ignore for now...
* Hard-code validator identities for easy reasoning
* Add a test for opt. conf violation without tower
* Fix compile with rust < 1.47
* Remove unused method
* More move of assert tweak to the asser pr
* Add comments
* Clean up
* Clean the test addressing various review comments
* Clean up a bit
* introduce store program logs in blockstore / bigtable
* fix test, transaction logs created for successful transactions
* fix test for legacy bincode implementation around log_messages
* only api nodes should record logs
* truncate transaction logs to 100KB
* refactor log truncate for improved coverage
* Include post balance information for rewards
* Add post-balance to stored Reward struct
* Handle extended Reward in bigtable
Co-authored-by: Michael Vines <mvines@gmail.com>
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
Current code only returns values which are expired based on the default
timeout. Example from the added unit test:
- value inserted at time 0
- pubkey specific timeout = 1
- default timeout = 3
Then at now = 2, the value is expired, but the function fails to return
the value because it compares with the default timeout.
* Add service to track the most recent optimistically confirmed bank
* Plumb service into ClusterInfoVoteListener and ReplayStage
* Clean up test
* Use OptimisticallyConfirmedBank in RPC
* Remove superfluous notifications from RpcSubscriptions
* Use crossbeam to avoid mpsc recv_timeout panic
* Review comments
* Remove superfluous last_checked_slots, but pass in OptimisticallyConfirmedBank for complete correctness
* Add blockstore column to store performance sampling data
* introduce timer and write performance metrics to blockstore
* introduce getRecentPerformanceSamples rpc
* only run on rpc nodes enabled with transaction history
* add unit tests for get_recent_performance_samples
* remove RpcResponse from rpc call
* refactor to use Instant::now and elapsed for timer
* switch to root bank and ensure not negative subraction
* Add PerfSamples to purge/compaction
* refactor to use Instant::now and elapsed for timer
* switch to root bank and ensure not negative subraction
* remove duplicate constants
Co-authored-by: Tyera Eulberg <tyera@solana.com>
* Save/restore Tower
* Avoid unwrap()
* Rebase cleanups
* Forcibly pass test
* Correct reconcilation of votes after validator resume
* d b g
* Add more tests
* fsync and fix test
* Add test
* Fix fmt
* Debug
* Fix tests...
* save
* Clarify error message and code cleaning around it
* Move most of code out of tower save hot codepath
* Proper comment for the lack of fsync on tower
* Clean up
* Clean up
* Simpler type alias
* Manage tower-restored ancestor slots without banks
* Add comment
* Extract long code blocks...
* Add comment
* Simplify returned tuple...
* Tweak too aggresive log
* Fix typo...
* Add test
* Update comment
* Improve test to require non-empty stray restored slots
* Measure tower save and dump all tower contents
* Log adjust and add threshold related assertions
* cleanup adjust
* Properly lower stray restored slots priority...
* Rust fmt
* Fix test....
* Clarify comments a bit and add TowerError::TooNew
* Further clean-up arround TowerError
* Truly create ancestors by excluding last vote slot
* Add comment for stray_restored_slots
* Add comment for stray_restored_slots
* Use BTreeSet
* Consider root_slot into post-replay adjustment
* Tweak logging
* Add test for stray_restored_ancestors
* Reorder some code
* Better names for unit tests
* Add frozen_abi to SavedTower
* Fold long lines
* Tweak stray ancestors and too old slot history
* Re-adjust error conditon of too old slot history
* Test normal ancestors is checked before stray ones
* Fix conflict, update tests, adjust behavior a bit
* Fix test
* Address review comments
* Last touch!
* Immediately after creating cleaning pr
* Revert stray slots
* Revert comment...
* Report error as metrics
* Revert not to panic! and ignore unfixable test...
* Normalize lockouts.root_slot more strictly
* Add comments for panic! and more assertions
* Proper initialize root without vote account
* Clarify code and comments based on review feedback
* Fix rebase
* Further simplify based on assured tower root
* Reorder code for more readability
Co-authored-by: Michael Vines <mvines@gmail.com>
filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.
This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.
If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.
Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
* Check bank capitalization
* Simplify and unify capitalization calculation
* Improve and add tests
* Avoid overflow and inhibit automatic restart
* Fix test
* Tweak checked sum for cap. and add tests
* Fix broken build after merge conflicts..
* Rename to ClusterType
* Rename confusing method
* Clarify comment
* Verify cap. in rent and inflation tests
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
* Add blockstore column to cache block times
* Add method to cache block time
* Add service to cache block time
* Update rpc getBlockTime to use new method, and refactor blockstore slightly
* Return block_time with confirmed block, if available
* Add measure and warning to cache-block-time
* Use untagged RpcSignatureResult enum to avoid breaking downstream consumers of current signature subscriptions
* Clean up client duplication
* Clippy
* Add rpc endpoint to return the state of multiple accounts from the same bank
* Add docs
* Review comments: Dedupe account code, default to base64, add max const
* Add get_multiple_accounts to rpc-client
* Update account-decoder to spl-token v2.0
* Update transaction-status to spl-token v2.0
* Update rpc to spl-token v2.0
* Update getTokenSupply to pull from Mint directly
* Fixup to spl-token v2.0.1
* Submit a timestamp for every vote
* Submit at most one vote timestamp per second
* Submit a timestamp for every new vote
Co-authored-by: Tyera Eulberg <tyera@solana.com>
* Filter push/pulls from spies
* Don't pull from peers with shred version == 0, don't push to people with shred_version == 0
Co-authored-by: Carl <carl@solana.com>
* wip: re-do rent collection check on rent-exempt account
* Let's see how the ci goes
* Restore previous code
* Well, almost all new changes are revertable
* Update doc
* Add test and gating
* Fix tests
* Fix tests, especially avoid to change abi...
* Fix more tests...
* Fix snapshot restore
* Align to _new_ with better uninitialized detection
* Refactor bigtable apis to accept start and end keys
* Make helper fn to deserialize cell data
* Refactor get_confirmed_signatures_for_address to use get_row_data range
* Add until param to get_confirmed_signatures_for_address
* Add until param to blockstore api
* Plumb until through client/cli
* Simplify client params
* Pass pubkey in to account-decoder for sysvars
* Decode sysvar accounts
* Decode config accounts; move validator-info lower
* Decode stake accounts
* Review comments
* Stringify any account lamports and epochs that can be set to u64::MAX
* Remove support for Budget
Also:
* Make "pay" command a deprecated alias for the "transfer" command
* chore: remove budget from web3.js
* Drop Budget depedency from core
Validators no longer ship with builtin Budget
* Withdraw authority no longer implies a custodian
Before this change, if the withdraw authority and custodian had
the same public key, then a withdraw authority signature would
imply a custodian signature and lockup would be not be enforced.
After this change, the client's withdraw instruction must
explictly reference a custodian account in its optional sixth
account argument.
Likewise, the fee-payer no longer implies either a withdraw
authority or custodian.
* Fix test
The test was configuring the stake account with the fee-payer as
the withdraw authority, but then passing in a different key to
the withdraw instruction's withdraw authority parameter. It only
worked because the second transaction was signed by the fee-payer.
* Simplify account-decoder program ids + spl_token helper
* Spl program namespace version
* Add getTokenAccountBalance endpoint
* Remove token program id from getTokenAccountBalance request
* Add getTokenSupply endpoint
* Remove token program id from getTokenSupply request
* Add getTokenAccountsByOwner/Delegate endpoints
* Remove token program id from getTokenAccountsByOwner/Delegate requests
* Named parameter
* Add check in cluster_info_vote_listenere to see if optimstic conf was achieved
Add OptimisticConfirmationVerifier
* More fixes
* Fix merge conflicts
* Remove gossip notificatin
* Add dashboards
* Fix rebase
* Count switch votes as well toward optimistic conf
* rename
Co-authored-by: Carl <carl@solana.com>
* Return root when bank not found
* Apply suggestions from code review
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Plumb replay vote channel for notifying vote listener of replay votes
* Keep gossip only notification for debugging gossip in the future
Co-authored-by: Carl <carl@solana.com>
* Send transaction upon recv
This will allow us to move the channel to the public interface
* Use a channel, not a method, to communicate
* Pipeline the services
* Ignore unused return values
* Fix clippy warning
* Plumb votes into repair service
* Remove refactoring
* Fix tests
* Switch to using RepairWeight for generating repairs
* Revert "Weight repair slots based on vote stake (#10741)"
This reverts commit cabd0a09c3.
* Update logging
Co-authored-by: Carl <carl@solana.com>
* Add RpcFilterType, and implement CompareBytes for getProgramAccounts
* Accept bytes in bs58
* Rename to memcmp
* Add Memcmp optional encoding field
* Add dataSize filter
* Update docs
* Clippy
* Simplify tests that don't need to test account contents; add multiple-filter tests
* Fix comment and make less pub
* Add account-decoder crate and use to decode vote and system (nonce) accounts
* Update docs
* Rename RpcAccount struct
* s/Rpc/Display
* Call it jsonParsed and update docs
* Revert "s/Rpc/Display"
This reverts commit 6e7149f503f560f1e9237981058ff05642bb7db5.
* s/Rpc/Ui
* Add tests
* Ui more things
* Comments
* Remove Blockstore member variable from BlockCommitmentCache
* Hoist is_confirmed_rooted() to its only caller
BlockCommitmentCache no longer depends on Blockstore
* Move BlockCommitmentCache to solana_runtime
* Make Message::new_with_payer the default constructor
* Remove Transaction::new_[un]signed_instructions
These guess the fee-payer instead of stating it explicitly
* Remove unused StakeLockout::lockout
* Revert...
* Really revert to the original behavior...
* Use consistent naming after StakeLockout removal
* Furhter clean up
* Missed type aliases...
* More...
* Even more...