Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
* Move bank drop to AccountsBackgroundService
* Send to ABS on drop instead, protects against other places banks are dropped
* Fix Abi
* test
Co-authored-by: Carl Lin <carl@solana.com>
* Add TestValidator::new_with_fees constructor, and warning for low bootstrap_validator_lamports
* Add logging to solana-tokens integration test to help catch low bootstrap_validator_lamports in the future
* Reasonable TestValidator mint_lamports
https://hackerone.com/reports/991106
> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
crds_gossip tests start large networks, which with large thread-pools
will exhaust system resources, causing failures in ci tests:
https://buildkite.com/solana-labs/solana/builds/31953
The commit limits size of thread-pools in the test.
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
* Add service to track the most recent optimistically confirmed bank
* Plumb service into ClusterInfoVoteListener and ReplayStage
* Clean up test
* Use OptimisticallyConfirmedBank in RPC
* Remove superfluous notifications from RpcSubscriptions
* Use crossbeam to avoid mpsc recv_timeout panic
* Review comments
* Remove superfluous last_checked_slots, but pass in OptimisticallyConfirmedBank for complete correctness
* Use untagged RpcSignatureResult enum to avoid breaking downstream consumers of current signature subscriptions
* Clean up client duplication
* Clippy
* wip: re-do rent collection check on rent-exempt account
* Let's see how the ci goes
* Restore previous code
* Well, almost all new changes are revertable
* Update doc
* Add test and gating
* Fix tests
* Fix tests, especially avoid to change abi...
* Fix more tests...
* Fix snapshot restore
* Align to _new_ with better uninitialized detection
* Add check in cluster_info_vote_listenere to see if optimstic conf was achieved
Add OptimisticConfirmationVerifier
* More fixes
* Fix merge conflicts
* Remove gossip notificatin
* Add dashboards
* Fix rebase
* Count switch votes as well toward optimistic conf
* rename
Co-authored-by: Carl <carl@solana.com>
* Move bank (de)serialisation logic from bank and snapshot_utils to serde_snapshot.
Add sanity assertions between genesis config and bank fields on deserialisation.
Atomically update atomic bool in quote_for_specialization_detection().
Use same genesis config when restoring snapshots in test cases.
* Tidy up namings and duplicate structs to version
* Apply struct renames to tests
* Update abi hashes
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
* Add RpcFilterType, and implement CompareBytes for getProgramAccounts
* Accept bytes in bs58
* Rename to memcmp
* Add Memcmp optional encoding field
* Add dataSize filter
* Update docs
* Clippy
* Simplify tests that don't need to test account contents; add multiple-filter tests
* Fix comment and make less pub
* Add account-decoder crate and use to decode vote and system (nonce) accounts
* Update docs
* Rename RpcAccount struct
* s/Rpc/Display
* Call it jsonParsed and update docs
* Revert "s/Rpc/Display"
This reverts commit 6e7149f503f560f1e9237981058ff05642bb7db5.
* s/Rpc/Ui
* Add tests
* Ui more things
* Comments
* Remove Blockstore member variable from BlockCommitmentCache
* Hoist is_confirmed_rooted() to its only caller
BlockCommitmentCache no longer depends on Blockstore
* Move BlockCommitmentCache to solana_runtime
* Gossip benchmark
* Rayon tweaking
* push pulls
* fanout to max nodes
* fixup! fanout to max nodes
* fixup! fixup! fanout to max nodes
* update
* multi vote test
* fixup prune
* fast propagation
* fixups
* compute up to 95%
* test for specific tx
* stats
* stats
* fixed tests
* rename
* track a lagging view of which nodes have the local node in their active set in the local received_cache
* test fixups
* dups are old now
* dont prune your own origin
* send vote to tpu
* tests
* fixed tests
* fixed test
* update
* ignore scale
* lint
* fixup
* fixup
* fixup
* cleanup
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
* Batch process pull responses
* Generate pull requests at 1/2 rate
* Do filtering work of process_pull_response in read lock
Only take write lock to insert if needed.
* filter messages that are likely to be pushed from the response
* tests
* tests
* wait to start filtering responses, and push stats to influx
* wait to start filtering responses, and push stats to influx
* reduce the timers to match the publish self timeout
* fmt
* fmt
* Multi-version snapshot support
* rustfmt
* Remove CLI options and runtime support for selection output snapshot version.
Address some clippy complaints.
* Muzzle clippy type complexity warning.
Despite clippy's suggestion, it is not currently possible to create type aliases
for traits and so everything within the 'Box<...>' cannot be type aliased.
This then leaves creating full blown traits, and either implementing
said traits by closure (somehow) or moving the closures into new structs
implementing said traits which seems a bit of a palaver.
Alternatively it is possible to define and use the type alias 'type ResultBox<T> = Result<Box<T>>'
which does seems rather pointless and not a great reduction in complexity but is enough to keep clippy happy.
In the end I simply went with squelching the clippy warning.
* Remove now unused Serialize/Deserialize trait implementations for AccountStorageEntry and AppendVec
* refactor versioned de/serialisers
* rename serde_utils to serde_snapshot
* move call to accounts_db.generate_index() back down to context_accountsdb_from_stream()
* update version 1.1.1 to 1.2.0
remove nested use of serialize_bytes
* cleanups
* Add back measurement of account storage entry serialization.
Remove construction of Vec and HashMap temporaries during serialization.
* consolidate serialisation test cases into serde_snapshot.
clean up leakage of implementation details in serde_snapshot.
* move short term / legacy snapshot code into child module
* add serialize_iter_as_tuple
* preliminary integration of following commit
commit 6d58b73c47294bfb93465d5a83cd2175660b6e6d
Author: Ryo Onodera <ryoqun@gmail.com>
Date: Wed May 20 14:02:02 2020 +0900
Confine snapshot 1.1 relic to versioned codepath
* refactored serde_snapshot, rustfmt
legacy accounts_db format now "owns" both leading u64s, legacy bank_rc format has none
* reduce type complexity (clippy)
* Switch AccountsIndex.account_maps from HashMap to BTreeMap
* Introduce eager rent collection
* Start to add tests
* Avoid too short eager rent collection cycles
* Add more tests
* Add more tests...
* Refacotr!!!!!!
* Refactoring follow up
* More tiny cleanups
* Don't rewrite 0-lamport accounts to be deterministic
* Refactor a bit
* Do hard fork, restore tests, and perf. mitigation
* Fix build...
* Refactor and add switch over for testnet (TdS)
* Use to_be_bytes
* cleanup
* More tiny cleanup
* Rebase cleanup
* Set Bank::genesis_hash when resuming from snapshot
* Reorder fns and clean ups
* Better naming and commenting
* Yet more naming clarifications
* Make prefix width strictly uniform for 2-base partition_count
* Fix typo...
* Revert cluster-dependent gate
* kick ci?
* kick ci?
* kick ci?
* Switch subscriptions to use commitment instead of confirmations
* Add bank method to return account and last-modified slot
* Add last_modified_slot to subscription data and use to filter account subscriptions
* Update tests to non-zero last_notified_slot
* Add accounts subscriptions to test; fails at higher tx load
* Pass BankForks to RpcSubscriptions
* Use BankForks on add_account_subscription to properly initialize last_notified_slot
* Bundle subscriptions
* Check for non-equality
* Use commitment to initialize last_notified_slot; revert context.slot chage
* Revert "Untar is called for shred archives that do not exist. (#9565)"
This reverts commit 729cb5eec6.
* Revert "Dont insert shred payload into rocksdb (#9366)"
This reverts commit 5ed39de8c5.
* Add largest_confirmed_root to BlockCommitmentCache
* clippy
* Add blockstore to BlockCommitmentCache to check root
* Add rooted_stake helper fn and test
* Nodes that are behind should correctly id confirmed roots
* Simplify rooted_stake collector
* Add runtime methods to simply get status and slot
* Add helper function to get slot confirmation_count from BlockCommitmentCache
* Return cluster confirmations in getSignatureStatus
* Remove use of invalid get_signature_confirmation_status
* Remove unused methods
* Update pubsub to use cluster confirmations
* Fix test_check_signature_subscribe failure
* Refactor confirmations to read commitment cache only once
* Review comments
* Use bank, root from BlockCommitmentCache
* Update docs
* Add metric for block-commitment aggregations
Co-authored-by: Justin Starry <justin@solana.com>
* Add CrdsValue timeout checks on Pull Responses
* Allow older values to enter Crds as long as a ContactInfo exists
* Allow staked contact infos to be inserted into crds if they haven't expired
* Try and handle oveflows
* Fix test
* Some comments
* Fix compile
* fix test deadlock
* Add a test for processing timed out values received via pull response
* save limit deserialize
* save
* Save
* Clean up
* rustfmt
* rustfmt
* Just comment out to please CI
* Fix ci...
* Move code
* Rustfmt
* Crean up control flow
* Add another comment
* Introduce predetermined constant limit on snapshot data files (deserialize side)
* Introduce predetermined constant limit on snapshot data files (serialize side)
* rustfmt
* Tweak message
* Revert dynamic memory limit
* Limit size of snapshot data file (de)serialization
* Fix test breakage
* Clean up
* Fix uses formatting
* Rename: deserialize_{for,from}_snapshot
* Simplify comment
* Use Slot
* Provide slot for status cache
* Align variable name with snapshot_status_cache_file_path
* Define serialize_snapshot_data_file_with_metrics
* Fix build.......
* De-marco serialize_snapshot_data_file_with_metrics
* Revert u64 => Slot
* Pass blocktree into execute_batch, if persist_transaction_status
* Add validator arg to enable persistent transaction status store
* Pass blocktree into banking_stage, if persist_transaction_status
* Add validator params to bash scripts
* Expose actual transaction statuses outside Bank; add tests
* Fix benches
* Offload transaction status writes to a separate thread
* Enable persistent transaction status along with rpc service
* nudge
* Review comments
* Remove the name "blob" from archivers
* Remove the name "blob" from broadcast
* Remove the name "blob" from Cluset Info
* Remove the name "blob" from Repair
* Remove the name "blob" from a bunch more places
* Remove the name "blob" from tests and book
* Remove Blobs and switch to Packets
* Fix some gossip messages not respecting MTU size
* Failure to serialize is not fatal
* Add log macros
* Remove unused extern
* Apparently macro use is required
* Explicitly scope macro
* Fix test compile
* credit_only_credits_forwarding
* whack transfer_now()
* fixup
* bench should retry the airdrop TX
* fixup
* try to make bench-exchange a bit more robust, informative
* Insert data shreds in blocktree and database
* Integrate data shreds with rest of the code base
* address review comments, and some clippy fixes
* Fixes to some tests
* more test fixes
* ignore some local cluster tests
* ignore replicator local cluster tests
* Coalesce gossip pull requests and serve them in batches
* batch all filters and immediately respond to messages in gossip
* Fix tests
* make download_from_replicator perform a greedy recv
* fixed bloom filter math
* Add split each pull request into multiple pulls with different filters
* Rework CrdsFilter to generate all possible masks to cover the keyspace
* Limit the bloom sizes such that each pull request is no larger than mtu
* Decouple Segments from Turns in Storage
* Get replicator local cluster tests running in a reasonable amount of time
* Fix unused imports
* Document new RPC APIs
* Check for exit while polling
* Ensure signable data is not out of range
* Add a broadcast stage that puts bad sizes in blobs
* Resign blob after modifyign size
* Remove assertions that fail when size != meta.size
* Add crate to serialize some tests
* Ignore unused attribute warning
* Enable parallel run in CI
* Try to fix lograte tests
* Fix interdependent counter tests
* Broadcast run for injecting fake blobs in turbine
* address review comments
* new local cluster test that uses fake blob broadcast
* added a test to make sure tvu_peers ordering is guaranteed
* Add DeadSlots column family
* Filter dead forks from get_slots_since
* Mark erroring slots as dead in replay stage, add test
* Mark dead forks in progress instead of removing them
* Fix logging process_entries failures in replay_stage
* Unignore test_fail_entry_verification_leader
* Refactor BroadcastStage to support custom implementations, add FailEntryVerificationBroadcastRun implementation
* Plumb switch on broadcast type through validator
* Add test for validator generating non-verifiable entries to local_cluster
* Fix bad initializers
* Refactor broadcast run code into utils
* add MiningPool fund validator MinigPools from inflation
* fixup
* finish rename of MINIMUM_SLOT_LENGTH to MINIMUM_SLOTS_PER_EPOCH
* deterministic miningpool location
* point_value, not credit_value... use f64
* Add signature to blob
* Change Signable trait to support returning references to signable data
* Add signing to broadcast
* Verify signatures in window_service
* Add testing for signatures to erasure
* Add RPC for getting current slot, consume RPC call in test_repairman_catchup for more deterministic results