While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
This PR solves #18815. Note that I had to make the snapshot prefix
constants inside `snapshot_utils.rs` public at the crate level in order
to make this work. I'm not sure whether or not introducing this
dependency is entirely good, either way the `snapshot_utils.rs` file
needs a lot of rework so things will move around, I believe this does
the work in the meantime. Any feedback will be greatly appreciated.
This commit builds on PR #18504 by adding a test to core/tests/snapshot.rs for Incremental Snapshots. The test adds banks to bank forks in a loop and takes both full snapshots and incremental snapshots at intervals, and validates they are rebuild-able.
For background info about Incremental Snapshots, see #17088.
Fixes#18829 and #18972
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true
Fixes#17544
* Update rocksdb to v0.16.0
* Promote the infrequent and important log to info!
* Force background compaction by ttl without manual compaction
* Fix test
* Support no compaction mode in test_ledger_cleanup_compaction
* Fix comment
* Make compaction_interval customizable
* Avoid major compaction with periodic filtering...
* Adress lazy_static, special cfs and range check
* Clean up a bit and add comment
* Add comment
* More comments...
* Config code cleanup
* Add comment
* Use .conflicts_with()
* Nullify unneeded delete_range ops for special CFs
* Some clean ups
* Clarify the locking intention
* Ensure special CFs' consistency with PurgeType::CompactionFilter
* Fix comment
* Fix bad copy paste
* Fix various types...
* Don't use tuples
* Add a unit test for compaction_filter
* Fix typo...
* Remove flag and just use new behavior always
* Fix wrong condition negation...
* Doc. about no set_last_purged_slot in purge_slots
* Write a test and fix off-by-one bug....
* Apply suggestions from code review
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Follow up to github review suggestions
* Fix line-wrapping
* Fix conflict
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Move gossip modules to solana-gossip
* Update Protocol abi digest due to move
* Move gossip benches and hook up CI
* Remove unneeded Result entries
* Single use statements
For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491
Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.
This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.
In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.
This commit adds a metric to track how often above condition is not met.
If stakes are unknown, then timeouts will be short, resulting in values
being purged from the crds table, and consequently higher pull-response
load when they are obtained again from gossip. In particular, this slows
down validator start where almost all values obtained from entrypoint
are immediately discarded.
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script
* upgrade to Rust 1.52.1
* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
* purge_old_snapshot_archives is changed to take an extra argument 'maximum_snapshots_to_retain' to control the max number of latest snapshot archives to retain. Note the oldest snapshot is always retained as before and is not subjected to this new options.
* The validator and ledger-tool executables are modified with a CLI argument --maximum-snapshots-to-retain. And the options are propagated down the call chains. Their corresponding shell scripts were changed accordingly.
* SnapshotConfig is modified to have an extra field for the maximum_snapshots_to_retain
* Unit tests are developed to cover purge_old_snapshot_archives
Having an ordinal index on crds values based on insert order allows to
efficiently filter values using a cursor. In particular
CrdsGossipPush::push_messages hash-map can be replaced with a cursor,
saving on the bookkeepings, purging, etc
VersionedCrdsValue.insert_timestamp is used for fetching crds values
inserted since last query:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298
So it is crucial that insert_timestamp does not go backward in time when
new values are inserted into the table. However std::time::SystemTime is
not monotonic, or due to workload, lock contention, thread scheduling,
etc, ... new values may be inserted with a stalled timestamp way in the
past. Additionally, reading system time for the above purpose is
inefficient/unnecessary.
This commit adds an ordinal index to crds values indicating their insert
order. Additionally, it implements a new Cursor type for fetching values
inserted since last query.
IP addresses need to be validated before sending packets to them.
This commit, sends a ping packet to nodes before any pull requests.
Pull requests are then only sent to the nodes which have responded with
the correct hash of their respective ping packet.
rocksdb compaction can cause long stalls, so
make it more configurable to try and reduce those stalls
and also to coordinate between multiple nodes to not induce
stall at the same time.
* Update tonic & prost, and regenerate proto
* Reignore doc code
* Revert pull #14367, but pin tokio to v0.2 for jsonrpc
* Bump backoff and goauth -> and therefore tokio
* Bump tokio in faucet, net-utils
* Bump remaining tokio, plus tarpc
* Deprecate commitment variants
* Add new CommitmentConfig builders
* Add helpers to avoid allowing deprecated variants
* Remove deprecated transaction-status code
* Include new commitment variants in runtime commitment; allow deprecated as long as old variants persist
* Remove deprecated banks code
* Remove deprecated variants in core; allow deprecated in rpc/rpc-subscriptions for now
* Heavier hand with rpc/rpc-subscription commitment
* Remove deprecated variants from local-cluster
* Remove deprecated variants from various tools
* Remove deprecated variants from validator
* Update docs
* Remove deprecated client code
* Add new variants to cli; remove deprecated variants as possible
* Don't send new commitment variants to old clusters
* Retain deprecated method in test_validator_saves_tower
* Fix clippy matches! suggestion for BPF solana-sdk legacy compile test
* Refactor node version check to handle commitment variants and transaction encoding
* Hide deprecated variants from cli help
* Add cli App comments
Crds values buffered when responding to pull-requests can be very large taking a lot of memory.
Added a limit for number of buffered crds values based on outbound data budget.
* Move bank drop to AccountsBackgroundService
* Send to ABS on drop instead, protects against other places banks are dropped
* Fix Abi
* test
Co-authored-by: Carl Lin <carl@solana.com>
* Add TestValidator::new_with_fees constructor, and warning for low bootstrap_validator_lamports
* Add logging to solana-tokens integration test to help catch low bootstrap_validator_lamports in the future
* Reasonable TestValidator mint_lamports
https://hackerone.com/reports/991106
> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
crds_gossip tests start large networks, which with large thread-pools
will exhaust system resources, causing failures in ci tests:
https://buildkite.com/solana-labs/solana/builds/31953
The commit limits size of thread-pools in the test.
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
* Add service to track the most recent optimistically confirmed bank
* Plumb service into ClusterInfoVoteListener and ReplayStage
* Clean up test
* Use OptimisticallyConfirmedBank in RPC
* Remove superfluous notifications from RpcSubscriptions
* Use crossbeam to avoid mpsc recv_timeout panic
* Review comments
* Remove superfluous last_checked_slots, but pass in OptimisticallyConfirmedBank for complete correctness
* Use untagged RpcSignatureResult enum to avoid breaking downstream consumers of current signature subscriptions
* Clean up client duplication
* Clippy
* wip: re-do rent collection check on rent-exempt account
* Let's see how the ci goes
* Restore previous code
* Well, almost all new changes are revertable
* Update doc
* Add test and gating
* Fix tests
* Fix tests, especially avoid to change abi...
* Fix more tests...
* Fix snapshot restore
* Align to _new_ with better uninitialized detection
* Add check in cluster_info_vote_listenere to see if optimstic conf was achieved
Add OptimisticConfirmationVerifier
* More fixes
* Fix merge conflicts
* Remove gossip notificatin
* Add dashboards
* Fix rebase
* Count switch votes as well toward optimistic conf
* rename
Co-authored-by: Carl <carl@solana.com>
* Move bank (de)serialisation logic from bank and snapshot_utils to serde_snapshot.
Add sanity assertions between genesis config and bank fields on deserialisation.
Atomically update atomic bool in quote_for_specialization_detection().
Use same genesis config when restoring snapshots in test cases.
* Tidy up namings and duplicate structs to version
* Apply struct renames to tests
* Update abi hashes
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
* Add RpcFilterType, and implement CompareBytes for getProgramAccounts
* Accept bytes in bs58
* Rename to memcmp
* Add Memcmp optional encoding field
* Add dataSize filter
* Update docs
* Clippy
* Simplify tests that don't need to test account contents; add multiple-filter tests
* Fix comment and make less pub
* Add account-decoder crate and use to decode vote and system (nonce) accounts
* Update docs
* Rename RpcAccount struct
* s/Rpc/Display
* Call it jsonParsed and update docs
* Revert "s/Rpc/Display"
This reverts commit 6e7149f503f560f1e9237981058ff05642bb7db5.
* s/Rpc/Ui
* Add tests
* Ui more things
* Comments
* Remove Blockstore member variable from BlockCommitmentCache
* Hoist is_confirmed_rooted() to its only caller
BlockCommitmentCache no longer depends on Blockstore
* Move BlockCommitmentCache to solana_runtime
* Gossip benchmark
* Rayon tweaking
* push pulls
* fanout to max nodes
* fixup! fanout to max nodes
* fixup! fixup! fanout to max nodes
* update
* multi vote test
* fixup prune
* fast propagation
* fixups
* compute up to 95%
* test for specific tx
* stats
* stats
* fixed tests
* rename
* track a lagging view of which nodes have the local node in their active set in the local received_cache
* test fixups
* dups are old now
* dont prune your own origin
* send vote to tpu
* tests
* fixed tests
* fixed test
* update
* ignore scale
* lint
* fixup
* fixup
* fixup
* cleanup
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
* Batch process pull responses
* Generate pull requests at 1/2 rate
* Do filtering work of process_pull_response in read lock
Only take write lock to insert if needed.
* filter messages that are likely to be pushed from the response
* tests
* tests
* wait to start filtering responses, and push stats to influx
* wait to start filtering responses, and push stats to influx
* reduce the timers to match the publish self timeout
* fmt
* fmt
* Multi-version snapshot support
* rustfmt
* Remove CLI options and runtime support for selection output snapshot version.
Address some clippy complaints.
* Muzzle clippy type complexity warning.
Despite clippy's suggestion, it is not currently possible to create type aliases
for traits and so everything within the 'Box<...>' cannot be type aliased.
This then leaves creating full blown traits, and either implementing
said traits by closure (somehow) or moving the closures into new structs
implementing said traits which seems a bit of a palaver.
Alternatively it is possible to define and use the type alias 'type ResultBox<T> = Result<Box<T>>'
which does seems rather pointless and not a great reduction in complexity but is enough to keep clippy happy.
In the end I simply went with squelching the clippy warning.
* Remove now unused Serialize/Deserialize trait implementations for AccountStorageEntry and AppendVec
* refactor versioned de/serialisers
* rename serde_utils to serde_snapshot
* move call to accounts_db.generate_index() back down to context_accountsdb_from_stream()
* update version 1.1.1 to 1.2.0
remove nested use of serialize_bytes
* cleanups
* Add back measurement of account storage entry serialization.
Remove construction of Vec and HashMap temporaries during serialization.
* consolidate serialisation test cases into serde_snapshot.
clean up leakage of implementation details in serde_snapshot.
* move short term / legacy snapshot code into child module
* add serialize_iter_as_tuple
* preliminary integration of following commit
commit 6d58b73c47294bfb93465d5a83cd2175660b6e6d
Author: Ryo Onodera <ryoqun@gmail.com>
Date: Wed May 20 14:02:02 2020 +0900
Confine snapshot 1.1 relic to versioned codepath
* refactored serde_snapshot, rustfmt
legacy accounts_db format now "owns" both leading u64s, legacy bank_rc format has none
* reduce type complexity (clippy)
* Switch AccountsIndex.account_maps from HashMap to BTreeMap
* Introduce eager rent collection
* Start to add tests
* Avoid too short eager rent collection cycles
* Add more tests
* Add more tests...
* Refacotr!!!!!!
* Refactoring follow up
* More tiny cleanups
* Don't rewrite 0-lamport accounts to be deterministic
* Refactor a bit
* Do hard fork, restore tests, and perf. mitigation
* Fix build...
* Refactor and add switch over for testnet (TdS)
* Use to_be_bytes
* cleanup
* More tiny cleanup
* Rebase cleanup
* Set Bank::genesis_hash when resuming from snapshot
* Reorder fns and clean ups
* Better naming and commenting
* Yet more naming clarifications
* Make prefix width strictly uniform for 2-base partition_count
* Fix typo...
* Revert cluster-dependent gate
* kick ci?
* kick ci?
* kick ci?
* Switch subscriptions to use commitment instead of confirmations
* Add bank method to return account and last-modified slot
* Add last_modified_slot to subscription data and use to filter account subscriptions
* Update tests to non-zero last_notified_slot
* Add accounts subscriptions to test; fails at higher tx load
* Pass BankForks to RpcSubscriptions
* Use BankForks on add_account_subscription to properly initialize last_notified_slot
* Bundle subscriptions
* Check for non-equality
* Use commitment to initialize last_notified_slot; revert context.slot chage
* Revert "Untar is called for shred archives that do not exist. (#9565)"
This reverts commit 729cb5eec6.
* Revert "Dont insert shred payload into rocksdb (#9366)"
This reverts commit 5ed39de8c5.
* Add largest_confirmed_root to BlockCommitmentCache
* clippy
* Add blockstore to BlockCommitmentCache to check root
* Add rooted_stake helper fn and test
* Nodes that are behind should correctly id confirmed roots
* Simplify rooted_stake collector