In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
replace and remove entries_to_data_shreds from the public interface.
* working on local snapshot
* Parallelization for slot storage minimization
* Additional clean-up and fixes
* make --minimize an option of create-snapshot
* remove now unnecessary function
* Parallelize parts of minimized account set generation
* clippy fixes
* Add rent collection accounts and voting node_pubkeys
* Simplify programdata_accounts generation
* Loop over storages to get slot set
* Parallelize minimized slot set generation
* Parallelize adding owners and programdata_accounts
* Remove some now unncessary checks on the blockstore
* Add a warning for minimized snapshots across epoch boundary
* Simplify ledger-tool minimize
* Clarify names of bank's minimization helper functions
* Remove unnecesary funciton, fix line spacing
* Use DashSets instead of HashSets for minimized account and slot sets
* Filter storages uses all threads instead of thread_pool
* Add some additional comments on functions for minimization
* Moved more into bank and parallelized
* Update programs/bpf/Cargo.lock for dashmap in ledger
* Clippy fix
* ledger-tool: convert minimize_bank_for_snapshot Measure into measure!
* bank.rs: convert minimize_bank_for_snapshot Measure into measure!
* accounts_db.rs: convert minimize_accounts_db Measure into measure!
* accounts_db.rs: add comment about use of minimize_accounts_db
* ledger-tool: CLI argument clarification
* minimization functions: make infos unique
* bank.rs: Add test_get_rent_collection_accounts_between_slots
* bank.rs: Add test_minimization_add_vote_accounts
* bank.rs: Add test_minimization_add_stake_accounts
* bank.rs: Add test_minimization_add_owner_accounts
* bank.rs: Add test_minimization_add_programdata_accounts
* accounts_db.rs: Add test_minimize_accounts_db
* bank.rs: Add negative case and comments in test_get_rent_collection_accounts_between_slots
* bank.rs: Negative test in test_minimization_add_programdata_accounts
* use new static runtime and sdk ids
* bank comments to doc comments
* Only need to insert the maximum slot a key is found in
* rename remove_pubkeys to purge_pubkeys
* add comment on builtins::get_pubkeys
* prevent excessive logging of removed dead slots
* don't need to remove slot from shrink slot candidates
* blockstore.rs: get_accounts_used_in_range shouldn't return Result
* blockstore.rs: get_accounts_used_in_range: parallelize slot loop
* report filtering progress on time instead of count
* parallelize loop over snapshot storages
* WIP: move some bank minimization functionality into a new class
* WIP: move some accounts_db minimization functionality into SnapshotMinimizer
* WIP: Use new SnapshotMinimizer
* SnapshotMinimizer: fix use statements
* remove bank and accounts_db minimization code, where possible
* measure! doesn't take a closure
* fix use statement in blockstore
* log_dead_slots does not need pub(crate)
* get_unique_accounts_from_storages does not need pub(crate)
* different way to get stake accounts/nodes
* fix tests
* move rent collection account functionality to snapshot minimizer
* move accounts_db minimize behavior to snapshot minimizer
* clean up
* Use bank reference instead of Arc. Additional comments
* Add a comment to blockstore function
* Additional clarifying comments
* Moved all non-transaction account accumulation into the SnapshotMinimizer.
* transaction_account_set does not need to be mutable now
* Add comment about load_to_collect_rent_eagerly
* Update log_dead_slots comment
* remove duplicate measure/print of get_minimized_slot_set
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
A slot may be purged from the blockstore with clear_unconfirmed_slot().
If the slot is added back, the slot should only exist once in its'
parent SlotMeta::next_slots Vec. Prior to this change, repeated clearing
and re-adding of a slot could result in the slot existing in parent's
next_slots multiple times. The result is that if the next time the
parent slot is processed (node restart or ledger-tool-replay), slot
could be added to the queue of slots to play multiple times.
Added test that failed before change and works now as well
Fix pre-check of blockstore slts during load_bank_forks. Now iterates from starting_slot to halt_slot via slot_meta.next_slots to confirm they are connected.
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
Now that nodes correctly populate position field in coding shreds, and
first_coding_index in erasure meta, the old code to maintain backward
compatibility can be removed.
The commit is working towards changing erasure coding schema to 32:64.
Current slot stats are removed when the slot is full or every 30 seconds
if the slot is before root:
https://github.com/solana-labs/solana/blob/493a8e234/ledger/src/blockstore.rs#L2017-L2027
In order to track if the slot is ultimately marked as dead or rooted and
emit more metrics, this commit expands lifetime of SlotStats while
bounding total size of cache using an LRU eviction policy.
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.
This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus. Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.
Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
Creating a new ledger implicitly means that no other process could have
previously held access to it. Additionally, creating a new ledger
implicitly requires writing, so it follows that Primary access is
required and we can drop access type as an argument.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
#### Summary of Changes
To avoid mixing the use of different shred storage types, each shred storage type
will have its blockstore in a different directory.
This PR still keeps the RocksFifo setting hidden. The default ShredStorageType and
blockstore directory are still RocksLevel and `rocksdb`.
Will follow-up with PRs on making FIFO option public in ledger-tool and validator.
#### Test Plan
* Added a new test to verify the existence of `rocksdb-fifo` directory when FIFO compaction is used.
* Updated existing test to verify the current setting still store ledger under `rocksdb` directory.
* Manually ran ledger_cleanup_test with both level and fifo compaction and verified the resulting ledger.
* Ran a validator with this PR.
* Bump first-available block to first complete block
* Remove obsolete purges in tests (PrimaryIndex toggling no longer in use
* Check first-available block in Rpc check_slot_cleaned_up