* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
Now that nodes correctly populate position field in coding shreds, and
first_coding_index in erasure meta, the old code to maintain backward
compatibility can be removed.
The commit is working towards changing erasure coding schema to 32:64.
Current slot stats are removed when the slot is full or every 30 seconds
if the slot is before root:
https://github.com/solana-labs/solana/blob/493a8e234/ledger/src/blockstore.rs#L2017-L2027
In order to track if the slot is ultimately marked as dead or rooted and
emit more metrics, this commit expands lifetime of SlotStats while
bounding total size of cache using an LRU eviction policy.
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.
This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus. Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.
Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
As we start adding more options into BlockstoreOptions, it's better to allow
new_cf_descriptor to take the reference to BlockstoreOptions so that
we can avoid future function API changes on new_cf_descriptor.
Creating a new ledger implicitly means that no other process could have
previously held access to it. Additionally, creating a new ledger
implicitly requires writing, so it follows that Primary access is
required and we can drop access type as an argument.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
* Rename excludes_from_compaction to should_exclude_from_compaction
* Make subfunction to create all cf descriptors
* Condense logic for when to disable compactions
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.
--shred-storage <SHRED_STORAGE>
EXPERIMENTAL: Controls how RocksDB compacts shreds. *WARNING*: You will lose your ledger data
when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
disk-write-bytes of the ledger store. [default: level] [possible values: level, fifo]
--shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
268435456000]
#### Summary of Changes
To avoid mixing the use of different shred storage types, each shred storage type
will have its blockstore in a different directory.
This PR still keeps the RocksFifo setting hidden. The default ShredStorageType and
blockstore directory are still RocksLevel and `rocksdb`.
Will follow-up with PRs on making FIFO option public in ledger-tool and validator.
#### Test Plan
* Added a new test to verify the existence of `rocksdb-fifo` directory when FIFO compaction is used.
* Updated existing test to verify the current setting still store ledger under `rocksdb` directory.
* Manually ran ledger_cleanup_test with both level and fifo compaction and verified the resulting ledger.
* Ran a validator with this PR.
* Bump first-available block to first complete block
* Remove obsolete purges in tests (PrimaryIndex toggling no longer in use
* Check first-available block in Rpc check_slot_cleaned_up