#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations. The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.
The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
Now that nodes correctly populate position field in coding shreds, and
first_coding_index in erasure meta, the old code to maintain backward
compatibility can be removed.
The commit is working towards changing erasure coding schema to 32:64.
Current slot stats are removed when the slot is full or every 30 seconds
if the slot is before root:
https://github.com/solana-labs/solana/blob/493a8e234/ledger/src/blockstore.rs#L2017-L2027
In order to track if the slot is ultimately marked as dead or rooted and
emit more metrics, this commit expands lifetime of SlotStats while
bounding total size of cache using an LRU eviction policy.
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.
This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus. Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.
Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
As we start adding more options into BlockstoreOptions, it's better to allow
new_cf_descriptor to take the reference to BlockstoreOptions so that
we can avoid future function API changes on new_cf_descriptor.
Creating a new ledger implicitly means that no other process could have
previously held access to it. Additionally, creating a new ledger
implicitly requires writing, so it follows that Primary access is
required and we can drop access type as an argument.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
* Rename excludes_from_compaction to should_exclude_from_compaction
* Make subfunction to create all cf descriptors
* Condense logic for when to disable compactions
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.
--shred-storage <SHRED_STORAGE>
EXPERIMENTAL: Controls how RocksDB compacts shreds. *WARNING*: You will lose your ledger data
when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
disk-write-bytes of the ledger store. [default: level] [possible values: level, fifo]
--shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
268435456000]
#### Summary of Changes
To avoid mixing the use of different shred storage types, each shred storage type
will have its blockstore in a different directory.
This PR still keeps the RocksFifo setting hidden. The default ShredStorageType and
blockstore directory are still RocksLevel and `rocksdb`.
Will follow-up with PRs on making FIFO option public in ledger-tool and validator.
#### Test Plan
* Added a new test to verify the existence of `rocksdb-fifo` directory when FIFO compaction is used.
* Updated existing test to verify the current setting still store ledger under `rocksdb` directory.
* Manually ran ledger_cleanup_test with both level and fifo compaction and verified the resulting ledger.
* Ran a validator with this PR.
* Bump first-available block to first complete block
* Remove obsolete purges in tests (PrimaryIndex toggling no longer in use
* Check first-available block in Rpc check_slot_cleaned_up
* Add failing test for precompile transition
* Skip adding builtins if they will be removed
* cargo clean
* nits
* fix abi check
* remove workaround
Co-authored-by: Jack May <jack@solana.com>
Transaction logs are not being saved to the database through the plugin interface.
Summary of Changes
Retain the transaction logs when transaction notification plugin is loaded.
Fixes #
lijunwangs/solana-accountsdb-plugin-postgres#6
* Refactor Bank::load_and_execute_transactions
* Refactor: improve type safety of TransactionExecutionResult
* Add enum for extra type safety in execution results
* feedback
* get_signatures_for_address does not correctly account for result sets that span Blockstore and Bigtable.
This causes Bigtable to return `RowNotFound` until the new tx is uploaded.
Check that `before` exists in Bigtable, and if not, set it to `None` to return the full data set.
References #21442Closes#22110
* Differentiate between before sig not found and no newer signatures
* Dedupe bigtable results to account for potential upload race
Co-authored-by: Tyera Eulberg <tyera@solana.com>
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921
However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.
The commit adds constructs to track coding shreds indices explicitly.
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
The number of shreds that result from a given number of entries is
variable and in our test case, somewhat unintuitive to think about when
trying to determine how much data we're pushing into the blockstore. So,
this change converts the unit of test parameters from entries to shreds.
This change also cleans up some variable naming for clarity and prints.
SlotMeta.last_index may be unknown and current code is using u64::MAX to
indicate that:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L169-L174
This lacks type-safety and can introduce bugs if not always checked for
Several instances of slot_meta.last_index + 1 are also subject to
overflow.
This commit updates the type to Option<u64>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
https://github.com/solana-labs/solana/pull/16646
removed first_coding_index since the field is currently redundant and
always equal to fec_set_index.
However, with upcoming changes to erasure coding schema, this will no
longer be the same as fec_set_index and so requires a separate field to
represent.
https://github.com/solana-labs/solana/pull/17004
removed position field from coding-shred-header because as it stands the
field is redundant and unused.
However, with the upcoming changes to erasure coding schema this field
will no longer be redundant and needs to be populated.
Allows the use of GPU acceleration in verifying the signatures in Entry's after deserialization in the replay stage
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
Co-authored-by: Ryan Leung <ryan.leung@solana.com>
Problem
Blockstore currently removes erasure shreds if the data shreds are
successfully recovered on insert, which is an issue if we want to
serve coding shreds over repair.
Summary of Changes
This diff keeps all coding shreds even on successful recovery and
changes change the signature of prev_inserted_codes to immutable
reference to ensure its immunity.
Fixes#20968
* Cache owners in TransactionTokenBalances
* Light cleanup
* Use return struct, and remove pub
* Single-use statements
* Why not, just do the whole crate
* Add metrics
* Make datapoint_debug to prevent spam unless troubleshooting
* add filler accounts to bloat validator and predict failure
* assert no accounts match filler
* cleanup magic numbers
* panic if can't load from snapshot with filler accounts specified
* some renames
* renames
* into_par_iter
* clean filler accts, too
Now that CRDS supports incremental snapshot hashes,
SnapshotPackagerService needs to push 'em!
This commit does two main things:
1. SnapshotPackagerService now knows about incremental snapshot hashes,
and will push SnapshotPackage::IncrementalSnapshot hashes to CRDS.
2. At startup, when loading from a full + incremental snapshot, the
hashes need to be passed all the way to SnapshotPackagerService so it
can push these starting hashes to CRDS. Those values have been piped
through.
Fixes#20441 and #20423
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
Use TempDir::path() instead of TempDir::into_path(); into_path() returns
the underlying PathBuf and voids the promise to delete the directory
when TempDir goes out of scope.
This is a bugfix.
When `load_frozen_forks()` called `snapshot_bank()`, the accounts hash
was not computed. So then when a snapshot archive was eventually
created, its hash would be all 1s, which is clearly wrong. The fix is
to update the bank's accounts hash before taking the bank snapshot.
* Add TransactionMemos column family
* Traitify extract_memos
* Write TransactionMemos in TransactionStatusService
* Populate memos from column
* Dedupe and add unit test
Some of the blockstore_processor tests weren't using
`test_process_blockstore()`, but could. I updated those tests, and also
changed to using `..` for ignoring, instead of `_`, since I'll be adding
another return value here shortly (thus reducing the need to change
these again).
```text
error: all if blocks contain the same code at the end
--> ledger/src/blockstore.rs:3473:5
|
3473 | / Ok(insert_map.get(&slot).unwrap().clone())
3474 | | }
| |_____^
|
= note: `-D clippy::branches-sharing-code` implied by `-D warnings`
= note: The end suggestion probably needs some adjustments to use the expression result correctly
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#branches_sharing_code
help: consider moving the end statements out like this
|
3473 ~ }
3474 + Ok(insert_map.get(&slot).unwrap().clone())
|
error: could not compile `solana-ledger` due to previous error
```
#### Problem
Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:
```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```
#### Summary of Changes
For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.
Co-authored-by: Michael Vines <mvines@gmail.com>
* increase block_cost_max by 10 times to accommodate updated account read and write costs; Otherwise during performace test, banking_stage will start reject and retry transactions in next block, result regressed performance.
* fix typo
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.
This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
* Handle cleaning zero-lamport accounts
Handle cleaning zero-lamport accounts in slots higher than the last full
snapshot slot. This is part of the Incremental Snapshot work.
Fixes#18825
* added realtime cost checking logic to reject block that would exceed max limit:
- defines max limits at block_cost_limits.rs
- right after each bath's execution, accumulate its cost and check again
limit, return error if limit is exceeded
* update abi that changed due to adding additional TransactionError
* To avoid counting stats mltiple times, only accumulate execute-timing when a bank is completed
* gate it by a feature
* move cost const def into block_cost_limits.rs
* redefine the cost for signature and account access, removed signer part as it is not well defined for now
* check if per_program_timings of execute_timings before sending
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
* Move transaction sanitization earlier in the pipeline
* Renamed HashedTransaction to SanitizedTransaction
* Implement deref for sanitized transaction
* bring back process_transactions test method
* Use sanitized transactions for cost model calculation
* Fix link target in doc comment
* Fix formatting of log examples in process_instruction
* Fix doc markdown in solana-gossip
* Fix doc markdown in solana-runtime
* Escape square braces in doc comments to avoid warnings
* Surround 'account references' doc items in code spans to avoid warnings
* Fix code block in loader_upgradeable_instruction
* Fix doctest for loader_upgradable_instruction
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true
Fixes#17544
* Create solana-poh crate
* Move BigTableUploadService to solana-ledger
* Add solana-rpc to workspace
* Move dependencies to solana-rpc
* Move remaining rpc modules to solana-rpc
* Single use statement solana-poh
* Single use statement solana-rpc
* Update rocksdb to v0.16.0
* Promote the infrequent and important log to info!
* Force background compaction by ttl without manual compaction
* Fix test
* Support no compaction mode in test_ledger_cleanup_compaction
* Fix comment
* Make compaction_interval customizable
* Avoid major compaction with periodic filtering...
* Adress lazy_static, special cfs and range check
* Clean up a bit and add comment
* Add comment
* More comments...
* Config code cleanup
* Add comment
* Use .conflicts_with()
* Nullify unneeded delete_range ops for special CFs
* Some clean ups
* Clarify the locking intention
* Ensure special CFs' consistency with PurgeType::CompactionFilter
* Fix comment
* Fix bad copy paste
* Fix various types...
* Don't use tuples
* Add a unit test for compaction_filter
* Fix typo...
* Remove flag and just use new behavior always
* Fix wrong condition negation...
* Doc. about no set_last_purged_slot in purge_slots
* Write a test and fix off-by-one bug....
* Apply suggestions from code review
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Follow up to github review suggestions
* Fix line-wrapping
* Fix conflict
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Refactor a few functions that are on the load-from-snapshot path, to facilitate
adding in incremental snapshots more easily.
Additionally, add some tests and doc comments.
* Add BlockHeight CF to blockstore
* Rename CacheBlockTimeService to be more general
* Cache block-height using service
* Fixup previous proto mishandling
* Add block_height to block structs
* Add block-height to solana block
* Fallback to BankForks if block time or block height are not yet written to Blockstore
* Add docs
* Review comments
* Add blockstore-root-scan for api nodes on boot
* Ensure cluster-confirmed root and parents are set as root in blockstore in load_frozen_forks()
* Plumb rpc-scan-and-fix-roots validator flag
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script
* upgrade to Rust 1.52.1
* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
* Require that blockstore block-time only be recognized slot, instead of root
* Move cache_block_time to after Bank freeze
* Single use statement
* Pass transaction_status_sender by reference
* Remove unnecessary slot-existence check before caching block time altogether
* Move block-time existence check into Blockstore::cache_block_time, Blockstore no longer needed in blockstore_processor helper
CodingShredHeader.position is equal to
ShredCommonHeader.index - ShredCommonHeader.fec_set_index
and is so redundant. The extra position field can add bugs if not
consistent with index and fec_set_index.
Strip the zero-padding off of data shreds before insertion into blockstore
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
Co-authored-by: Nathan Hawkins <utsl@utsl.org>
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719
Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714
However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.
This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.
As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
* Track transaction check time separately from account loads
* banking packet process metrics
* Remove signature clone in status cache lookup
* Reduce allocations when converting packets to transactions
* Add blake3 hash of transaction messages in status cache
* Bug fixes
* fix tests and run fmt
* Address feedback
* fix simd tx entry verification
* Fix rebase
* Feedback
* clean up
* Add tests
* Remove feature switch and fall back to signature check
* Bump programs/bpf Cargo.lock
* clippy
* nudge benches
* Bump `BankSlotDelta` frozen ABI hash`
* Add blake3 to sdk/programs/Cargo.lock
* nudge bpf tests
* short circuit status cache checks
Co-authored-by: Trent Nelson <trent@solana.com>
If the vector is pinned and has a recycler, From<PinnedVec>
implementation of Vec should clone (instead of consuming) the underlying
vector so that the next allocation of a PinnedVec will recycle an
already pinned one.