* Use client certs in QUIC to get peer's stake
* fixes to cert processing
* integrate the code
* clippy
* more cleanup
* sort cargo deps
* test fixes
* info -> debug
* Remove UseQuic type
Move to storing the UdpSocket on ConnectionCache and accepting a bool
* Remove use_quic from ConnectionCache constructor
Replace with separate with_udp constructor to force callers to choose
* allow initial hash calc to occur in bg
* validator_initialized -> startup_verification_complete
* add infos for leader and vote
* rework snapshot for startup verification
* change to assert
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
* Make sure to root local slots even with hard fork
* Address review comments
* Cleanup a bit
* Further clean up
* Further clean up a bit
* Add comment
* Tweak hard fork reconciliation code placement
* Connection pool in connection cache and handle connection errors
1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.
#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)
Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.
Fixes#25710
* client: Remove static connection cache, plumb it instead
* Add TpuClient::new_with_connection_cache to not break downstream
* Refactor get_connection and RwLock into ConnectionCache
* Fix merge conflicts from new async TpuClient
* Remove `ConnectionCache::set_use_quic`
* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
* Spawn QUIC server to receive forwarded txs
* Update validator port range
* forward votes using UDP
* no forwarding from unstaked nodes
* forwarding stats in banking stage
* fix test builds
* fix lifetime of forward sender
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
* panic when test timeout
* nonblocking send when when droping banks
* debug log
* timeout for tvu
* unused varaible
* timeout for tpu
* Revert "debug log"
This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.
* add timeout const
* fix typo
* Revert "nonblocking send when when droping banks".
I will create another pull request for this.
This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/validator.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
* run validator_exit_test sequentially
* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests
* fix intermittent validator exit failure
* no sleep
* undo the code move
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.
--shred-storage <SHRED_STORAGE>
EXPERIMENTAL: Controls how RocksDB compacts shreds. *WARNING*: You will lose your ledger data
when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
disk-write-bytes of the ledger store. [default: level] [possible values: level, fifo]
--shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
268435456000]
Transaction logs are not being saved to the database through the plugin interface.
Summary of Changes
Retain the transaction logs when transaction notification plugin is loaded.
Fixes #
lijunwangs/solana-accountsdb-plugin-postgres#6
Tpu::new() now matches Tvu::new() in having struct to reduce argument
list. Additionally, Rust supports partial moves, so there is no need to
clone the Tvu sockets out of Node object.
The TransactionNotifierInterface interface for notifying transactions.
Changes to transaction_status_service to notify the notifier of the transaction data.
Interface to query the plugin's interest in transaction data
Problem
Slot status can be used of in other scenarios in addition to account information such as transactions, blocks. The current implementation is too tightly coupled.
Summary of Changes
Decouple the slot status notification from accounts notification. Created a new slot status notification module.
* Move test-validator to own module to reduce core dependencies
* Fix a few TestValidator paths
* Use solana_test_validator crate for solana_test_validator bin
* Move client int tests to separate crate
Co-authored-by: Tyera Eulberg <tyera@solana.com>
Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.
Support multiple plugins at the same time.
Now that CRDS supports incremental snapshot hashes,
SnapshotPackagerService needs to push 'em!
This commit does two main things:
1. SnapshotPackagerService now knows about incremental snapshot hashes,
and will push SnapshotPackage::IncrementalSnapshot hashes to CRDS.
2. At startup, when loading from a full + incremental snapshot, the
hashes need to be passed all the way to SnapshotPackagerService so it
can push these starting hashes to CRDS. Those values have been piped
through.
Fixes#20441 and #20423
* Add separate vote processing tpu port
* Add feature to send to tpu vote port
* Add vote rejecting sigverify mode
* use packet.meta.is_simple_vote_tx in place of deserialization
* consolidate code that identifies vote tx atcommon path for cpu and gpu
* new key for feature set
* banking forward tpu vote
* add tpu vote port to dockerfile and other review changes
* Simplify thread id compare
* fix a test; updated cluster_info ABI change
Co-authored-by: Tao Zhu <tao@solana.com>
Co-authored-by: sakridge <sakridge@gmail.com>
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
* reimplement rpc pubsub with a broadcast queue
* update tests for new pubsub implementation
* fix: fix review suggestions
* chore(rpc): add additional pubsub metrics
* integrate max subscriptions check into SubscriptionTracker to reduce locking
* separate subscription control from tracker
* limit memory usage of items in pubsub broadcast queue, improve error handling
* add more pubsub metrics
* add final count metrics to pubsub
* add metric for total number of subscriptions
* fix small review suggestions
* remove by_params from SubscriptionTracker and add node_progress_watchers map instead
* add subscription tracker tests
* add metrics for number of pubsub notifications as a counter
* ignore clippy lint in TokenCounter
* fix underflow in token counter
* reduce queue capacity in pubsub tests
* fix(rpc): fix test timeouts
* fix race in account subscription test
* Add RpcSubscriptions::new_for_tests
Co-authored-by: Pavel Strakhov <p.strakhov@iconic.vc>
Co-authored-by: Nikita Podoliako <n.podoliako@zubr.io>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
Add `--incremental-snapshots` flag to enable incremental snapshots.
This will allow setting `--full-snapshot-interval-slots` and
`--incremental-snapshot-interval-slots`.
Also added `--maximum-incremental-snapshots-to-retain`.
Co-authored-by: Michael Vines <mvines@gmail.com>
This is the 2nd installment for the AccountsDb replication.
Summary of Changes
The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.
The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.
Functionality used by both the client and server side are encapsulated in the replica-lib crate.
There is no impact to the existing validator by default.
Tests:
Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
AccountsBackgroundService now knows about incremental snapshots. It is
now also in charge of deciding if an AccountsPackage is destined to be a
SnapshotPackage or not (or just used by AccountsHashVerifier).
!!! New behavior changes !!!
Taking snapshots (both bank and archive) **MUST** succeed.
This is required because of how the last full snapshot slot is
calculated, which is used by AccountsBackgroundService when calling
`clean_accounts()`.
File system calls are now unwrapped and will result in a crash. As Trent told me:
>Well I think if a snapshot fails due to some IO error, it's very likely that the operator is going to have to intervene before it works. We should exit error in this case, otherwise the validator might happily spin for several more hours, never successfully writing a complete snapshot, before something else brings it down. This would leave the validator's last local snapshot many more slots behind than it would be had we exited outright and potentially force the operator to abandon ledger continuity in favor of a quick catchup
Other errors will set the `exit` flag to `true`, and the node will gracefully shutdown.
Fixes#19167Fixes#19168
#### Problem
Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:
```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```
#### Summary of Changes
For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.
Co-authored-by: Michael Vines <mvines@gmail.com>
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
* update ledger tool to restore cost model from blockstore when compute-slot-cost
* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool
* refactor and simplify a test
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`
* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup
* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417
Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.
In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.