Refactor the thin_client::create_client to take addresses separately instead of as a tuple
Co-authored-by: Bijie Zhu <bijiezhu@Bijies-MBP.cable.rcn.com>
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
* Add quic-client module to send transactions via quic, abstracted behind the TpuConnection trait (along with a legacy UDP implementation of TpuConnection) and change thin-client to use TpuConnection
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.
--shred-storage <SHRED_STORAGE>
EXPERIMENTAL: Controls how RocksDB compacts shreds. *WARNING*: You will lose your ledger data
when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
disk-write-bytes of the ledger store. [default: level] [possible values: level, fifo]
--shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
268435456000]
* Add feature gate for new vote instruction and plumb through replay
Add tower versions
* Add check for slot hashes history
* Update is_recent check to exclude voting on hard fork root slot
* Move tower rollback test to flaky and ignore it until #22551 lands
Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.
Support multiple plugins at the same time.
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
This is the 2nd installment for the AccountsDb replication.
Summary of Changes
The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.
The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.
Functionality used by both the client and server side are encapsulated in the replica-lib crate.
There is no impact to the existing validator by default.
Tests:
Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
#### Problem
Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:
```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```
#### Summary of Changes
For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.
Co-authored-by: Michael Vines <mvines@gmail.com>
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true
Fixes#17544
* Create solana-poh crate
* Move BigTableUploadService to solana-ledger
* Add solana-rpc to workspace
* Move dependencies to solana-rpc
* Move remaining rpc modules to solana-rpc
* Single use statement solana-poh
* Single use statement solana-rpc
Convert to use type alias for the callback and cascade the changes to callers. Thanks @jeffwashington for the help making it possible.
Changed the closure for the progress update in the validator main to FnMut and modify the abort count in the closure which is more reliable.
* Move gossip modules to solana-gossip
* Update Protocol abi digest due to move
* Move gossip benches and hook up CI
* Remove unneeded Result entries
* Single use statements
1. Allow the validator bootstrap code to specify the minimal snapshot download speed. If the snapshot download speed is detected below that, a different RPC can be retried. The default is 10MB/sec.
2. To prevent spinning on a number of sub-optimal choices and not making progress, the abort/retry logic is implemented with the following safe guards:
2.1 at maximum we do this retry for 5 times -- this number is configurable with default 5.
2.2 if the download in one notification round (5 second) is more than 2%, do not do retry -- it is not as bad anyway.
2.3 if the remaining estimate time is less than 1 minutes, do not abort retry as it will be done quickly anyway.
2.4 We do this abort/retry logic only at the first notification to avoid wasting download efforts -- the reasoning is being opportunistic and too greedy may not achieve overall shorter download time.
3. The download_snapshot and download_file is modified with the option allowing caller to notified of download progress via a callback. This allows the business logic of retrying to the place it belongs.
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script
* upgrade to Rust 1.52.1
* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
* purge_old_snapshot_archives is changed to take an extra argument 'maximum_snapshots_to_retain' to control the max number of latest snapshot archives to retain. Note the oldest snapshot is always retained as before and is not subjected to this new options.
* The validator and ledger-tool executables are modified with a CLI argument --maximum-snapshots-to-retain. And the options are propagated down the call chains. Their corresponding shell scripts were changed accordingly.
* SnapshotConfig is modified to have an extra field for the maximum_snapshots_to_retain
* Unit tests are developed to cover purge_old_snapshot_archives
* Deprecate commitment variants
* Add new CommitmentConfig builders
* Add helpers to avoid allowing deprecated variants
* Remove deprecated transaction-status code
* Include new commitment variants in runtime commitment; allow deprecated as long as old variants persist
* Remove deprecated banks code
* Remove deprecated variants in core; allow deprecated in rpc/rpc-subscriptions for now
* Heavier hand with rpc/rpc-subscription commitment
* Remove deprecated variants from local-cluster
* Remove deprecated variants from various tools
* Remove deprecated variants from validator
* Update docs
* Remove deprecated client code
* Add new variants to cli; remove deprecated variants as possible
* Don't send new commitment variants to old clusters
* Retain deprecated method in test_validator_saves_tower
* Fix clippy matches! suggestion for BPF solana-sdk legacy compile test
* Refactor node version check to handle commitment variants and transaction encoding
* Hide deprecated variants from cli help
* Add cli App comments
* Discard pre hard fork persisted tower if hard-forking
* Relax config.require_tower
* Add cluster test
* nits
* Remove unnecessary check
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
Co-authored-by: Carl Lin <carl@solana.com>
* Fix tower/blockstore unsync due to external causes
* Add and clean up long comments
* Clean up test
* Comment about warped_slot_history
* Run test_future_tower with master-only/master-slave
* Update comments about false leader condition
* Follow up to persistent tower
* Ignore for now...
* Hard-code validator identities for easy reasoning
* Add a test for opt. conf violation without tower
* Fix compile with rust < 1.47
* Remove unused method
* More move of assert tweak to the asser pr
* Add comments
* Clean up
* Clean the test addressing various review comments
* Clean up a bit
* Save/restore Tower
* Avoid unwrap()
* Rebase cleanups
* Forcibly pass test
* Correct reconcilation of votes after validator resume
* d b g
* Add more tests
* fsync and fix test
* Add test
* Fix fmt
* Debug
* Fix tests...
* save
* Clarify error message and code cleaning around it
* Move most of code out of tower save hot codepath
* Proper comment for the lack of fsync on tower
* Clean up
* Clean up
* Simpler type alias
* Manage tower-restored ancestor slots without banks
* Add comment
* Extract long code blocks...
* Add comment
* Simplify returned tuple...
* Tweak too aggresive log
* Fix typo...
* Add test
* Update comment
* Improve test to require non-empty stray restored slots
* Measure tower save and dump all tower contents
* Log adjust and add threshold related assertions
* cleanup adjust
* Properly lower stray restored slots priority...
* Rust fmt
* Fix test....
* Clarify comments a bit and add TowerError::TooNew
* Further clean-up arround TowerError
* Truly create ancestors by excluding last vote slot
* Add comment for stray_restored_slots
* Add comment for stray_restored_slots
* Use BTreeSet
* Consider root_slot into post-replay adjustment
* Tweak logging
* Add test for stray_restored_ancestors
* Reorder some code
* Better names for unit tests
* Add frozen_abi to SavedTower
* Fold long lines
* Tweak stray ancestors and too old slot history
* Re-adjust error conditon of too old slot history
* Test normal ancestors is checked before stray ones
* Fix conflict, update tests, adjust behavior a bit
* Fix test
* Address review comments
* Last touch!
* Immediately after creating cleaning pr
* Revert stray slots
* Revert comment...
* Report error as metrics
* Revert not to panic! and ignore unfixable test...
* Normalize lockouts.root_slot more strictly
* Add comments for panic! and more assertions
* Proper initialize root without vote account
* Clarify code and comments based on review feedback
* Fix rebase
* Further simplify based on assured tower root
* Reorder code for more readability
Co-authored-by: Michael Vines <mvines@gmail.com>
* Use untagged RpcSignatureResult enum to avoid breaking downstream consumers of current signature subscriptions
* Clean up client duplication
* Clippy
* Implement Debug for MessageProcessor
* Switch from delta-based gating to whole-set gating
* Remove dbg!
* Fix clippy
* Clippy
* Add test
* add loader to stable operating mode at proper epoch
* refresh_programs_and_inflation after ancestor setup
* Callback via snapshot; avoid account re-add; Debug
* Fix test
* Fix test and fix the past history
* Make callback management stricter and cleaner
* Fix test
* Test overwrite and frozen for native programs
* Test epoch callback with genesis-programs
* Add assertions for parent bank
* Add tests and some minor cleaning
* Remove unsteady assertion...
* Fix test...
* Fix DOS
* Skip ensuring account by dual (whole/delta) gating
* Fix frozen abi implementation...
* Move compute budget constatnt init back into bank
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>