* Fixed the local-cluster test case test_optimistic_confirmation_violation_detection, which is an existing test issue exposed when we disable the udp based tpu port. In the test, when we restart the node, we restarted the entry point validator. And when the gossip port of the restarted node changes, the two nodes are not able to talk to each other via gossip. I have changed the restart the logic to make the second validator the heavier node and restart it instead of the entypoint one (the first node).
* Addressed a review comment from Carl
* Fix test_bench_tps_local_cluster_solana
* Remove #[ignore] annotations from dos tests (which are also fixed by this change)
* Remove #[ignore] annotations from local cluster tests (which are also fixed by this change)
* Keypair: implement clone()
This was not implemented upstream in ed25519-dalek to force everyone to
think twice before creating another copy of a potentially sensitive
private key in memory.
See https://github.com/dalek-cryptography/ed25519-dalek/issues/76
However, there are now 9 instances of
Keypair::from_bytes(&keypair.to_bytes())
in the solana codebase and it would be preferable to have a function.
In particular since this also comes up when writing programs and can
cause users to either start messing with lifetimes or discover the
from_bytes() workaround themselves.
This patch opts to not implement the Clone trait. This avoids automatic
use in order to preserve some of the original "let developers think
twice about this" intention.
* Use Keypair::clone
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
* Make sure to root local slots even with hard fork
* Address review comments
* Cleanup a bit
* Further clean up
* Further clean up a bit
* Add comment
* Tweak hard fork reconciliation code placement
* client: Remove static connection cache, plumb it instead
* Add TpuClient::new_with_connection_cache to not break downstream
* Refactor get_connection and RwLock into ConnectionCache
* Fix merge conflicts from new async TpuClient
* Remove `ConnectionCache::set_use_quic`
* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
Refactor the thin_client::create_client to take addresses separately instead of as a tuple
Co-authored-by: Bijie Zhu <bijiezhu@Bijies-MBP.cable.rcn.com>
* Add quic-client module to send transactions via quic, abstracted behind the TpuConnection trait (along with a legacy UDP implementation of TpuConnection) and change thin-client to use TpuConnection
* Add feature gate for new vote instruction and plumb through replay
Add tower versions
* Add check for slot hashes history
* Update is_recent check to exclude voting on hard fork root slot
* Move tower rollback test to flaky and ignore it until #22551 lands
#### Problem
Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:
```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```
#### Summary of Changes
For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.
Co-authored-by: Michael Vines <mvines@gmail.com>
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
Convert to use type alias for the callback and cascade the changes to callers. Thanks @jeffwashington for the help making it possible.
Changed the closure for the progress update in the validator main to FnMut and modify the abort count in the closure which is more reliable.
* Move gossip modules to solana-gossip
* Update Protocol abi digest due to move
* Move gossip benches and hook up CI
* Remove unneeded Result entries
* Single use statements
1. Allow the validator bootstrap code to specify the minimal snapshot download speed. If the snapshot download speed is detected below that, a different RPC can be retried. The default is 10MB/sec.
2. To prevent spinning on a number of sub-optimal choices and not making progress, the abort/retry logic is implemented with the following safe guards:
2.1 at maximum we do this retry for 5 times -- this number is configurable with default 5.
2.2 if the download in one notification round (5 second) is more than 2%, do not do retry -- it is not as bad anyway.
2.3 if the remaining estimate time is less than 1 minutes, do not abort retry as it will be done quickly anyway.
2.4 We do this abort/retry logic only at the first notification to avoid wasting download efforts -- the reasoning is being opportunistic and too greedy may not achieve overall shorter download time.
3. The download_snapshot and download_file is modified with the option allowing caller to notified of download progress via a callback. This allows the business logic of retrying to the place it belongs.
* purge_old_snapshot_archives is changed to take an extra argument 'maximum_snapshots_to_retain' to control the max number of latest snapshot archives to retain. Note the oldest snapshot is always retained as before and is not subjected to this new options.
* The validator and ledger-tool executables are modified with a CLI argument --maximum-snapshots-to-retain. And the options are propagated down the call chains. Their corresponding shell scripts were changed accordingly.
* SnapshotConfig is modified to have an extra field for the maximum_snapshots_to_retain
* Unit tests are developed to cover purge_old_snapshot_archives