There are operations in bank_fork_utils that may fail; we explicitly
call std::process::exit() on several of these. Granted we may end up
exiting the process higher up the callstack, bubbling the errors up
allow a caller that could handle the error to do so.
Currently, the file is generated when a node drops a block that was
produced by another node. However, it would also be beneficial to see
the account state when a node drops its' own block.
Output the file in this additional failure codepath
The Blockstore currently maintains a RwLock<Slot> of the maximum root
it has seen inserted. The value is initialized during
Blockstore::open() and updated during calls to Blockstore::set_roots().
The max root is queried fairly often for several use cases, and caching
the value is cheaper than constructing an iterator to look it up every
time.
However, the access patterns of these RwLock match that of an atomic.
That is, there is no critical section of code that is run while the
lock is head. Rather, read/write locks are acquired in order to read/
update, respectively. So, change the RwLock<u64> to an AtomicU64.
These services currently live in core/; however, they operate on the
ledger. Mores so, these two services operate on the blockstore only,
and not necessarily the entire ledger. So, it makes sense to move these
services out of core and into ledger. We've recently been doing similar
changes with breaking things out into individual crates in order to
reduce the scope of core.
So, this change moves the services from core/ to ledger/, and replaces
ledger with blockstore.
* Remove RWLock from EntryNotifier because it causes perf degradation when entry notifications are enabled on geyser
* remove unused RWLock
* Remove RWLock
* Initialize fork graph in program cache during bank_forks creation
* rename BankForks::new to BankForks::new_rw_arc
* fix compilation
* no need to set fork_graph on insert()
* fix partition tests
This macro is used a lot for tests to create a ledger path in order to
open a Blockstore. Files will be left on disk unless the test remembers
to call Blockstore::destroy() on the directory. So, instead of requiring
this, use the get_tmp_ledger_path_auto_delete!() macro that creates a
TempDir (which automatically deletes itself when it goes out of scope).
The commit implements lazy eviction for turbine QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
The commit implements lazy eviction for repair QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
Currently each outgoing repair request will attempt to establish a
connection if one does not already exist. This is very wasteful and
consumes many tokio tasks if the remote node is down or unresponsive.
The commit decouples routing packets from establishing connections by
adding a buffering channel for each remote address. Outgoing packets are
always sent down this channel to be processed once the connection is
established. If connecting attempt fails, all packets already pushed to
the channel are dropped at once, reducing the number of attempts to make
a connection if the remote node is down or unresponsive.
The current getHealth mechanism checks a local accounts hash slot vs.
those of other nodes as specified by --known-validator. This is a
very coarse comparison given that the default for this value is 100
slots. More so, any nodes using a value larger than the default
(ie --incremental-snapshot-interval 500) will likely see getHealth
return status behind at some point.
Change the underlying mechanism of how health is computed. Instead of
using the accounts hash slots published in gossip, use the latest
optimistically confirmed slot from the cluster. Even when a node is
behind, it is able to observe cluster optimistically confirmed by slots
by viewing votes published in gossip.
Thus, the latest cluster optimistically confirmed slot can be compared
against the latest optimistically confirmed bank from replay to
determine health. This new comparison is much more granular, and not
needing to depend on individual known validators is also a plus.
* Enable frozen_abi on banking trace file
* Fix ci with really correct bugfix...
* Remove tracker_callers
* Fix typo...
* Fix AbiExample for Arc/Rc's Weaks
* Added comment for AbiExample impl of SystemTime
* Simplify and document EvenAsOpaque with new usage
* Minor clean-ups
* Simplify SystemTime::example() with UNIX_EPOCH...
* Add comment for AbiExample subtleties
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.
* Update lock file.
* Fix linter errors.
* Fix depencies order.
* Update wen_restart explanation and small fixes.
* Generate tower outside tvu.
* Update validator/src/cli.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/protos/wen_restart.proto
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/build.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Rename proto directory.
* Rename InitRecord to MyLastVotedForkSlots, add imports.
* Update wen-restart/Cargo.toml
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Move prost-build dependency to project toml.
* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.
* Use 16k slots instead of 81k slots, a few more wording changes.
* Use AncestorIterator which does the same thing.
* Update Cargo.lock
* Update Cargo.lock
---------
Co-authored-by: Tyera <teulberg@gmail.com>
* Separate simple-vote transaction cost from non-vote transaction cost
* remove is_simple_vote flag from transaction UsageCostDetails
* update test and comment
* set static usage cost for SimpleVote transaction
* Move vote related code to its own crate
* Update imports in code and tests
* update programs/sbf/Cargo.lock
* fix check errors
* update abi_digest
* rebase fixes
* fixes after rebase
* Adds a module `address_lookup_table` to the SDK.
* Adds a module `address_lookup_table::instruction` to the SDK.
* Adds a module `address_lookup_table::error` to the SDK.
* Adds a module `address_lookup_table::state` to the SDK.
* Moves AddressLookupTable into SDK as well.
* Moves AddressLookupTableAccount into address_lookup_table.
* Adds deprecation messages.
* Disentangles dependencies across cargo files.
The commit implements server-side of repair using QUIC protocol.
UDP repair requests are adapted as RemoteRequest and sent down the same
channel as remote requests arriving over QUIC, and the rest of the
server code is update to process over RemoteRequest type.
Working towards using QUIC protocol for repair, the commit adds a QUIC
endpoint for repair service.
Outgoing local requests are sent as
struct LocalRequest {
remote_address: SocketAddr,
bytes: Vec<u8>,
num_expected_responses: usize,
response_sender: Sender<(SocketAddr, Vec<u8>)>,
}
to the client-side of the endpoint. The client opens a bidirectional
stream with the LocalRequest.remote_address and once received the
response, sends it down the LocalRequest.response_sender channel.
Incoming requests from remote nodes are received from bidirectional
streams and sent as
struct RemoteRequest {
remote_pubkey: Option<Pubkey>,
remote_address: SocketAddr,
bytes: Vec<u8>,
response_sender: Option<OneShotSender<Vec<Vec<u8>>>>,
}
to the repair-service. The response is received from the receiver end of
RemoteRequest.response_sender channel and send back to the remote node
using the send side of the bidirectional stream.
removes outdated matches crate from the dependencies
std::matches has been stable since rust 1.42.0.
Other use-cases are covered by assert_matches crate.
This function used to contain feature gate activation checks that
required access to a bank. Those checks have been cleaned up, so we no
longer need access to a full Bank. Rather, we can momentarily get a Bank
from BankForks, calculate the necessary results and then drop the Bank
along with the BankForks read lock.
* allow pedantic invalid cast lint
* allow lint with false-positive triggered by `test-case` crate
* nightly `fmt` correction
* adapt to rust layout changes
* remove dubious test
* Use transmute instead of pointer cast and de/ref when check_aligned is false.
* Renames clippy::integer_arithmetic to clippy::arithmetic_side_effects.
* bump rust nightly to 2023-08-25
* Upgrades Rust to 1.72.0
---------
Co-authored-by: Trent Nelson <trent@solana.com>
- BankForks is not an optional argument, so remove dated comment
- Given that BankForks is always present, no need for special values to
initialize variables before the loop
- Root slot can be retrieved from root bank, no need to call
BankForks::root() which will load the underlying atomic a second time
- Use BankForks::highest_slot() instead of .slot() on .working_bank() to
avoid the extra clone that .working_bank() performs
- Move several operations outside of BankForks read lock scope to
minimize lock time
* remove unnecessary hashes around raw string literals
* remove unncessary literal `unwrap()`s
* remove panicking `unwrap()`
* remove unnecessary `unwrap()`
* use `[]` instead of `vec![]` where applicable
* remove (more) unnecessary explicit `into_iter()` calls
* remove redundant pattern matching
* don't cast to same type and constness
* do not `cfg(any(...` a single item
* remove needless pass by `&mut`
* prefer `or_default()` to `or_insert_with(T::default())`
* `filter_map()` better written as `filter()`
* incorrect `PartialOrd` impl on `Ord` type
* replace "slow zero-filled `Vec` initializations"
* remove redundant local bindings
* add required lifetime to associated constant
* sdk: Add concurrent support for rand 0.7 and 0.8
* Update rand, rand_chacha, and getrandom versions
* Run command to replace `gen_range`
Run `git grep -l gen_range | xargs sed -i'' -e 's/gen_range(\(\S*\), /gen_range(\1../'
* sdk: Fix users of older `gen_range`
* Replace `hash::new_rand` with `hash::new_with_thread_rng`
Run:
```
git grep -l hash::new_rand | xargs sed -i'' -e 's/hash::new_rand([^)]*/hash::new_with_thread_rng(/'
```
* perf: Use `Keypair::new()` instead of `generate`
* Use older rand version in zk-token-sdk
* program-runtime: Inline random key generation
* bloom: Fix clippy warnings in tests
* streamer: Scope rng usage correctly
* perf: Fix clippy warning
* accounts-db: Map to char to generate a random string
* Remove `from_secret_key_bytes`, it's just `keypair_from_seed`
* ledger: Generate keypairs by hand
* ed25519-tests: Use new rand
* runtime: Use new rand in all tests
* gossip: Clean up clippy and inline keypair generators
* core: Inline keypair generation for tests
* Push sbf lockfile change
* sdk: Sort dependencies correctly
* Remove `hash::new_with_thread_rng`, use `Hash::new_unique()`
* Use Keypair::new where chacha isn't used
* sdk: Fix build by marking rand 0.7 optional
* Hardcode secret key length, add static assertion
* Unify `getrandom` crate usage to fix linking errors
* bloom: Fix tests that require a random hash
* Remove some dependencies, try to unify others
* Remove unnecessary uses of rand and rand_core
* Update lockfiles
* Add back some dependencies to reduce rebuilds
* Increase max rebuilds from 14 to 15
* frozen-abi: Remove `getrandom`
* Bump rebuilds to 17
* Remove getrandom from zk-token-proof
In most cases, either a &Bank or an Arc<Bank> is more proper.
- &Bank is used if the function only needs a momentary reference
- Arc<Bank> is used if the function needs its' own copy
This PR leaves several instances of &Arc<Bank> around; these instances
are situations where a clone may only happen conditionally.
When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.
This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag
When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
version
- Diff the files, they are pretty-printed json
Some of the cleanup tasks include ...
- Make subfunctions return a Result and allow error handling above
- Add some clarifying comments
- Give backup directory name a more meaningful name
- Add some additional logs (with timing info) for long running parts
The existing signature unpacked elements from a Shred and took an owned
Vec<u8>, forcing a .clone() from the caller. The Shred can be passed in
directly to simplify argument list and avoid the clone.
* separates out turbine QUIC from TPU implementation
Turbine being tied to QUIC implementation for TPU hinders development
and makes it hard to optimize QUIC specifically for turbine.
The commit separates out turbine QUIC from TPU implementation.
* Update core/src/validator.rs
Co-authored-by: Jon Cinque <me@jonc.dev>
* Update turbine/src/retransmit_stage.rs
Co-authored-by: Jon Cinque <me@jonc.dev>
---------
Co-authored-by: Jon Cinque <me@jonc.dev>
* Move CostModel and CostTracker to its own crate
* compile new crate and update imports
* update sbf Cargo.lock
* fix AbiExample
* fix cargo sort
* Fix AbiExample
The optional args allow reuse by ledger-tool repair roots command Also,
hold cleanup lock for duration of Blockstore::scan_and_fix_roots().
This prevents a scenario where scan_and_fix_roots() could identify a
slot as needing to be marked root, that slot getting cleaned by
LedgerCleanupService, and then scan_and_fix_roots() marking the slot as
root on the now purged slot.