* Initialize fork graph in program cache during bank_forks creation
* rename BankForks::new to BankForks::new_rw_arc
* fix compilation
* no need to set fork_graph on insert()
* fix partition tests
The commit implements lazy eviction for turbine QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
The commit implements lazy eviction for repair QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
The current getHealth mechanism checks a local accounts hash slot vs.
those of other nodes as specified by --known-validator. This is a
very coarse comparison given that the default for this value is 100
slots. More so, any nodes using a value larger than the default
(ie --incremental-snapshot-interval 500) will likely see getHealth
return status behind at some point.
Change the underlying mechanism of how health is computed. Instead of
using the accounts hash slots published in gossip, use the latest
optimistically confirmed slot from the cluster. Even when a node is
behind, it is able to observe cluster optimistically confirmed by slots
by viewing votes published in gossip.
Thus, the latest cluster optimistically confirmed slot can be compared
against the latest optimistically confirmed bank from replay to
determine health. This new comparison is much more granular, and not
needing to depend on individual known validators is also a plus.
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.
* Update lock file.
* Fix linter errors.
* Fix depencies order.
* Update wen_restart explanation and small fixes.
* Generate tower outside tvu.
* Update validator/src/cli.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/protos/wen_restart.proto
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/build.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Rename proto directory.
* Rename InitRecord to MyLastVotedForkSlots, add imports.
* Update wen-restart/Cargo.toml
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Move prost-build dependency to project toml.
* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.
* Use 16k slots instead of 81k slots, a few more wording changes.
* Use AncestorIterator which does the same thing.
* Update Cargo.lock
* Update Cargo.lock
---------
Co-authored-by: Tyera <teulberg@gmail.com>
The commit implements server-side of repair using QUIC protocol.
UDP repair requests are adapted as RemoteRequest and sent down the same
channel as remote requests arriving over QUIC, and the rest of the
server code is update to process over RemoteRequest type.
In most cases, either a &Bank or an Arc<Bank> is more proper.
- &Bank is used if the function only needs a momentary reference
- Arc<Bank> is used if the function needs its' own copy
This PR leaves several instances of &Arc<Bank> around; these instances
are situations where a clone may only happen conditionally.
Some of the cleanup tasks include ...
- Make subfunctions return a Result and allow error handling above
- Add some clarifying comments
- Give backup directory name a more meaningful name
- Add some additional logs (with timing info) for long running parts
* separates out turbine QUIC from TPU implementation
Turbine being tied to QUIC implementation for TPU hinders development
and makes it hard to optimize QUIC specifically for turbine.
The commit separates out turbine QUIC from TPU implementation.
* Update core/src/validator.rs
Co-authored-by: Jon Cinque <me@jonc.dev>
* Update turbine/src/retransmit_stage.rs
Co-authored-by: Jon Cinque <me@jonc.dev>
---------
Co-authored-by: Jon Cinque <me@jonc.dev>
The optional args allow reuse by ledger-tool repair roots command Also,
hold cleanup lock for duration of Blockstore::scan_and_fix_roots().
This prevents a scenario where scan_and_fix_roots() could identify a
slot as needing to be marked root, that slot getting cleaned by
LedgerCleanupService, and then scan_and_fix_roots() marking the slot as
root on the now purged slot.
Slot::MAX was used to specify that a type of snapshots should not be
created; define a constant to be that value and reference the constant
to have a single point of edit.
* Restrict access to Bank's HardForks
Callers could previously obtain a a lock to read/write HardForks from
any Bank. This would allow any caller to modify, and creates the
opportunity for inconsistent handling of what is considered a valid hard
fork (ie too old).
This PR adds a function to Bank so consistent sanity checks can be
applied; the caller will already have a Bank as that is where they would
have obtained the HardForks from in the first place. Additionally,
change the getter to return a copy of HardForks (simple Vec).
* Allow hard fork at bank slot if bank is not yet frozen
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it. Just adds an extra layer of
indirection.
Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.
This change focuses specifically on `Arc<AtomicBool>`. There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner. But it would make the change even larger.
The callstack updated in this PR passed an &Arc<...> down only to have
the bottom level clone the reference. Thus, we are giving shared
ownership so the reference is a bit redundant and arguably obscures the
intention to clone further down the callstack.
* Move entry_notifier_interface
* Add EntryNotifierService
* Use descriptive struct in sender/receiver
* Optionally initialize EntryNotifierService in validator
* Plumb EntryNotfierSender into Tvu, blockstore_processor
* Plumb EntryNotfierSender into Tpu
* Only return one option when constructing EntryNotifierService