* Restrict access to Bank's HardForks
Callers could previously obtain a a lock to read/write HardForks from
any Bank. This would allow any caller to modify, and creates the
opportunity for inconsistent handling of what is considered a valid hard
fork (ie too old).
This PR adds a function to Bank so consistent sanity checks can be
applied; the caller will already have a Bank as that is where they would
have obtained the HardForks from in the first place. Additionally,
change the getter to return a copy of HardForks (simple Vec).
* Allow hard fork at bank slot if bank is not yet frozen
The core/src/ directory is already pretty crowded, and moving these
items into the subdirectory more clearly identifies that they are tied
to banking_stage.
* Add TpuEntryNotifier to send EntryNotifications from Tpu
* Optionally run TpuEntryNotifier to send out EntrySummarys alongside BroadcastStage messages
* Track entry index in TpuEntryNotifier
* Allow for leader slots that switch forks
* Exit if broadcast send fails
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it. Just adds an extra layer of
indirection.
Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.
This change focuses specifically on `Arc<AtomicBool>`. There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner. But it would make the change even larger.
The callstack updated in this PR passed an &Arc<...> down only to have
the bottom level clone the reference. Thus, we are giving shared
ownership so the reference is a bit redundant and arguably obscures the
intention to clone further down the callstack.
* Notify replay of pruned duplicate confirmed slots
* Ingest replay signal and run ancestor hashes for pruned
* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service
* Add local-cluster test
* pass include_slot_in_hash through hash calcs to allow rehashing
* tests use each include_slot_in_hash value
* move include_slot_in_hash
* typo
* reorder struct init
* spelling is hard
* Move entry_notifier_interface
* Add EntryNotifierService
* Use descriptive struct in sender/receiver
* Optionally initialize EntryNotifierService in validator
* Plumb EntryNotfierSender into Tvu, blockstore_processor
* Plumb EntryNotfierSender into Tpu
* Only return one option when constructing EntryNotifierService
Counters incur additional overhead in sending points to the MetricsAgent
over a crossbeam channel. Additionally, some of these counters would be
submitted by non-voting nodes which is just extra overhead and noise.
This change condenses several updates of a counter into a field of the
existing BankingStageStats metrics struct.
replay_stage-voted_empty_bank has been converted into a datapoint that
now includes slot number. replay_stage-replay_transactions has been
removed altogether as we can get similar information on a per-slot basis
from replay-slot-stats metric.
* Fixed missing Root notifications via geyser plugin framework
* Renamed a variable
* fmt issue
* Do not try the loop if no subscribers.
* Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions
* clippy issue
* Address some reviewing findings
* Addressed some feedback from Carl
* fix a clippy issue
* Added comments on optimistically_confirmed_bank_tracker module to explain the workflow
* Addressed Trent's review
* Remove snapshot_links
* Change the function name from snapshot_dir to bank_snapshot_dir
* Format fix
* Fix test_concurrent_snapshot_packaging
* Fix clippy error
* Fix nits
* Fix nits 2nd try
* Use get_bank_snapshots_dir
* Use slot_dir
* Revert "Use get_bank_snapshots_dir" because get_bank_snapshots_dir is private to crate
This reverts commit 1ed9b3b2c8e84689a918beee7159f63c56500a96.
* Enforce used_underscore_binding
* Fix all
* Work around for cfg()-ed code...
* ci....
* Make clipply fixes more pleasant
* Clone exit signal while intentionally shadowing
* Use more verbose code to avoid any #[allow(...)]s
* AHV processes the snapshot dirs in place
Let account pacakge use the snapshot dir, so AHV computes the accounts hash and turns the pre snapshot dir into a post snapshot dir
* fix status cache path to maintain the archive layout for the in-place snapshot dir archiving
* fix test_package_snapshots
* Fix test_concurrent_snapshot_packaging
* Remove debug change.
* Fix snapshot_links path
* change to borrow for bank_snapshots_dir
* Reverted changes in create_and_verify_snapshot
* Fix param errors
* Fix rebase errors
* Remove NOTE 1
* Remove unwrap
* Remove the variables to make it apparent taht snapshot_links is the bank_snapshots_dir
* Use soft link instead of hard link for snapshot and status cache
* After switching to soft symlinking, the src path should be absolute
When ReplayStage repeatedly fails to compute the correct for a block
after purging and repairing, it panics on the assumption that something
is very wrong and will require human intervention.
If this is the case, there is typically something to be debugged, and
having the slot available locally is valuable. This change does the
retry check that will panic before purging the failure slot.
Working towards LegacyContactInfo => ContactInfo migration, the commit
hides some implementation details of LegacyContactInfo and expands API
parity with the new ContactInfo.
* Fix bug where ReplayStage holds an Arc<Bank> for process lifetime
When ReplayStage::new() kicks off, it needs to do some setup with the
working bank prior to entering the main processing loop. This setup is
done before entering the main processing loop; however, a bug made it
such that an Arc<Bank> remained in scope after the processing loop had
been entered. The processing loop is only exited when the process exits,
so this means that Bank was being held for the lifetime of the process.
This is a waste of resources and prevents background cleanup.
* clippy
Add new vote state version that replaces Lockout with LandedVote to allow vote latency to be tracked in a future change.
Includes a feature to be enabled which will when enabled cause the vote state to be written in the new form.
* Add a type_select param for purge_old_bank_snapshots
* Use flags to make the function calls more readable
* Remove the extra purge calls
* replace select_type with filter_by_type
* Add test
* Use matches
* Fix CI test on reference
* use match and call do_purge once
* Let bank_snapshots_dir be TempDir
* remove account_paths in the test
* replace bank with _bank
* Remove create_snapshot_dirs_for_tests, will take the lastest from master
* Fix merge errors
* Let AHV hold and update last_snapshot_storages
* Clean up comment
* Move cloning after enqueued_time
* Minor positon change
* Remove type last_snapshot_storages annotation
* refactor out fault hash inject output AccountsHashVerifier
* refactor faught injector out of AccountHashVerifier
* use type alias
* Apply suggestions from code review
Co-authored-by: Brooks <brooks@prumo.org>
* move type alias
* rename
---------
Co-authored-by: Brooks <brooks@prumo.org>
Measure total time spent inside `confirm_slot_entries()`. Useful metric
in addition to `replay_elapsed` and
`poh_verify_elapsed`/`transaction_verify_elapsed`, as it shows how PoH
and transaction verification interact with the replay process.
Extracted time metrics related to transaction execution into a separate
structure. This allows me to call `process_entries_with_callback()`
without locking the whole instance of `ConfirmationTiming`, passing just
the `BatchExecutionTiming` part.
I want to add a new metric that starts at the beginning of the
`confirm_slot_entries()` call and ends until the very end. In order to
use a `scopeguard::defer`, I need to be able to have an excursive
reference to it for the whole body of `confirm_slot_entries()`.
Plus a few minor renamings to clarify which verifications and results
variables actually store. And corrected a few messages, that
incorrectly stated PoH verification, while they were actually issued
for transaction verification failures.