* Initialize fork graph in program cache during bank_forks creation
* rename BankForks::new to BankForks::new_rw_arc
* fix compilation
* no need to set fork_graph on insert()
* fix partition tests
This macro is used a lot for tests to create a ledger path in order to
open a Blockstore. Files will be left on disk unless the test remembers
to call Blockstore::destroy() on the directory. So, instead of requiring
this, use the get_tmp_ledger_path_auto_delete!() macro that creates a
TempDir (which automatically deletes itself when it goes out of scope).
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.
* Update lock file.
* Fix linter errors.
* Fix depencies order.
* Update wen_restart explanation and small fixes.
* Generate tower outside tvu.
* Update validator/src/cli.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/protos/wen_restart.proto
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/build.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Rename proto directory.
* Rename InitRecord to MyLastVotedForkSlots, add imports.
* Update wen-restart/Cargo.toml
Co-authored-by: Tyera <teulberg@gmail.com>
* Update wen-restart/src/wen_restart.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Move prost-build dependency to project toml.
* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.
* Use 16k slots instead of 81k slots, a few more wording changes.
* Use AncestorIterator which does the same thing.
* Update Cargo.lock
* Update Cargo.lock
---------
Co-authored-by: Tyera <teulberg@gmail.com>
* Move vote related code to its own crate
* Update imports in code and tests
* update programs/sbf/Cargo.lock
* fix check errors
* update abi_digest
* rebase fixes
* fixes after rebase
* remove unnecessary hashes around raw string literals
* remove unncessary literal `unwrap()`s
* remove panicking `unwrap()`
* remove unnecessary `unwrap()`
* use `[]` instead of `vec![]` where applicable
* remove (more) unnecessary explicit `into_iter()` calls
* remove redundant pattern matching
* don't cast to same type and constness
* do not `cfg(any(...` a single item
* remove needless pass by `&mut`
* prefer `or_default()` to `or_insert_with(T::default())`
* `filter_map()` better written as `filter()`
* incorrect `PartialOrd` impl on `Ord` type
* replace "slow zero-filled `Vec` initializations"
* remove redundant local bindings
* add required lifetime to associated constant
In most cases, either a &Bank or an Arc<Bank> is more proper.
- &Bank is used if the function only needs a momentary reference
- Arc<Bank> is used if the function needs its' own copy
This PR leaves several instances of &Arc<Bank> around; these instances
are situations where a clone may only happen conditionally.
When a consensus divergance occurs, the current workflow involves a
handful of manual steps to hone in on the offending slot and
transaction. This process isn't overly difficult to execute; however, it
is tedious and currently involves creating and parsing logs.
This change introduces functionality to output a debug file that
contains the components go into the bank hash. The file can be generated
in two ways:
- Via solana-validator when the node realizes it has diverged
- Via solana-ledger-tool verify by passing a flag
When a divergance occurs now, the steps to debug would be:
- Grab the file from the node that diverged
- Generate a file for the same slot with ledger-tool with a known good
version
- Diff the files, they are pretty-printed json
* When there are too many pubkeys in one slot, kick the one with lowest
stake out.
* Cache last_root to reduce read locks we need.
* Use slots_in_epoch to limit number of slots in the map.
* Fix lint errors.
* Only cache stake and slots per epoch once per epoch.
* Revert "Only cache stake and slots per epoch once per epoch."
This reverts commit 8658aad0083456794b4c4403adaf9c74d1a71d09.
* Vote at the tip of current fork if last vote is outside SlotHash
of the tip and last vote expired.
* Add unittest when last vote is outside slothash, we should vote at the tip
of the current fork.
* Revert "Use slots_in_epoch to limit number of slots in the map."
This reverts commit 93574f57a48d2a70fbbc0f62fa8810d3b6bee0af.
* Revert "Cache last_root to reduce read locks we need."
This reverts commit bb114ec2b62cb9c0207328b19c415f6116be0f1c.
* Revert "When there are too many pubkeys in one slot, kick the one with lowest"
This reverts commit 711e29a6a025fd4f11fbc97dcbbe90e4832be04c.
* Move new vote generation when last vote is outside slothash into the
main path, this actually makes more sense since we don't select where
to vote in two different places, and all the vote generation logic
is seamlessly inherited.
* - Move vote refresh to be behind select vote and do not refresh vote if a new
vote is selected.
- Check whether last vote is inside slothash inside select_vote_and_reset_forks
- rename slot_within_slothash to is_in_slothashes_history
- remove one unittest for now, more tests will be added in a separate CL
* Remove new test, it will be in another file.
* Add is_in_slot_hashes_history test in the new file.
* Add unittest for the case when last vote is outside slot hashes.
* Small improvements and more unittests.
* Fix bad merge.
* Update docs/src/terminology.md
Co-authored-by: mvines <mvines@gmail.com>
* Put SwitchForkDecision::FailedSwitchThreshold logic into separate function.
* Make linter happy.
---------
Co-authored-by: mvines <mvines@gmail.com>
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it. Just adds an extra layer of
indirection.
Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.
This change focuses specifically on `Arc<AtomicBool>`. There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner. But it would make the change even larger.
* Notify replay of pruned duplicate confirmed slots
* Ingest replay signal and run ancestor hashes for pruned
* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service
* Add local-cluster test
* Move entry_notifier_interface
* Add EntryNotifierService
* Use descriptive struct in sender/receiver
* Optionally initialize EntryNotifierService in validator
* Plumb EntryNotfierSender into Tvu, blockstore_processor
* Plumb EntryNotfierSender into Tpu
* Only return one option when constructing EntryNotifierService
replay_stage-voted_empty_bank has been converted into a datapoint that
now includes slot number. replay_stage-replay_transactions has been
removed altogether as we can get similar information on a per-slot basis
from replay-slot-stats metric.
* Fixed missing Root notifications via geyser plugin framework
* Renamed a variable
* fmt issue
* Do not try the loop if no subscribers.
* Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions
* clippy issue
* Address some reviewing findings
* Addressed some feedback from Carl
* fix a clippy issue
* Added comments on optimistically_confirmed_bank_tracker module to explain the workflow
* Addressed Trent's review
When ReplayStage repeatedly fails to compute the correct for a block
after purging and repairing, it panics on the assumption that something
is very wrong and will require human intervention.
If this is the case, there is typically something to be debugged, and
having the slot available locally is valuable. This change does the
retry check that will panic before purging the failure slot.
* Fix bug where ReplayStage holds an Arc<Bank> for process lifetime
When ReplayStage::new() kicks off, it needs to do some setup with the
working bank prior to entering the main processing loop. This setup is
done before entering the main processing loop; however, a bug made it
such that an Arc<Bank> remained in scope after the processing loop had
been entered. The processing loop is only exited when the process exits,
so this means that Bank was being held for the lifetime of the process.
This is a waste of resources and prevents background cleanup.
* clippy
Extracted time metrics related to transaction execution into a separate
structure. This allows me to call `process_entries_with_callback()`
without locking the whole instance of `ConfirmationTiming`, passing just
the `BatchExecutionTiming` part.
I want to add a new metric that starts at the beginning of the
`confirm_slot_entries()` call and ends until the very end. In order to
use a `scopeguard::defer`, I need to be able to have an excursive
reference to it for the whole body of `confirm_slot_entries()`.
Plus a few minor renamings to clarify which verifications and results
variables actually store. And corrected a few messages, that
incorrectly stated PoH verification, while they were actually issued
for transaction verification failures.
* Add RewardsMessage enum
* Cache and update max_complete_rewards_slot
* Plumb max_complete_rewards_slot into JsonRpcRequestProcesseor
* Use max_complete_rewards_slot to check get_block requests
* Use max_complete_rewards_slot to limit Bigtable uploads
* Plumb max_complete_rewards_slot into RpcSubscriptions
* Use max_complete_rewards_slot to limit block subscriptions
* Nit: fix test