* When there are too many pubkeys in one slot, kick the one with lowest
stake out.
* Cache last_root to reduce read locks we need.
* Use slots_in_epoch to limit number of slots in the map.
* Fix lint errors.
* Only cache stake and slots per epoch once per epoch.
* Revert "Only cache stake and slots per epoch once per epoch."
This reverts commit 8658aad0083456794b4c4403adaf9c74d1a71d09.
* Vote at the tip of current fork if last vote is outside SlotHash
of the tip and last vote expired.
* Add unittest when last vote is outside slothash, we should vote at the tip
of the current fork.
* Revert "Use slots_in_epoch to limit number of slots in the map."
This reverts commit 93574f57a48d2a70fbbc0f62fa8810d3b6bee0af.
* Revert "Cache last_root to reduce read locks we need."
This reverts commit bb114ec2b62cb9c0207328b19c415f6116be0f1c.
* Revert "When there are too many pubkeys in one slot, kick the one with lowest"
This reverts commit 711e29a6a025fd4f11fbc97dcbbe90e4832be04c.
* Move new vote generation when last vote is outside slothash into the
main path, this actually makes more sense since we don't select where
to vote in two different places, and all the vote generation logic
is seamlessly inherited.
* - Move vote refresh to be behind select vote and do not refresh vote if a new
vote is selected.
- Check whether last vote is inside slothash inside select_vote_and_reset_forks
- rename slot_within_slothash to is_in_slothashes_history
- remove one unittest for now, more tests will be added in a separate CL
* Remove new test, it will be in another file.
* Add is_in_slot_hashes_history test in the new file.
* Add unittest for the case when last vote is outside slot hashes.
* Small improvements and more unittests.
* Fix bad merge.
* Update docs/src/terminology.md
Co-authored-by: mvines <mvines@gmail.com>
* Put SwitchForkDecision::FailedSwitchThreshold logic into separate function.
* Make linter happy.
---------
Co-authored-by: mvines <mvines@gmail.com>
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it. Just adds an extra layer of
indirection.
Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.
This change focuses specifically on `Arc<AtomicBool>`. There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner. But it would make the change even larger.
* Notify replay of pruned duplicate confirmed slots
* Ingest replay signal and run ancestor hashes for pruned
* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service
* Add local-cluster test
* Move entry_notifier_interface
* Add EntryNotifierService
* Use descriptive struct in sender/receiver
* Optionally initialize EntryNotifierService in validator
* Plumb EntryNotfierSender into Tvu, blockstore_processor
* Plumb EntryNotfierSender into Tpu
* Only return one option when constructing EntryNotifierService
replay_stage-voted_empty_bank has been converted into a datapoint that
now includes slot number. replay_stage-replay_transactions has been
removed altogether as we can get similar information on a per-slot basis
from replay-slot-stats metric.
* Fixed missing Root notifications via geyser plugin framework
* Renamed a variable
* fmt issue
* Do not try the loop if no subscribers.
* Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions
* clippy issue
* Address some reviewing findings
* Addressed some feedback from Carl
* fix a clippy issue
* Added comments on optimistically_confirmed_bank_tracker module to explain the workflow
* Addressed Trent's review
When ReplayStage repeatedly fails to compute the correct for a block
after purging and repairing, it panics on the assumption that something
is very wrong and will require human intervention.
If this is the case, there is typically something to be debugged, and
having the slot available locally is valuable. This change does the
retry check that will panic before purging the failure slot.
* Fix bug where ReplayStage holds an Arc<Bank> for process lifetime
When ReplayStage::new() kicks off, it needs to do some setup with the
working bank prior to entering the main processing loop. This setup is
done before entering the main processing loop; however, a bug made it
such that an Arc<Bank> remained in scope after the processing loop had
been entered. The processing loop is only exited when the process exits,
so this means that Bank was being held for the lifetime of the process.
This is a waste of resources and prevents background cleanup.
* clippy
Extracted time metrics related to transaction execution into a separate
structure. This allows me to call `process_entries_with_callback()`
without locking the whole instance of `ConfirmationTiming`, passing just
the `BatchExecutionTiming` part.
I want to add a new metric that starts at the beginning of the
`confirm_slot_entries()` call and ends until the very end. In order to
use a `scopeguard::defer`, I need to be able to have an excursive
reference to it for the whole body of `confirm_slot_entries()`.
Plus a few minor renamings to clarify which verifications and results
variables actually store. And corrected a few messages, that
incorrectly stated PoH verification, while they were actually issued
for transaction verification failures.
* Add RewardsMessage enum
* Cache and update max_complete_rewards_slot
* Plumb max_complete_rewards_slot into JsonRpcRequestProcesseor
* Use max_complete_rewards_slot to check get_block requests
* Use max_complete_rewards_slot to limit Bigtable uploads
* Plumb max_complete_rewards_slot into RpcSubscriptions
* Use max_complete_rewards_slot to limit block subscriptions
* Nit: fix test
* Add fully-reproducible online tracer for banking
* Don't use eprintln!()...
* Update programs/sbf/Cargo.lock...
* Remove meaningless assert_eq
* Group test-only code under aptly named mod
* Remove needless overflow handling in receive_until
* Delay stat aggregation as it's possible now
* Use Cow to avoid needless heap allocs
* Properly consume metrics action as soon as hold
* Trace UnprocessedTransactionStorage::len() instead
* Loosen joining api over type safety for replaystage
* Introce hash event to override these when simulating
* Use serde_with/serde_as instead of hacky workaround
* Update another Cargo.lock...
* Add detailed comment for Packet::buffer serialize
* Rename sender_overhead_minimized_receiver_loop()
* Use type interference for TraceError
* Another minor rename
* Retire now useless ForEach to simplify code
* Use type alias as much as possible
* Properly translate and propagate tracing errors
* Clarify --enable-banking-trace with better naming
* Consider unclean (signal-based) node restarts..
* Tweak logging and cli
* Remove Bank events as it's not needed anymore
* Make tpu own banking tracer thread
* Reduce diff a bit..
* Use latest serde_with
* Finally use the published rolling-file crate
* Make test code change more consistent
* Revive dead and non-terminating test code path...
* Dispose batches early now that possible
* Split off thread handle very early at ::new()
* Tweak message for TooSmallDirByteLimitl
* Remove too much of indirection
* Remove needless pub from ::channel()
* Clarify test comments
* Avoid needless event creation if tracer is disabled
* Write tests around file rotation and spill-over
* Remove unneeded PathBuf::clone()s...
* Introduce inner struct instead of tuple...
* Remove unused enum BankStatus...
* Avoid .unwrap() for the case of disabled tracer...
Problem
The plugins need to know when all transactions for a block have been all notified to serve getBlock request correctly. As block and transaction notifications are sent asynchronously to each other it will be difficult.
Summary of Changes
Include the executed transaction count in block notification which can be used to check if all transactions have been notified.
* Plumb dumps from replay_stage to repair
When dumping a slot from replay_stage as a result of duplicate or
ancestor hashes, properly update repair subtrees to keep weighting and
forks view accurate.
* add test
* pr comments
* Add dump_node to update stake for heaviest subtrees
Additionally refactor subtrees to store children as a hashset
* Add a more complicated forks test
* chose -> choose
* remove is_dumped flag and reuse latest_invalid_ancestor instead
* Update cost model to use requested_cu instead of estimated cu #27608
* remove CostUpdate and CostModel from replay/tvu
* revive cost update service to send cost tracker stats
* CostModel is now static
* remove unused package
Co-authored-by: Tao Zhu <tao@solana.com>
We have transactions counted in replay-slot-end-to-end-stats, but that
metric is broken down to report things per thread.
So, report total_transactions for the entire slot (all threads) in
replay-slot-stats.