Commit Graph

511 Commits

Author SHA1 Message Date
Jeff Biseda bad5197cb0
refactor core to create repair module (#32303) 2023-07-05 12:20:46 -07:00
Ashwin Sekar e1576b5352
Don't attempt to refresh votes on non voting validators (#32315) 2023-06-30 17:53:06 -07:00
Jeff Biseda 87c1b67d53
refactor core to create consensus module (#32282) 2023-06-27 17:25:08 -07:00
Wen 6f72258e3e
Vote refresh fix when outside slothash (#29948)
* When there are too many pubkeys in one slot, kick the one with lowest
stake out.

* Cache last_root to reduce read locks we need.

* Use slots_in_epoch to limit number of slots in the map.

* Fix lint errors.

* Only cache stake and slots per epoch once per epoch.

* Revert "Only cache stake and slots per epoch once per epoch."

This reverts commit 8658aad0083456794b4c4403adaf9c74d1a71d09.

* Vote at the tip of current fork if last vote is outside SlotHash
of the tip and last vote expired.

* Add unittest when last vote is outside slothash, we should vote at the tip
of the current fork.

* Revert "Use slots_in_epoch to limit number of slots in the map."

This reverts commit 93574f57a48d2a70fbbc0f62fa8810d3b6bee0af.

* Revert "Cache last_root to reduce read locks we need."

This reverts commit bb114ec2b62cb9c0207328b19c415f6116be0f1c.

* Revert "When there are too many pubkeys in one slot, kick the one with lowest"

This reverts commit 711e29a6a025fd4f11fbc97dcbbe90e4832be04c.

* Move new vote generation when last vote is outside slothash into the
main path, this actually makes more sense since we don't select where
to vote in two different places, and all the vote generation logic
is seamlessly inherited.

* - Move vote refresh to be behind select vote and do not refresh vote if a new
  vote is selected.
- Check whether last vote is inside slothash inside select_vote_and_reset_forks
- rename slot_within_slothash to is_in_slothashes_history
- remove one unittest for now, more tests will be added in a separate CL

* Remove new test, it will be in another file.

* Add is_in_slot_hashes_history test in the new file.

* Add unittest for the case when last vote is outside slot hashes.

* Small improvements and more unittests.

* Fix bad merge.

* Update docs/src/terminology.md

Co-authored-by: mvines <mvines@gmail.com>

* Put SwitchForkDecision::FailedSwitchThreshold logic into separate function.

* Make linter happy.

---------

Co-authored-by: mvines <mvines@gmail.com>
2023-06-26 18:21:24 -07:00
Jeff Biseda 5ca1b40f11
refactor core to create cluster_slots_service module (#32119) 2023-06-26 08:54:49 -07:00
behzad nouri f6e039b0b3
moves turbine to a separate crate out of solana/core (#32226) 2023-06-22 16:22:11 +00:00
Ashwin Sekar 8135cf35bf
Only dump duplicate descendants in dump & repair (#31559) 2023-06-21 11:28:42 -07:00
Ashwin Sekar 01d3546de0
Increment timestamp on refreshed votes (#31908) 2023-06-15 10:38:22 -07:00
Illia Bobyr 4353ac6797
Pass Arc<AtomicBool> by value, not by reference. (#31916)
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it.  Just adds an extra layer of
indirection.

Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.

This change focuses specifically on `Arc<AtomicBool>`.  There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner.  But it would make the change even larger.
2023-06-01 17:25:48 -07:00
Andrew Fitzgerald 02ac8a46d6
set_bank takes owned Arc<Bank> (#31717) 2023-05-23 09:41:27 -07:00
Lijun Wang 917f3d2586
Use unwrap_or_else for efficiency (#31747)
Use unwrap_or_else for efficiency.
2023-05-22 09:58:24 -07:00
Ashwin Sekar 3e8f5bad81
refactor: highest_cluster_confirmed_root -> highest_super_majority_root (#31619) 2023-05-14 00:42:03 -07:00
Ashwin Sekar ef75f1cb4e
Add ancestor hashes to state machine (#31627)
* Notify replay of pruned duplicate confirmed slots

* Ingest replay signal and run ancestor hashes for pruned

* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service

* Add local-cluster test
2023-05-13 02:05:44 -07:00
Tyera 3f70ddb2c5
Add entry notification service for geyser (#31290)
* Move entry_notifier_interface

* Add EntryNotifierService

* Use descriptive struct in sender/receiver

* Optionally initialize EntryNotifierService in validator

* Plumb EntryNotfierSender into Tvu, blockstore_processor

* Plumb EntryNotfierSender into Tpu

* Only return one option when constructing EntryNotifierService
2023-05-10 17:20:51 -06:00
steviez 4300d84c68
Remove counters from ReplayStage (#31532)
replay_stage-voted_empty_bank has been converted into a datapoint that
now includes slot number. replay_stage-replay_transactions has been
removed altogether as we can get similar information on a per-slot basis
from replay-slot-stats metric.
2023-05-09 11:44:02 -05:00
behzad nouri 8e638b785a
removes feature gate code sending votes to tpu-vote-port (#31529) 2023-05-08 18:12:35 +00:00
Lijun Wang 7cf50e60fc
Fixed missing Root notifications via geyser plugin framework (#31180)
* Fixed missing Root notifications via geyser plugin framework

* Renamed a variable

* fmt issue

* Do not try the loop if no subscribers.

* Addressing some feedback -- passing parent roots from replay_stage to avoid race conditions

* clippy issue

* Address some reviewing findings

* Addressed some feedback from Carl

* fix a clippy issue

* Added comments on optimistically_confirmed_bank_tracker module to explain the workflow

* Addressed Trent's review
2023-05-03 18:50:00 +08:00
steviez 758bc1ca75
Make ReplayStage panic before dumping repeated-repair-attempt slots (#31333)
When ReplayStage repeatedly fails to compute the correct for a block
after purging and repairing, it panics on the assumption that something
is very wrong and will require human intervention.

If this is the case, there is typically something to be debugged, and
having the slot available locally is valuable. This change does the
retry check that will panic before purging the failure slot.
2023-04-25 11:50:47 -05:00
Andrew Fitzgerald 10d637d2e6
PohRecorder take Arc not &Arc for blockstore (#31234) 2023-04-19 11:41:18 -07:00
steviez 377ba53a31
Fix bug where ReplayStage holds an Arc<Bank> for process lifetime (#31267)
* Fix bug where ReplayStage holds an Arc<Bank> for process lifetime

When ReplayStage::new() kicks off, it needs to do some setup with the
working bank prior to entering the main processing loop. This setup is
done before entering the main processing loop; however, a bug made it
such that an Arc<Bank> remained in scope after the processing loop had
been entered. The processing loop is only exited when the process exits,
so this means that Bank was being held for the lifetime of the process.
This is a waste of resources and prevents background cleanup.

* clippy
2023-04-19 18:12:34 +00:00
Trent Nelson f34a6bcfce
runtime: transpose `VoteAccount::vote_state()` return to improve ergonomics (#31256) 2023-04-18 14:48:52 -06:00
Ashwin Sekar 85dbd3d94d
Add stake breakdown to metrics for HeaviestForkFailures (#31067) 2023-04-05 20:35:12 -06:00
Brennan 60c4a718a5
enhance replay partition metrics (#31010)
* enhance replay partition metrics
2023-04-04 19:57:09 -07:00
Tyera 3442f184f7
Remove unneeded `clippy::new_ret_no_self` allows (#31035)
Remove unneeded allows
2023-04-03 20:35:20 -06:00
Illia Bobyr 564f8c9b17
ledger: Extract `BatchExecutionTiming` (#30806)
Extracted time metrics related to transaction execution into a separate
structure.  This allows me to call `process_entries_with_callback()`
without locking the whole instance of `ConfirmationTiming`, passing just
the `BatchExecutionTiming` part.

I want to add a new metric that starts at the beginning of the
`confirm_slot_entries()` call and ends until the very end.  In order to
use a `scopeguard::defer`, I need to be able to have an excursive
reference to it for the whole body of `confirm_slot_entries()`.

Plus a few minor renamings to clarify which verifications and results
variables actually store.  And corrected a few messages, that
incorrectly stated PoH verification, while they were actually issued
for transaction verification failures.
2023-03-28 15:37:34 -07:00
Ryo Onodera 74970a0b5d
Remove unused ProcessOptions::entry_callback (#30600)
* Confine entry_callback under cfg(test) for clarity

* Fix ci

* Actually remove entry_callback altogether

* fix clippy
2023-03-16 09:33:18 +09:00
Tyera b389d509a8
Track max_complete_rewards_slot for use in rpc, bigtable (#30698)
* Add RewardsMessage enum

* Cache and update max_complete_rewards_slot

* Plumb max_complete_rewards_slot into JsonRpcRequestProcesseor

* Use max_complete_rewards_slot to check get_block requests

* Use max_complete_rewards_slot to limit Bigtable uploads

* Plumb max_complete_rewards_slot into RpcSubscriptions

* Use max_complete_rewards_slot to limit block subscriptions

* Nit: fix test
2023-03-14 12:08:48 -06:00
behzad nouri c4b2639a86
patches flaky test_retransmit_latest_unpropagated_leader_slot (#30686) 2023-03-12 22:46:05 +00:00
behzad nouri f9805b6fbb
stops nodes from broadcasting slots twice (#30681)
https://github.com/solana-labs/solana/blob/94ef881de/core/src/progress_map.rs#L178
always returns true the first time around because retry_time is None.
So every slot is broadcasted twice.
2023-03-11 02:46:08 +00:00
Ashwin Sekar 67f644473b
Fix repair behavior concerning our own leader slots (#30200)
panic when trying to dump & repair a block that we produced
2023-02-09 14:30:12 -07:00
steviez d3dab24bbe
chore: Use `i` over `ix` variable name when naming worker threads (#30206) 2023-02-09 01:24:57 +00:00
Kirill Fomichev b4d1769688
geyser: add parent slot/blockhash to block (#29855) 2023-01-25 14:20:24 -08:00
Ryo Onodera 40bbf99c74
Add fully-reproducible online tracer for banking (#29196)
* Add fully-reproducible online tracer for banking

* Don't use eprintln!()...

* Update programs/sbf/Cargo.lock...

* Remove meaningless assert_eq

* Group test-only code under aptly named mod

* Remove needless overflow handling in receive_until

* Delay stat aggregation as it's possible now

* Use Cow to avoid needless heap allocs

* Properly consume metrics action as soon as hold

* Trace UnprocessedTransactionStorage::len() instead

* Loosen joining api over type safety for replaystage

* Introce hash event to override these when simulating

* Use serde_with/serde_as instead of hacky workaround

* Update another Cargo.lock...

* Add detailed comment for Packet::buffer serialize

* Rename sender_overhead_minimized_receiver_loop()

* Use type interference for TraceError

* Another minor rename

* Retire now useless ForEach to simplify code

* Use type alias as much as possible

* Properly translate and propagate tracing errors

* Clarify --enable-banking-trace with better naming

* Consider unclean (signal-based) node restarts..

* Tweak logging and cli

* Remove Bank events as it's not needed anymore

* Make tpu own banking tracer thread

* Reduce diff a bit..

* Use latest serde_with

* Finally use the published rolling-file crate

* Make test code change more consistent

* Revive dead and non-terminating test code path...

* Dispose batches early now that possible

* Split off thread handle very early at ::new()

* Tweak message for TooSmallDirByteLimitl

* Remove too much of indirection

* Remove needless pub from ::channel()

* Clarify test comments

* Avoid needless event creation if tracer is disabled

* Write tests around file rotation and spill-over

* Remove unneeded PathBuf::clone()s...

* Introduce inner struct instead of tuple...

* Remove unused enum BankStatus...

* Avoid .unwrap() for the case of disabled tracer...
2023-01-25 21:54:38 +09:00
steviez ac65343f01
Remove duplicate bank frozen log from ReplayStage (#29821)
We emit a similar log with more information shortly after from Bank, so
this logline is extra that occurs for every slot.
2023-01-24 20:29:14 -06:00
Trent Nelson c4e43f1de4
vote: encapsulate `Lockout` (#29753) 2023-01-18 19:28:28 -07:00
Lijun Wang 1e8a8e07b6
Stream the executed transaction count in the block notification (#29272)
Problem

The plugins need to know when all transactions for a block have been all notified to serve getBlock request correctly. As block and transaction notifications are sent asynchronously to each other it will be difficult.

Summary of Changes

Include the executed transaction count in block notification which can be used to check if all transactions have been notified.
2023-01-05 09:36:19 -08:00
Ashwin Sekar f2ba16ee87
Plumb dumps from replay_stage to repair (#29058)
* Plumb dumps from replay_stage to repair

When dumping a slot from replay_stage as a result of duplicate or
ancestor hashes, properly update repair subtrees to keep weighting and
forks view accurate.

* add test

* pr comments
2022-12-25 09:58:30 -07:00
Jason Davis 8f24ceffbd Removed Arcs from PohConfig parameters and pass the struct by reference only 2022-12-07 10:52:07 -06:00
behzad nouri df7fd8ae5f
patches rust code formatting in core/src/replay_stage.rs (#29123) 2022-12-06 22:09:57 +00:00
behzad nouri 9524c9dbff patches errors from clippy::uninlined_format_args
https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
2022-12-06 19:32:15 +00:00
behzad nouri 9433c06745 patches errors from clippy::unchecked_duration_subtraction
https://rust-lang.github.io/rust-clippy/master/index.html#unchecked_duration_subtraction
2022-12-06 19:32:15 +00:00
Ashwin Sekar edacd3c411
Add dump_node to update stake for heaviest subtrees (#28827)
* Add dump_node to update stake for heaviest subtrees

Additionally refactor subtrees to store children as a hashset

* Add a more complicated forks test

* chose -> choose

* remove is_dumped flag and reuse latest_invalid_ancestor instead
2022-11-30 09:26:13 -08:00
Maximilian Schneider c8b0c3ede9
Update cost model to use requested_cu instead of estimated cu #27608 (#28281)
* Update cost model to use requested_cu instead of estimated cu #27608

* remove CostUpdate and CostModel from replay/tvu

* revive cost update service to send cost tracker stats

* CostModel is now static

* remove unused package

Co-authored-by: Tao Zhu <tao@solana.com>
2022-11-22 11:55:56 -06:00
Ashwin Sekar ddf4ff2d26
Repair service documentation (#28592)
* repair doc update

* tree_root rename

* remove extra todo
2022-11-16 02:38:07 +00:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
Ashwin Sekar ae557a9eb5
Exit when stuck in an unrecoverable repair/purge loop (#28596)
* Exit when stuck in an unrecoverable repair/purge loop

* add tests
2022-10-27 20:06:06 -07:00
steviez 39fa297bf6
Report total_transactions in replay-slot-stats (#28382)
We have transactions counted in replay-slot-end-to-end-stats, but that
metric is broken down to report things per thread.

So, report total_transactions for the entire slot (all threads) in
replay-slot-stats.
2022-10-15 14:07:03 +01:00
carllin 14a415ccf3
Consensus Logging (#28176) 2022-10-03 20:45:55 -05:00
Justin Starry c2bb2b8e60
Allow validators to reset to the slot which matches their last voted slot (#28172)
* Add failing test

* Allow resetting to duplicate was confirmed

* feedback

* feedback

* bump

* simplify change

* Revert "simplify change"

This reverts commit 72e5de3e5bdac595f71dc7fc01650ca3bc7da98e.

* update comment

* Update core/src/replay_stage.rs
2022-10-03 16:49:47 +08:00
behzad nouri 9ee53e594d
patches clippy errors from new rust nightly release (#28028) 2022-09-23 20:57:27 +00:00