Commit Graph

3994 Commits

Author SHA1 Message Date
Trent Nelson a2579d484e
remove raptor coding experiments (#255) 2024-03-14 22:01:15 -06:00
Wen 5591db7801
Wen_restart: check block full using blockstore (#250)
* Switch to blockstore.is_full() check because replay thread isn't active.

* Use make_chaining_slot_entries and add first_parent to the method.
Small style fixes.

* Switch to blockstore.is_full() check because replay thread isn't active.
2024-03-14 20:45:03 -07:00
steviez 7a144e2b9f
Make ReplayStage own the threadpool for tx replay (#190)
The threadpool used to replay multiple transactions in parallel is
currently global state via a lazy_static definition. Making this pool
owned by ReplayStage will enable subsequent work to make the pool
size configurable on the CLI.

This makes `ReplayStage` create and hold the threadpool which is passed
down to blockstore_processor::confirm_slot().

blockstore_processor::process_blockstore_from_root() now creates its'
own threadpool as well; however, this pool is only alive while for
the scope of that function and does not persist the lifetime of the
process.
2024-03-12 13:21:11 -05:00
Brooks 88f6a7a459
Removes holding storages in AccountsHashVerifier for fastboot (#120) 2024-03-11 17:09:26 -04:00
steviez bf0a3684eb
Make ReplayStage create the parallel fork replay threadpool (#137)
ReplayStage owning the pool allows for subsequent work to configure
the size of the pool; configuring the size of the pool inside of the
lazy_static would have been a little messy
2024-03-08 12:52:35 -06:00
Lucas Steuernagel e027a8bd63
Gather recording booleans in a data structure (#134) 2024-03-08 09:28:04 -03:00
Dmitri Makarov ba43f74dcf
[SVM] Move RuntimeConfig to program-runtime (#96)
RuntimeConfig doesn't use anything SVM specific and logically belongs
in program runtime rather than SVM.  This change moves the definition
of RuntimeConfig struct from the SVM crate to program-runtime and
adjusts `use` statements accordingly.
2024-03-07 10:16:16 -08:00
Tao Zhu 8f3f06cc7f
Combine builtin and BPF compute cost in cost model (#29)
* Combine builtin and BPF execution cost into programs_execution_cost since VM has started to consume CUs uniformly

* update tests

* apply suggestions from code review
2024-03-07 09:23:49 -06:00
steviez 1110fc93d7
Give SigVerify and ShredFetch threads unique names (#98)
- solTvuFetchPmod ==> solTvuPktMod + solTvuRepPktMod
- solSigVerifier ==> solSigVerTpu + solSigVerTpuVot
2024-03-05 22:02:04 -06:00
steviez ce34f3f014
Rename and uniquify QUIC thread names (#28)
When viewing in various tools such as gdb and perf, it is not easy to
distinguish which threads are serving which function (TPU or TPU FWD)
2024-03-05 12:09:17 -06:00
Brooks 93f5b514fa
Adds StartingSnapshotStorages to AccountsHashVerifier (#58) 2024-03-04 16:32:51 -05:00
Wen bfe44d95f4
Wen restart aggregate last voted fork slots (#33892)
* Push and aggregate RestartLastVotedForkSlots.

* Fix API and lint errors.

* Reduce clutter.

* Put my own LastVotedForkSlots into the aggregate.

* Write LastVotedForkSlots aggregate progress into local file.

* Fix typo and name constants.

* Fix flaky test.

* Clarify the comments.

* - Use constant for wait_for_supermajority
- Avoid waiting after first shred when repair is in wen_restart

* Fix delay_after_first_shred and remove loop in wen_restart.

* Read wen_restart slots inside the loop instead.

* Discard turbine shreds while in wen_restart in windows insert rather than
shred_fetch_stage.

* Use the new Gossip API.

* Rename slots_to_repair_for_wen_restart and a few others.

* Rename a few more and list all states.

* Pipe exit down to aggregate loop so we can exit early.

* Fix import of RestartLastVotedForkSlots.

* Use the new method to generate test bank.

* Make linter happy.

* Use new bank constructor for tests.

* Fix a bad merge.

* - add new const for wen_restart
- fix the test to cover more cases
- add generate_repairs_for_slot_not_throtted_by_tick and
  generate_repairs_for_slot_throtted_by_tick to make it readable

* Add initialize and put the main logic into a loop.

* Change aggregate interface and other fixes.

* Add failure tests and tests for state transition.

* Add more tests and add ability to recover from written records in
last_voted_fork_slots_aggregate.

* Various name changes.

* We don't really care what type of error is returned.

* Wait on expected progress message in proto file instead of sleep.

* Code reorganization and cleanup.

* Make linter happy.

* Add WenRestartError.

* Split WenRestartErrors into separate erros per state.

* Revert "Split WenRestartErrors into separate erros per state."

This reverts commit 4c920cb8f8d492707560441912351cca779129f6.

* Use individual functions when testing for failures.

* Move initialization errors into initialize().

* Use anyhow instead of thiserror to generate backtrace for error.

* Add missing Cargo.lock.

* Add error log when last_vote is missing in the tower storage.

* Change error log info.

* Change test to match exact error.
2024-03-01 18:52:47 -08:00
steviez 7d6f1d5911
Give streamer::receiver() threads unique names (#35369)
The name was previously hard-coded to solReceiver. The use of the same
name makes it hard to figure out which thread is which when these
threads are handling many services (Gossip, Tvu, etc).
2024-03-01 13:36:08 -06:00
Andrew Fitzgerald ede9163633
Comments clarifying non-emptiness of threadset (#35388) 2024-03-01 11:18:42 -08:00
steviez 7c878973e2
Cleanup ReplayStage loop timing struct (#35361)
- Track loop_count in the struct
- Rename ReplayTiming ==> ReplayLoopTiming
- Make all metrics consistent to end with "_elapsed_us"
2024-03-01 12:30:50 -06:00
Pankaj Garg 990ca1d0b8
Add limit to looping in banking-stage (#35342) 2024-02-28 17:36:45 -08:00
Brooks 6aaaf858c9
Adds more info to panic message in AccountsHashVerifier (#35353) 2024-02-28 15:55:05 -05:00
behzad nouri a7a41e7631
adds Merkle shred variant with retransmitter's signature (#35293)
Moving towards locking down Turbine propagation path, the commit
reserves a buffer within shred payload for retransmitter's signature.
2024-02-28 20:31:40 +00:00
steviez 140818221c
Rename SamplePerformanceService thread for consistency (#35332)
- Rename thread
- Add uniform service start/stop logs
- Misc cleanup with variables / constants / exit flag check
2024-02-28 13:47:27 -06:00
Andrew Fitzgerald 9f581113bd
Scheduler: Leader-Slot metrics for Scheduler (#35087) 2024-02-23 17:06:22 -08:00
enjoyoor c02f47a6fb
fix: cleanup (#35298) 2024-02-23 19:59:52 +00:00
Tao Zhu 139b9c8c25
Add fee_details to fee calculation (#35021)
* add fee_details to fee calculation

* fix - no need to round after summing u64

* feature gate on removing unwanted rounding
2024-02-23 08:58:48 -06:00
Andrew Fitzgerald 367f489f63
scheduler inner metrics (#35271) 2024-02-22 15:01:08 -08:00
Ashwin Sekar 07955e79ad
replay: gracefully exit if tower load fails (#35269) 2024-02-21 18:51:30 -08:00
Ryo Onodera 024d6ecc4f
Add --unified-scheduler-handler-threads (#35195)
* Add --unified-scheduler-handler-threads

* Adjust value name

* Warn if the flag was ignored

* Tweak message a bit
2024-02-22 09:05:17 +09:00
DimAn 531793b4be
validator: ignore too old tower error (#35229)
* validator: ignore too old tower error

* Update core/src/replay_stage.rs

Co-authored-by: Ashwin Sekar <ashwin@solana.com>

* remove redundant references

---------

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
2024-02-21 13:23:23 -05:00
steviez 4905076fb6
Remove channel that sends roots to BlockstoreCleanupService (#35211)
Currently, ReplayStage sends new roots to BlockstoreCleanupService, and
BlockstoreCleanupService decides when to clean based on advancement of
the latest root. This is totally unnecessary as the latest root is
cached by the Blockstore, and this value can simply be fetched.

This change removes the channel completely, and instead just fetches
the latest root from Blockstore directly. Moreso, some logic is added
to check the latest root less frequently, based on the set purge
interval.

All in all, we went from sending > 100 slots/min across a crossbeam
channel to reading an atomic roughly 3 times/min, while also removing
the need for an additional thread that read from the channel.
2024-02-21 10:16:16 -06:00
steviez 5c04a9731c
Obtain BankForks read lock once to get ancestors and descendants (#35273)
No need to get the read lock twice; instead, hold it and get both items
2024-02-21 10:07:57 -06:00
Andrew Fitzgerald cd4cf814fc
Scheduler: Separate scheduler metrics module (#35216) 2024-02-20 19:39:00 -08:00
Ashwin Sekar b0134ab04d
validator: include waited_for_supermajority in startup metric (#35137) 2024-02-20 16:13:57 -08:00
Dmitri Makarov 0acee67891
SVM: move transaction_results from accounts-db to SVM (#35183)
SVM: Remove accounts-db deps in accounts_loader tests
2024-02-20 12:54:56 -08:00
Ashwin Sekar befe8b9d98
replay: reload tower if set-identity during startup (#35173)
* replay: reload tower if set-identity during startup

* pr feedback: add unit tests

* pr feedback: use tower.node_pubkey, more descriptive names
2024-02-20 09:30:46 -08:00
HaoranYi ebf60359f4
clean up dev-context-only attribute (#35201)
Co-authored-by: HaoranYi <haoran.yi@solana.com>
2024-02-19 07:56:27 -06:00
sakridge e21251090f
Remove spammy banking-stage retryable tx metric which is not needed (#35207)
Already covered by other metrics like the filtered retryable and the
number filtered.
2024-02-16 18:29:42 +01:00
steviez 897adb2711
Update the directory naming for incorrect shred version backup (#35158)
The directory is currently named with the expected_shred_version;
however, the backup contains shreds that do NOT match the
expected_shred_version. So, use the found (incorrect) shred version in
the name instead.
2024-02-13 09:42:05 -07:00
Andrew Fitzgerald 1517d22ecc
Scheduler - prioritization fees/cost (#34888) 2024-02-09 08:51:21 -08:00
Dmitri Makarov 245d1c4087
SVM: Move TransactionCheckResult definition from accounts-db to SVM (#35153) 2024-02-08 21:13:00 -05:00
Dmitri Makarov 2c0001b530
SVM: Move RewardInfo from accounts-db to Solana SDK (#35120) 2024-02-07 10:55:39 -08:00
Pankaj Garg 46b9586630
SVM: Move SVM code to its own crate folder (#35119) 2024-02-06 16:06:32 -08:00
Pankaj Garg 10defb161f
SVM: Move TransactionErrorMetrics to SVM folder (#35112) 2024-02-06 11:15:48 -08:00
Ashwin Sekar 3e24b410fb
replay: votes made before restart are eligible for refresh (#34737)
* replay: votes made before restart are eligible for refresh

* pr feedback: rename to mark

* pr feedback: limit scope to non voting validators
2024-02-06 11:09:59 -08:00
Andrew Fitzgerald 9dca15a5b7
Rename priority to compute_unit_price (#35062)
* rename several priorities to compute_unit_price

* TransactionPriorityDetails -> ComputeBudgetDetails

* prioritization_fee_cache: fix comment

* transaction_state: fix comments and variable names

* immutable_deserialized_packet: fix comment
2024-02-05 16:41:01 -08:00
Ashwin Sekar 0e4e81a44c
banking stage: remove spammy packet conversion metric (#35014) 2024-02-05 14:46:32 -08:00
Pankaj Garg 3cf5dd2afb
SVM: Move RuntimeConfig to svm folder (#35085) 2024-02-05 13:49:36 -08:00
Brooks f62293918d
Moves the async deleter code to accounts-db (#35040) 2024-02-02 09:21:26 -05:00
galactus 35f900b03b
Metrics prioritization fees (#34653)
* Adding metrics for prioritization fees min/max per thread

* Adding scheduled transaction prioritization fees to the metrics

* Changes after andrews comments

* fixing Taos comments

* Adding metrics to the new scheduler

* Fixing getting of min max for TransactionStateContainer

* Fix clippy CI Issue

* Changes after andrews comments about min/max for new scheduler

* Creating a new structure to store prio fee metrics

* Reporting with prio fee stats banking_stage_scheduler_counts

* merging prioritization stats into SchedulerCountMetrics

* Minor changes after andrews review
2024-02-01 15:06:45 -06:00
Brooks daa2449ad4
Removes RwLock on AccountsDb::shrink_paths (#35027) 2024-02-01 09:35:34 -05:00
Lijun Wang 8fde8d26c7
don't sign X.509 certs (#34896)
This get rid of 3rd party components rcgen in the path of private key access to make the code more secure.
2024-01-28 16:17:46 -08:00
behzad nouri 79bbe4381a
adds chained_merkle_root to shredder arguments (#34952)
Working towards chaining Merkle root of erasure batches, the commit adds
chained_merkle_root to shredder arguments.
2024-01-27 15:04:31 +00:00
behzad nouri d4fdcd940a
adds feature to enable chained Merkle shreds (#34916)
During a cluster upgrade when only half of the cluster can ingest the new shred
variant, sending shreds of the new variant can cause nodes to diverge.
The commit adds a feature to enable chained Merkle shreds explicitly.
2024-01-27 15:03:16 +00:00