Commit Graph

3693 Commits

Author SHA1 Message Date
Xiang Zhu f3e94ca73c
AHV processes the snapshot dirs in place (#30978)
* AHV processes the snapshot dirs in place

Let account pacakge use the snapshot dir, so AHV computes the accounts hash and turns the pre snapshot dir into a post snapshot dir

* fix status cache path to maintain the archive layout for the in-place snapshot dir archiving

* fix test_package_snapshots

* Fix test_concurrent_snapshot_packaging

* Remove debug change.

* Fix snapshot_links path

* change to borrow for bank_snapshots_dir

* Reverted changes in create_and_verify_snapshot

* Fix param errors

* Fix rebase errors

* Remove NOTE 1

* Remove unwrap

* Remove the variables to make it apparent taht snapshot_links is the bank_snapshots_dir

* Use soft link instead of hard link for snapshot and status cache

* After switching to soft symlinking, the src path should be absolute
2023-04-26 11:48:48 -07:00
Andrew Fitzgerald 2d91c52373
derive Debug for multi_iterator_scanner::ProcessingDecision (#31312) 2023-04-26 08:53:31 -07:00
Brooks 32fb15655e
Upgrades fs_extra to v1.3.0 (#31341) 2023-04-25 18:29:19 -04:00
steviez 758bc1ca75
Make ReplayStage panic before dumping repeated-repair-attempt slots (#31333)
When ReplayStage repeatedly fails to compute the correct for a block
after purging and repairing, it panics on the assumption that something
is very wrong and will require human intervention.

If this is the case, there is typically something to be debugged, and
having the slot available locally is valuable. This change does the
retry check that will panic before purging the failure slot.
2023-04-25 11:50:47 -05:00
Xiang Zhu 16f3dcd5d2
Update fn create_and_verify_snapshot (#31245)
* only 1 snapshot per archive in create_and_verify_snapshot

* Update create_and_verify_snapshot with the newer funtion calls

* Fix test_package_snapshots

* Remove account path access change

* Rename slot to num_snapshots

* Remove unncessary purge_old_bank_snapshots in test

* Update non-deterministic format comment

* Cleanup unnecessary hash calls

* Use get_accounts_hash

* Remove extra declaration

* Remove rehash

* Remove clean_accounts

* Revert "Cleanup unnecessary hash calls"

This reverts commit 06b1457462cf6d7acf62e0e5531633caf5d9fc58.

Removing unncessary hash calls should be done for create_and_verify_snapshot,
not bank_to_full_snapshot_archive

* Fix typo appenedvecs

* Remove bank_snapshots_dir after rebasing
2023-04-24 18:52:50 -07:00
behzad nouri 1b08d01a80
removes shred_version from LegacyContactInfo public interface (#31304)
Working towards LegacyContactInfo => ContactInfo migration, the commit
adds more api parity between the two.
2023-04-24 15:19:33 +00:00
behzad nouri a88024e295
removes wallclock from LegacyContactInfo public interface (#31303) 2023-04-22 20:18:39 +00:00
Jeff Biseda 3cdd59e55f
check for prior discard in shred_fetch_stage (#31293) 2023-04-21 11:43:10 -07:00
behzad nouri cb65a785bc
makes sockets in LegacyContactInfo private (#31248)
Working towards LegacyContactInfo => ContactInfo migration, the commit
hides some implementation details of LegacyContactInfo and expands API
parity with the new ContactInfo.
2023-04-21 15:39:16 +00:00
Andrew Fitzgerald 7a393e479d
Scheduler Messages (#30976) 2023-04-19 15:14:47 -07:00
Andrew Fitzgerald 10d637d2e6
PohRecorder take Arc not &Arc for blockstore (#31234) 2023-04-19 11:41:18 -07:00
steviez 377ba53a31
Fix bug where ReplayStage holds an Arc<Bank> for process lifetime (#31267)
* Fix bug where ReplayStage holds an Arc<Bank> for process lifetime

When ReplayStage::new() kicks off, it needs to do some setup with the
working bank prior to entering the main processing loop. This setup is
done before entering the main processing loop; however, a bug made it
such that an Arc<Bank> remained in scope after the processing loop had
been entered. The processing loop is only exited when the process exits,
so this means that Bank was being held for the lifetime of the process.
This is a waste of resources and prevents background cleanup.

* clippy
2023-04-19 18:12:34 +00:00
Xiang Zhu a5275f8839
Remove bank_snapshots_dir param (#31249)
* Remove bank_snapshots_dir param

* Remove outdated comment

* Revert "Remove outdated comment"

This reverts commit e4441432bec57edb0dc22c4bacf4d48ce26ed818.

* Handle parent() error

* Fix format error
2023-04-19 09:37:46 -07:00
Andrew Fitzgerald 748220c9d3
Forwarder: Add common setup for tests (#31232) 2023-04-19 09:08:13 -07:00
Brooks ca1bde3591
Use Arc instead of &Arc in AccountsBackgroundService::new (#31268) 2023-04-19 11:10:41 -04:00
Brooks 80b27f3cd9
Use Arc instead of &Arc in AccountsHashVerifier::new (#31269) 2023-04-19 11:10:08 -04:00
Brooks 1d14156832
Use Arc instead of &Arc in SnapshotPackagerService::new (#31270) 2023-04-19 11:09:49 -04:00
bji a45710838d
Add new vote state version that replaces Lockout with LandedVote to a… (#30831)
Add new vote state version that replaces Lockout with LandedVote to allow vote latency to be tracked in a future change.

Includes a feature to be enabled which will when enabled cause the vote state to be written in the new form.
2023-04-18 20:27:38 -07:00
Trent Nelson f34a6bcfce
runtime: transpose `VoteAccount::vote_state()` return to improve ergonomics (#31256) 2023-04-18 14:48:52 -06:00
Brennan 2164a50d00
Move BankIncrementalSnapshotPersistence (#31236)
* Move BankIncrementalSnapshotPersistence

* Update bank serialize ABI digest
2023-04-18 11:18:17 -07:00
Xiang Zhu e74bc4e2e7
Add a filter_by_type param for purge_old_bank_snapshots (#31191)
* Add a type_select param for purge_old_bank_snapshots

* Use flags to make the function calls more readable

* Remove the extra purge calls

* replace select_type with filter_by_type

* Add test

* Use matches

* Fix CI test on reference

* use match and call do_purge once

* Let bank_snapshots_dir be TempDir

* remove account_paths in the test

* replace bank with _bank

* Remove create_snapshot_dirs_for_tests, will take the lastest from master

* Fix merge errors
2023-04-17 16:16:41 -07:00
Xiang Zhu 5747290d51
Move reference-holding last_snapshot_storages from ABS to AHV (#31175)
* Let AHV hold and update last_snapshot_storages

* Clean up comment

* Move cloning after enqueued_time

* Minor positon change

* Remove  type last_snapshot_storages annotation
2023-04-14 14:38:44 -07:00
Andrew Fitzgerald b657004141
store slot on BlockBatchUpdate (#31190) 2023-04-14 13:15:31 -07:00
Brooks d43e19bb03
Refactors the Full/Incremental SnapshotHash types (#31186) 2023-04-13 18:01:27 -04:00
Brooks 1f67591e21
Removes `base` from `IncrementalSnapshotHash` (#31185) 2023-04-13 17:35:35 +00:00
Brooks e05957d8fa
Push starting snapshot hashes in SnapshotGossipManager::new() (#31173) 2023-04-13 11:49:17 -04:00
Andrew Fitzgerald c847236147
decision maker perf (#30618) 2023-04-12 21:40:59 -07:00
Andrew Fitzgerald 01659edd16
Forwarder: forward_packets w/o metrics (#30925) 2023-04-12 14:09:24 -07:00
Brooks 602297e29f
Add comment in SolanaGossipMananger::update_latest_full_snapshot_hash() (#31171) 2023-04-12 16:52:23 +00:00
Brooks 1761c0947b
Removes unused arg from SnapshotGossipManager::new() (#31169) 2023-04-12 16:45:31 +00:00
Brooks 944b9d574a
Only push latest snapshot hashes in SnapshotGossipManager (#31154) 2023-04-12 11:00:26 -04:00
Brooks 965dd37924
Moves SnapshotGossipManager to its own file (#31147) 2023-04-11 15:51:52 -04:00
Brooks 453f272698
Rename IncrementalSnapshotHashes to SnapshotHashes (#31136) 2023-04-11 10:30:29 -04:00
Brooks f3083ad2e0
Rename SnapshotHashes to LegacySnapshotHashes (#31086) 2023-04-10 17:52:20 -04:00
behzad nouri ce21a58b65
reworks streamer::StakedNodes (#31082)
{min,max}_stake are computed but never assigned:
https://github.com/solana-labs/solana/blob/4564bcdc1/core/src/staked_nodes_updater_service.rs#L54-L57

The updater code is also inefficient and verbose.
2023-04-10 17:07:40 +00:00
Brooks f9276d1748
Uses MAX_ACCOUNTS_HASHES instead of MAX_SNAPSHOT_HASHES in accounts_hash_verifier.rs (#31114) 2023-04-10 10:44:40 -04:00
HaoranYi fcd1fe0959
Refactor fault hash injection into lambda (#31093)
* refactor out fault hash inject output AccountsHashVerifier

* refactor faught injector out of AccountHashVerifier

* use type alias

* Apply suggestions from code review

Co-authored-by: Brooks <brooks@prumo.org>

* move type alias

* rename

---------

Co-authored-by: Brooks <brooks@prumo.org>
2023-04-07 17:50:21 -05:00
Tao Zhu 9a7b6abc91
update benches after removing packet.sender_stake (#31110) 2023-04-07 14:27:29 -05:00
Andrew Fitzgerald 926bb0c794
MultiIteratorScanner::finalize returns (payload, already_processed) (#31054)
* finalize() returns (payload, already_processed)

* Additional testing around already_handled

* Return type wrapper and comment update
2023-04-07 11:17:36 -07:00
behzad nouri 466a9a2449
removes ip_stake_map field from streamer::StakedNodes (#31078) 2023-04-07 13:27:29 +00:00
Ryo Onodera f0432ec50f
Avoid overflow in ThreadSet::any() and nits (#31098)
Avoid overflow in ThreadSet::any and etc
2023-04-07 12:45:29 +09:00
behzad nouri 4d0abebe0e
removes Packet Meta.sender_stake and find_packet_sender_stake_stage (#31077)
Packet Meta.sender_stake is unused since
https://github.com/solana-labs/solana/pull/26512
removed sender_stake from banking-stage buffer prioritization.
2023-04-06 21:33:43 +00:00
Andrew Fitzgerald 00250819b8
ThreadAwareAccountLocks (#30422) 2023-04-06 10:12:03 -07:00
Ashwin Sekar 85dbd3d94d
Add stake breakdown to metrics for HeaviestForkFailures (#31067) 2023-04-05 20:35:12 -06:00
Brennan 60c4a718a5
enhance replay partition metrics (#31010)
* enhance replay partition metrics
2023-04-04 19:57:09 -07:00
Ashwin Sekar 9cbaf0e234
Rename DeadSlotAncestorRequestStatus -> AncestorRequestStatus (#31050) 2023-04-04 20:48:45 +00:00
Tyera 3442f184f7
Remove unneeded `clippy::new_ret_no_self` allows (#31035)
Remove unneeded allows
2023-04-03 20:35:20 -06:00
behzad nouri ff9a42a354
uses Duration type instead of untyped ..._ms: u64 (#30971) 2023-03-31 15:42:49 +00:00
steviez cc8e531a5d
Enforce a minimum of 1 on full and incremental snapshot retention (#30968) 2023-03-30 10:16:36 -05:00
Illia Bobyr fe5ae7733b
ledger: confirm_slot_entries(): confirmation_elapsed metric (#30807)
Measure total time spent inside `confirm_slot_entries()`.  Useful metric
in addition to `replay_elapsed` and
`poh_verify_elapsed`/`transaction_verify_elapsed`, as it shows how PoH
and transaction verification interact with the replay process.
2023-03-29 13:11:29 -07:00
Andrew Fitzgerald 8e910b494f
Forwarder: separate get_leader_and_addr (#30922) 2023-03-28 16:34:36 -07:00
Illia Bobyr 564f8c9b17
ledger: Extract `BatchExecutionTiming` (#30806)
Extracted time metrics related to transaction execution into a separate
structure.  This allows me to call `process_entries_with_callback()`
without locking the whole instance of `ConfirmationTiming`, passing just
the `BatchExecutionTiming` part.

I want to add a new metric that starts at the beginning of the
`confirm_slot_entries()` call and ends until the very end.  In order to
use a `scopeguard::defer`, I need to be able to have an excursive
reference to it for the whole body of `confirm_slot_entries()`.

Plus a few minor renamings to clarify which verifications and results
variables actually store.  And corrected a few messages, that
incorrectly stated PoH verification, while they were actually issued
for transaction verification failures.
2023-03-28 15:37:34 -07:00
Andrew Fitzgerald b72be0f086
Forwarder: clean up packet_vec filter (#30921) 2023-03-28 14:54:04 -07:00
Andrew Fitzgerald 2dfc46c71e
Forwarder: separate update_data_budget (#30920) 2023-03-28 12:58:39 -07:00
behzad nouri 75abfc79a6
removes unused dependencies (#30917) 2023-03-28 17:25:44 +00:00
behzad nouri 29f776c676
reports cluster-nodes metrics by stake (#30912) 2023-03-28 12:45:09 +00:00
Brooks d7ae05c3fd
Unifies logging of start/stop for background services (#30916) 2023-03-28 08:32:18 -04:00
behzad nouri 49b8ea771e
simplifies ServeRepair::run_orphan (#30908) 2023-03-27 23:19:55 +00:00
Andrew Fitzgerald a575ea2ee0
LeaderBankNotifier (#30395) 2023-03-27 08:17:17 -07:00
Brooks bf7fa02214
Add units to incremental accounts hash datapoint (#30894) 2023-03-24 21:39:20 +00:00
Andrew Fitzgerald f226a34f48
Only need bank reference for update (#30879) 2023-03-24 09:48:04 -07:00
Tao Zhu 52e63e2ffa
Allow banking_stage to update prioritization_fee_cache (#30853)
* Allow banking_stage to update prioritization_fee_cache

* Update core/src/banking_stage/committer.rs

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>

* move use to top

---------

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
2023-03-24 00:05:54 +00:00
behzad nouri a36e8f559c
removes hash-map lookups when sorting vote-account keys (#30878) 2023-03-23 22:29:35 +00:00
Ryo Onodera 6c444df9e0
Add --block-{verification,production}-method flags (noop atm) (#30746)
* Add --{replaying,banking}-backend flags (noop atm)

* Greatly simplify enums with strum

* Update programs/sbf/Cargo.lock...

* Rely on Display, removing Debug

* constify cli_names()

* Don't allow omitting bankend value

* Rename to --block-{verification,production}-method

* Use more specific name

* Actually support missing value....

* Remove strictly-unnecessary flags

* Use lazy_static! instead of abusing DefaultArgs...
2023-03-23 12:57:28 +09:00
Jeff Biseda 04f0311aa1
check data budget before accessing blockstore (#30809) 2023-03-22 15:56:06 -07:00
Jeff Biseda 94b27d8f96
add metric for repair request blockstore misses (#30810) 2023-03-22 15:47:11 -07:00
Illia Bobyr 809041b151
poh_verify => run_verification: Rename to be more accurate (#30811)
`poh_verify` actually disables transaction signature, tick count and
built in program argument verifications as well.  It is somewhat
confusing to call it `poh_verify`.
2023-03-22 11:03:30 -07:00
Brooks 35437b8dad
Makes AccountsHashVerifier aware of Incremental Accounts Hash (#30820) 2023-03-22 10:20:16 -04:00
behzad nouri 25b7811869
moves shreds deduper to shred-sigverify stage (#30786)
Shreds arriving at tvu/tvu_forward/repair sockets are each processed in
a separate thread, and since each thread has its own deduper, the
duplicates across these sockets are not filtered out.
Using a common deduper across these threads will require an RwLock
wrapper and may introduce lock contention.
The commit instead moves the shred-deduper to shred-sigverify-stage
where all these shreds arrive through the same channel.
2023-03-22 13:19:16 +00:00
Andrew Fitzgerald ac8c31b5d6
PohRecorder::recorder -> new_recorder (#30838) 2023-03-22 13:19:20 +09:00
Brooks b64d0de771
Makes snapshot_utils aware of Incremental Accounts Hash (#30804) 2023-03-21 16:34:30 +00:00
steviez 5344a789d7
Revert "Add new vote state version that replaces Lockout with LandedV… (#30817)
Revert "Add new vote state version that replaces Lockout with LandedVote to a… (#29524)"

This reverts commit d77f0a22c7.
2023-03-21 22:54:13 +08:00
behzad nouri e66edeb180
moves turbine-disabled check to shred-fetch-stage (#30799)
If turbine_disabled is true, the commit discards turbine packets
earlier in the pipeline so that they won't interfere with the deduper
and the packets can get through once turbine is enabled again.

This is a prerequisite of:
https://github.com/solana-labs/solana/pull/30786
so that local-cluster tests pass.
2023-03-20 20:34:41 +00:00
behzad nouri c6e7aaf96c
removes lazy-static thread-pool from sigverify-shreds (#30787)
Instead the thread-pool is passed explicitly from higher in the call
stack so that
https://github.com/solana-labs/solana/pull/30786
can use the same thread-pool for shred deduplication.
2023-03-20 20:33:22 +00:00
behzad nouri 5d9aba5548
increases retransmit-stage deduper capacity and reset-cycle (#30758)
For duplicate block detection, for each (slot, shred-index, shred-type)
we need to allow 2 different shreds to be retransmitted.
The commit implements this using two bloom-filter dedupers:
* Shreds are deduplicated using the 1st deduper.
* If a shred is not a duplicate, then we check if:
      (slot, shred-index, shred-type, k)
  is not a duplicate for either k = 0  or k = 1 using the 2nd deduper,
  and if so then the shred is retransmitted.

This allows to achieve larger capactiy compared to current LRU-cache.
2023-03-20 20:32:23 +00:00
behzad nouri 9ad77485ce
generalizes deduper to work with any hashable type (#30753)
generalizes Deduper to work with any hashable type

Current Deduper is hard-coded only for Packet type. In order to use
Deduper in retransmit-stage, we need to dedup types other than Packet.
The commit generalizes Deduper to any hashable type.
2023-03-20 18:04:46 +00:00
behzad nouri 4de59881b7
filters out merkle shreds until feature activation (#30769)
In order to maintain backward compatibility, the commit reworks merkle
shreds feature gate to off by default until the feature activation.
2023-03-20 15:44:00 +00:00
behzad nouri 4b595ebaaf
adds metrics tracking deduper saturations (#30779) 2023-03-20 15:33:36 +00:00
bji d77f0a22c7
Add new vote state version that replaces Lockout with LandedVote to a… (#29524)
* Add new vote state version that replaces Lockout with LandedVote to allow vote latency to be tracked in a future change. Includes a feature to be enabled which will when enabled cause the vote state to be written in the new form.

* Update feature set key to one owned by ashwin

---------

Co-authored-by: Ashwin Sekar <ashwin@solana.com>
2023-03-20 08:31:46 -06:00
behzad nouri 46614c0e9f
removes false_positive_rate field from Deduper (#30788)
removes the false_positive_rate field from the Deduper

Deduper.false_positive_rate field is misleading because it is not
enforced until maybe_reset is called. But then maybe_reset can be
invoked with an explicit argument.
2023-03-20 13:16:52 +00:00
Brooks cd7fe76744
Removes writing BankIncrementalSnapshotPersistence in AccountsHashVerifier (#30792) 2023-03-19 21:45:13 -04:00
Xiang Zhu 8e3a30c22c
Clean orphaned account snapshot dirs (#30645)
* Clean up orphaned account snapshot hardlink dirs

* fix compilation issues

* debugged, now working.  seeing the orphaned directories deleted

* change back to eprintln + exit for account_path error

* changed eprintln to panic for now

* add test_clean_orphaned_account_snapshot_dirs for codecov check

* address a few comments and nit isseus

* directly unzip, skipped the intermediate array of tuples

* let set_up_account_run_and_snapshot_paths return Result

* 'proper' typo, and comment on return

* use map_err

* use for loop in clean_orphaned_account_snapshot_dirs, removed panic

* add test_set_up_account_run_and_snapshot_paths

* minor, replace .for_each with .all

* rename set_up_account_run_and_snapshot_paths to create_all_accounts_run_and_snapshot_dirs

* remove unnecessary closure return type

* change to for loop

* change match to unwrap_or_else

* remove create_dir_all(&account_path) in create_all

* minor comment cleanup
2023-03-17 15:22:10 -07:00
Brooks af367db6f0
Calculates accounts hash from storages in snapshot tests (#30778) 2023-03-17 15:22:02 -04:00
behzad nouri 93f696dac7
increases shred-fetch-stage deduper capacity and reset-cycle (#30690) 2023-03-17 00:05:29 +00:00
cavemanloverboy 10f49d4e26
Geyser Runtime Reload (#30352)
Support dynamic geyser plugin load, unload, and listing through admin RPC.
2023-03-16 17:03:00 -07:00
behzad nouri 7a7b020580
dedups packets using an atomic bloom filter (#30726)
Current Deduper implementation uses many bits per entry:
https://github.com/solana-labs/solana/blob/65cd55261/perf/src/deduper.rs#L70-L73
and may be saturated quickly. It also lacks api to specify desired false
positive rate.

The commit instead uses an atomic bloom filter with K hash functions.
The false positive rate is obtained by tracking popcount of bits.
2023-03-16 16:45:42 +00:00
ryleung-solana 0ed9f62602
Quic server batching (#30330) 2023-03-16 21:50:57 +08:00
Brooks 6bdbd2dfec
Removes unnecessary AccountsHashVerifier from snapshot tests (#30738) 2023-03-16 09:17:42 -04:00
Brooks 423fd6010e
Removes extraneous accounts hash calculations in snapshot tests (#30737) 2023-03-16 09:17:03 -04:00
Andrew Fitzgerald b7e76c752f
Separate stats updates from decision_maker (#30481)
* Separate stats updates from decision_maker

* BufferedPacketsDecision::bank_start

* BufferedPacketsDecision: bank_start() doc-comment

* remove unnecessary clone
2023-03-15 19:39:48 -07:00
Ryo Onodera 74970a0b5d
Remove unused ProcessOptions::entry_callback (#30600)
* Confine entry_callback under cfg(test) for clarity

* Fix ci

* Actually remove entry_callback altogether

* fix clippy
2023-03-16 09:33:18 +09:00
Brooks a5f86a8212
Verifies accounts hash in snapshot tests (#30724) 2023-03-15 12:23:44 -04:00
Tyera b389d509a8
Track max_complete_rewards_slot for use in rpc, bigtable (#30698)
* Add RewardsMessage enum

* Cache and update max_complete_rewards_slot

* Plumb max_complete_rewards_slot into JsonRpcRequestProcesseor

* Use max_complete_rewards_slot to check get_block requests

* Use max_complete_rewards_slot to limit Bigtable uploads

* Plumb max_complete_rewards_slot into RpcSubscriptions

* Use max_complete_rewards_slot to limit block subscriptions

* Nit: fix test
2023-03-14 12:08:48 -06:00
Brooks 93c43610ac
AccountsHashVerifier stores IncrementalAccountsHash in AccountsDb (#30696) 2023-03-14 12:41:44 -04:00
Brennan 9b587bf073
Create gossip vote iterator sorted by stake weight (#30697) 2023-03-14 08:43:01 -07:00
Brennan 11c942ab40
`test_verified_vote_packets_validator_gossip_votes_iterator_correct_fork` dynamic num validators (#30695)
Gossip vote test dynamic number of validators cleanup
2023-03-14 08:35:26 -07:00
Brooks 560ec08d5e
AccountsHashVerifier writes BankIncrementalSnapshotPersistence (#30587) 2023-03-13 17:44:34 -04:00
Brooks 346021a48c
Refactors common accounts hash calculation config in AccountsHashVerifier (#30677)
* Refactors common accounts hash calculation config in AccountsHashVerifier

* pr: config var
2023-03-13 19:39:28 +00:00
Brooks 6e5615e32d
Revert "AccountsHashVerifier remembers last full snapshot info (#30582)" (#30660) 2023-03-13 14:48:16 -04:00
Brooks 505e3ff5c7
AccountsHashVerifier updates AccountsDb after calculating accounts hash (#30658) 2023-03-13 16:41:24 +00:00