Commit Graph

1393 Commits

Author SHA1 Message Date
Wen 312f725f1e
wen_restart: Find the bank hash of the heaviest fork, replay if necessary. (#420)
* Find the bank hash of the heaviest fork, replay if necessary.

* Make it more explicit how heaviest fork slot is selected.

* Use process_single_slot instead of process_blockstore_from_root, the latter
may re-insert banks already frozen.

* Put BlockstoreProcessError into the error message.

* Check that all existing blocks link to correct parent before replay.

* Use the default number of threads instead.

* Check whether block is full and other small fixes.

* Fix root_bank and move comments to function level.

* Remove the extra parent link check.
2024-04-07 16:17:52 -07:00
Joe C 03ef611f0c
program-runtime: hoist `RuntimeConfig` up to SVM (#630)
program-runtime: hoist `RuntimeConfig` out into SVM
2024-04-07 10:45:57 -05:00
Brooks 8822aaa67e
Do not purge all bank snapshots after fastboot (#345) 2024-03-28 15:10:50 -04:00
Brooks 182d27f718
Checks if bank snapshot is loadable before fastbooting (#343) 2024-03-28 11:14:23 -04:00
steviez 10d06773cd
Share the threadpool for tx execution and entry verifification (#216)
Previously, entry verification had a dedicated threadpool used to verify
PoH hashes as well as some basic transaction verification via
Bank::verify_transaction(). It should also be noted that the entry
verification code provides logic to offload to a GPU if one is present.

Regardless of whether a GPU is present or not, some of the verification
must be done on a CPU. Moreso, the CPU verification of entries and
transaction execution are serial operations; entry verification finishes
first before moving onto transaction execution.

So, tx execution and entry verification are not competing for CPU cycles
at the same time and can use the same pool.

One exception to the above statement is that if someone is using the
feature to replay forks in parallel, then hypothetically, different
forks may end up competing for the same resources at the same time.
However, that is already true given that we had pools that were shared
between replay of multiple forks. So, this change doesn't really change
much for that case, but will reduce overhead in the single fork case
which is the vast majority of the time.
2024-03-27 16:33:21 -05:00
Ashwin Sekar cfd5b71b28
shred: expose chained merkle root (#435)
* shred: expose chained merkle root

* pr feedback: macro, pub(super), _=> none
2024-03-27 10:06:43 -07:00
Pankaj Garg cc3afa5588
Remove public visibility of program cache from bank (#279) 2024-03-17 15:29:20 -07:00
Wen 5591db7801
Wen_restart: check block full using blockstore (#250)
* Switch to blockstore.is_full() check because replay thread isn't active.

* Use make_chaining_slot_entries and add first_parent to the method.
Small style fixes.

* Switch to blockstore.is_full() check because replay thread isn't active.
2024-03-14 20:45:03 -07:00
steviez 7a144e2b9f
Make ReplayStage own the threadpool for tx replay (#190)
The threadpool used to replay multiple transactions in parallel is
currently global state via a lazy_static definition. Making this pool
owned by ReplayStage will enable subsequent work to make the pool
size configurable on the CLI.

This makes `ReplayStage` create and hold the threadpool which is passed
down to blockstore_processor::confirm_slot().

blockstore_processor::process_blockstore_from_root() now creates its'
own threadpool as well; however, this pool is only alive while for
the scope of that function and does not persist the lifetime of the
process.
2024-03-12 13:21:11 -05:00
Brooks 88f6a7a459
Removes holding storages in AccountsHashVerifier for fastboot (#120) 2024-03-11 17:09:26 -04:00
Lucas Steuernagel e027a8bd63
Gather recording booleans in a data structure (#134) 2024-03-08 09:28:04 -03:00
steviez 26692e6664
blockstore: Remove unnecessary function and threadpool (#122)
In a previous change, we removed the threadpool used to fetch entries
in parallel in favor of combining all fetches into a single rocksdb
multi_get() call.

This change does the same thing, except for a threadpool that was used
to fetch entries when we needed them to purge the transaction status
and address signatures columns.
2024-03-07 16:06:31 -06:00
Dmitri Makarov ba43f74dcf
[SVM] Move RuntimeConfig to program-runtime (#96)
RuntimeConfig doesn't use anything SVM specific and logically belongs
in program runtime rather than SVM.  This change moves the definition
of RuntimeConfig struct from the SVM crate to program-runtime and
adjusts `use` statements accordingly.
2024-03-07 10:16:16 -08:00
Ashwin Sekar 6263537bf0
blockstore_purge: fix inspect -> inspect_err (#66) 2024-03-05 01:16:31 +00:00
Brooks 93f5b514fa
Adds StartingSnapshotStorages to AccountsHashVerifier (#58) 2024-03-04 16:32:51 -05:00
Yihau Chen 3f9a7a52ea [anza migration] rename crates (#10)
* rename geyser-plugin-interface

* rename cargo registry

* rename watchtower

* rename ledger tool

* rename validator

* rename install

* rename geyser plugin interface when patch
2024-03-03 12:31:24 +08:00
Ashwin Sekar cc4072bce8
blockstore: atomize slot clearing, relax parent slot meta check (#35124)
* blockstore: atomize slot clearing, relax parent slot meta check

clear_unconfirmed_slot can leave blockstore in an irrecoverable state
if it panics in the middle. write batch this function, so that any
errors can be recovered after restart.

additionally relax the constraint that the parent slot meta must exist,
as it could have been cleaned up if outdated.

* pr feedback: use PurgeType, don't pass slot_meta

* pr feedback: add unit test

* pr feedback: refactor into separate function

* pr feedback: add special columns to helper, err msg, comments

* pr feedback: reword comments and write batch error message

* pr feedback: bubble write_batch error to caller

* pr feedback: reword comments

Co-authored-by: steviez <stevecz@umich.edu>

---------

Co-authored-by: steviez <stevecz@umich.edu>
2024-03-02 23:23:55 -05:00
Brooks 245530b28e
Uses purge_all_bank_snapshots() (#35380) 2024-03-01 07:11:38 -05:00
Sean Young 9bb59aa30f
ledger-tool: verify: add --record-slots and --verify-slots (#34246)
ledger-tool: verify: add --verify-slots and --verify-slots-details

This adds:

    --record-slots <FILENAME>
	Write the slot hashes to this file.

    --record-slots-config hash-only|accounts
	Store the bank (=accounts) json file, or not.

    --verify-slots <FILENAME>
        Verify slot hashes against this file.

The first case can be used to dump a list of (slot, hash) to a json file
during a replay. The second case can be used to check slot hashes against
previously recorded values.

This is useful for debugging consensus failures, eg:

    # on good commit/branch
    ledger-tool verify --record-slots good.json --record-slots-config=accounts

    # on bad commit or potentially consensus breaking branch
    ledger-tool verify --verify-slots good.json

On a hash mismatch an error will be logged with the expected hash vs the
computed hash.
2024-03-01 08:39:30 +00:00
Ashwin Sekar e8c87e86ef
local-cluster: fix flaky optimistic_confirmation tests (#35356)
* local-cluster: fix flaky optimistic_confirmation tests

* pr feedback: latest_vote -> newest_vote, reword some comments
2024-02-29 12:05:20 -08:00
Brooks bdc5cceb18
Purges all bank snapshots after fastboot (#35350) 2024-02-29 14:31:13 -05:00
behzad nouri a7a41e7631
adds Merkle shred variant with retransmitter's signature (#35293)
Moving towards locking down Turbine propagation path, the commit
reserves a buffer within shred payload for retransmitter's signature.
2024-02-28 20:31:40 +00:00
steviez 09925a11eb
Remove the Blockstore thread pool used for fetching Entries (#34768)
There are several cases for fetching entries from the Blockstore:
- Fetching entries for block replay
- Fetching entries for CompletedDataSetService
- Fetching entries to service RPC getBlock requests

All of these operations occur in a different calling thread. However,
the currently implementation utilizes a shared thread-pool within the
Blockstore function. There are several problems with this:
- The thread pool is shared between all of the listed cases, despite
  block replay being the most critical. These other services shouldn't
  be able to interfere with block replay
- The thread pool is overprovisioned for the average use; thread
  utilization on both regular validators and RPC nodes shows that many
  of the thread see very little activity. But, these thread existing
  introduce "accounting" overhead
- rocksdb exposes an API to fetch multiple items at once, potentially
  with some parallelization under the hood. Using parallelization in
  our API and the underlying rocksdb is overkill and we're doing more
  damage than good.

This change removes that threadpool completely, and instead fetches
all of the desired entries in a single call. This has been observed
to have a minor degradation on the time spent within the Blockstore
get_slot_entries_with_shred_info() function. Namely, some buffer
copying and deserialization that previously occurred in parallel now
occur serially.

However, the metric that tracks the amount of time spent replaying
blocks (inclusive of fetch) is unchanged. Thus, despite spending
marginally more time to fetch/copy/deserialize with only a single
thread, the gains from not thrashing everything else with the pool
keep us at parity.
2024-02-26 20:27:03 -06:00
behzad nouri 0ab425b43b
splits test_shred_variant_compat into separate test-cases (#35306) 2024-02-26 17:32:47 +00:00
behzad nouri c8ee4f59ad
uses struct instead of tuple for Merkle shreds variant (#35303)
Working towards adding a new Merkle shred variant with retransmitter's
signature, the commit uses struct instead of tuple to describe Merkle shred
variant.
2024-02-26 15:58:40 +00:00
Brooks 7da8d82aa1
Adds snapshot_utils::purge_all_bank_snapshots() (#35291) 2024-02-23 11:15:10 -05:00
steviez 4905076fb6
Remove channel that sends roots to BlockstoreCleanupService (#35211)
Currently, ReplayStage sends new roots to BlockstoreCleanupService, and
BlockstoreCleanupService decides when to clean based on advancement of
the latest root. This is totally unnecessary as the latest root is
cached by the Blockstore, and this value can simply be fetched.

This change removes the channel completely, and instead just fetches
the latest root from Blockstore directly. Moreso, some logic is added
to check the latest root less frequently, based on the set purge
interval.

All in all, we went from sending > 100 slots/min across a crossbeam
channel to reading an atomic roughly 3 times/min, while also removing
the need for an additional thread that read from the channel.
2024-02-21 10:16:16 -06:00
Dmitri Makarov 0acee67891
SVM: move transaction_results from accounts-db to SVM (#35183)
SVM: Remove accounts-db deps in accounts_loader tests
2024-02-20 12:54:56 -08:00
sakridge e4064023bf
Set COPYFILE_DISABLE for mac os so it doesn't generate ._ files (#35213) 2024-02-16 21:58:06 +01:00
behzad nouri 0cfb06f745
adds rollout path for chained Merkle shreds (#35076)
The commit adds should_chain_merkle_shreds to incrementally roll out
chained Merkle shreds to clusters.
2024-02-08 23:06:00 +00:00
Dmitri Makarov b9ee3b475b
SVM: Move RentDebits from accounts-db to Solana SDK (#35135) 2024-02-07 15:10:17 -08:00
Pankaj Garg 46b9586630
SVM: Move SVM code to its own crate folder (#35119) 2024-02-06 16:06:32 -08:00
steviez fddfc8431e
Reorder fields in shred_insert_is_full datapoint (#35117)
Put the slot as the first field to make grep'ing for datapoints for a
specific slot in logs easier. This does not effect the datapoints
submission / presentation in metrics database
2024-02-06 16:38:05 -06:00
behzad nouri 8d0ca9db78
chains Merkle shreds in broadcast fake shreds (#35061)
The commit migrates
    turbine/src/broadcast_stage/broadcast_fake_shreds_run.rs
to use chained Merkle shreds variant.
2024-02-06 20:02:38 +00:00
Pankaj Garg 3cf5dd2afb
SVM: Move RuntimeConfig to svm folder (#35085) 2024-02-05 13:49:36 -08:00
Brooks daa2449ad4
Removes RwLock on AccountsDb::shrink_paths (#35027) 2024-02-01 09:35:34 -05:00
behzad nouri 79bbe4381a
adds chained_merkle_root to shredder arguments (#34952)
Working towards chaining Merkle root of erasure batches, the commit adds
chained_merkle_root to shredder arguments.
2024-01-27 15:04:31 +00:00
behzad nouri d4fdcd940a
adds feature to enable chained Merkle shreds (#34916)
During a cluster upgrade when only half of the cluster can ingest the new shred
variant, sending shreds of the new variant can cause nodes to diverge.
The commit adds a feature to enable chained Merkle shreds explicitly.
2024-01-27 15:03:16 +00:00
steviez 9122193e17
blockstore: Make is_orphan() a method of SlotMeta (#34889)
The old function's only input is a SlotMeta, so makes sense to move
it a member function of SlotMeta
2024-01-22 19:14:51 -06:00
behzad nouri 9a520fd5b4
adds chained merkle shreds variant (#34787)
With the new chained variants, each Merkle shred will also embed the Merkle
root of the previous erasure batch.
2024-01-20 16:08:16 +00:00
steviez 3bccdaff7f
blockstore: Adjust the error message for missing shreds (#34833)
The log statement is currently a bit misleading, and could be
interpretted as saying this routine deleted a shred.

Adjust the log statement to state that this routine is looking for the
shred but couldn't find it. Also, elevate the log to error level as
inconsistent state across columns should not be happening.
2024-01-18 12:17:49 -06:00
behzad nouri 586c794c8a
adds get_proof_offset for Merkle shreds (#34798)
In preparation of adding chained Merkle shreds variant, the commit
reworks api for proof-offset within the shred binary.
2024-01-17 20:53:56 +00:00
Andrew Fitzgerald 257ba2f0b1
Add benchmark for execute_batch (#34717) 2024-01-13 09:09:04 -08:00
Justin Starry 5f74fc4f16
Update genesis processing to have a fallback collector id for tests (#34135)
* Update genesis processing to have a fallback collector id for tests

* DCOU-ify the collector id for tests parameter (#1902)

* wrap test_collector_id in DCOU

* rename param to collector_id_for_tests

* fix program test

* fix dcou

---------

Co-authored-by: Brooks <brooks@prumo.org>
2024-01-10 08:34:41 +08:00
Brooks abe699b7b4
Adds newline to fastboot's CLI help (#34712) 2024-01-09 15:28:39 -05:00
steviez dce3ce3734
Adjust blockstore open logs to say blockstore instead of database (#34672)
Additionally, make the log before/after the open more similar so it is
more clear while skimming logs that they correspond to each other.
2024-01-05 21:23:39 -06:00
Ashwin Sekar 19088411ff
blockstore: populate duplicate shred proofs for merkle root conflicts (#34270)
* blockstore: populate duplicate shred proofs for merkle root conflicts

* pr feedback: check test case

* pr feedback: comment

* pr feedback: match statement, shred_id, comment

* add feature flag

* pr feedback: rename ff var and perform_merkle_check

* pr feedback: move panic to callers in get_shred_from_just_inserted_or_db

* avoid unecessary write if proof is already present
2024-01-03 12:15:52 -05:00
Nick Frostbutter fc2a8794be
[docs] updated readme and fix links (#34565)
* feat: updated readme

* fix: updated links

* fix: proposal links

* fix: more links

* fix: json-rpc links

* fix: more links

* fix: zk links

* fix: managing forks

* fix: links for deprecated methods
2024-01-03 09:06:06 -05:00
Ashwin Sekar cc584a0c19
blockstore: write only dirty erasure meta and merkle root metas (#34269)
* blockstore: write only dirty erasure meta and merkle root metas

* pr feedback: use enum to distinguish clean/dirty

* pr feedback: comments, rename

* pr feedback: use AsRef
2023-12-22 16:26:50 -05:00
GoodDaisy 03386cc7b9
Fix typos (#34459)
* Fix typos

* Fix typos

* fix typo
2023-12-21 13:06:00 -07:00