Commit Graph

1380 Commits

Author SHA1 Message Date
Ashwin Sekar 6263537bf0
blockstore_purge: fix inspect -> inspect_err (#66) 2024-03-05 01:16:31 +00:00
Brooks 93f5b514fa
Adds StartingSnapshotStorages to AccountsHashVerifier (#58) 2024-03-04 16:32:51 -05:00
Yihau Chen 3f9a7a52ea [anza migration] rename crates (#10)
* rename geyser-plugin-interface

* rename cargo registry

* rename watchtower

* rename ledger tool

* rename validator

* rename install

* rename geyser plugin interface when patch
2024-03-03 12:31:24 +08:00
Ashwin Sekar cc4072bce8
blockstore: atomize slot clearing, relax parent slot meta check (#35124)
* blockstore: atomize slot clearing, relax parent slot meta check

clear_unconfirmed_slot can leave blockstore in an irrecoverable state
if it panics in the middle. write batch this function, so that any
errors can be recovered after restart.

additionally relax the constraint that the parent slot meta must exist,
as it could have been cleaned up if outdated.

* pr feedback: use PurgeType, don't pass slot_meta

* pr feedback: add unit test

* pr feedback: refactor into separate function

* pr feedback: add special columns to helper, err msg, comments

* pr feedback: reword comments and write batch error message

* pr feedback: bubble write_batch error to caller

* pr feedback: reword comments

Co-authored-by: steviez <stevecz@umich.edu>

---------

Co-authored-by: steviez <stevecz@umich.edu>
2024-03-02 23:23:55 -05:00
Brooks 245530b28e
Uses purge_all_bank_snapshots() (#35380) 2024-03-01 07:11:38 -05:00
Sean Young 9bb59aa30f
ledger-tool: verify: add --record-slots and --verify-slots (#34246)
ledger-tool: verify: add --verify-slots and --verify-slots-details

This adds:

    --record-slots <FILENAME>
	Write the slot hashes to this file.

    --record-slots-config hash-only|accounts
	Store the bank (=accounts) json file, or not.

    --verify-slots <FILENAME>
        Verify slot hashes against this file.

The first case can be used to dump a list of (slot, hash) to a json file
during a replay. The second case can be used to check slot hashes against
previously recorded values.

This is useful for debugging consensus failures, eg:

    # on good commit/branch
    ledger-tool verify --record-slots good.json --record-slots-config=accounts

    # on bad commit or potentially consensus breaking branch
    ledger-tool verify --verify-slots good.json

On a hash mismatch an error will be logged with the expected hash vs the
computed hash.
2024-03-01 08:39:30 +00:00
Ashwin Sekar e8c87e86ef
local-cluster: fix flaky optimistic_confirmation tests (#35356)
* local-cluster: fix flaky optimistic_confirmation tests

* pr feedback: latest_vote -> newest_vote, reword some comments
2024-02-29 12:05:20 -08:00
Brooks bdc5cceb18
Purges all bank snapshots after fastboot (#35350) 2024-02-29 14:31:13 -05:00
behzad nouri a7a41e7631
adds Merkle shred variant with retransmitter's signature (#35293)
Moving towards locking down Turbine propagation path, the commit
reserves a buffer within shred payload for retransmitter's signature.
2024-02-28 20:31:40 +00:00
steviez 09925a11eb
Remove the Blockstore thread pool used for fetching Entries (#34768)
There are several cases for fetching entries from the Blockstore:
- Fetching entries for block replay
- Fetching entries for CompletedDataSetService
- Fetching entries to service RPC getBlock requests

All of these operations occur in a different calling thread. However,
the currently implementation utilizes a shared thread-pool within the
Blockstore function. There are several problems with this:
- The thread pool is shared between all of the listed cases, despite
  block replay being the most critical. These other services shouldn't
  be able to interfere with block replay
- The thread pool is overprovisioned for the average use; thread
  utilization on both regular validators and RPC nodes shows that many
  of the thread see very little activity. But, these thread existing
  introduce "accounting" overhead
- rocksdb exposes an API to fetch multiple items at once, potentially
  with some parallelization under the hood. Using parallelization in
  our API and the underlying rocksdb is overkill and we're doing more
  damage than good.

This change removes that threadpool completely, and instead fetches
all of the desired entries in a single call. This has been observed
to have a minor degradation on the time spent within the Blockstore
get_slot_entries_with_shred_info() function. Namely, some buffer
copying and deserialization that previously occurred in parallel now
occur serially.

However, the metric that tracks the amount of time spent replaying
blocks (inclusive of fetch) is unchanged. Thus, despite spending
marginally more time to fetch/copy/deserialize with only a single
thread, the gains from not thrashing everything else with the pool
keep us at parity.
2024-02-26 20:27:03 -06:00
behzad nouri 0ab425b43b
splits test_shred_variant_compat into separate test-cases (#35306) 2024-02-26 17:32:47 +00:00
behzad nouri c8ee4f59ad
uses struct instead of tuple for Merkle shreds variant (#35303)
Working towards adding a new Merkle shred variant with retransmitter's
signature, the commit uses struct instead of tuple to describe Merkle shred
variant.
2024-02-26 15:58:40 +00:00
Brooks 7da8d82aa1
Adds snapshot_utils::purge_all_bank_snapshots() (#35291) 2024-02-23 11:15:10 -05:00
steviez 4905076fb6
Remove channel that sends roots to BlockstoreCleanupService (#35211)
Currently, ReplayStage sends new roots to BlockstoreCleanupService, and
BlockstoreCleanupService decides when to clean based on advancement of
the latest root. This is totally unnecessary as the latest root is
cached by the Blockstore, and this value can simply be fetched.

This change removes the channel completely, and instead just fetches
the latest root from Blockstore directly. Moreso, some logic is added
to check the latest root less frequently, based on the set purge
interval.

All in all, we went from sending > 100 slots/min across a crossbeam
channel to reading an atomic roughly 3 times/min, while also removing
the need for an additional thread that read from the channel.
2024-02-21 10:16:16 -06:00
Dmitri Makarov 0acee67891
SVM: move transaction_results from accounts-db to SVM (#35183)
SVM: Remove accounts-db deps in accounts_loader tests
2024-02-20 12:54:56 -08:00
sakridge e4064023bf
Set COPYFILE_DISABLE for mac os so it doesn't generate ._ files (#35213) 2024-02-16 21:58:06 +01:00
behzad nouri 0cfb06f745
adds rollout path for chained Merkle shreds (#35076)
The commit adds should_chain_merkle_shreds to incrementally roll out
chained Merkle shreds to clusters.
2024-02-08 23:06:00 +00:00
Dmitri Makarov b9ee3b475b
SVM: Move RentDebits from accounts-db to Solana SDK (#35135) 2024-02-07 15:10:17 -08:00
Pankaj Garg 46b9586630
SVM: Move SVM code to its own crate folder (#35119) 2024-02-06 16:06:32 -08:00
steviez fddfc8431e
Reorder fields in shred_insert_is_full datapoint (#35117)
Put the slot as the first field to make grep'ing for datapoints for a
specific slot in logs easier. This does not effect the datapoints
submission / presentation in metrics database
2024-02-06 16:38:05 -06:00
behzad nouri 8d0ca9db78
chains Merkle shreds in broadcast fake shreds (#35061)
The commit migrates
    turbine/src/broadcast_stage/broadcast_fake_shreds_run.rs
to use chained Merkle shreds variant.
2024-02-06 20:02:38 +00:00
Pankaj Garg 3cf5dd2afb
SVM: Move RuntimeConfig to svm folder (#35085) 2024-02-05 13:49:36 -08:00
Brooks daa2449ad4
Removes RwLock on AccountsDb::shrink_paths (#35027) 2024-02-01 09:35:34 -05:00
behzad nouri 79bbe4381a
adds chained_merkle_root to shredder arguments (#34952)
Working towards chaining Merkle root of erasure batches, the commit adds
chained_merkle_root to shredder arguments.
2024-01-27 15:04:31 +00:00
behzad nouri d4fdcd940a
adds feature to enable chained Merkle shreds (#34916)
During a cluster upgrade when only half of the cluster can ingest the new shred
variant, sending shreds of the new variant can cause nodes to diverge.
The commit adds a feature to enable chained Merkle shreds explicitly.
2024-01-27 15:03:16 +00:00
steviez 9122193e17
blockstore: Make is_orphan() a method of SlotMeta (#34889)
The old function's only input is a SlotMeta, so makes sense to move
it a member function of SlotMeta
2024-01-22 19:14:51 -06:00
behzad nouri 9a520fd5b4
adds chained merkle shreds variant (#34787)
With the new chained variants, each Merkle shred will also embed the Merkle
root of the previous erasure batch.
2024-01-20 16:08:16 +00:00
steviez 3bccdaff7f
blockstore: Adjust the error message for missing shreds (#34833)
The log statement is currently a bit misleading, and could be
interpretted as saying this routine deleted a shred.

Adjust the log statement to state that this routine is looking for the
shred but couldn't find it. Also, elevate the log to error level as
inconsistent state across columns should not be happening.
2024-01-18 12:17:49 -06:00
behzad nouri 586c794c8a
adds get_proof_offset for Merkle shreds (#34798)
In preparation of adding chained Merkle shreds variant, the commit
reworks api for proof-offset within the shred binary.
2024-01-17 20:53:56 +00:00
Andrew Fitzgerald 257ba2f0b1
Add benchmark for execute_batch (#34717) 2024-01-13 09:09:04 -08:00
Justin Starry 5f74fc4f16
Update genesis processing to have a fallback collector id for tests (#34135)
* Update genesis processing to have a fallback collector id for tests

* DCOU-ify the collector id for tests parameter (#1902)

* wrap test_collector_id in DCOU

* rename param to collector_id_for_tests

* fix program test

* fix dcou

---------

Co-authored-by: Brooks <brooks@prumo.org>
2024-01-10 08:34:41 +08:00
Brooks abe699b7b4
Adds newline to fastboot's CLI help (#34712) 2024-01-09 15:28:39 -05:00
steviez dce3ce3734
Adjust blockstore open logs to say blockstore instead of database (#34672)
Additionally, make the log before/after the open more similar so it is
more clear while skimming logs that they correspond to each other.
2024-01-05 21:23:39 -06:00
Ashwin Sekar 19088411ff
blockstore: populate duplicate shred proofs for merkle root conflicts (#34270)
* blockstore: populate duplicate shred proofs for merkle root conflicts

* pr feedback: check test case

* pr feedback: comment

* pr feedback: match statement, shred_id, comment

* add feature flag

* pr feedback: rename ff var and perform_merkle_check

* pr feedback: move panic to callers in get_shred_from_just_inserted_or_db

* avoid unecessary write if proof is already present
2024-01-03 12:15:52 -05:00
Nick Frostbutter fc2a8794be
[docs] updated readme and fix links (#34565)
* feat: updated readme

* fix: updated links

* fix: proposal links

* fix: more links

* fix: json-rpc links

* fix: more links

* fix: zk links

* fix: managing forks

* fix: links for deprecated methods
2024-01-03 09:06:06 -05:00
Ashwin Sekar cc584a0c19
blockstore: write only dirty erasure meta and merkle root metas (#34269)
* blockstore: write only dirty erasure meta and merkle root metas

* pr feedback: use enum to distinguish clean/dirty

* pr feedback: comments, rename

* pr feedback: use AsRef
2023-12-22 16:26:50 -05:00
GoodDaisy 03386cc7b9
Fix typos (#34459)
* Fix typos

* Fix typos

* fix typo
2023-12-21 13:06:00 -07:00
Ryo Onodera d2b5afc410
Finish unified scheduler plumbing with min impl (#34300)
* Finalize unified scheduler plumbing with min impl

* Fix comment

* Rename leftover type name...

* Make logging text less ambiguous

* Make PhantomData simplyer without already used S

* Make TaskHandler stateless again

* Introduce HandlerContext to simplify TaskHandler

* Add comment for coexistence of Pool::{new,new_dyn}

* Fix grammar

* Remove confusing const for upcoming changes

* Demote InstalledScheduler::context() into dcou

* Delay drop of context up to return_to_pool()-ing

* Revert "Demote InstalledScheduler::context() into dcou"

This reverts commit 049a126c905df0ba8ad975c5cb1007ae90a21050.

* Revert "Delay drop of context up to return_to_pool()-ing"

This reverts commit 60b1bd2511a714690b0b2331e49bc3d0c72e3475.

* Make context handling really type-safe

* Update comment

* Fix grammar...

* Refine type aliases for boxed traits

* Swap the tuple order for readability & semantics

* Simplify PooledScheduler::result_with_timings type

* Restore .in_sequence()

* Use where for aesthetics

* Simplify if...

* Fix typo...

* Polish ::schedule_execution() a bit

* Fix rebase conflicts..

* Make test more readable

* Fix test failures after rebase...
2023-12-19 09:50:41 +09:00
behzad nouri 750023530c
makes last erasure batch size >= 64 shreds (#34330) 2023-12-13 06:48:00 +00:00
steviez 70cab76495
Remove deletion of TransactionStatusIndex entries (#34023)
These entries are legacy code at this point; however, older release
branches require these entries to be present. Also, while it would be
nice to clean up these entries immediately, they only occupy a small
amount of space so having them linger a little longer isn't a big deal.
2023-12-12 22:53:51 -06:00
behzad nouri d5eee01950
adds feature gated code to drop legacy shreds (#34328) 2023-12-06 22:47:46 +00:00
Lucas Steuernagel 1877fdb273
Use BankForks on tests - Part 4 (#34271)
* Use BankForks on tests - Part 4

* Ensure the correct slot is set
2023-12-06 13:32:04 -03:00
Andrew Fitzgerald 2294801954
Do not derive Copy for EpochSchedule and Rent (#32767) 2023-12-01 07:57:25 -08:00
Ashwin Sekar d84dcd37bc
blockstore: use u32 for fec_set_index in erasure set index store key (#34268)
* blockstore: use u32 for fec_set_index in erasure set index store key

* pr feedback u64::from
2023-11-30 19:17:49 -05:00
steviez 479b7ee9f2
Bubble up errors in bank_fork_utils instead of exiting process (#34277)
There are operations in bank_fork_utils that may fail; we explicitly
call std::process::exit() on several of these. Granted we may end up
exiting the process higher up the callstack, bubbling the errors up
allow a caller that could handle the error to do so.
2023-11-30 16:35:59 -06:00
steviez 71c1782c74
Allow Blockstore to open unknown columns (#34174)
As we develop new features or modifications, we occassionally need to
introduce new columns to the Blockstore. Adding a new column introduces
a compatibility break given that opening the database in Primary mode
(R/W access) requires opening all columns. Reverting to an old software
version that is unaware of the new column is obviously problematic.

In the past, we have addressed by backporting minimal "stub" PR's to
older versions. This is annoying, and only allow compatibility for the
single version or two that we backport to.

This PR adds a change to automatically detect all columns, and create
default column descriptors for columns we were unaware of. As a result,
older software versions can open a Blockstore that was modified by a
newer software version, even if that new version added columns that the
old version is unaware of.
2023-11-30 13:24:56 -06:00
Ashwin Sekar e1165aaf00
blockstore: populate merkle root metas column (#34097) 2023-11-29 11:14:24 -05:00
Tyera 573ec81fbb
storage-bigtable: Upload entries (#34099)
* Add entries table to bt init

* Add entries to storage-proto

* Use new Blockstore method in bigtable_upload

* Add LedgerStorage::upload_confirmed_block_with_entries and use in bigtable_upload

* Upload entries to bigtable
2023-11-28 11:47:22 -07:00
Brooks 5c7ab5dc08
ledger-tool does *not* fastboot by default (#34228) 2023-11-27 13:48:28 -05:00
steviez 9a7b681f0c
Remove key_size() method from Column trait (#34021)
This helper simply called std::mem::size_of<Self::Index>(). However, all
of the underlying functions that create keys manually copy fields into a
byte array. The fields are copied in end-to-end whereas size_of() might
include alignment bytes.

For example, a (u64, u32) only has 12 bytes of "data", but it would
have size 16 due to the 4 alignment padding bytes that would be
added to get the u32 (size 4) aligned with the u64 (size 8).
2023-11-19 23:05:32 -06:00