Commit Graph

351 Commits

Author SHA1 Message Date
Pankaj Garg f4287d70bb
Move accounts-db code to its own crate (#32766) 2023-08-09 13:03:36 -07:00
carllin d5faa6e8aa
Local Cluster Duplicate Switch Test (#32614)
* Add test for broken behavior in same batch

* tests

* redo test

* Important fixes to not immediately duplicate confirm by adding extra node

* Fixup merge

* PR comments

* Redo stakes

* clippy

* fixes

* Resolve conflicts

* add thread logging

* Fixup merge

* Fixup bugs

* Revert "add thread logging"

This reverts commit 9dc22401054b8f91f2b2aa3033e482996913febb.

* Hide scope

* Fixes

* Cleanup test_faulty_node

* More fixes

* Fixes

* Error logging

* Fix duplicate confirmed

* done

* PR comments

* Revert "Error logging"

This reverts commit 18953c36a5e865ecdd38bbf49b8d0502448087d2.

* PR comments

* nit
2023-08-08 19:29:39 -04:00
steviez cd39a6afd3
Make use of Blockstore's LedgerColumn objects (#32506)
* Move functions that take a &Database under impl Blockstore {...}
* Replace &Database with &self in those functions and update callers
* Use LedgerColumn's from Blockstore instead of re-fetching
* Add missing roots LedgerColumn and have it report metrics like others
* Remove several redundant comments
2023-07-25 10:25:12 -05:00
behzad nouri 7f2f0136bd
enables merkle shreds for devnet and development clusters (#32533) 2023-07-20 12:47:28 +00:00
carllin b6927db6a8
Detect duplicates in the same insert batch (#32528) 2023-07-19 21:17:59 -04:00
steviez c2ae30eb4d
Simplify signature for Blockstore::is_shred_duplicate() (#32537)
The existing signature unpacked elements from a Shred and took an owned
Vec<u8>, forcing a .clone() from the caller. The Shred can be passed in
directly to simplify argument list and avoid the clone.
2023-07-19 13:54:29 -05:00
behzad nouri cfb028819a
deprecates Signature::new in favor of Signature::{try_,}from (#32481) 2023-07-14 22:51:12 +00:00
steviez b1fd0e8e18
Cleanup Blockstore::find_missing_indexes() (#32449)
Add doc comments and simplify logic inside the function.
2023-07-14 10:47:08 -05:00
steviez 04fab2b48f
Make Blockstore::get_data_shreds_for_slot() return type consistent (#32460)
We have several other functions that return data shreds; however, these
other functions map shred::Error to BlockstoreError. Make this function
consistent with those and map the error.
2023-07-11 21:25:29 -05:00
steviez 0a1e641057
Fix flaky find_missing_data_indexes_timeout() test (#32450) 2023-07-11 10:10:25 -05:00
behzad nouri d54b6204be
removes instances of clippy::manual_let_else (#32417) 2023-07-09 21:41:36 +00:00
steviez 5feebd2dc8
ledger-tool: Manually walk optimistic slots ancestors (#32362)
If a slot is marked as optimistically confirmed, it is probable but not
guaranteed that its' ancestors will also be marked as optimistically
confirmed in the Blockstore. Given the importance of examining
optimistically confirmed slots around cluster restarts, manually walk
an AncestorIterator to avoid the chance of a slot improperly being
ignored in cluster restart scenarios.
2023-07-05 10:18:15 -04:00
steviez d5ad29d837
Make Blockstore::scan_and_fix_roots() take optional start/stop slots (#32289)
The optional args allow reuse by ledger-tool repair roots command Also,
hold cleanup lock for duration of Blockstore::scan_and_fix_roots().

This prevents a scenario where scan_and_fix_roots() could identify a
slot as needing to be marked root, that slot getting cleaned by
LedgerCleanupService, and then scan_and_fix_roots() marking the slot as
root on the now purged slot.
2023-06-28 22:32:03 -05:00
steviez 6b013f46eb
Add comments and unit test for Blockstore::scan_and_fix_roots() (#32283) 2023-06-27 10:39:23 -05:00
Alexander Meißner ee2c2ef6c7
Cleanup - require_static_program_ids_in_transaction (#31767)
require_static_program_ids_in_transaction
2023-06-07 17:12:41 +02:00
Ashwin Sekar 3e8f5bad81
refactor: highest_cluster_confirmed_root -> highest_super_majority_root (#31619) 2023-05-14 00:42:03 -07:00
Ashwin Sekar ef75f1cb4e
Add ancestor hashes to state machine (#31627)
* Notify replay of pruned duplicate confirmed slots

* Ingest replay signal and run ancestor hashes for pruned

* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service

* Add local-cluster test
2023-05-13 02:05:44 -07:00
steviez 0eec1ad57f
chore: Update Blockstore::get_slots_since() doc comments (#31134)
Additionally, change the function definition to use Slot instead of u64.
Slot is defined as an alias of u64 so these are functionally equivalent,
but using Slot over u64 is more expressive of the intent.
2023-04-11 02:39:31 -05:00
steviez 814de50f2a
chore: Variable rename `height` ==> `slot` in blockstore function (#31132)
Slots refer to time windows where a block could be produced whereas
height refers to how many blocks are actually in a fork. This function
is operating on a list of slots, so the use of "height" is incorrect and
misleading.
2023-04-11 02:38:20 -05:00
Illia Bobyr a1149ecafe
make_slot_entries(): Use unique hashes (#30627) 2023-03-30 19:36:30 -07:00
steviez 8db35eb7e5
ledger-tool: Add flag to find non-vote optimistic slots (#30580)
In cluster restart scenarios, it is desirable to know the latest
optimistic slot(s), which the subcommand latest-optimistic-slots will
return. However, it is also useful to know whether slots contain only
votes or if they contain votes and user transactions.

This PR adds an extra column of output to show whether an optimistically
confirmed slot is vote only (contains zero non-vote transactions).
Additionally, a flag has been added to enable filtering output to
exclude vote only slots.
2023-03-23 14:13:41 -07:00
steviez bc933c63ce
Fix SlotMeta connected tracking (#28069)
Fix SlotMeta is_connected tracking

The tracking of connected status was previously based upon an assumption
that would be practically false for all validators. The connected status
of slots played into whether Blockstore would signal ReplayStage that it
had new shreds ready to be replayed. Prior to the change, we would never
signal and ReplayStage would always wait the entire duration of a 100ms
timeout before restarting its' main processing loop.

This commit introduces a change where we mark snapshot slots as
connected. A validator may not have a path all the way back to genesis
itself; however, snapshots are taken at known roots so we extend the
connected status to these slots. Once a node has been bootstrapped once
to have is connected, the logic persists in Blockstore such that all
children on the main fork also get their connected status updated
properly.
2023-03-21 20:17:58 +08:00
behzad nouri 5d9aba5548
increases retransmit-stage deduper capacity and reset-cycle (#30758)
For duplicate block detection, for each (slot, shred-index, shred-type)
we need to allow 2 different shreds to be retransmitted.
The commit implements this using two bloom-filter dedupers:
* Shreds are deduplicated using the 1st deduper.
* If a shred is not a duplicate, then we check if:
      (slot, shred-index, shred-type, k)
  is not a duplicate for either k = 0  or k = 1 using the 2nd deduper,
  and if so then the shred is retransmitted.

This allows to achieve larger capactiy compared to current LRU-cache.
2023-03-20 20:32:23 +00:00
Jeff Biseda 20614fa746
restore timestamp() in find_missing_indexes (#30345) 2023-02-15 16:12:36 -08:00
steviez 328b674edc
Remove recursive read lock that could deadlock Blockstore (#30203)
This deadlock could only occur on nodes that call
Blockstore::get_rooted_block(). Regular validators don't call this
function, RPC nodes and nodes that have BigTableUploadService enabled
do.

Blockstore::get_rooted_block() grabs a read lock on lowest_cleanup_slot
right at the start to check if the block has been cleaned up, and to
ensure it doesn't get cleaned up during execution. As part of the
callstack of get_rooted_block(), Blockstore::get_completed_ranges() will
get called, which also grabs a read lock on lowest_cleanup_slot.

If LedgerCleanupService attempts to grab a write lock between the read
lock calls, we could hit a deadlock if priority is given to the write
lock request in this scenario.

This change removes the call to get the read lock in
get_completed_ranges(). The lock is only held for the scope of this
function, which is a single rocksdb read and thus not needed. This does
mean that a different error will be returned if the requested slot was
below lowest_cleanup_slot. Previously, a BlockstoreError::SlotCleanedUp
would have been thrown; the rocksdb error will be bubbled up now. Note
that callers of get_rooted_block() will still get the SlotCleanedUp
error when appropriate because get_rooted_block() grabs the lock. If the
slot is unavailable, it will return immediately. If the slot is
available, get_rooted_block() holding the lock means the slot will
remain available.
2023-02-13 17:33:24 -06:00
Ryo Onodera 3e6162e69e
Add address lookup tables to minimized snapshot (#30158)
* Add address lookup tables to minimized snapshot

* Add comment for future posterity

* Add reference to the issue

* Adjust comment a bit

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>

---------

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
2023-02-10 14:46:02 +09:00
Jeff Biseda 180273b97d
defer HighestShred repairs during shred propagation threshold (#30142) 2023-02-09 14:57:55 -08:00
steviez d3dab24bbe
chore: Use `i` over `ix` variable name when naming worker threads (#30206) 2023-02-09 01:24:57 +00:00
Illia Bobyr 6bc566d802
doc: ledger: Document `CompletedDataSetInfo` (#29998) 2023-02-07 21:19:07 -08:00
Ryo Onodera 40bbf99c74
Add fully-reproducible online tracer for banking (#29196)
* Add fully-reproducible online tracer for banking

* Don't use eprintln!()...

* Update programs/sbf/Cargo.lock...

* Remove meaningless assert_eq

* Group test-only code under aptly named mod

* Remove needless overflow handling in receive_until

* Delay stat aggregation as it's possible now

* Use Cow to avoid needless heap allocs

* Properly consume metrics action as soon as hold

* Trace UnprocessedTransactionStorage::len() instead

* Loosen joining api over type safety for replaystage

* Introce hash event to override these when simulating

* Use serde_with/serde_as instead of hacky workaround

* Update another Cargo.lock...

* Add detailed comment for Packet::buffer serialize

* Rename sender_overhead_minimized_receiver_loop()

* Use type interference for TraceError

* Another minor rename

* Retire now useless ForEach to simplify code

* Use type alias as much as possible

* Properly translate and propagate tracing errors

* Clarify --enable-banking-trace with better naming

* Consider unclean (signal-based) node restarts..

* Tweak logging and cli

* Remove Bank events as it's not needed anymore

* Make tpu own banking tracer thread

* Reduce diff a bit..

* Use latest serde_with

* Finally use the published rolling-file crate

* Make test code change more consistent

* Revive dead and non-terminating test code path...

* Dispose batches early now that possible

* Split off thread handle very early at ::new()

* Tweak message for TooSmallDirByteLimitl

* Remove too much of indirection

* Remove needless pub from ::channel()

* Clarify test comments

* Avoid needless event creation if tracer is disabled

* Write tests around file rotation and spill-over

* Remove unneeded PathBuf::clone()s...

* Introduce inner struct instead of tuple...

* Remove unused enum BankStatus...

* Avoid .unwrap() for the case of disabled tracer...
2023-01-25 21:54:38 +09:00
behzad nouri 272e667cb2
deprecates Pubkey::new in favor of Pubkey::{,try_}from (#29805)
The commit deprecates Pubkey::new which lacks type-safety and instead
implements TryFrom<&[u8]> and TryFrom<Vec<u8>> for Pubkey.
2023-01-21 18:06:27 +00:00
Brennan Watt aa40c2b712
Increase turbine propagation const (#29742)
* Increase turbine propagation const

Value is used as a delay threshold for issuing shred repairs and analysis is showing we are overly aggressive in requesting repairs. Shreds show up via turbine before the repair completes the vast majority of the time

* Use Duration type for MAX_TURBINE_PROPAGATION
2023-01-17 15:01:00 -08:00
steviez 2b88401ef7
chore: Cleanup and document a Blockstore chaining test (#29705) 2023-01-14 03:04:53 -06:00
Illia Bobyr 59fde130d6
ledger/blockstore: PerfSampleV2: num_non_vote_transactions (#29404)
Store non-vote transaction counts that are now recorded by the banks
into the `blockstore`.

`SamplePerformanceService` now populates `PerfSampleV2` with counts from
the banks.
2023-01-12 19:14:04 -08:00
behzad nouri 283a2b1540
removes #[allow(clippy::same_item_push)] (#29543) 2023-01-06 17:32:26 +00:00
behzad nouri d87128e02c
fixes errors from clippy::needless_borrow (#29535)
https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
2023-01-05 18:21:56 +00:00
behzad nouri 5c9beef498
fixes errors from clippy::useless_conversion (#29534)
https://rust-lang.github.io/rust-clippy/master/index.html#useless_conversion
2023-01-05 18:05:32 +00:00
steviez ff8bb5362c
Remove repetitive logic in SlotMeta first insert detection logic (#29153) 2022-12-15 17:38:27 -06:00
behzad nouri 9524c9dbff patches errors from clippy::uninlined_format_args
https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
2022-12-06 19:32:15 +00:00
steviez 01cd55a27a
Change SlotMeta is_connected bool to bitflags (#29001)
We currently use the is_connected field to be able to signal to
ReplayStage that a slot has replayable updates. It was discovered that
this functionality is effectively broken, and that is_connected is never
true. In order to convey this information to ReplayStage more
effectively, we need extra state information so this PR changes the
existing bool to bitflags with two bits.

From a compatibility standpoint, the is_connected bool was already
occupying one byte in the serialized SlotMeta in blockstore. Thus, the
change from a bool to bitflags still "fits" in that one byte allotment.

In consideration of a case where a client may wish to downgrade software
and use the same ledger, deserializing the bitflags into a bool could
fail if the new bit is set. As such, this PR introduces the second bit
field, but does not set it anywhere. Once clusters have mass adopted a
software version with this PR, a subsequent change to actually set and
use the new field can be introduced.
2022-12-01 14:42:35 -06:00
steviez 3c42c87098
Remove obsoleted return value from Blockstore insert shred method (#28992) 2022-12-01 11:17:46 -06:00
steviez b6dce6cf3b
Move BlockstoreInsertionMetrics field update to blockstore.rs (#28991)
The num_repair field is only blockstore insertion metric being updated
outside of Blockstore::insert() call chain; move the update to insert()
with the rest of the fields in BlockstoreInsertionMetrics struct.
2022-11-30 11:46:35 -06:00
Brennan Watt 9a6ab5e7fe
Distinguish turbine vs repair insertion metrics (#28980) 2022-11-30 09:03:53 -08:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
steviez 2272fd807e
Remove Blockstore manual compaction code (#28409)
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2022-10-28 10:39:00 +02:00
Justin Starry 2d8665d307
Record inner instruction stack height (#28430)
* Record inner instruction stack height

* fix sbf tests

* feedback
2022-10-26 10:37:44 +08:00
steviez 60f6e24b76
Make Blockstore::get_entries_in_data_block() use multi_get() (#28245) 2022-10-09 15:34:03 -04:00
steviez 49dbae7e53
Use VecDeque as a queue instead of Vec (#28190) 2022-10-03 22:55:59 -05:00
Yueh-Hsuan Chiang 6b17bee5a8
Remove the const default for RocksFifo (#27965)
#### Summary of Changes
Removes the constant default for ShredStorageType::RocksFifo
as the shred storage size is either user-specified or derived
from --limit-ledger-size in #27459.
2022-10-01 15:10:54 -07:00
steviez f38ed1c266
Use more descriptive variable names in blockstore chaining tests (#28131) 2022-09-29 10:24:09 -05:00