Commit Graph

337 Commits

Author SHA1 Message Date
Alexander Meißner ee2c2ef6c7
Cleanup - require_static_program_ids_in_transaction (#31767)
require_static_program_ids_in_transaction
2023-06-07 17:12:41 +02:00
Ashwin Sekar 3e8f5bad81
refactor: highest_cluster_confirmed_root -> highest_super_majority_root (#31619) 2023-05-14 00:42:03 -07:00
Ashwin Sekar ef75f1cb4e
Add ancestor hashes to state machine (#31627)
* Notify replay of pruned duplicate confirmed slots

* Ingest replay signal and run ancestor hashes for pruned

* Forward PDC to ancestor hashes and ingest pruned dumps from ancestor hashes service

* Add local-cluster test
2023-05-13 02:05:44 -07:00
steviez 0eec1ad57f
chore: Update Blockstore::get_slots_since() doc comments (#31134)
Additionally, change the function definition to use Slot instead of u64.
Slot is defined as an alias of u64 so these are functionally equivalent,
but using Slot over u64 is more expressive of the intent.
2023-04-11 02:39:31 -05:00
steviez 814de50f2a
chore: Variable rename `height` ==> `slot` in blockstore function (#31132)
Slots refer to time windows where a block could be produced whereas
height refers to how many blocks are actually in a fork. This function
is operating on a list of slots, so the use of "height" is incorrect and
misleading.
2023-04-11 02:38:20 -05:00
Illia Bobyr a1149ecafe
make_slot_entries(): Use unique hashes (#30627) 2023-03-30 19:36:30 -07:00
steviez 8db35eb7e5
ledger-tool: Add flag to find non-vote optimistic slots (#30580)
In cluster restart scenarios, it is desirable to know the latest
optimistic slot(s), which the subcommand latest-optimistic-slots will
return. However, it is also useful to know whether slots contain only
votes or if they contain votes and user transactions.

This PR adds an extra column of output to show whether an optimistically
confirmed slot is vote only (contains zero non-vote transactions).
Additionally, a flag has been added to enable filtering output to
exclude vote only slots.
2023-03-23 14:13:41 -07:00
steviez bc933c63ce
Fix SlotMeta connected tracking (#28069)
Fix SlotMeta is_connected tracking

The tracking of connected status was previously based upon an assumption
that would be practically false for all validators. The connected status
of slots played into whether Blockstore would signal ReplayStage that it
had new shreds ready to be replayed. Prior to the change, we would never
signal and ReplayStage would always wait the entire duration of a 100ms
timeout before restarting its' main processing loop.

This commit introduces a change where we mark snapshot slots as
connected. A validator may not have a path all the way back to genesis
itself; however, snapshots are taken at known roots so we extend the
connected status to these slots. Once a node has been bootstrapped once
to have is connected, the logic persists in Blockstore such that all
children on the main fork also get their connected status updated
properly.
2023-03-21 20:17:58 +08:00
behzad nouri 5d9aba5548
increases retransmit-stage deduper capacity and reset-cycle (#30758)
For duplicate block detection, for each (slot, shred-index, shred-type)
we need to allow 2 different shreds to be retransmitted.
The commit implements this using two bloom-filter dedupers:
* Shreds are deduplicated using the 1st deduper.
* If a shred is not a duplicate, then we check if:
      (slot, shred-index, shred-type, k)
  is not a duplicate for either k = 0  or k = 1 using the 2nd deduper,
  and if so then the shred is retransmitted.

This allows to achieve larger capactiy compared to current LRU-cache.
2023-03-20 20:32:23 +00:00
Jeff Biseda 20614fa746
restore timestamp() in find_missing_indexes (#30345) 2023-02-15 16:12:36 -08:00
steviez 328b674edc
Remove recursive read lock that could deadlock Blockstore (#30203)
This deadlock could only occur on nodes that call
Blockstore::get_rooted_block(). Regular validators don't call this
function, RPC nodes and nodes that have BigTableUploadService enabled
do.

Blockstore::get_rooted_block() grabs a read lock on lowest_cleanup_slot
right at the start to check if the block has been cleaned up, and to
ensure it doesn't get cleaned up during execution. As part of the
callstack of get_rooted_block(), Blockstore::get_completed_ranges() will
get called, which also grabs a read lock on lowest_cleanup_slot.

If LedgerCleanupService attempts to grab a write lock between the read
lock calls, we could hit a deadlock if priority is given to the write
lock request in this scenario.

This change removes the call to get the read lock in
get_completed_ranges(). The lock is only held for the scope of this
function, which is a single rocksdb read and thus not needed. This does
mean that a different error will be returned if the requested slot was
below lowest_cleanup_slot. Previously, a BlockstoreError::SlotCleanedUp
would have been thrown; the rocksdb error will be bubbled up now. Note
that callers of get_rooted_block() will still get the SlotCleanedUp
error when appropriate because get_rooted_block() grabs the lock. If the
slot is unavailable, it will return immediately. If the slot is
available, get_rooted_block() holding the lock means the slot will
remain available.
2023-02-13 17:33:24 -06:00
Ryo Onodera 3e6162e69e
Add address lookup tables to minimized snapshot (#30158)
* Add address lookup tables to minimized snapshot

* Add comment for future posterity

* Add reference to the issue

* Adjust comment a bit

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>

---------

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
2023-02-10 14:46:02 +09:00
Jeff Biseda 180273b97d
defer HighestShred repairs during shred propagation threshold (#30142) 2023-02-09 14:57:55 -08:00
steviez d3dab24bbe
chore: Use `i` over `ix` variable name when naming worker threads (#30206) 2023-02-09 01:24:57 +00:00
Illia Bobyr 6bc566d802
doc: ledger: Document `CompletedDataSetInfo` (#29998) 2023-02-07 21:19:07 -08:00
Ryo Onodera 40bbf99c74
Add fully-reproducible online tracer for banking (#29196)
* Add fully-reproducible online tracer for banking

* Don't use eprintln!()...

* Update programs/sbf/Cargo.lock...

* Remove meaningless assert_eq

* Group test-only code under aptly named mod

* Remove needless overflow handling in receive_until

* Delay stat aggregation as it's possible now

* Use Cow to avoid needless heap allocs

* Properly consume metrics action as soon as hold

* Trace UnprocessedTransactionStorage::len() instead

* Loosen joining api over type safety for replaystage

* Introce hash event to override these when simulating

* Use serde_with/serde_as instead of hacky workaround

* Update another Cargo.lock...

* Add detailed comment for Packet::buffer serialize

* Rename sender_overhead_minimized_receiver_loop()

* Use type interference for TraceError

* Another minor rename

* Retire now useless ForEach to simplify code

* Use type alias as much as possible

* Properly translate and propagate tracing errors

* Clarify --enable-banking-trace with better naming

* Consider unclean (signal-based) node restarts..

* Tweak logging and cli

* Remove Bank events as it's not needed anymore

* Make tpu own banking tracer thread

* Reduce diff a bit..

* Use latest serde_with

* Finally use the published rolling-file crate

* Make test code change more consistent

* Revive dead and non-terminating test code path...

* Dispose batches early now that possible

* Split off thread handle very early at ::new()

* Tweak message for TooSmallDirByteLimitl

* Remove too much of indirection

* Remove needless pub from ::channel()

* Clarify test comments

* Avoid needless event creation if tracer is disabled

* Write tests around file rotation and spill-over

* Remove unneeded PathBuf::clone()s...

* Introduce inner struct instead of tuple...

* Remove unused enum BankStatus...

* Avoid .unwrap() for the case of disabled tracer...
2023-01-25 21:54:38 +09:00
behzad nouri 272e667cb2
deprecates Pubkey::new in favor of Pubkey::{,try_}from (#29805)
The commit deprecates Pubkey::new which lacks type-safety and instead
implements TryFrom<&[u8]> and TryFrom<Vec<u8>> for Pubkey.
2023-01-21 18:06:27 +00:00
Brennan Watt aa40c2b712
Increase turbine propagation const (#29742)
* Increase turbine propagation const

Value is used as a delay threshold for issuing shred repairs and analysis is showing we are overly aggressive in requesting repairs. Shreds show up via turbine before the repair completes the vast majority of the time

* Use Duration type for MAX_TURBINE_PROPAGATION
2023-01-17 15:01:00 -08:00
steviez 2b88401ef7
chore: Cleanup and document a Blockstore chaining test (#29705) 2023-01-14 03:04:53 -06:00
Illia Bobyr 59fde130d6
ledger/blockstore: PerfSampleV2: num_non_vote_transactions (#29404)
Store non-vote transaction counts that are now recorded by the banks
into the `blockstore`.

`SamplePerformanceService` now populates `PerfSampleV2` with counts from
the banks.
2023-01-12 19:14:04 -08:00
behzad nouri 283a2b1540
removes #[allow(clippy::same_item_push)] (#29543) 2023-01-06 17:32:26 +00:00
behzad nouri d87128e02c
fixes errors from clippy::needless_borrow (#29535)
https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
2023-01-05 18:21:56 +00:00
behzad nouri 5c9beef498
fixes errors from clippy::useless_conversion (#29534)
https://rust-lang.github.io/rust-clippy/master/index.html#useless_conversion
2023-01-05 18:05:32 +00:00
steviez ff8bb5362c
Remove repetitive logic in SlotMeta first insert detection logic (#29153) 2022-12-15 17:38:27 -06:00
behzad nouri 9524c9dbff patches errors from clippy::uninlined_format_args
https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
2022-12-06 19:32:15 +00:00
steviez 01cd55a27a
Change SlotMeta is_connected bool to bitflags (#29001)
We currently use the is_connected field to be able to signal to
ReplayStage that a slot has replayable updates. It was discovered that
this functionality is effectively broken, and that is_connected is never
true. In order to convey this information to ReplayStage more
effectively, we need extra state information so this PR changes the
existing bool to bitflags with two bits.

From a compatibility standpoint, the is_connected bool was already
occupying one byte in the serialized SlotMeta in blockstore. Thus, the
change from a bool to bitflags still "fits" in that one byte allotment.

In consideration of a case where a client may wish to downgrade software
and use the same ledger, deserializing the bitflags into a bool could
fail if the new bit is set. As such, this PR introduces the second bit
field, but does not set it anywhere. Once clusters have mass adopted a
software version with this PR, a subsequent change to actually set and
use the new field can be introduced.
2022-12-01 14:42:35 -06:00
steviez 3c42c87098
Remove obsoleted return value from Blockstore insert shred method (#28992) 2022-12-01 11:17:46 -06:00
steviez b6dce6cf3b
Move BlockstoreInsertionMetrics field update to blockstore.rs (#28991)
The num_repair field is only blockstore insertion metric being updated
outside of Blockstore::insert() call chain; move the update to insert()
with the rest of the fields in BlockstoreInsertionMetrics struct.
2022-11-30 11:46:35 -06:00
Brennan Watt 9a6ab5e7fe
Distinguish turbine vs repair insertion metrics (#28980) 2022-11-30 09:03:53 -08:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
steviez 2272fd807e
Remove Blockstore manual compaction code (#28409)
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2022-10-28 10:39:00 +02:00
Justin Starry 2d8665d307
Record inner instruction stack height (#28430)
* Record inner instruction stack height

* fix sbf tests

* feedback
2022-10-26 10:37:44 +08:00
steviez 60f6e24b76
Make Blockstore::get_entries_in_data_block() use multi_get() (#28245) 2022-10-09 15:34:03 -04:00
steviez 49dbae7e53
Use VecDeque as a queue instead of Vec (#28190) 2022-10-03 22:55:59 -05:00
Yueh-Hsuan Chiang 6b17bee5a8
Remove the const default for RocksFifo (#27965)
#### Summary of Changes
Removes the constant default for ShredStorageType::RocksFifo
as the shred storage size is either user-specified or derived
from --limit-ledger-size in #27459.
2022-10-01 15:10:54 -07:00
steviez f38ed1c266
Use more descriptive variable names in blockstore chaining tests (#28131) 2022-09-29 10:24:09 -05:00
behzad nouri 72537e7e07
bypasses rayon thread-pool for single entry batches (#28077)
With no parallelization, thread-pool only adds overhead.
2022-09-26 21:32:58 +00:00
behzad nouri f49beb0cbc
caches reed-solomon encoder/decoder instance (#27510)
ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache:
https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466

In order to cache this computation, this commit caches the reed-solomon
encoder/decoder instance for each (data_shards, parity_shards) pair.
2022-09-25 18:09:47 +00:00
behzad nouri 45e26574f3
removes redundant shred.sanitize() from blockstore (#28016)
Shreds received from other nodes over the socket are sanitized when the
payload is deserialized:
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/legacy.rs#L137
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/legacy.rs#L77
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/merkle.rs#L355
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/merkle.rs#L439

Similarly, shreds recovered from erasure codes are also sanitized at
deserialization:
https://github.com/solana-labs/solana/blob/f02fe9c7e/ledger/src/shredder.rs#L330
or explicitly so for Merkle shreds:
https://github.com/solana-labs/solana/blob/f02fe9c7e/ledger/src/shred/merkle.rs#L753

Shreds generated locally by the node itself during its leader slots do
not need to be sanitized.

So sanitizing shreds in blockstore is redundant and wasteful. In
particular this becomes more wasteful with Merkle shreds because
sanitizing shreds would require verifying Merkle proof.
As such the commit removes redundant shred.sanitize() from blockstore.
2022-09-24 16:31:50 +00:00
behzad nouri 97c9af4c6b plumbs through flag to generate merkle variant of shreds 2022-09-23 16:45:18 +00:00
steviez e4affb9fea
Add Blockstore::highest_slot() method (#27981) 2022-09-23 04:53:43 -05:00
steviez eaa4787201
Cleanup blockstore test (#27999) 2022-09-22 23:37:47 +00:00
behzad nouri 9a57c64f21
patches clippy errors from new rust nightly release (#27996) 2022-09-22 22:23:03 +00:00
Yueh-Hsuan Chiang cccade42b3
Optimize get_slots_since() using the batched version of multi_get() (#27686)
#### Problem
The current implementation of get_slots_since() invokes multiple rocksdb::get().
As a result, each get() operation may end up requiring one disk read.  This leads
to poor performance of get_slots_since described in #24878.

#### Summary of Changes
This PR makes get_slots_since() use the batched version of multi_get() instead,
which allows multiple get operations to be processed in batch so that they can
be answered with fewer disk reads.
2022-09-19 21:52:13 -07:00
Yueh-Hsuan Chiang ba3d9cd325
Add LedgerColumn::multi_get() (#26354)
#### Problem
Blockstore operations such as get_slots_since() issues multiple rocksdb::get()
at once which is not optimal for performance.

#### Summary of Changes
This PR adds LedgerColumn::multi_get() based on rocksdb::batched_multi_get(),
the optimized version of multi_get() where get requests are processed in batch
to minimize read I/O.
2022-09-12 15:01:22 -07:00
Yueh-Hsuan Chiang ed00365101
Add ledger tool command print-file-metadata (#26790)
Add ledger-tool command print-file-metadata

#### Summary of Changes
This PR adds a ledger tool subcommand print-file-metadata.
```
USAGE:
    solana-ledger-tool print-file-metadata [FLAGS] [OPTIONS] [SST_FILE_NAME]

    Prints the metadata of the specified ledger-store file.
    If no file name is unspecified, then it will print the metadata of all ledger files
```
2022-09-06 21:46:35 -07:00
Yueh-Hsuan Chiang c62aef6e02
Add code comments for lowest_cleanup_slot functions (#27497)
#### Summary of Changes
Add code comments for lowest_cleanup_slot related functions to improve
the code readability for the consistency between blockstore purge logic
and the read side.
2022-09-06 16:03:42 -07:00
Yueh-Hsuan Chiang 6d070cee08
Fix the boundary inconsistency between delete_file_in_range and delete_range (#27201)
#### Problem
RocksDB's delete_range applies to [from, to) while delete_file_in_range
applies to [from, to] by default, and the rust-rocksdb api does not include
the option to make delete_file_in_range apply to [from, to).  Such inconsistency
might cause `blockstore::run_purge` to produce an inconsistent result as it
invokes both delete_range and delete_file_in_range.

#### Summary of Changes
This PR makes all our purge / delete related functions to be inclusive
on both starting and ending slots.
2022-08-30 21:38:54 -07:00
Brennan Watt e4a7d01e10
Rust v1.63 (#27303)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

* Update quinn-udp crate

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-22 18:01:03 -07:00
Michael Vines 3f4731b37f Standardize thread names
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
2022-08-20 07:49:39 -07:00