Commit Graph

317 Commits

Author SHA1 Message Date
behzad nouri 283a2b1540
removes #[allow(clippy::same_item_push)] (#29543) 2023-01-06 17:32:26 +00:00
behzad nouri d87128e02c
fixes errors from clippy::needless_borrow (#29535)
https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
2023-01-05 18:21:56 +00:00
behzad nouri 5c9beef498
fixes errors from clippy::useless_conversion (#29534)
https://rust-lang.github.io/rust-clippy/master/index.html#useless_conversion
2023-01-05 18:05:32 +00:00
steviez ff8bb5362c
Remove repetitive logic in SlotMeta first insert detection logic (#29153) 2022-12-15 17:38:27 -06:00
behzad nouri 9524c9dbff patches errors from clippy::uninlined_format_args
https://rust-lang.github.io/rust-clippy/master/index.html#uninlined_format_args
2022-12-06 19:32:15 +00:00
steviez 01cd55a27a
Change SlotMeta is_connected bool to bitflags (#29001)
We currently use the is_connected field to be able to signal to
ReplayStage that a slot has replayable updates. It was discovered that
this functionality is effectively broken, and that is_connected is never
true. In order to convey this information to ReplayStage more
effectively, we need extra state information so this PR changes the
existing bool to bitflags with two bits.

From a compatibility standpoint, the is_connected bool was already
occupying one byte in the serialized SlotMeta in blockstore. Thus, the
change from a bool to bitflags still "fits" in that one byte allotment.

In consideration of a case where a client may wish to downgrade software
and use the same ledger, deserializing the bitflags into a bool could
fail if the new bit is set. As such, this PR introduces the second bit
field, but does not set it anywhere. Once clusters have mass adopted a
software version with this PR, a subsequent change to actually set and
use the new field can be introduced.
2022-12-01 14:42:35 -06:00
steviez 3c42c87098
Remove obsoleted return value from Blockstore insert shred method (#28992) 2022-12-01 11:17:46 -06:00
steviez b6dce6cf3b
Move BlockstoreInsertionMetrics field update to blockstore.rs (#28991)
The num_repair field is only blockstore insertion metric being updated
outside of Blockstore::insert() call chain; move the update to insert()
with the rest of the fields in BlockstoreInsertionMetrics struct.
2022-11-30 11:46:35 -06:00
Brennan Watt 9a6ab5e7fe
Distinguish turbine vs repair insertion metrics (#28980) 2022-11-30 09:03:53 -08:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
steviez 2272fd807e
Remove Blockstore manual compaction code (#28409)
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2022-10-28 10:39:00 +02:00
Justin Starry 2d8665d307
Record inner instruction stack height (#28430)
* Record inner instruction stack height

* fix sbf tests

* feedback
2022-10-26 10:37:44 +08:00
steviez 60f6e24b76
Make Blockstore::get_entries_in_data_block() use multi_get() (#28245) 2022-10-09 15:34:03 -04:00
steviez 49dbae7e53
Use VecDeque as a queue instead of Vec (#28190) 2022-10-03 22:55:59 -05:00
Yueh-Hsuan Chiang 6b17bee5a8
Remove the const default for RocksFifo (#27965)
#### Summary of Changes
Removes the constant default for ShredStorageType::RocksFifo
as the shred storage size is either user-specified or derived
from --limit-ledger-size in #27459.
2022-10-01 15:10:54 -07:00
steviez f38ed1c266
Use more descriptive variable names in blockstore chaining tests (#28131) 2022-09-29 10:24:09 -05:00
behzad nouri 72537e7e07
bypasses rayon thread-pool for single entry batches (#28077)
With no parallelization, thread-pool only adds overhead.
2022-09-26 21:32:58 +00:00
behzad nouri f49beb0cbc
caches reed-solomon encoder/decoder instance (#27510)
ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache:
https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466

In order to cache this computation, this commit caches the reed-solomon
encoder/decoder instance for each (data_shards, parity_shards) pair.
2022-09-25 18:09:47 +00:00
behzad nouri 45e26574f3
removes redundant shred.sanitize() from blockstore (#28016)
Shreds received from other nodes over the socket are sanitized when the
payload is deserialized:
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/legacy.rs#L137
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/legacy.rs#L77
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/merkle.rs#L355
https://github.com/solana-labs/solana/blob/315707504/ledger/src/shred/merkle.rs#L439

Similarly, shreds recovered from erasure codes are also sanitized at
deserialization:
https://github.com/solana-labs/solana/blob/f02fe9c7e/ledger/src/shredder.rs#L330
or explicitly so for Merkle shreds:
https://github.com/solana-labs/solana/blob/f02fe9c7e/ledger/src/shred/merkle.rs#L753

Shreds generated locally by the node itself during its leader slots do
not need to be sanitized.

So sanitizing shreds in blockstore is redundant and wasteful. In
particular this becomes more wasteful with Merkle shreds because
sanitizing shreds would require verifying Merkle proof.
As such the commit removes redundant shred.sanitize() from blockstore.
2022-09-24 16:31:50 +00:00
behzad nouri 97c9af4c6b plumbs through flag to generate merkle variant of shreds 2022-09-23 16:45:18 +00:00
steviez e4affb9fea
Add Blockstore::highest_slot() method (#27981) 2022-09-23 04:53:43 -05:00
steviez eaa4787201
Cleanup blockstore test (#27999) 2022-09-22 23:37:47 +00:00
behzad nouri 9a57c64f21
patches clippy errors from new rust nightly release (#27996) 2022-09-22 22:23:03 +00:00
Yueh-Hsuan Chiang cccade42b3
Optimize get_slots_since() using the batched version of multi_get() (#27686)
#### Problem
The current implementation of get_slots_since() invokes multiple rocksdb::get().
As a result, each get() operation may end up requiring one disk read.  This leads
to poor performance of get_slots_since described in #24878.

#### Summary of Changes
This PR makes get_slots_since() use the batched version of multi_get() instead,
which allows multiple get operations to be processed in batch so that they can
be answered with fewer disk reads.
2022-09-19 21:52:13 -07:00
Yueh-Hsuan Chiang ba3d9cd325
Add LedgerColumn::multi_get() (#26354)
#### Problem
Blockstore operations such as get_slots_since() issues multiple rocksdb::get()
at once which is not optimal for performance.

#### Summary of Changes
This PR adds LedgerColumn::multi_get() based on rocksdb::batched_multi_get(),
the optimized version of multi_get() where get requests are processed in batch
to minimize read I/O.
2022-09-12 15:01:22 -07:00
Yueh-Hsuan Chiang ed00365101
Add ledger tool command print-file-metadata (#26790)
Add ledger-tool command print-file-metadata

#### Summary of Changes
This PR adds a ledger tool subcommand print-file-metadata.
```
USAGE:
    solana-ledger-tool print-file-metadata [FLAGS] [OPTIONS] [SST_FILE_NAME]

    Prints the metadata of the specified ledger-store file.
    If no file name is unspecified, then it will print the metadata of all ledger files
```
2022-09-06 21:46:35 -07:00
Yueh-Hsuan Chiang c62aef6e02
Add code comments for lowest_cleanup_slot functions (#27497)
#### Summary of Changes
Add code comments for lowest_cleanup_slot related functions to improve
the code readability for the consistency between blockstore purge logic
and the read side.
2022-09-06 16:03:42 -07:00
Yueh-Hsuan Chiang 6d070cee08
Fix the boundary inconsistency between delete_file_in_range and delete_range (#27201)
#### Problem
RocksDB's delete_range applies to [from, to) while delete_file_in_range
applies to [from, to] by default, and the rust-rocksdb api does not include
the option to make delete_file_in_range apply to [from, to).  Such inconsistency
might cause `blockstore::run_purge` to produce an inconsistent result as it
invokes both delete_range and delete_file_in_range.

#### Summary of Changes
This PR makes all our purge / delete related functions to be inclusive
on both starting and ending slots.
2022-08-30 21:38:54 -07:00
Brennan Watt e4a7d01e10
Rust v1.63 (#27303)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

* Update quinn-udp crate

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-22 18:01:03 -07:00
Michael Vines 3f4731b37f Standardize thread names
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
2022-08-20 07:49:39 -07:00
behzad nouri c0b63351ae
recovers merkle shreds from erasure codes (#27136)
The commit
* Identifies Merkle shreds when recovering from erasure codes and
  dispatches specialized code to reconstruct shreds.
* Coding shred headers are added to recovered erasure shards.
* Merkle tree is reconstructed for the erasure batch and added to
  recovered shreds.
* The common signature (for the root of Merkle tree) is attached to all
  recovered shreds.
2022-08-19 21:07:32 +00:00
apfitzge 40b9f2f2be
slots_connected: check if the range is connected (>= ending_slot) (#27152) 2022-08-19 09:33:50 -05:00
Brennan Watt 7573000d87
Revert "Rust v1.63.0 (#27148)" (#27245)
This reverts commit a2e7bdf50a.
2022-08-19 09:19:44 +01:00
HaoranYi 4634fb944c
Use from_secs api to create duration (#27222)
use from_secs api to create duration
2022-08-18 11:06:52 -05:00
Yueh-Hsuan Chiang 6d12bb6ec3
Fix a corner-case panic in get_entries_in_data_block() (#27195)
#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in #26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.
2022-08-18 02:37:19 -07:00
Brennan Watt a2e7bdf50a
Rust v1.63.0 (#27148)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-17 15:48:33 -07:00
behzad nouri ac91cdab74
removes buffering when generating coding shreds in broadcast (#25807)
Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.

In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.

This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
  coding shreds will be generated so that the resulting erasure batch
  has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
  split uniformly into erasure batches with _at least_ 32 data shreds in
  each batch. Each erasure batch will have the same number of code and
  data shreds.

For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
  resulting 19(data):27(code) erasure batch has the same recovery
  probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
  36:36 and 35:35 data:code shreds each.

A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);
2022-08-11 12:44:27 +00:00
Richard Patel 270315a7f6
transaction-status, storage-proto: add compute_units_consumed (#26528)
* transaction-status, storage-proto: add compute_units_consumed

* fix bpf test

Co-authored-by: Justin Starry <justin@solana.com>
2022-08-06 17:14:31 +00:00
Yueh-Hsuan Chiang dcbda2c6c5
Allow ledger tool to automatically detect the shred compaction style (#26182)
#### Problem
Ledger-tool doesn't support shred-compaction-type other than the default rocksdb level compaction.

#### Summary of Changes
This PR enables ledger-tool to automatically detect the shred-compaction-type of the specified ledger.

#### Test Plan
New ledger-tool tests are added for both level and fifo compactions.
2022-07-19 01:19:17 +08:00
Yueh-Hsuan Chiang 493108f026
Move Blockstore::blockstore_directory() to ShredStorageType (#26179)
#### Summary of Changes
Blockstore::blockstore_directory() function takes a ShredStorageType and returns a path.
As it's a ShredStorageType related function, moving it under ShredStorageType as a
member function is a better fit.

In addition, as we start doing automatic shred compaction type selection under ledger-tool,
it becomes more convenient to reuse the function and const under blockstore_options
namespace instead of blockstore.
2022-07-13 00:05:54 +08:00
apfitzge 1c2f6a4c41
Bugfix: slots_connected snapshot_slot not full (#26506)
* slots_connected check used by ledger-tool should not require a full slot for snapshot slot

* Cleaner Result<Option<>> unwrap/default

* return false if no meta for starting slot

* Add clarifying comments
2022-07-11 13:45:12 -05:00
behzad nouri ba785cf8ab
removes erroneous uses of std::mem::swap (#26536)
All instances should be replace by std::mem::{replace,take},
or just plain assignment.
2022-07-11 11:33:15 +00:00
Jeff Washington (jwash) 1f2e830391
helpful error message when mmap limit can't change (#26450) 2022-07-06 17:31:10 -05:00
steviez 54cd31e9e2
Reduce repeated discard_shreds checks (#26329) 2022-06-30 14:45:10 -05:00
steviez d964774fde
Removing pub keyword from blockstore unit tests (#26334) 2022-06-30 14:41:04 -05:00
behzad nouri 88599fd760
skips shreds deserialization before retransmit (#26230)
Fully deserializing shreds in window-service before sending them to
retransmit stage adds latency to shreds propagation.
This commit instead channels through the payload and relies on only
partial deserialization of a few required fields: slot, shred-index,
shred-type.
2022-06-30 12:13:00 +00:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
Ryo Onodera cd2878acf9
Avoid to miss to root for local slots before the hard fork (#19912)
* Make sure to root local slots even with hard fork

* Address review comments

* Cleanup a bit

* Further clean up

* Further clean up a bit

* Add comment

* Tweak hard fork reconciliation code placement
2022-06-26 15:14:17 +09:00
Jeff Washington (jwash) bf97a99dca
increase mmap file limit to 1M to match published instructions (#26203) 2022-06-25 19:04:54 -05:00
behzad nouri f534b8981b
maps number of data shreds to erasure batch size (#25917)
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
  erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
  replace and remove entries_to_data_shreds from the public interface.
2022-06-23 13:27:54 +00:00