Packets are at the boundary of the system where, vast majority of the
time, they are received from an untrusted source. Raw indexing into the
data buffer can open attack vectors if the offsets are invalid.
Validating offsets beforehand is verbose and error prone.
The commit updates Packet::data() api to take a SliceIndex and always to
return an Option. The call-sites are so forced to explicitly handle the
case where the offsets are invalid.
In preparation of
https://github.com/solana-labs/solana/pull/25237
which adds a new shred variant with merkle tree branches, the commit
embeds versioning into shred binary by encoding a new ShredVariant type
at byte 65 of payload replacing previously ShredType at this offset.
enum ShredVariant {
LegacyCode, // 0b0101_1010
LegacyData, // 0b0101_1010
}
* 0b0101_1010 indicates a legacy coding shred, which is also equal to
ShredType::Code for backward compatibility.
* 0b1010_0101 indicates a legacy data shred, which is also equal to
ShredType::Data for backward compatibility.
Following commits will add merkle variants to this type:
enum ShredVariant {
LegacyCode, // 0b0101_1010
LegacyData, // 0b1010_0101
MerkleCode(/*proof_size:*/ u8), // 0b0100_????
MerkleData(/*proof_size:*/ u8), // 0b1000_????
}
Fix pre-check of blockstore slts during load_bank_forks. Now iterates from starting_slot to halt_slot via slot_meta.next_slots to confirm they are connected.
Data shreds are batched into MAX_DATA_SHREDS_PER_FEC_BLOCK shreds for
each erasure batch. If there are residual shreds not making a full
batch, then we cannot generate coding shreds and need to buffer shreds
until there is a full batch; This may add latency to coding shreds
generation and broadcast.
In order to evaluate upcoming changes removing this buffering logic,
this commit adds metrics tracking residual number of data shreds which
don't make a full batch.
#### Summary of Changes
Use the new datapoint macro that supports group-by for RocksDB column family metrics.
By using the new macro, we can further remove large chunks of boilerplate code that try to work around the previous datapoint macro that does not support group-by.
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.
This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.
This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
Bytes past Packet.meta.size are not valid to read from.
The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
buffer up to Packet.meta.size. The rest of the buffer is not valid to
read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
of the underlying buffer to write into. The caller is responsible to
update Packet.meta.size after writing to the buffer.
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
#### Problem
The current RocksDB read/write perf metrics do not include the total operation nanos
and thus we have to include all fields that might contribute to the total operation nanos.
#### Summary of Changes
This PR includes the total operation nanos in RocksDB's read/write perf and reduces the
number of reported fields in its perf metric.
#### Problem
When the number of RocksDB read/write operations spikes, its payload size
might exceed the limit (413 Payload Too Large).
#### Summary of Changes
This PR rate-limit the perf-sampling of RocksDB read/write operations by one second
in addition to the existing sampling that is configurable via the hidden validator
argument --rocksdb-perf-sample-interval.
Working towards revising shred struct to embed versioning so that a new
variant can contain merkle tree hashes of the erasure batch. To ease out
migration the commit adds more type-safety by distinguishing data vs
code shreds at the type level.
Additionally having both data and coding headers in each shred is
redundant as only one is relevant for each shred. The revised shred type
in this commit will only have one type-specific header.
https://github.com/solana-labs/solana/blob/c785f1ffc/ledger/src/shred.rs#L198-L203
Adding const_assert_eq:
* Documents explicitly what the constants are equal to.
* Prevents introducing bugs by silently changing the constants as the
code is updated.
Added multiple blockstore read threads.
Run the bigtable upload in tokio::spawn context.
Run bigtable tx and tx-by-addr uploads in tokio::spawn context.
#### Problem
After #25042, each LedgerColumn has its own BlockstoreRocksDbWritePerfMetrics
and BlockstoreRocksDbReadPerfMetrics instances. As it has total ownership,
its member field does not need to use Arc.
#### Summary of Changes
Change perf_samples_counter from Arc<AtomicUsize> to AtomicUsize
under BlockstoreRocksDbWritePerfMetrics and BlockstoreRocksDbReadPerfMetrics.
* Add ConfirmedBlockUploadConfig, no behavior changes
* Add comment
* A little DRY cleanup
* Add configurable limit to number of blocks to check in Blockstore and Bigtable before uploading
* Limit blockstore and bigtable look-ahead
* Exit iterator early when reach ending_slot
* Use rooted_slot_iterator instead of slot_meta_iterator
* Only check blocks in the ledger
* Added additional costs to block capacity computation, and pushed alloc of CostModel all the way to the top of the call chain, instead of reallocing
* Fix two compiler errors
* Update block processing to propagate computed costs, rather than re-computing deeper in the call stack
* Clippy fix
* Reformatting fix after merge
* Add CostModel::sum_without_bpf
#### Problem
LedgerColumnOptions contain two fields, perf_read_counter and perf_write_counter,
that are not really options but internal counters.
#### Summary of Changes
This PR introduces BlockstoreRocksDbPerfSamplingStatus, a struct that holds internal
status for RocksDB perf sampling and moves perf_read_counter and perf_write_counter
out from LedgerColumnOptions.
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
#### Problem
blockstore_db.rs becomes bigger.
#### Summary of Changes
Move BlockstoreRocksDbColumnFamilyMetrics to blockstore_metric.rs out from blockstore_db.rs.
#### Problem
blockstore_db.rs becomes bigger.
#### Summary of Changes
Move trait ColumnMetrics and metric-macros to blockstore_metric.rs out from blockstore_db.rs.