In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
replace and remove entries_to_data_shreds from the public interface.
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921
However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.
The commit adds constructs to track coding shreds indices explicitly.
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).
At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.
To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
* Current caching mechanism does not update cluster-nodes when the epoch
(and so epoch staked nodes) changes:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344
* Additionally, the cache update has a concurrency bug in which the
thread which does compare_and_swap may be blocked when it tries to
obtain the write-lock on cache, while other threads will keep running
ahead with the outdated cache (since the atomic timestamp is already
updated).
In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.
Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.
This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719
Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714
However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.
This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.
As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
* Remove the name "blob" from archivers
* Remove the name "blob" from broadcast
* Remove the name "blob" from Cluset Info
* Remove the name "blob" from Repair
* Remove the name "blob" from a bunch more places
* Remove the name "blob" from tests and book
* Implement new Index Column
* Correct slicing of blobs
* Mark coding blobs as coding when they're recovered
* Prevent broadcast stages from mixing coding and data blobs in blocktree
* Mark recovered blobs as present in the index
* Fix indexing error in recovery
* Fix broken tests, and some bug fixes
* increase min stack size for coverage runs
* Refactor BroadcastStage to support custom implementations, add FailEntryVerificationBroadcastRun implementation
* Plumb switch on broadcast type through validator
* Add test for validator generating non-verifiable entries to local_cluster
* Fix bad initializers
* Refactor broadcast run code into utils