Commit Graph

56 Commits

Author SHA1 Message Date
behzad nouri 7f173ce7c7
feature gates merkle shreds on all clusters (#29957) 2023-01-27 21:02:51 +00:00
behzad nouri 80a39bd6a5
adds feature to (temporarily) drop merkle shreds from testnet (#29711) 2023-01-15 15:41:58 +00:00
behzad nouri 5b5a3ebce8
adds metrics for num merkle shreds on the receiving end (#29710) 2023-01-14 23:07:42 +00:00
Jon Cinque b1340d77a2
sdk: Make Packet::meta private, use accessor functions (#29092)
sdk: Make packet meta private
2022-12-06 12:54:49 +01:00
behzad nouri f49beb0cbc
caches reed-solomon encoder/decoder instance (#27510)
ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache:
https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466

In order to cache this computation, this commit caches the reed-solomon
encoder/decoder instance for each (data_shards, parity_shards) pair.
2022-09-25 18:09:47 +00:00
Michael Vines 3f4731b37f Standardize thread names
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
2022-08-20 07:49:39 -07:00
behzad nouri ac91cdab74
removes buffering when generating coding shreds in broadcast (#25807)
Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.

In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.

This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
  coding shreds will be generated so that the resulting erasure batch
  has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
  split uniformly into erasure batches with _at least_ 32 data shreds in
  each batch. Each erasure batch will have the same number of code and
  data shreds.

For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
  resulting 19(data):27(code) erasure batch has the same recovery
  probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
  36:36 and 35:35 data:code shreds each.

A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);
2022-08-11 12:44:27 +00:00
Jeff Biseda 857be1e237
sign repair requests (#26833) 2022-07-31 15:48:51 -07:00
behzad nouri 6f4838719b
decouples shreds sig-verify from tpu vote and transaction packets (#26300)
Shreds have different workload and traffic pattern from TPU vote and
transaction packets. Some of recent changes to SigVerifyStage are not
suitable or at least optimal for shreds sig-verify; e.g. random discard,
dedup with false positives, discard excess by IP-address, ...

SigVerifier trait is meant to abstract out the distinctions between the
two pipelines, but in practice it has led to more verbose and convoluted
code.

This commit discards SigVerifier implementation for shreds sig-verify
and instead provides a standalone stage for verifying shreds signatures.
2022-07-07 11:13:13 +00:00
behzad nouri d3a14f5b30
simplifies packet/shred sanity checks (#26356) 2022-07-05 21:41:19 +00:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
behzad nouri faa6c32162 removes packet modifier from shred_fetch_stage
... in favor of just passing packet flags.
2022-06-22 12:17:37 +00:00
behzad nouri 1f0f5dc03e verifies shred-version in fetch stage
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
2022-06-22 12:17:37 +00:00
steviez ec7ca411dd
Make PacketBatch packets vector non-public (#25413)
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
2022-05-23 15:30:15 -05:00
sakridge 3d96a1ab76
Block packets in vote-only mode (#24906) 2022-05-14 17:53:37 +02:00
Justin Starry 7100f1c94b
Collect stats in streamer receiver and report fetch stage metrics (#25010) 2022-05-06 02:56:18 +08:00
behzad nouri eff59193db
enforces that LAST_SHRED_IN_SLOT is also DATA_COMPLETE_SHRED (#24892)
A data shred cannot be LAST_SHRED_IN_SLOT if not also DATA_COMPLETE_SHRED.
So LAST_SHRED_IN_SLOT should also imply DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shredder.rs#L116-L117
https://github.com/solana-labs/solana/blob/74b586ae7/core/src/broadcast_stage/standard_broadcast_run.rs#L80-L81

However current shred constructs allow specifying a shred which is
LAST_SHRED_IN_SLOT but not DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L117-L118
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L272-L273

The commit updates ShredFlags so that if a shred is not
DATA_COMPLETE_SHRED it cannot be LAST_SHRED_IN_SLOT either.
2022-05-02 23:33:53 +00:00
behzad nouri 0f60665100
replaces Shred::new_empty_coding with Shred::new_from_parity_shard (#24749)
Removing implementation details of shreds and payload offsets from
shredder, so that shredder does not need to mutate payload:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L968-L977

Also, Shred::new_from_data can simply obtain a slice as opposed to
Option<&[u8]>:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L268-L278
2022-04-27 18:04:10 +00:00
behzad nouri 3bbfaae7b6
moves shred stats to a separate file (#24484) 2022-04-19 18:25:09 +00:00
Stephen Akridge 976b138e76 Add tx weighting stage 2022-03-17 19:31:28 -05:00
Jeff Biseda 8b66625c95
convert std::sync::mpsc to crossbeam_channel (#22264) 2022-01-11 02:44:46 -08:00
behzad nouri 01a096adc8 adds bitflags to Packet.Meta
Instead of a separate bool type for each flag, all the flags can be
encoded in a type-safe bitflags encoded in a single u8:
https://github.com/solana-labs/solana/blob/d6ec103be/sdk/src/packet.rs#L19-L31
2022-01-04 13:53:40 +00:00
Justin Starry b1d9a2e60e
Don't forward packets received from TPU forwards port (#22078)
* Don't forward packets received from TPU forwards port

* Add banking stage test
2021-12-29 19:34:31 +01:00
behzad nouri 65d59f4ef0
tracks erasure coding shreds' indices explicitly (#21822)
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.
2021-12-19 22:37:55 +00:00
Justin Starry 254ef3e7b6
Rename Packets to PacketBatch (#21794) 2021-12-11 09:44:15 -05:00
Michael Vines b8837c04ec Reformat imports to a consistent style for imports
rustfmt.toml configuration:
  imports_granularity = "One"
  group_imports = "One"
2021-12-03 09:19:13 -08:00
Michael Vines 7027d56064 Resolve nightly-2021-10-05 clippy complaints 2021-10-06 10:37:58 -07:00
Alexander Meißner 6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
sakridge eeee75c5be
Don't use pinned memory when unnecessary (#17832)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.
2021-06-14 16:10:04 +02:00
behzad nouri 37b8587d4e
expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719

Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714

However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.

This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.

As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
2021-04-21 12:47:50 +00:00
behzad nouri e405747409 Revert "Add limit and shrink policy for recycler (#15320)"
This reverts commit c2e8814dce.
2021-04-18 19:29:24 +00:00
behzad nouri 4f82b897bc
buffers data shreds to make larger erasure coded sets (#15849)
Broadcast stage batches up to 8 entries:
https://github.com/solana-labs/solana/blob/79280b304/core/src/broadcast_stage/broadcast_utils.rs#L26-L29
which will be serialized into some number of shreds and chunked into FEC
sets of at most 32 shreds each:
https://github.com/solana-labs/solana/blob/79280b304/ledger/src/shred.rs#L576-L597
So depending on the size of entries, FEC sets can be small, which may
aggravate loss rate.
For example 16 FEC sets of 2:2 data/code shreds each have higher loss
rate than one 32:32 set.

This commit broadcasts data shreds immediately, but also buffers them
until it has a batch of 32 data shreds, at which point 32 coding shreds
are generated and broadcasted.
2021-03-23 14:52:38 +00:00
sakridge 05409e51ce
Increase tpu coalescing and add parameter (#15536)
Should create larger entries on average
2021-02-26 09:15:45 -08:00
carllin c2e8814dce
Add limit and shrink policy for recycler (#15320) 2021-02-24 00:15:58 -08:00
sakridge d4a174fb7c
Partial shred deserialize cleanup and shred type differentiation (#14094)
* Partial shred deserialize cleanup and shred type differentiation in retransmit

* consolidate packet hashing logic
2020-12-15 16:50:40 -08:00
sakridge 5c95d8e963
Shred filter (#14030) 2020-12-10 07:54:15 -08:00
sakridge c5fe076432
Better dupe detection (#13992) 2020-12-09 23:14:31 -08:00
sakridge f6600810d7
Use LRU cache and blake3 hash of shreds to filter duplicates (#13976) 2020-12-07 16:42:39 -08:00
Greg Fitzgerald 6ee222363e
Move BankForks to solana_runtime (#10637)
* Move BankForks to solana_runtime

* Update imports
2020-06-17 15:27:03 +00:00
carllin 439fd30840
Fix erasure (#10095)
* Fix bad FEC blocks

* Add test

Co-authored-by: Carl <carl@solana.com>
2020-05-19 16:13:12 -07:00
sakridge f562ed4cc8 Distinguish between shred type in shred fetch stage duplicate filter (#10068)
* Shred type check

* Test
2020-05-15 13:23:56 -07:00
sakridge 1a47b1cd86
Retransmit and shred fetch metrics (#9965)
* Retransmit stats

* Shred fetch stats
2020-05-10 21:37:05 -07:00
sakridge 909321928c
Shred fetch comment and debug message tweak (#8980)
automerge
2020-03-20 11:00:48 -07:00
sakridge 453f5ce8f2
Shred filter (#8975)
Thread bank_forks into shred fetch
2020-03-20 07:49:48 -07:00
anatoly yakovenko 9cedeb0a8d
Pull streamer out into its own module. (#8917)
automerge
2020-03-17 23:30:23 -07:00
Greg Fitzgerald b5dba77056 Rename blocktree to blockstore (#7757)
automerge
2020-01-13 13:13:52 -08:00
Pankaj Garg 87b2525e03
Limit maximum number of shreds in a slot to 32K (#7584)
* Limit maximum number of shreds in a slot to 32K

* mark dead slot replay as fatal error
2019-12-30 07:42:09 -08:00
Rob Walker a7040896f0
Update to rust 1.40.0 (#7572)
* Update to rust 1.40.0

* fixups
2019-12-19 23:27:54 -08:00
Greg Fitzgerald a3a830e1ab
Delete Service trait (#6921) 2019-11-13 11:12:09 -07:00
sakridge 8e81bc1b49
Fix pinning (#6604)
Remove Deref implementations and add more pass-throughs to the PinnedVec
wrapper.
Warm recyclers
set_pinnable
2019-11-07 19:48:33 -08:00