Commit Graph

49 Commits

Author SHA1 Message Date
behzad nouri ba785cf8ab
removes erroneous uses of std::mem::swap (#26536)
All instances should be replace by std::mem::{replace,take},
or just plain assignment.
2022-07-11 11:33:15 +00:00
behzad nouri f534b8981b
maps number of data shreds to erasure batch size (#25917)
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
  erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
  replace and remove entries_to_data_shreds from the public interface.
2022-06-23 13:27:54 +00:00
behzad nouri fe3c1d3d49
removes erroneous uses of &Arc<...> from broadcast-stage (#25962) 2022-06-15 13:44:24 +00:00
behzad nouri 6c9f2eac78
removes fec_set_offset from UnfinishedSlotInfo (#25815)
If the blockstore has shreds for a slot, it should not recreate the
slot:
https://github.com/solana-labs/solana/blob/ff68bf6c2/ledger/src/leader_schedule_cache.rs#L142-L146
https://github.com/solana-labs/solana/pull/15849/files#r596657314

Therefore in broadcast stage if UnfinishedSlotInfo is None, then
fec_set_offset will be zero:
https://github.com/solana-labs/solana/blob/ff68bf6c2/core/src/broadcast_stage/standard_broadcast_run.rs#L111-L120

As a result fec_set_offset will always be zero, and is so redundant and
can be removed.
2022-06-07 22:17:37 +00:00
behzad nouri 65d59f4ef0
tracks erasure coding shreds' indices explicitly (#21822)
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.
2021-12-19 22:37:55 +00:00
behzad nouri 89d66c3210
removes next_shred_index from return value of entries to shreds api (#21961)
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
2021-12-17 15:01:55 +00:00
behzad nouri 1deb4add81
removes Slot from TransmitShreds (#19327)
An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.
2021-08-20 13:48:33 +00:00
behzad nouri e4be00fece falls back on working-bank if root-bank::epoch-staked-nodes is none
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
2021-08-05 21:47:33 +00:00
behzad nouri aa32738dd5 uses cluster-nodes cache in broadcast-stage
* Current caching mechanism does not update cluster-nodes when the epoch
  (and so epoch staked nodes) changes:
  https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

* Additionally, the cache update has a concurrency bug in which the
  thread which does compare_and_swap may be blocked when it tries to
  obtain the write-lock on cache, while other threads will keep running
  ahead with the outdated cache (since the atomic timestamp is already
  updated).

In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.
2021-08-05 21:47:33 +00:00
behzad nouri 44b11154ca sends slots (instead of stakes) through broadcast flow
Current broadcast code is computing stakes for each slot before sending
them down the channel:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

Since the stakes are a function of epoch the slot belongs to (and so
does not necessarily change from one slot to another), forwarding the
slot itself would allow better caching downstream.

In addition we need to invalidate the cache if the epoch changes (which
the current code does not do), and that requires to know which slot (and
so epoch) current broadcasted shreds belong to:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344
2021-08-05 21:47:33 +00:00
sakridge 84e78316b1
Write helper for multithread update (#18808) 2021-07-29 03:16:36 +02:00
behzad nouri d2d5f36a3c
adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
jbiseda a86ced0bac
generate deterministic seeds for shreds (#17950)
* generate shred seed from leader pubkey

* clippy

* clippy

* review

* review 2

* fmt

* review

* check

* review

* cleanup

* fmt
2021-07-07 08:21:12 -07:00
behzad nouri 04787be8b1
encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00
Michael Vines 84b9de8c18 Shredder no longer holds a keypair 2021-06-21 21:29:52 -07:00
behzad nouri 37b8587d4e
expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719

Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714

However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.

This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.

As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
2021-04-21 12:47:50 +00:00
behzad nouri e1021d9f83
removes redundant epoch stakes cache in retransmit (#14781)
Following d6d76219b, staked nodes computed from vote accounts are
already cached in runtime::Stakes, so the caching in retransmit_stage is
redundant.
2021-01-24 21:15:09 +00:00
behzad nouri d6d76219b6
caches staked nodes computed from vote-accounts (#13929) 2020-12-17 21:22:50 +00:00
sakridge 2cf719ac2c
Cache tvu peers for broadcast (#10373) 2020-06-03 08:24:05 -07:00
carllin 3442f36f8a
Repair alternate versions of dead slots (#9805)
Co-authored-by: Carl <carl@solana.com>
2020-05-05 14:07:21 -07:00
carllin bab3502260
Push down cluster_info lock (#9594)
* Push down cluster_info lock

* Rework budget decrement

Co-authored-by: Carl <carl@solana.com>
2020-04-21 12:54:45 -07:00
carllin 7aa4d401f7
Fix broadcast metrics (#9461)
* Rework broadcast metrics to support multiple threads

* Update dashboards

Co-authored-by: Carl <carl@solana.com>
2020-04-15 15:22:16 -07:00
sakridge 69f1e487b3
Reduce cluster-info metrics. (#9465) 2020-04-14 21:21:58 -07:00
sakridge 4677cdb4c2
Optimize broadcast cluster_info critical section (#9327) 2020-04-06 17:36:22 -07:00
carllin 7b68628e6c
Remove write lock (#9311)
* Remove write lock

Co-authored-by: Carl <carl@solana.com>
2020-04-05 15:18:45 -07:00
Pankaj Garg aa80f69171
Promote some datapoints to `info` to fix dashboard (#8381)
automerge
2020-02-21 13:41:49 -08:00
Greg Fitzgerald b5dba77056 Rename blocktree to blockstore (#7757)
automerge
2020-01-13 13:13:52 -08:00
anatoly yakovenko 97589f77f8 Pipeline broadcast socket transmit and blocktree record (#7481)
automerge
2019-12-16 17:11:18 -08:00
Sagar Dhawan 6bfe0fca1f
Add a version field to shreds (#7023)
* Add a version field to shreds

* Clippy

* Fix Chacha Golden

* Fix shredder bench compile

* Fix blocktree bench compile
2019-11-18 18:05:02 -08:00
Sagar Dhawan 79d7090867
Remove obsolete references to Blob (#6957)
* Remove the name "blob" from archivers

* Remove the name "blob" from broadcast

* Remove the name "blob" from Cluset Info

* Remove the name "blob" from Repair

* Remove the name "blob" from a bunch more places

* Remove the name "blob" from tests and book
2019-11-14 11:49:31 -08:00
carllin 43e2301e2c
Fix roots overrunning broadcast (#6884)
* Add trusted pathway for insert_shreds to avoid checks
2019-11-14 00:32:07 -08:00
Pankaj Garg 0ace79939b
Add reference tick to data shreds (#6772)
* Add reference tick to data shreds

* fix tests
2019-11-06 13:27:58 -08:00
Greg Fitzgerald 5468be2ef9 Add solana-ledger crate (#6415)
automerge
2019-10-18 09:28:51 -07:00
Justin Starry 7e6e7e8406
Remove special handling of first ledger tick (#6263)
* Remove special handling of first ledger tick

* Fix subtraction overflow

* @garious feedback

* Back to height

* More tick_height name changes

* Fix off-by-one

* Fix leader tick error

* Fix merge conflict

* Fix recently added test
2019-10-16 15:53:11 -04:00
Pankaj Garg 364781366a
Use sendmmsg for broadcasting shreds (#6325)
* Replace packet with slice of data in sendmmsg

* fixes

* fix bench
2019-10-10 19:38:48 -07:00
Rob Walker 7cf90766a3
add epoch_schedule sysvar (#6256)
* add epoch_schedule sysvar

* book sheesh!
2019-10-08 22:34:26 -07:00
carllin ac2374e9a1
Shred entries in parallel (#6180)
* Make shredding more parallel

* Fix erasure tests

* Fix replicator test

* Remove UnfinishedSlotInfo
2019-10-08 00:42:51 -07:00
Pankaj Garg 774e9df2e5
Finish unfininished slot before processing new slots (#6197) 2019-10-01 11:46:14 -07:00
Pankaj Garg 783e8672e7
Removed Shred enum (#5963)
* Remove shred enum and it's references

* rename ShredInfo to Shred

* clippy
2019-09-18 16:24:30 -07:00
Pankaj Garg 6c4e656795
Remove obsoleted code from shred (#5954)
* Remove obsoleted code from shred

* fix broken test
2019-09-18 13:56:44 -07:00
Rob Walker 0d4a2c5eb0
simplify poh recorder => broadcast channel (#5940)
* simplify poh recorder broadcast channel

* fixup

* fixup
2019-09-18 12:16:22 -07:00
Pankaj Garg ff608992ee
Replace Shred usage with ShredInfo (#5939)
* Replace Shred usage with ShredInfo

* Fix tests

* fix clippy
2019-09-17 18:22:46 -07:00
Pankaj Garg 7459eb15c3
A new data-structure in shreds for partial deserialization (#5915)
* A new datastructure in shreds for partial deserialization

* fix chacha golden hash

* fix clippy and address review comments
2019-09-16 20:28:54 -07:00
Pankaj Garg 3d3b03a123
Verify signature of recovered shred before adding them to blocktree (#5811)
* Verify signature of recovered shred before adding them to blocktree

* fix failing tests, and review comments
2019-09-05 18:20:30 -07:00
Pankaj Garg 3b0d48e3b8
Remove blocktree blob references (#5691)
* Remove blocktree blob references

* fixes and cleanup

* replace uninitialized() call with MaybeUninit

* fix bench
2019-09-03 21:32:51 -07:00
Mark E. Sinclair a383ea532f Implement new Index Column (#4827)
* Implement new Index Column

* Correct slicing of blobs

* Mark coding blobs as coding when they're recovered

* Prevent broadcast stages from mixing coding and data blobs in blocktree

* Mark recovered blobs as present in the index

* Fix indexing error in recovery

* Fix broken tests, and some bug fixes

* increase min stack size for coverage runs
2019-07-10 11:08:17 -07:00
Michael Vines 36c9e22e3d Revert "Dynamic erasure (#4653)"
This reverts commit ada4d16c4c.
2019-06-20 20:53:03 -07:00
Mark E. Sinclair ada4d16c4c
Dynamic erasure (#4653)
Remove erasure-related constants

Remove unneeded `Iterator::collect` call

Document erasure module

Randomize coding blobs used for repair
2019-06-20 20:27:41 -05:00
carllin 46bb79df29
Support for custom BroadcastStage in local cluster tests (#4716)
* Refactor BroadcastStage to support custom implementations, add FailEntryVerificationBroadcastRun implementation

* Plumb switch on broadcast type through validator

* Add test for validator generating non-verifiable entries to local_cluster

* Fix bad initializers

* Refactor broadcast run code into utils
2019-06-19 00:13:19 -07:00