Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
Shreds are dropped in window-service if the slot leader is the node
itself:
https://github.com/solana-labs/solana/blob/cd2878acf/core/src/window_service.rs#L181-L185
However this is done after wasting resources verifying signature on
these shreds, and requires a redundant 2nd lookup of the slot leader.
This commit instead discards such shreds in sigverify stage where we
already know the leader for the slot.
Following commits will skip shreds deserializaton before retransmit, and
so we will only have a ShredId and not a fully deserialized shred to
obtain the shuffling seed from.
In order to preserve current behavior, the threshold is set to the
current value of the argument to IndexedParallelIterator::with_min_len.
Follow up commits will recalibrate this threshold to optimize
performance on mainnet-beta.
Shreds arriving at a node for retransmit tend to belong to the same slot
(or a just a couple of different slots). Slot leader and cluster nodes
are common for the shreds of the same slot, and so the common work to
look up these values can be factored out.
This commit first group-bys shreds by slot to factor out that common
lookup work.
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
RetransmitSlotStats can already be utilized to track when the first
shred for a slot was received; therefore
first_shreds_received: &Mutex<BTreeSet<Slot>>
is redundant. Sending update notifications after shreds retransmit will
also bypass the need for a mutex.
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
https://github.com/solana-labs/solana/pull/17004
removed position field from coding-shred-header because as it stands the
field is redundant and unused.
However, with the upcoming changes to erasure coding schema this field
will no longer be redundant and needs to be populated.
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).
However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.
This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
tree.
* Staked nodes are always included regardless of if their contact-info
is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.
This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.
Following commit will add these metrics back to window-service, earlier
in the pipeline.
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).
At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.
To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
The new cluster-nodes cache will:
* ensure cluster-nodes are recalculated if the epoch (and so the epoch
staked nodes) changes.
* encapsulate time-to-live eviction policy.