https://github.com/solana-labs/solana/pull/29445
makes it unnecessary to embed merkle roots into shreds binary. This
commit removes the merkle root from shreds binary.
This adds 20 bytes to shreds capacity to store more data.
Additionally since we no longer need to truncate the merkle root, the
signature would be on the full 32 bytes of hash as opposed to the
truncated one.
Also signature verification would now effectively verify merkle proof as
well, so we no longer need to verify merkle proof in the sanitize
implementation.
{verify,sign}_shreds_gpu need to point to offsets within the packets for
the signed data. For merkle shreds this signed data is the merkle root
of the erasure batch and this would necessitate embedding the merkle
roots in the shreds payload.
However this is wasteful and reduces shreds capacity to store data
because the merkle root can already be recovered from the encoded merkle
proof.
Instead of pointing to offsets within the shreds payload, this commit
recovers merkle roots from the merkle proofs and stores them in an
allocated buffer. {verify,sign}_shreds_gpu would then point to offsets
within this new buffer for the respective signed data.
This would unblock us from removing merkle roots from shreds payload
which would save capacity to send more data with each shred.
The commit adds an associated SignedData type to Shred trait so that
merkle and legacy shreds can return different types for signed_data
method.
This would allow legacy shreds to point to a section of the shred
payload, whereas merkle shreds would compute and return the merkle root.
Ultimately this would allow to remove the merkle root from the shreds
binary.
Merkle shreds within the same erasure batch have the same merkle root.
The root of the merkle tree is signed. So either the signatures match
or one fails sigverify, and the comparison of merkle roots is redundant.
If data is empty, make_shreds_from_data will now return one data shred
with empty data. This preserves invariants verified in tests regardless
of data size.
These methods are only used in tests but invoked on a merkle shred they
will always invalidate the shred because the merkle proof will no longer
verify. As a result the shred will not sanitize and blockstore will
avoid inserting them. Their use in tests will result in spurious test
coverage because the shreds will not be ingested.
The commit removes implementation of these methods for merkle shreds.
Follow up commits will entirely remove these methods from shreds api.
The commit
* Identifies Merkle shreds when recovering from erasure codes and
dispatches specialized code to reconstruct shreds.
* Coding shred headers are added to recovered erasure shards.
* Merkle tree is reconstructed for the erasure batch and added to
recovered shreds.
* The common signature (for the root of Merkle tree) is attached to all
recovered shreds.
Coding shreds can only be signed once erasure codings are already
generated. Therefore coding shreds recovered from erasure codings lack
slot leader's signature and so cannot be retransmitted to the rest of
the cluster.
shred/merkle.rs implements a new shred variant where we generate merkle
tree for each erasure encoded batch and each shred includes:
* root of the merkle tree (Hash truncated to 20 bytes).
* slot leader's signature of the root of the merkle tree.
* merkle tree nodes along the branch the shred belongs to, where hashes
are trimmed to 20 bytes during tree construction.
This schema results in the same signature for all shreds within an
erasure batch.
When recovering shreds from erasure codes, we can reconstruct merkle
tree for the batch and for each recovered shred also recover respective
merkle tree branch; then snap the slot leader's signature from any of
the shreds received from turbine and retransmit all recovered code or
data shreds.
Backward compatibility is achieved by encoding shred variant at byte 65
of payload (previously shred-type at this position):
* 0b0101_1010 indicates a legacy coding shred, which is also equal to
ShredType::Code for backward compatibility.
* 0b1010_0101 indicates a legacy data shred, which is also equal to
ShredType::Data for backward compatibility.
* 0b0100_???? indicates a merkle coding shred with merkle branch size
indicated by the last 4 bits.
* 0b1000_???? indicates a merkle data shred with merkle branch size
indicated by the last 4 bits.
Merkle root and branch are encoded at the end of the shred payload.