Update ledger replication chapter (#2029)
* ledger block -> ledger segment The book already defines a *block* to be a slight variation of how block-based changes define it. It's the thing the cluster confirms should be the next set of transactions on the ledger. * Boot storage description from the book
This commit is contained in:
parent
3441d3399b
commit
b5a80d3d49
|
@ -1,4 +1,4 @@
|
|||
# Fullnode
|
||||
# Anatomy of a Fullnode
|
||||
|
||||
<img alt="Fullnode block diagrams" src="img/fullnode.svg" class="center"/>
|
||||
|
||||
|
|
|
@ -1,114 +1,2 @@
|
|||
# Ledger Replication
|
||||
|
||||
## Background
|
||||
|
||||
At full capacity on a 1gbps network Solana would generate 4 petabytes of data
|
||||
per year. If each fullnode was required to store the full ledger, the cost of
|
||||
storage would discourage fullnode participation, thus centralizing the network
|
||||
around those that could afford it. Solana aims to keep the cost of a fullnode
|
||||
below $5,000 USD to maximize participation. To achieve that, the network needs
|
||||
to minimize redundant storage while at the same time ensuring the validity and
|
||||
availability of each copy.
|
||||
|
||||
To trust storage of ledger segments, Solana has *replicators* periodically
|
||||
submit proofs to the network that the data was replicated. Each proof is called
|
||||
a Proof of Replication. The basic idea of it is to encrypt a dataset with a
|
||||
public symmetric key and then hash the encrypted dataset. Solana uses [CBC
|
||||
encryption](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_Block_Chaining_(CBC)).
|
||||
To prevent a malicious replicator from deleting the data as soon as it's
|
||||
hashed, a replicator is required hash random segments of the dataset.
|
||||
Alternatively, Solana could require hashing the reverse of the encrypted data,
|
||||
but random sampling is sufficient and much faster. Either solution ensures
|
||||
that all the data is present during the generation of the proof and also
|
||||
requires the validator to have the entirety of the encrypted data present for
|
||||
verification of every proof of every identity. The space required to validate
|
||||
is:
|
||||
|
||||
``` number_of_proofs * data_size ```
|
||||
|
||||
## Optimization with PoH
|
||||
|
||||
Solana is not the only distribute systems project using Proof of Replication,
|
||||
but it might be the most efficient implementation because of its ability to
|
||||
synchronize nodes with its Proof of History. With PoH, Solana is able to record
|
||||
a hash of the PoRep samples in the ledger. Thus the blocks stay in the exact
|
||||
same order for every PoRep and verification can stream the data and verify all
|
||||
the proofs in a single batch. This way Solana can verify multiple proofs
|
||||
concurrently, each one on its own GPU core. With the current generation of
|
||||
graphics cards our network can support up to 14,000 replication identities or
|
||||
symmetric keys. The total space required for verification is:
|
||||
|
||||
``` 2 CBC_blocks * number_of_identities ```
|
||||
|
||||
with core count of equal to (Number of Identities). A CBC block is expected to
|
||||
be 1MB in size.
|
||||
|
||||
## Network
|
||||
|
||||
Validators for PoRep are the same validators that are verifying transactions.
|
||||
They have some stake that they have put up as collateral that ensures that
|
||||
their work is honest. If you can prove that a validator verified a fake PoRep,
|
||||
then the validator's stake is slashed.
|
||||
|
||||
Replicators are specialized light clients. They download a part of the ledger
|
||||
and store it and provide proofs of storing the ledger. For each verified proof,
|
||||
replicators are rewarded tokens from the mining pool.
|
||||
|
||||
## Constraints
|
||||
|
||||
Solana's PoRep protocol instroduces the following constraints:
|
||||
|
||||
* At most 14,000 replication identities can be used, because that is how many GPU
|
||||
cores are currently available to a computer costing under $5,000 USD.
|
||||
* Verification requires generating the CBC blocks. That requires space of 2
|
||||
blocks per identity, and 1 GPU core per identity for the same dataset. As
|
||||
many identities at once are batched with as many proofs for those identities
|
||||
verified concurrently for the same dataset.
|
||||
|
||||
## Validation and Replication Protocol
|
||||
|
||||
1. The network sets a replication target number, let's say 1k. 1k PoRep
|
||||
identities are created from signatures of a PoH hash. They are tied to a
|
||||
specific PoH hash. It doesn't matter who creates them, or it could simply be
|
||||
the last 1k validation signatures we saw for the ledger at that count. This is
|
||||
maybe just the initial batch of identities, because we want to stagger identity
|
||||
rotation.
|
||||
2. Any client can use any of these identities to create PoRep proofs.
|
||||
Replicator identities are the CBC encryption keys.
|
||||
3. Periodically at a specific PoH count, a replicator that wants to create
|
||||
PoRep proofs signs the PoH hash at that count. That signature is the seed
|
||||
used to pick the block and identity to replicate. A block is 1TB of ledger.
|
||||
4. Periodically at a specific PoH count, a replicator submits PoRep proofs for
|
||||
their selected block. A signature of the PoH hash at that count is the seed
|
||||
used to sample the 1TB encrypted block, and hash it. This is done faster than
|
||||
it takes to encrypt the 1TB block with the original identity.
|
||||
5. Replicators must submit some number of fake proofs, which they can prove to
|
||||
be fake by providing the seed for the hash result.
|
||||
6. Periodically at a specific PoH count, validators sign the hash and use the
|
||||
signature to select the 1TB block that they need to validate. They batch all
|
||||
the identities and proofs and submit approval for all the verified ones.
|
||||
7. After #6, replicator client submit the proofs of fake proofs.
|
||||
|
||||
For any random seed, Solana requires everyone to use a signature that is
|
||||
derived from a PoH hash. Every node uses the same count so that the same PoH
|
||||
hash is signed by every participant. The signatures are then each
|
||||
cryptographically tied to the keypair, which prevents a leader from grinding on
|
||||
the resulting value for more than 1 identity.
|
||||
|
||||
Key rotation is *staggered*. Once going, the next identity is generated by
|
||||
hashing itself with a PoH hash.
|
||||
|
||||
Since there are many more client identities then encryption identities, the
|
||||
reward is split amont multiple clients to prevent Sybil attacks from generating
|
||||
many clients to acquire the same block of data. To remain BFT, the network
|
||||
needs to avoid a single human entity from storing all the replications of a
|
||||
single chunk of the ledger.
|
||||
|
||||
Solana's solution to this is to require clients to continue using the same
|
||||
identity. If the first round is used to acquire the same block for many client
|
||||
identities, the second round for the same client identities will require a
|
||||
redistribution of the signatures, and therefore PoRep identities and blocks.
|
||||
Thus to get a reward for storage, clients are not rewarded for storage of the
|
||||
first block. The network rewards long-lived client identities more than new
|
||||
ones.
|
||||
|
||||
|
|
|
@ -145,8 +145,9 @@ The public key of a [keypair](#keypair).
|
|||
|
||||
#### replicator
|
||||
|
||||
A type of [client](#client) that stores copies of segments of the
|
||||
[ledger](#ledger).
|
||||
A type of [client](#client) that stores [ledger](#ledger) segments and
|
||||
periodically submits storage proofs to the cluster; not a
|
||||
[fullnode](#fullnode).
|
||||
|
||||
#### secret key
|
||||
|
||||
|
@ -154,8 +155,8 @@ The private key of a [keypair](#keypair).
|
|||
|
||||
#### slot
|
||||
|
||||
The time (i.e. number of [blocks](#block)) for which a [leader](#leader) ingests
|
||||
transactions and produces [entries](#entry).
|
||||
The time (i.e. number of [blocks](#block)) for which a [leader](#leader)
|
||||
ingests transactions and produces [entries](#entry).
|
||||
|
||||
#### sol
|
||||
|
||||
|
@ -215,13 +216,29 @@ for potential future use.
|
|||
A fraction of a [block](#block); the smallest unit sent between
|
||||
[fullnodes](#fullnode).
|
||||
|
||||
#### CBC block
|
||||
|
||||
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
|
||||
many CBC blocks; `ledger_segment_size / cbc_block_size` to be exact.
|
||||
|
||||
#### curio
|
||||
|
||||
A scarce, non-fungible member of a set of curios.
|
||||
|
||||
#### epoch
|
||||
|
||||
The time, i.e. number of [slots](#slot), for which a [leader schedule](#leader-schedule) is valid.
|
||||
The time, i.e. number of [slots](#slot), for which a [leader
|
||||
schedule](#leader-schedule) is valid.
|
||||
|
||||
#### fake storage proof
|
||||
|
||||
A proof which has the same format as a storage proof, but the sha state is
|
||||
actually from hashing a known ledger value which the storage client can reveal
|
||||
and is also easily verifiable by the network on-chain.
|
||||
|
||||
#### ledger segment
|
||||
|
||||
A sequence of [blocks](#block).
|
||||
|
||||
#### light client
|
||||
|
||||
|
@ -237,6 +254,37 @@ Millions of [instructions](#instruction) per second.
|
|||
The component of a [fullnode](#fullnode) responsible for [program](#program)
|
||||
execution.
|
||||
|
||||
#### storage proof
|
||||
|
||||
A set of SHA hash states which is constructed by sampling the encrypted version
|
||||
of the stored [ledger segment](#ledger-segment) at certain offsets.
|
||||
|
||||
#### storage proof challenge
|
||||
|
||||
A [transaction](#transaction) from a [replicator](#replicator) that verifiably
|
||||
proves that a [validator](#validator) [confirmed](#storage-proof-confirmation)
|
||||
a [fake proof](#fake-storage-proof).
|
||||
|
||||
#### storage proof claim
|
||||
|
||||
A [transaction](#transaction) from a [validator](#validator) which is after the
|
||||
timeout period given from the [storage proof
|
||||
confirmation](#storage-proof-confirmation) and which no successful
|
||||
[challenges](#storage-proof-challenge) have been observed which rewards the
|
||||
parties of the [storage proofs](#storage-proof) and confirmations.
|
||||
|
||||
#### storage proof confirmation
|
||||
|
||||
A [transaction](#transaction) from a [validator](#validator) which indicates
|
||||
the set of [real](#storage-proof) and [fake proofs](#fake-storage-proof)
|
||||
submitted by a [replicator](#replicator). The transaction would contain a list
|
||||
of proof hash values and a bit which says if this hash is valid or fake.
|
||||
|
||||
#### storage validation capacity
|
||||
|
||||
The number of keys and samples that a [validator](#validator) can verify each
|
||||
storage epoch.
|
||||
|
||||
#### thin client
|
||||
|
||||
A type of [client](#client) that trusts it is communicating with a valid
|
||||
|
|
|
@ -1,11 +1,19 @@
|
|||
# Storage
|
||||
# Ledger Replication
|
||||
|
||||
The goal of this RFC is to define a protocol for storing a very large ledger
|
||||
over a p2p network that is verified by solana validators. At full capacity on
|
||||
a 1gbps network solana will generate 4 petabytes of data per year. To prevent
|
||||
the network from centralizing around full nodes that have to store the full
|
||||
data set this protocol proposes a way for mining nodes to provide storage
|
||||
capacity for pieces of the network.
|
||||
At full capacity on a 1gbps network solana will generate 4 petabytes of data
|
||||
per year. To prevent the network from centralizing around full nodes that have
|
||||
to store the full data set this protocol proposes a way for mining nodes to
|
||||
provide storage capacity for pieces of the network.
|
||||
|
||||
The basic idea to Proof of Replication is encrypting a dataset with a public
|
||||
symmetric key using CBC encryption, then hash the encrypted dataset. The main
|
||||
problem with the naive approach is that a dishonest storage node can stream the
|
||||
encryption and delete the data as its hashed. The simple solution is to force
|
||||
the hash to be done on the reverse of the encryption, or perhaps with a random
|
||||
order. This ensures that all the data is present during the generation of the
|
||||
proof and it also requires the validator to have the entirety of the encrypted
|
||||
data present for verification of every proof of every identity. So the space
|
||||
required to validate is `number_of_proofs * data_size`
|
||||
|
||||
## Definitions
|
||||
|
||||
|
@ -14,20 +22,20 @@ capacity for pieces of the network.
|
|||
Storage mining client, stores some part of the ledger enumerated in blocks and
|
||||
submits storage proofs to the chain. Not a full-node.
|
||||
|
||||
#### ledger block
|
||||
#### ledger segment
|
||||
|
||||
Portion of the ledger which is downloaded by the replicator where storage proof
|
||||
data is derived.
|
||||
|
||||
#### CBC block
|
||||
|
||||
Smallest encrypted chunk of ledger, an encrypted ledger block would be made of
|
||||
many CBC blocks. `(size of ledger block) / (size of cbc block)` to be exact.
|
||||
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
|
||||
many CBC blocks. `ledger_segment_size / cbc_block_size` to be exact.
|
||||
|
||||
#### storage proof
|
||||
|
||||
A set of sha hash state which is constructed by sampling the encrypted version
|
||||
of the stored ledger block at certain offsets.
|
||||
of the stored ledger segment at certain offsets.
|
||||
|
||||
#### fake storage proof
|
||||
|
||||
|
@ -56,28 +64,16 @@ observed which rewards the parties of the storage proofs and confirmations.
|
|||
|
||||
The number of keys and samples that a validator can verify each storage epoch.
|
||||
|
||||
## Background
|
||||
|
||||
The basic idea to Proof of Replication is encrypting a dataset with a public
|
||||
symmetric key using CBC encryption, then hash the encrypted dataset. The main
|
||||
problem with the naive approach is that a dishonest storage node can stream the
|
||||
encryption and delete the data as its hashed. The simple solution is to force
|
||||
the hash to be done on the reverse of the encryption, or perhaps with a random
|
||||
order. This ensures that all the data is present during the generation of the
|
||||
proof and it also requires the validator to have the entirety of the encrypted
|
||||
data present for verification of every proof of every identity. So the space
|
||||
required to validate is `(Number of Proofs)*(data size)`
|
||||
|
||||
## Optimization with PoH
|
||||
|
||||
Our improvement on this approach is to randomly sample the encrypted blocks
|
||||
Our improvement on this approach is to randomly sample the encrypted segments
|
||||
faster than it takes to encrypt, and record the hash of those samples into the
|
||||
PoH ledger. Thus the blocks stay in the exact same order for every PoRep and
|
||||
PoH ledger. Thus the segments stay in the exact same order for every PoRep and
|
||||
verification can stream the data and verify all the proofs in a single batch.
|
||||
This way we can verify multiple proofs concurrently, each one on its own CUDA
|
||||
core. The total space required for verification is `(1 ledger block) + (2 CBC
|
||||
blocks) * (Number of Identities)`, with core count of equal to (Number of
|
||||
Identities). We use a 64-byte chacha CBC block size.
|
||||
core. The total space required for verification is `1_ledger_segment +
|
||||
2_cbc_blocks * number_of_identities` with core count of equal to
|
||||
`number_of_identities`. We use a 64-byte chacha CBC block size.
|
||||
|
||||
## Network
|
||||
|
||||
|
@ -106,8 +102,8 @@ changes to determine what rate it can validate storage proofs.
|
|||
|
||||
### Constants
|
||||
|
||||
1. NUM\_STORAGE\_ENTRIES: Number of entries in a block of ledger data. The unit
|
||||
of storage for a replicator.
|
||||
1. NUM\_STORAGE\_ENTRIES: Number of entries in a segment of ledger data. The
|
||||
unit of storage for a replicator.
|
||||
2. NUM\_KEY\_ROTATION\_TICKS: Number of ticks to save a PoH value and cause a
|
||||
key generation for the section of ledger just generated and the rotation of
|
||||
another key in the set.
|
||||
|
@ -167,19 +163,19 @@ is:
|
|||
2. A replicator obtains the PoH hash corresponding to the last key rotation
|
||||
along with its entry\_height.
|
||||
3. The replicator signs the PoH hash with its keypair. That signature is the
|
||||
seed used to pick the block to replicate and also the encryption key. The
|
||||
replicator mods the signature with the entry\_height to get which block to
|
||||
seed used to pick the segment to replicate and also the encryption key. The
|
||||
replicator mods the signature with the entry\_height to get which segment to
|
||||
replicate.
|
||||
4. The replicator retrives the ledger by asking peer validators and
|
||||
replicators. See 6.5.
|
||||
5. The replicator then encrypts that block with the key with chacha algorithm
|
||||
5. The replicator then encrypts that segment with the key with chacha algorithm
|
||||
in CBC mode with NUM\_CHACHA\_ROUNDS of encryption.
|
||||
6. The replicator initializes a chacha rng with the signature from step 2 as
|
||||
the seed.
|
||||
7. The replicator generates NUM\_STORAGE\_SAMPLES samples in the range of the
|
||||
entry size and samples the encrypted block with sha256 for 32-bytes at each
|
||||
entry size and samples the encrypted segment with sha256 for 32-bytes at each
|
||||
offset value. Sampling the state should be faster than generating the encrypted
|
||||
block.
|
||||
segment.
|
||||
8. The replicator sends a PoRep proof transaction which contains its sha state
|
||||
at the end of the sampling operation, its seed and the samples it used to the
|
||||
current leader and it is put onto the ledger.
|
||||
|
@ -198,9 +194,9 @@ frozen.
|
|||
### Finding who has a given block of ledger
|
||||
|
||||
1. Validators monitor the transaction stream for storage mining proofs, and
|
||||
keep a mapping of ledger blocks by entry\_height to public keys. When it sees a
|
||||
storage mining proof it updates this mapping and provides an RPC interface
|
||||
which takes an entry\_height and hands back a list of public keys. The client
|
||||
keep a mapping of ledger segments by entry\_height to public keys. When it sees
|
||||
a storage mining proof it updates this mapping and provides an RPC interface
|
||||
which takes an entry\_height and hands back a list of public keys. The client
|
||||
then looks up in their cluster\_info table to see which network address that
|
||||
corresponds to and sends a repair request to retrieve the necessary blocks of
|
||||
ledger.
|
||||
|
|
Loading…
Reference in New Issue