Update ledger replication chapter (#2029)

* ledger block -> ledger segment

The book already defines a *block* to be a slight variation of
how block-based changes define it. It's the thing the cluster
confirms should be the next set of transactions on the ledger.

* Boot storage description from the book
This commit is contained in:
Greg Fitzgerald 2018-12-07 16:52:36 -07:00 committed by GitHub
parent 3441d3399b
commit b5a80d3d49
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 88 additions and 156 deletions

View File

@ -1,4 +1,4 @@
# Fullnode
# Anatomy of a Fullnode
<img alt="Fullnode block diagrams" src="img/fullnode.svg" class="center"/>

View File

@ -1,114 +1,2 @@
# Ledger Replication
## Background
At full capacity on a 1gbps network Solana would generate 4 petabytes of data
per year. If each fullnode was required to store the full ledger, the cost of
storage would discourage fullnode participation, thus centralizing the network
around those that could afford it. Solana aims to keep the cost of a fullnode
below $5,000 USD to maximize participation. To achieve that, the network needs
to minimize redundant storage while at the same time ensuring the validity and
availability of each copy.
To trust storage of ledger segments, Solana has *replicators* periodically
submit proofs to the network that the data was replicated. Each proof is called
a Proof of Replication. The basic idea of it is to encrypt a dataset with a
public symmetric key and then hash the encrypted dataset. Solana uses [CBC
encryption](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation#Cipher_Block_Chaining_(CBC)).
To prevent a malicious replicator from deleting the data as soon as it's
hashed, a replicator is required hash random segments of the dataset.
Alternatively, Solana could require hashing the reverse of the encrypted data,
but random sampling is sufficient and much faster. Either solution ensures
that all the data is present during the generation of the proof and also
requires the validator to have the entirety of the encrypted data present for
verification of every proof of every identity. The space required to validate
is:
``` number_of_proofs * data_size ```
## Optimization with PoH
Solana is not the only distribute systems project using Proof of Replication,
but it might be the most efficient implementation because of its ability to
synchronize nodes with its Proof of History. With PoH, Solana is able to record
a hash of the PoRep samples in the ledger. Thus the blocks stay in the exact
same order for every PoRep and verification can stream the data and verify all
the proofs in a single batch. This way Solana can verify multiple proofs
concurrently, each one on its own GPU core. With the current generation of
graphics cards our network can support up to 14,000 replication identities or
symmetric keys. The total space required for verification is:
``` 2 CBC_blocks * number_of_identities ```
with core count of equal to (Number of Identities). A CBC block is expected to
be 1MB in size.
## Network
Validators for PoRep are the same validators that are verifying transactions.
They have some stake that they have put up as collateral that ensures that
their work is honest. If you can prove that a validator verified a fake PoRep,
then the validator's stake is slashed.
Replicators are specialized light clients. They download a part of the ledger
and store it and provide proofs of storing the ledger. For each verified proof,
replicators are rewarded tokens from the mining pool.
## Constraints
Solana's PoRep protocol instroduces the following constraints:
* At most 14,000 replication identities can be used, because that is how many GPU
cores are currently available to a computer costing under $5,000 USD.
* Verification requires generating the CBC blocks. That requires space of 2
blocks per identity, and 1 GPU core per identity for the same dataset. As
many identities at once are batched with as many proofs for those identities
verified concurrently for the same dataset.
## Validation and Replication Protocol
1. The network sets a replication target number, let's say 1k. 1k PoRep
identities are created from signatures of a PoH hash. They are tied to a
specific PoH hash. It doesn't matter who creates them, or it could simply be
the last 1k validation signatures we saw for the ledger at that count. This is
maybe just the initial batch of identities, because we want to stagger identity
rotation.
2. Any client can use any of these identities to create PoRep proofs.
Replicator identities are the CBC encryption keys.
3. Periodically at a specific PoH count, a replicator that wants to create
PoRep proofs signs the PoH hash at that count. That signature is the seed
used to pick the block and identity to replicate. A block is 1TB of ledger.
4. Periodically at a specific PoH count, a replicator submits PoRep proofs for
their selected block. A signature of the PoH hash at that count is the seed
used to sample the 1TB encrypted block, and hash it. This is done faster than
it takes to encrypt the 1TB block with the original identity.
5. Replicators must submit some number of fake proofs, which they can prove to
be fake by providing the seed for the hash result.
6. Periodically at a specific PoH count, validators sign the hash and use the
signature to select the 1TB block that they need to validate. They batch all
the identities and proofs and submit approval for all the verified ones.
7. After #6, replicator client submit the proofs of fake proofs.
For any random seed, Solana requires everyone to use a signature that is
derived from a PoH hash. Every node uses the same count so that the same PoH
hash is signed by every participant. The signatures are then each
cryptographically tied to the keypair, which prevents a leader from grinding on
the resulting value for more than 1 identity.
Key rotation is *staggered*. Once going, the next identity is generated by
hashing itself with a PoH hash.
Since there are many more client identities then encryption identities, the
reward is split amont multiple clients to prevent Sybil attacks from generating
many clients to acquire the same block of data. To remain BFT, the network
needs to avoid a single human entity from storing all the replications of a
single chunk of the ledger.
Solana's solution to this is to require clients to continue using the same
identity. If the first round is used to acquire the same block for many client
identities, the second round for the same client identities will require a
redistribution of the signatures, and therefore PoRep identities and blocks.
Thus to get a reward for storage, clients are not rewarded for storage of the
first block. The network rewards long-lived client identities more than new
ones.

View File

@ -145,8 +145,9 @@ The public key of a [keypair](#keypair).
#### replicator
A type of [client](#client) that stores copies of segments of the
[ledger](#ledger).
A type of [client](#client) that stores [ledger](#ledger) segments and
periodically submits storage proofs to the cluster; not a
[fullnode](#fullnode).
#### secret key
@ -154,8 +155,8 @@ The private key of a [keypair](#keypair).
#### slot
The time (i.e. number of [blocks](#block)) for which a [leader](#leader) ingests
transactions and produces [entries](#entry).
The time (i.e. number of [blocks](#block)) for which a [leader](#leader)
ingests transactions and produces [entries](#entry).
#### sol
@ -215,13 +216,29 @@ for potential future use.
A fraction of a [block](#block); the smallest unit sent between
[fullnodes](#fullnode).
#### CBC block
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
many CBC blocks; `ledger_segment_size / cbc_block_size` to be exact.
#### curio
A scarce, non-fungible member of a set of curios.
#### epoch
The time, i.e. number of [slots](#slot), for which a [leader schedule](#leader-schedule) is valid.
The time, i.e. number of [slots](#slot), for which a [leader
schedule](#leader-schedule) is valid.
#### fake storage proof
A proof which has the same format as a storage proof, but the sha state is
actually from hashing a known ledger value which the storage client can reveal
and is also easily verifiable by the network on-chain.
#### ledger segment
A sequence of [blocks](#block).
#### light client
@ -237,6 +254,37 @@ Millions of [instructions](#instruction) per second.
The component of a [fullnode](#fullnode) responsible for [program](#program)
execution.
#### storage proof
A set of SHA hash states which is constructed by sampling the encrypted version
of the stored [ledger segment](#ledger-segment) at certain offsets.
#### storage proof challenge
A [transaction](#transaction) from a [replicator](#replicator) that verifiably
proves that a [validator](#validator) [confirmed](#storage-proof-confirmation)
a [fake proof](#fake-storage-proof).
#### storage proof claim
A [transaction](#transaction) from a [validator](#validator) which is after the
timeout period given from the [storage proof
confirmation](#storage-proof-confirmation) and which no successful
[challenges](#storage-proof-challenge) have been observed which rewards the
parties of the [storage proofs](#storage-proof) and confirmations.
#### storage proof confirmation
A [transaction](#transaction) from a [validator](#validator) which indicates
the set of [real](#storage-proof) and [fake proofs](#fake-storage-proof)
submitted by a [replicator](#replicator). The transaction would contain a list
of proof hash values and a bit which says if this hash is valid or fake.
#### storage validation capacity
The number of keys and samples that a [validator](#validator) can verify each
storage epoch.
#### thin client
A type of [client](#client) that trusts it is communicating with a valid

View File

@ -1,11 +1,19 @@
# Storage
# Ledger Replication
The goal of this RFC is to define a protocol for storing a very large ledger
over a p2p network that is verified by solana validators. At full capacity on
a 1gbps network solana will generate 4 petabytes of data per year. To prevent
the network from centralizing around full nodes that have to store the full
data set this protocol proposes a way for mining nodes to provide storage
capacity for pieces of the network.
At full capacity on a 1gbps network solana will generate 4 petabytes of data
per year. To prevent the network from centralizing around full nodes that have
to store the full data set this protocol proposes a way for mining nodes to
provide storage capacity for pieces of the network.
The basic idea to Proof of Replication is encrypting a dataset with a public
symmetric key using CBC encryption, then hash the encrypted dataset. The main
problem with the naive approach is that a dishonest storage node can stream the
encryption and delete the data as its hashed. The simple solution is to force
the hash to be done on the reverse of the encryption, or perhaps with a random
order. This ensures that all the data is present during the generation of the
proof and it also requires the validator to have the entirety of the encrypted
data present for verification of every proof of every identity. So the space
required to validate is `number_of_proofs * data_size`
## Definitions
@ -14,20 +22,20 @@ capacity for pieces of the network.
Storage mining client, stores some part of the ledger enumerated in blocks and
submits storage proofs to the chain. Not a full-node.
#### ledger block
#### ledger segment
Portion of the ledger which is downloaded by the replicator where storage proof
data is derived.
#### CBC block
Smallest encrypted chunk of ledger, an encrypted ledger block would be made of
many CBC blocks. `(size of ledger block) / (size of cbc block)` to be exact.
Smallest encrypted chunk of ledger, an encrypted ledger segment would be made of
many CBC blocks. `ledger_segment_size / cbc_block_size` to be exact.
#### storage proof
A set of sha hash state which is constructed by sampling the encrypted version
of the stored ledger block at certain offsets.
of the stored ledger segment at certain offsets.
#### fake storage proof
@ -56,28 +64,16 @@ observed which rewards the parties of the storage proofs and confirmations.
The number of keys and samples that a validator can verify each storage epoch.
## Background
The basic idea to Proof of Replication is encrypting a dataset with a public
symmetric key using CBC encryption, then hash the encrypted dataset. The main
problem with the naive approach is that a dishonest storage node can stream the
encryption and delete the data as its hashed. The simple solution is to force
the hash to be done on the reverse of the encryption, or perhaps with a random
order. This ensures that all the data is present during the generation of the
proof and it also requires the validator to have the entirety of the encrypted
data present for verification of every proof of every identity. So the space
required to validate is `(Number of Proofs)*(data size)`
## Optimization with PoH
Our improvement on this approach is to randomly sample the encrypted blocks
Our improvement on this approach is to randomly sample the encrypted segments
faster than it takes to encrypt, and record the hash of those samples into the
PoH ledger. Thus the blocks stay in the exact same order for every PoRep and
PoH ledger. Thus the segments stay in the exact same order for every PoRep and
verification can stream the data and verify all the proofs in a single batch.
This way we can verify multiple proofs concurrently, each one on its own CUDA
core. The total space required for verification is `(1 ledger block) + (2 CBC
blocks) * (Number of Identities)`, with core count of equal to (Number of
Identities). We use a 64-byte chacha CBC block size.
core. The total space required for verification is `1_ledger_segment +
2_cbc_blocks * number_of_identities` with core count of equal to
`number_of_identities`. We use a 64-byte chacha CBC block size.
## Network
@ -106,8 +102,8 @@ changes to determine what rate it can validate storage proofs.
### Constants
1. NUM\_STORAGE\_ENTRIES: Number of entries in a block of ledger data. The unit
of storage for a replicator.
1. NUM\_STORAGE\_ENTRIES: Number of entries in a segment of ledger data. The
unit of storage for a replicator.
2. NUM\_KEY\_ROTATION\_TICKS: Number of ticks to save a PoH value and cause a
key generation for the section of ledger just generated and the rotation of
another key in the set.
@ -167,19 +163,19 @@ is:
2. A replicator obtains the PoH hash corresponding to the last key rotation
along with its entry\_height.
3. The replicator signs the PoH hash with its keypair. That signature is the
seed used to pick the block to replicate and also the encryption key. The
replicator mods the signature with the entry\_height to get which block to
seed used to pick the segment to replicate and also the encryption key. The
replicator mods the signature with the entry\_height to get which segment to
replicate.
4. The replicator retrives the ledger by asking peer validators and
replicators. See 6.5.
5. The replicator then encrypts that block with the key with chacha algorithm
5. The replicator then encrypts that segment with the key with chacha algorithm
in CBC mode with NUM\_CHACHA\_ROUNDS of encryption.
6. The replicator initializes a chacha rng with the signature from step 2 as
the seed.
7. The replicator generates NUM\_STORAGE\_SAMPLES samples in the range of the
entry size and samples the encrypted block with sha256 for 32-bytes at each
entry size and samples the encrypted segment with sha256 for 32-bytes at each
offset value. Sampling the state should be faster than generating the encrypted
block.
segment.
8. The replicator sends a PoRep proof transaction which contains its sha state
at the end of the sampling operation, its seed and the samples it used to the
current leader and it is put onto the ledger.
@ -198,9 +194,9 @@ frozen.
### Finding who has a given block of ledger
1. Validators monitor the transaction stream for storage mining proofs, and
keep a mapping of ledger blocks by entry\_height to public keys. When it sees a
storage mining proof it updates this mapping and provides an RPC interface
which takes an entry\_height and hands back a list of public keys. The client
keep a mapping of ledger segments by entry\_height to public keys. When it sees
a storage mining proof it updates this mapping and provides an RPC interface
which takes an entry\_height and hands back a list of public keys. The client
then looks up in their cluster\_info table to see which network address that
corresponds to and sends a repair request to retrieve the necessary blocks of
ledger.