2018-11-21 15:18:51 -08:00
|
|
|
# A Solana Cluster
|
|
|
|
|
|
|
|
A Solana cluster is a set of fullnodes working together to serve client
|
|
|
|
transactions and maintain the integrity of the ledger. Many clusters may
|
2018-11-28 10:56:45 -08:00
|
|
|
coexist. When two clusters share a common genesis block, they attempt to
|
2018-11-21 15:18:51 -08:00
|
|
|
converge. Otherwise, they simply ignore the existence of the other.
|
2018-11-28 10:56:45 -08:00
|
|
|
Transactions sent to the wrong one are quietly rejected. In this chapter, we'll
|
|
|
|
discuss how a cluster is created, how nodes join the cluster, how they share
|
|
|
|
the ledger, how they ensure the ledger is replicated, and how they cope with
|
|
|
|
buggy and malicious nodes.
|
|
|
|
|
2018-11-28 13:00:02 -08:00
|
|
|
## Creating a Cluster
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-02 15:43:40 -08:00
|
|
|
Before starting any fullnodes, one first needs to create a *genesis block*.
|
2018-11-28 14:45:43 -08:00
|
|
|
The block contains entries referencing two public keys, a *mint* and a
|
|
|
|
*bootstrap leader*. The fullnode holding the bootstrap leader's secret key is
|
2018-11-28 10:56:45 -08:00
|
|
|
responsible for appending the first entries to the ledger. It initializes its
|
2018-11-28 14:45:43 -08:00
|
|
|
internal state with the mint's account. That account will hold the number of
|
2018-12-02 15:43:40 -08:00
|
|
|
native tokens defined by the genesis block. The second fullnode then contacts
|
|
|
|
the bootstrap leader to register as a *validator* or *replicator*. Additional
|
2018-11-28 14:45:43 -08:00
|
|
|
fullnodes then register with any registered member of the cluster.
|
|
|
|
|
2018-12-02 15:43:40 -08:00
|
|
|
A validator receives all entries from the leader and submits votes confirming
|
|
|
|
those entries are valid. After voting, the validator is expected to store those
|
|
|
|
entries until replicator nodes submit proofs that they have stored copies of
|
|
|
|
it. Once the validator observes a sufficient number of copies exist, it deletes
|
|
|
|
its copy.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-11-28 13:00:02 -08:00
|
|
|
## Joining a Cluster
|
2018-11-28 10:56:45 -08:00
|
|
|
|
|
|
|
Fullnodes and replicators enter the cluster via registration messages sent to
|
|
|
|
its *control plane*. The control plane is implemented using a *gossip*
|
|
|
|
protocol, meaning that a node may register with any existing node, and expect
|
2018-12-02 15:43:40 -08:00
|
|
|
its registration to propagate to all nodes in the cluster. The time it takes
|
|
|
|
for all nodes to synchronize is proportional to the square of the number of
|
2018-12-04 19:52:38 -08:00
|
|
|
nodes participating in the cluster. Algorithmically, that's considered very
|
|
|
|
slow, but in exchange for that time, a node is assured that it eventually has
|
|
|
|
all the same information as every other node, and that that information cannot
|
|
|
|
be censored by any one node.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-02 15:43:40 -08:00
|
|
|
## Sending Transactions to a Cluster
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-02 15:43:40 -08:00
|
|
|
Clients send transactions to any fullnode's Transaction Processing Unit (TPU)
|
|
|
|
port. If the node is in the validator role, it forwards the transaction to the
|
|
|
|
designated leader. If in the leader role, the node bundles incoming
|
|
|
|
transactions, timestamps them creating an *entry*, and pushes them onto the
|
|
|
|
cluster's *data plane*. Once on the data plane, the transactions are validated
|
|
|
|
by validator nodes and replicated by replicator nodes, effectively appending
|
|
|
|
them to the ledger.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-04 19:52:38 -08:00
|
|
|
## Confirming Transactions
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-04 19:52:38 -08:00
|
|
|
A Solana cluster is capable of subsecond *confirmation* for up to 150 nodes
|
2018-12-02 15:43:40 -08:00
|
|
|
with plans to scale up to hundreds of thousands of nodes. Once fully
|
2018-12-04 19:52:38 -08:00
|
|
|
implemented, confirmation times are expected to increase only with the
|
|
|
|
logarithm of the number of validators, where the logarithm's base is very high.
|
|
|
|
If the base is one thousand, for example, it means that for the first thousand
|
|
|
|
nodes, confirmation will be the duration of three network hops plus the time it
|
|
|
|
takes the slowest validator of a supermajority to vote. For the next million
|
|
|
|
nodes, confirmation increases by only one network hop.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-04 19:52:38 -08:00
|
|
|
Solana defines confirmation as the duration of time from when the leader
|
2018-12-02 15:43:40 -08:00
|
|
|
timestamps a new entry to the moment when it recognizes a supermajority of
|
|
|
|
ledger votes.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-04 19:52:38 -08:00
|
|
|
A gossip network is much too slow to achieve subsecond confirmation once the
|
2018-12-02 15:43:40 -08:00
|
|
|
network grows beyond a certain size. The time it takes to send messages to all
|
|
|
|
nodes is proportional to the square of the number of nodes. If a blockchain
|
2018-12-04 19:52:38 -08:00
|
|
|
wants to achieve low confirmation and attempts to do it using a gossip network,
|
|
|
|
it will be forced to centralize to just a handful of nodes.
|
2018-11-28 10:56:45 -08:00
|
|
|
|
2018-12-04 19:52:38 -08:00
|
|
|
Scalable confirmation can be achieved using the follow combination of
|
|
|
|
techniques:
|
2018-12-02 15:43:40 -08:00
|
|
|
|
|
|
|
1. Timestamp transactions with a VDF sample and sign the timestamp.
|
|
|
|
2. Split the transactions into batches, send each to separate nodes and have
|
|
|
|
each node share its batch with its peers.
|
|
|
|
3. Repeat the previous step recursively until all nodes have all batches.
|
|
|
|
|
|
|
|
Solana rotates leaders at fixed intervals, called *slots*. Each leader may only
|
|
|
|
produce entries during its allotted slot. The leader therefore timestamps
|
|
|
|
transactions so that validators may lookup the public key of the designated
|
|
|
|
leader. The leader then signs the timestamp so that a validator may verify the
|
|
|
|
signature, proving the signer is owner of the designated leader's public key.
|
|
|
|
|
|
|
|
Next, transactions are broken into batches so that a node can send transactions
|
|
|
|
to multiple parties without making multiple copies. If, for example, the leader
|
|
|
|
needed to send 60 transactions to 6 nodes, it would break that collection of 60
|
|
|
|
into batches of 10 transactions and send one to each node. This allows the
|
|
|
|
leader to put 60 transactions on the wire, not 60 transactions for each node.
|
|
|
|
Each node then shares its batch with its peers. Once the node has collected all
|
|
|
|
6 batches, it reconstructs the original set of 60 transactions.
|
|
|
|
|
|
|
|
A batch of transactions can only be split so many times before it is so small
|
|
|
|
that header information becomes the primary consumer of network bandwidth. At
|
|
|
|
the time of this writing, the approach is scaling well up to about 150
|
|
|
|
validators. To scale up to hundreds of thousands of validators, each node can
|
|
|
|
apply the same technique as the leader node to another set of nodes of equal
|
|
|
|
size. We call the technique *data plane fanout*, but it is not yet implemented.
|