solana/book/src/cluster.md

# A Solana Cluster

A Solana cluster is a set of fullnodes working together to serve client
transactions and maintain the integrity of the ledger. Many clusters may
coexist. When two clusters share a common genesis block, they attempt to
converge. Otherwise, they simply ignore the existence of the other.
Transactions sent to the wrong one are quietly rejected. In this chapter, we'll
discuss how a cluster is created, how nodes join the cluster, how they share
the ledger, how they ensure the ledger is replicated, and how they cope with
buggy and malicious nodes.

## Creating a Cluster

Before starting any fullnodes, one first needs to create a *genesis block*.
The block contains entries referencing two public keys, a *mint* and a
*bootstrap leader*. The fullnode holding the bootstrap leader's secret key is
responsible for appending the first entries to the ledger. It initializes its
internal state with the mint's account. That account will hold the number of
native tokens defined by the genesis block. The second fullnode then contacts
the bootstrap leader to register as a *validator* or *replicator*. Additional
fullnodes then register with any registered member of the cluster.

A validator receives all entries from the leader and submits votes confirming
those entries are valid. After voting, the validator is expected to store those
entries until replicator nodes submit proofs that they have stored copies of
it. Once the validator observes a sufficient number of copies exist, it deletes
its copy.

## Joining a Cluster

Fullnodes and replicators enter the cluster via registration messages sent to
its *control plane*. The control plane is implemented using a *gossip*
protocol, meaning that a node may register with any existing node, and expect
its registration to propagate to all nodes in the cluster. The time it takes
for all nodes to synchronize is proportional to the square of the number of
nodes participating in the cluster. Algorithmically, that's considered very
slow, but in exchange for that time, a node is assured that it eventually has
all the same information as every other node, and that that information cannot
be censored by any one node.

## Sending Transactions to a Cluster

Clients send transactions to any fullnode's Transaction Processing Unit (TPU)
port. If the node is in the validator role, it forwards the transaction to the
designated leader. If in the leader role, the node bundles incoming
transactions, timestamps them creating an *entry*, and pushes them onto the
cluster's *data plane*. Once on the data plane, the transactions are validated
by validator nodes and replicated by replicator nodes, effectively appending
them to the ledger.

## Confirming Transactions

A Solana cluster is capable of subsecond *confirmation* for up to 150 nodes
with plans to scale up to hundreds of thousands of nodes. Once fully
implemented, confirmation times are expected to increase only with the
logarithm of the number of validators, where the logarithm's base is very high.
If the base is one thousand, for example, it means that for the first thousand
nodes, confirmation will be the duration of three network hops plus the time it
takes the slowest validator of a supermajority to vote. For the next million
nodes, confirmation increases by only one network hop.

Solana defines confirmation as the duration of time from when the leader
timestamps a new entry to the moment when it recognizes a supermajority of
ledger votes.

A gossip network is much too slow to achieve subsecond confirmation once the
network grows beyond a certain size. The time it takes to send messages to all
nodes is proportional to the square of the number of nodes. If a blockchain
wants to achieve low confirmation and attempts to do it using a gossip network,
it will be forced to centralize to just a handful of nodes.

Scalable confirmation can be achieved using the follow combination of
techniques:

1. Timestamp transactions with a VDF sample and sign the timestamp.
2. Split the transactions into batches, send each to separate nodes and have
   each node share its batch with its peers.
3. Repeat the previous step recursively until all nodes have all batches.

Solana rotates leaders at fixed intervals, called *slots*. Each leader may only
produce entries during its allotted slot. The leader therefore timestamps
transactions so that validators may lookup the public key of the designated
leader. The leader then signs the timestamp so that a validator may verify the
signature, proving the signer is owner of the designated leader's public key.

Next, transactions are broken into batches so that a node can send transactions
to multiple parties without making multiple copies. If, for example, the leader
needed to send 60 transactions to 6 nodes, it would break that collection of 60
into batches of 10 transactions and send one to each node. This allows the
leader to put 60 transactions on the wire, not 60 transactions for each node.
Each node then shares its batch with its peers. Once the node has collected all
6 batches, it reconstructs the original set of 60 transactions.

A batch of transactions can only be split so many times before it is so small
that header information becomes the primary consumer of network bandwidth. At
the time of this writing, the approach is scaling well up to about 150
validators. To scale up to hundreds of thousands of validators, each node can
apply the same technique as the leader node to another set of nodes of equal
size. We call the technique *data plane fanout*; learn more in the [data plan
fanout](data-plane-fanout.md) section.
Reorg the markdown book to cater to app devs First, talk about how a client interacts with Solana to do useful things. Then describe how the fullnode you're talking to works and why it's so very fast. Last, why that fullnode you don't trust does what you asked it to anyway. 2018-11-21 15:18:51 -08:00			`# A Solana Cluster`

			`A Solana cluster is a set of fullnodes working together to serve client`
			`transactions and maintain the integrity of the ledger. Many clusters may`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00			`coexist. When two clusters share a common genesis block, they attempt to`
Reorg the markdown book to cater to app devs First, talk about how a client interacts with Solana to do useful things. Then describe how the fullnode you're talking to works and why it's so very fast. Last, why that fullnode you don't trust does what you asked it to anyway. 2018-11-21 15:18:51 -08:00			`converge. Otherwise, they simply ignore the existence of the other.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00			`Transactions sent to the wrong one are quietly rejected. In this chapter, we'll`
			`discuss how a cluster is created, how nodes join the cluster, how they share`
			`the ledger, how they ensure the ledger is replicated, and how they cope with`
			`buggy and malicious nodes.`

Fix capitalization And delete JSON RPC Service for now, since it currently has no content. 2018-11-28 13:00:02 -08:00			`## Creating a Cluster`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`Before starting any fullnodes, one first needs to create a genesis block.`
Drop mention of CLI tooling This is a "how does it work?" chapter, not "how do I do it?" 2018-11-28 14:45:43 -08:00			`The block contains entries referencing two public keys, a mint and a`
			`bootstrap leader. The fullnode holding the bootstrap leader's secret key is`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00			`responsible for appending the first entries to the ledger. It initializes its`
Drop mention of CLI tooling This is a "how does it work?" chapter, not "how do I do it?" 2018-11-28 14:45:43 -08:00			`internal state with the mint's account. That account will hold the number of`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`native tokens defined by the genesis block. The second fullnode then contacts`
			`the bootstrap leader to register as a validator or replicator. Additional`
Drop mention of CLI tooling This is a "how does it work?" chapter, not "how do I do it?" 2018-11-28 14:45:43 -08:00			`fullnodes then register with any registered member of the cluster.`

Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`A validator receives all entries from the leader and submits votes confirming`
			`those entries are valid. After voting, the validator is expected to store those`
			`entries until replicator nodes submit proofs that they have stored copies of`
			`it. Once the validator observes a sufficient number of copies exist, it deletes`
			`its copy.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Fix capitalization And delete JSON RPC Service for now, since it currently has no content. 2018-11-28 13:00:02 -08:00			`## Joining a Cluster`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
			`Fullnodes and replicators enter the cluster via registration messages sent to`
			`its control plane. The control plane is implemented using a gossip`
			`protocol, meaning that a node may register with any existing node, and expect`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`its registration to propagate to all nodes in the cluster. The time it takes`
			`for all nodes to synchronize is proportional to the square of the number of`
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`nodes participating in the cluster. Algorithmically, that's considered very`
			`slow, but in exchange for that time, a node is assured that it eventually has`
			`all the same information as every other node, and that that information cannot`
			`be censored by any one node.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`## Sending Transactions to a Cluster`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`Clients send transactions to any fullnode's Transaction Processing Unit (TPU)`
			`port. If the node is in the validator role, it forwards the transaction to the`
			`designated leader. If in the leader role, the node bundles incoming`
			`transactions, timestamps them creating an entry, and pushes them onto the`
			`cluster's data plane. Once on the data plane, the transactions are validated`
			`by validator nodes and replicated by replicator nodes, effectively appending`
			`them to the ledger.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`## Confirming Transactions`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`A Solana cluster is capable of subsecond confirmation for up to 150 nodes`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`with plans to scale up to hundreds of thousands of nodes. Once fully`
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`implemented, confirmation times are expected to increase only with the`
			`logarithm of the number of validators, where the logarithm's base is very high.`
			`If the base is one thousand, for example, it means that for the first thousand`
			`nodes, confirmation will be the duration of three network hops plus the time it`
			`takes the slowest validator of a supermajority to vote. For the next million`
			`nodes, confirmation increases by only one network hop.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`Solana defines confirmation as the duration of time from when the leader`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`timestamps a new entry to the moment when it recognizes a supermajority of`
			`ledger votes.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`A gossip network is much too slow to achieve subsecond confirmation once the`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00			`network grows beyond a certain size. The time it takes to send messages to all`
			`nodes is proportional to the square of the number of nodes. If a blockchain`
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`wants to achieve low confirmation and attempts to do it using a gossip network,`
			`it will be forced to centralize to just a handful of nodes.`
Expland cluster overview, integrate Avalanche chapter 2018-11-28 10:56:45 -08:00
Free up term "finality" to imply "economic finality" (#2002) * leader finality -> confirmation Free up term "finality" to imply "economic finality." * Reorder chapters 2018-12-04 19:52:38 -08:00			`Scalable confirmation can be achieved using the follow combination of`
			`techniques:`
Explain how ledger broadcasting works (#1960) 2018-12-02 15:43:40 -08:00
			`1. Timestamp transactions with a VDF sample and sign the timestamp.`
			`2. Split the transactions into batches, send each to separate nodes and have`
			`each node share its batch with its peers.`
			`3. Repeat the previous step recursively until all nodes have all batches.`

			`Solana rotates leaders at fixed intervals, called slots. Each leader may only`
			`produce entries during its allotted slot. The leader therefore timestamps`
			`transactions so that validators may lookup the public key of the designated`
			`leader. The leader then signs the timestamp so that a validator may verify the`
			`signature, proving the signer is owner of the designated leader's public key.`

			`Next, transactions are broken into batches so that a node can send transactions`
			`to multiple parties without making multiple copies. If, for example, the leader`
			`needed to send 60 transactions to 6 nodes, it would break that collection of 60`
			`into batches of 10 transactions and send one to each node. This allows the`
			`leader to put 60 transactions on the wire, not 60 transactions for each node.`
			`Each node then shares its batch with its peers. Once the node has collected all`
			`6 batches, it reconstructs the original set of 60 transactions.`

			`A batch of transactions can only be split so many times before it is so small`
			`that header information becomes the primary consumer of network bandwidth. At`
			`the time of this writing, the approach is scaling well up to about 150`
			`validators. To scale up to hundreds of thousands of validators, each node can`
			`apply the same technique as the leader node to another set of nodes of equal`
Brush up data-plane-fanout to read less like a proposal 2019-03-01 14:45:08 -08:00			`size. We call the technique data plane fanout; learn more in the [data plan`
			`fanout](data-plane-fanout.md) section.`