2018-12-12 14:37:46 -08:00
|
|
|
# Gossip Service
|
2018-11-06 17:00:58 -08:00
|
|
|
|
2018-12-06 12:52:47 -08:00
|
|
|
The Gossip Service acts as a gateway to nodes in the control plane. Fullnodes
|
|
|
|
use the service to ensure information is available to all other nodes in a cluster.
|
|
|
|
The service broadcasts information using a gossip protocol.
|
2018-12-05 15:54:21 -08:00
|
|
|
|
|
|
|
## Gossip Overview
|
|
|
|
|
2018-12-05 19:21:57 -08:00
|
|
|
Nodes continuously share signed data objects among themselves in order to
|
|
|
|
manage a cluster. For example, they share their contact information, ledger
|
|
|
|
height, and votes.
|
2018-12-05 15:54:21 -08:00
|
|
|
|
2018-12-05 19:21:57 -08:00
|
|
|
Every tenth of a second, each node sends a "push" message and/or a "pull"
|
|
|
|
message. Push and pull messages may elicit responses, and push messages may be
|
|
|
|
forwarded on to others in the cluster.
|
2018-12-05 15:54:21 -08:00
|
|
|
|
|
|
|
Gossip runs on a well-known UDP/IP port or a port in a well-known range. Once
|
2018-12-05 19:21:57 -08:00
|
|
|
a cluster is bootstrapped, nodes advertise to each other where to find their
|
2018-12-05 15:54:21 -08:00
|
|
|
gossip endpoint (a socket address).
|
|
|
|
|
|
|
|
## Gossip Records
|
|
|
|
|
|
|
|
Records shared over gossip are arbitrary, but signed and versioned (with a
|
|
|
|
timestamp) as needed to make sense to the node receiving them. If a node
|
|
|
|
recieves two records from the same source, it it updates its own copy with the
|
|
|
|
record with the most recent timestamp.
|
|
|
|
|
2018-12-06 12:52:47 -08:00
|
|
|
## Gossip Service Interface
|
2018-12-05 15:54:21 -08:00
|
|
|
|
|
|
|
### Push Message
|
|
|
|
|
2018-12-05 19:21:57 -08:00
|
|
|
A node sends a push message to tells the cluster it has information to share.
|
2018-12-05 15:54:21 -08:00
|
|
|
Nodes send push messages to `PUSH_FANOUT` push peers.
|
|
|
|
|
|
|
|
Upon receiving a push message, a node examines the message for:
|
|
|
|
|
|
|
|
1. Duplication: if the message has been seen before, the node responds with
|
|
|
|
`PushMessagePrune` and drops the message
|
|
|
|
|
2018-12-12 10:01:54 -08:00
|
|
|
2. New data: if the message is new to the node
|
|
|
|
* Stores the new information with an updated version in its cluster info and
|
|
|
|
purges any previous older value
|
|
|
|
* Stores the message in `pushed_once` (used for detecting duplicates,
|
|
|
|
purged after `PUSH_MSG_TIMEOUT * 5` ms)
|
|
|
|
* Retransmits the messages to its own push peers
|
2018-12-05 15:54:21 -08:00
|
|
|
|
|
|
|
3. Expiration: nodes drop push messages that are older than `PUSH_MSG_TIMEOUT`
|
|
|
|
|
|
|
|
### Push Peers, Prune Message
|
|
|
|
|
|
|
|
A nodes selects its push peers at random from the active set of known peers.
|
|
|
|
The node keeps this selection for a relatively long time. When a prune message
|
|
|
|
is received, the node drops the push peer that sent the prune. Prune is an
|
|
|
|
indication that there is another, faster path to that node than direct push.
|
|
|
|
|
|
|
|
The set of push peers is kept fresh by rotating a new node into the set every
|
|
|
|
`PUSH_MSG_TIMEOUT/2` milliseconds.
|
|
|
|
|
|
|
|
### Pull Message
|
|
|
|
|
2018-12-05 19:21:57 -08:00
|
|
|
A node sends a pull message to ask the cluster if there is any new information.
|
2018-12-05 15:54:21 -08:00
|
|
|
A pull message is sent to a single peer at random and comprises a Bloom filter
|
|
|
|
that represents things it already has. A node receiving a pull message
|
|
|
|
iterates over its values and constructs a pull response of things that miss the
|
|
|
|
filter and would fit in a message.
|
|
|
|
|
2018-12-12 10:01:54 -08:00
|
|
|
A node constructs the pull Bloom filter by iterating over current values and
|
|
|
|
recently purged values.
|
|
|
|
|
|
|
|
A node handles items in a pull response the same way it handles new data in a
|
|
|
|
push message.
|
|
|
|
|
|
|
|
|
|
|
|
## Purging
|
2018-12-05 15:54:21 -08:00
|
|
|
|
2018-12-12 10:01:54 -08:00
|
|
|
Nodes retain prior versions of values (those updated by a pull or push) and
|
|
|
|
expired values (those older than `GOSSIP_PULL_CRDS_TIMEOUT_MS`) in
|
|
|
|
`purged_values` (things I recently had). Nodes purge `purged_values` that are
|
|
|
|
older than `5 * GOSSIP_PULL_CRDS_TIMEOUT_MS`.
|