Add documentation for pipelining

This commit is contained in:
Greg Fitzgerald 2018-11-07 16:49:19 -07:00
parent 30697f63f1
commit f96563c3f2
1 changed files with 41 additions and 1 deletions

View File

@ -4,4 +4,44 @@
## Pipelining
## Pipeline Stages
The fullnodes make extensive use of an optimization common in CPU design,
called *pipeling*. Pipelining is the right tool for the job when there's a
stream of input data that needs to be processed by a sequence of steps, and
there's different hardware responsible for each. The quintessential example is
using a washer and dryer to wash/dry/fold several loads of laundry. Washing
must occur before drying and drying before folding, but each of the three
operations is performed by a separate unit. To maximize efficiency, one creates
a pipeline of *stages*. We'll call the washer one stage, the dryer another, and
the folding process a third. To run the pipeline, one adds a second load of
laundry to the washer just after the first load is added to the dryer.
Likewise, the third load is added to the washer after the second is in the
dryer and the first is being folded. In this way, one can make progress on
three loads of laundry simultaneously. Given infinite loads, the pipeline will
consistently complete a load at the rate of the slowest stage in the pipeline.
## Pipelining in the fullnode
The fullnode contains two pipelined processes, one used in leader mode called
the Tpu and one used in validator mode called the Tvu. In both cases, the
hardware being pipelined is the same, the network input, the GPU cards, the CPU
cores, and the network output. What it does with that hardware is different.
The Tpu exists to create ledger entries whereas the Tvu exists to validate
them.
## Pipeline stages in Rust
To approach to creating a pipeline stage in Rust may be unique to Solana. We
haven't seen the same technique used in other Rust projects and there may be
better ways to do it. The Solana approach defines a stage as an object that
communicates to its previous stage and the next stage using channels. By
convention, each stage accepts a *receiver* for input and creates a second
output channel. The second channel is used to pass data to the next stage, and
so its sender is moved into the stage's thread and the receiver is returned
from its constructor.
A well-written stage should create a thread and call a short `run()` method.
The method should read input from its input channel, call a function from
another module that processes it, and then send the output to the output
channel. The functionality in the second module will likely not use threads or
channels.