The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.
We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.
Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`
* Add a new `ClientRequestReceiver` type that wraps a
`mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
converting the successful result of `inner.poll_next_unpin` into an
`InProgressClientRequest`
* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
on the ClientRequest sender, so the invariant is broken
Add a MustUseOneshotSender, which panics if its inner sender is unused.
Callers must call `send()` on the MustUseOneshotSender, or ensure that
the sender is canceled.
Replaces an unreliable panic in `Client::call()` with a reliable panic
when a must-use sender is dropped.
Previously, tx would be dropped before send if:
- the success case would have used tx to wait for further messages,
- but the response was actually an error.
Instead, send the error on `tx` and call `fail_with()` using the same
error.
To support this change, allow `fail_with()` to take a `PeerError` or a
`SharedPeerError`.
* Rewrite GetData handling to match the zcashd implementation
`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)
This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.
Also change Zebra's GetData responses to peer request so they ignore
missing blocks.
* Stop hanging on incomplete transaction or block responses
Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
that were successfully received
2. if none of the expected blocks or transactions were received, return
an error, and close the connection
The cancellation implementation changes made to the connection state machine
mean that if a response oneshot is dropped, the connection will avoid
cancelling the request. So the heartbeat task does have to wait on the response.
This makes the span data more compact (e.g., `msg_as_req{msg=block}`) and
restores the Debug impl for Message to show all of the data contained in the
message. The full message is added as a single event at trace level in the
span to preserve the previous full-inspectability.
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.
To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.
Co-authored-by: teor <teor@riseup.net>
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware. This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See
ddc64e8d4d
for more context on the Tower changes. To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
These messages might be unsolicited, or they might be a response to a
request we already canceled. So don't fail the whole connection, just
drop the message and move on.
We handle request cancellation in two places: before we transition into
the AwaitingResponse state, and while we are in AwaitingResponse. We
need both places, or else if we started processing a request, we
wouldn't process the cancellation until the timeout elapsed.
The first is a check that the oneshot is not already canceled.
For the second, we wait on a cancellation, either from a timeout or from
the tx channel closing.
This cleans up the response processing logic a little bit along the way,
but the overall division of responsibility should be better documented
in a future commit.
This lets us distinguish between cases where the message was unsupported
(e.g., BIP11 messages), and cases where the message was uninterpretable
in context (e.g., unsolicited messages).
This commit makes several related changes to the network code:
- adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and
`Transactions(Vec<Arc<Transaction>>)` response pair that allows
fetching transactions from a remote peer;
- adds a `PushTransaction(Arc<Transaction>)` request that pushes an
unsolicited transaction to a remote peer;
- adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request
that advertises transactions by hash to a remote peer;
- adds an `AdvertiseBlock(block::Hash)` request that advertises a block
by hash to a remote peer;
Then, it modifies the connection state machine so that outbound
requests to remote peers are handled properly:
- `TransactionsByHash` generates a `getdata` message and collects the
results, like the existing `BlocksByHash` request.
- `PushTransaction` generates a `tx` message, and returns `Nil` immediately.
- `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv`
message, and return `Nil` immediately.
Next, it modifies the connection state machine so that messages
from remote peers generate requests to the inbound service:
- `getdata` messages generate `BlocksByHash` or `TransactionsByHash`
requests, depending on the content of the message;
- `tx` messages generate `PushTransaction` requests;
- `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions`
requests.
Finally, it refactors the request routing logic for the peer set to
handle advertisement messages, providing three routing methods:
- `route_p2c`, which uses p2c as normal (default);
- `route_inv`, which uses the inventory registry and falls back to p2c
(used for `BlocksByHash` or `TransactionsByHash`);
- `route_all`, which broadcasts a request to all ready peers (used for
`AdvertiseBlock` and `AdvertiseTransactions`).
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
* add bytes read and written metrics
* Apply suggestions from code review
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* store address as string
* Apply suggestions from code review
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
* change addr to label
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
* remove newline
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
Closes#536.
This removes:
- the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature);
- the EWMA parameters (these were put in the config just to avoid making a choice);
- the peer connection timeout (we can change the default value if anyone ever has a problem with it);
- the peer set request buffer size (setting this too low can make the application deadlock);
The new peer interval is left in.
When the connection sees the client_rx channel close it knows it will never get
any more requests, and it should terminate. But instead of terminating, it
errored itself, and the method to error itself tries to pull all the
outstanding client requests from the channel in order to fail them before it
shuts down. This results in reading from a closed channel, causing a panic.
Instead we return cleanly rather than failing (since we know there are no
outstanding requests, as the channel is closed).
This fixes a bug introduced when we added heartbeat support. Recall that we
handle the Bitcoin connection state machine on a per-peer basis. Each
connection has a task created from the `Connection` struct, and a `Client:
tower::Service` "frontend" that passes requests to it via a channel. In the
`Connection` event loop, the connection checks whether the request channel has
been closed, indicating no further requests from the `Client`, in which case it
shuts itself down and cleans up resources. This occurs when all of the senders
have been dropped.
However, this behavior broke when we introduced heartbeat support, because we
spawned an additional task to send heartbeat messages along the request
channel. This meant that instead of having a single sender, dropped by the
`Client`, we have two senders, the `Client` and the "shadow client" task that
generates heartbeat messages. This means that when the `Client` is dropped, we
still have a live sender and the connection is not closed. To fix this, the
`Client` now uses a `oneshot` to shut down its corresponding heartbeat task.
This closes all senders.
- Add a total peers metric to prevent races between measurements of
ready/unready peers (which can cause the sum to be wrong).
- Add an outbound request counter.
Previously, we relied on the owner of the handshake future to drive it to
completion. This meant that there were cases where handshakes might never be
completed, just because nothing was actively polling them.
The previous outbound peer connection logic got requests to connect to new
peers and processed them one at a time, making single connection attempts
and retrying if the connection attempt failed. This was quite slow, because
many connections fail, and we have to wait for timeouts. Instead, this logic
connects to new peers concurrently (up to 50 at a time).
Bitcoin does this either with `getblocks` (returns up to 500 following block
hashes) or `getheaders` (returns up to 2000 following block headers, not
just hashes). However, Bitcoin headers are much smaller than Zcash
headers, which contain a giant Equihash solution block, and many Zcash
blocks don't have many transactions in them, so the block header is
often similarly sized to the block itself. Because we're
aiming to have a highly parallel network layer, it seems better to use
`getblocks` to implement `FindBlocks` (which is necessarily sequential)
and parallelize the processing of the block downloads.
PushPeers is more complicated to thread into the rest of our
architecture (we would need to establish a data path connecting our
service handling inbound requests to the network layer's auto-crawler),
and since we crawl the network automatically anyways, we don't actually
need to accept them in order to get updated address information.
The only possible problem with this approach is that zcashd refuses to
answer multiple address requests from the same connection, ostensibly
for fingerprinting prevention (although it's totally happy to give
exactly the same information, as long as you hang up and reconnect
first, lol). It's unclear how this will interact with our design -- on
the one hand, it could mean that we don't get new addr information when
we ask, but on the other hand, we may have enough churn in our
connection pool that this isn't a problem anyways.
Attempting to implement requests for block data revealed a problem with
the previous connection logic. Block data is requested by sending a
`getdata` message with hashes of the requested blocks; the peer responds
with a sequence of `block` messages with the blocks themselves.
However, this wasn't possible to handle with the previous connection
logic, which could only convert a single Bitcoin message into a
Response. Instead, we factor out the message handling logic into a
Handler, which can statefully accumulate arbitrary data into a Response
and signal completion. This is still pretty ugly but it does work.
As a side effect, the HeartbeatNonceMismatch error is removed; because
the Handler now tries to process messages until it comes to a Response,
it just ignores mismatched nonces (and will eventually time out).
The previous Mempool and Transaction requests were removed but could be
re-added in a different form later. Also, the `Get` prefixes are
removed from `Request` to tidy the name.
This means that all sub-modules of `peer` can import everything they need from
the `peer` module itself, without having to be aware of the internal structure
of their sibling modules.
With a 'Transactions' response that gets turned into an 'Inv(Vec<InventoryHash::Tx>)' message.
We don't yet handle a response from our peer for a 'mempool', which will have to be
a more generic 'Inv' type because we might receive transaction hashes we don't know about yet.
Pertains to #26
Moved SeedService out of the command closure Command currently spawns
a tokio task to DOS the seed service with `Request::GetPeers` every
second.
Pertains to #54
It's only responsible for doing the handshakes, so it should be named that way,
and then we can have a Connector responsible for actually opening the TCP
connection.