Commit Graph

349 Commits

Author SHA1 Message Date
Henry de Valence 00c4f4f0e6 network: record cause of handshake failure 2020-12-01 19:16:41 -08:00
Henry de Valence 5ccd1905fc network: avoid putting null bytes in trace output 2020-12-01 19:16:41 -08:00
Henry de Valence f93deb1cac network: fix missing {0} in PeerError::Serialization 2020-12-01 19:16:41 -08:00
Henry de Valence 18cf5e0249 network: use short Display for Message in spans
This makes the span data more compact (e.g., `msg_as_req{msg=block}`) and
restores the Debug impl for Message to show all of the data contained in the
message.  The full message is added as a single event at trace level in the
span to preserve the previous full-inspectability.
2020-12-01 19:16:41 -08:00
Jane Lusby a91d0f0bb6
Include short sha in log messages and error urls (#1410)
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.

To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.

Co-authored-by: teor <teor@riseup.net>
2020-12-01 12:13:20 -08:00
teor 4d5ea4897c Log peer set ready and unready peers
* warn: if there are no peers at all
* info: if there are no ready peers
* trace: the number of ready and unready peers for every request

Log at most one warn or info log per minute, to avoid flooding the
terminal with log lines. Suppress warn and info logs for the first
minute, while the peer set is starting up.
2020-12-01 11:00:21 -05:00
teor 92eb92d1dd
Disable the nightly clippy unnecessary_wraps lint (#1403)
It seems to be a bit broken - some of our functions return `Result` for
consistency with similar functions. But the lint picks them up anyway.
2020-12-01 12:20:57 +10:00
Alfredo Garcia 4544463059
Inbound `FindBlocks` and `FindHeaders` (#1347)
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions

Add a `max_len` argument to support `FindHeaders` requests.

Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.

* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-12-01 07:30:37 +10:00
Alfredo Garcia 7d42c63790 fix comment 2020-11-25 10:55:44 -08:00
teor 8d6ac8eece Placate clippy 2020-11-24 20:03:21 +10:00
Henry de Valence d90e709ce1 network: tidy peer set implementation
- rename functions more descriptively
- create a common `take_ready_service` function
- organize poll_ functions separately
2020-11-24 20:03:21 +10:00
Henry de Valence f36a4800b2 network: fix invariant violation in peer set
Closes #1183.

The peer set maintains a preselected ready service that it can use to
perform power-of-two-choices (p2c) routing of requests.  Ready services
are stored by key (socket address) in an `IndexMap`, and the preselected
service is represented by an `Option<usize>` indexing that map.  This
means that whenever the set of ready services changes (e.g., a service
is removed from the peer set, or a service is taken to be used to
process a request), the preselected index is invalidated.  The original
P2C-only implementation maintained this invariant but did not document
it.

The change to inventory-based routing introduced a bug by failing to
maintain this invariant and appropriately invalidate the preselected
index.  However, this was only noticeable approximately 1/N of the time
on the next request after an inventory-directed request, so the bug
occurred infrequently.  Luckily, the use of `.expect` caused the bug to
be an immediate panic, making it possible to identify by inspecting all
uses of the ready service map.
2020-11-24 20:03:21 +10:00
teor 6387dfe1d0 Fix individual crate compilation failures
Some Zebra crates don't compile individually due to missing features in
their dependencies. Add those features to each crate's dependency list.
2020-11-23 23:56:28 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
Henry de Valence 06dd39df54
network: bump network version for Canopy (#1333)
Per https://zips.z.cash/zip-0251, nodes compatible with Canopy
activation on mainnet MUST advertise protocol version 170013 or later.

Once Canopy activates on testnet or mainnet, Canopy nodes SHOULD reject
new connections from pre-Canopy nodes, so this also increases the
minimum version.
2020-11-20 09:50:05 +10:00
Henry de Valence a3ab589d89 consensus,state: document cancellation contracts for services
This change explicitly documents cancellation contracts for our Tower services,
and tries to correct a bug in the implementation of the CheckpointVerifier,
which duplicates information from the state service but did not ensure that it
would be kept in sync.
2020-11-17 14:56:27 -08:00
teor ca4e792f47 Put messages in request/response order
And fix a comment typo
2020-11-17 07:52:53 +10:00
Alfredo Garcia 128643d81e
Call `zebra_test::init` where needed. (#1227)
* Add missing `zebra_test::init()` to zebra-chain
* Add missing `zebra_test::init()` to zebra-consensus
* Add missing `zebra_test::init()` to zebra-network
* Add missing `zebra_test::init()` to zebra-state
* Add missing `zebra_test::init()` to zebra-test
* Add missing `zebra_test::init()` to zebrad
2020-11-10 10:29:25 +10:00
Henry de Valence 8e709bfa88 network: don't fail on unsolicited messages
These messages might be unsolicited, or they might be a response to a
request we already canceled.  So don't fail the whole connection, just
drop the message and move on.
2020-10-26 12:05:35 -07:00
Henry de Valence 13daefa729 network: handle request cancellation in Connection
We handle request cancellation in two places: before we transition into
the AwaitingResponse state, and while we are in AwaitingResponse.  We
need both places, or else if we started processing a request, we
wouldn't process the cancellation until the timeout elapsed.

The first is a check that the oneshot is not already canceled.

For the second, we wait on a cancellation, either from a timeout or from
the tx channel closing.
2020-10-26 12:05:35 -07:00
teor 1e97691fc8 Fix some "needless lifetime" clippy lints
These lints seem to be new in clippy nightly.
2020-10-12 08:54:23 +10:00
Dimitris Apostolou 36279621f0 Fix typos 2020-10-06 12:16:41 +10:00
Henry de Valence 6dd7318d3b deps: use Tower 0.4 from git instead of 0.3.1.
This addresses at least three pain points:

- we were affected by bugs that were already fixed in git, but not in
  the released crate;
- we can use service combinators to transform requests and responses;
- we can use the hedge middleware.

The version in git is still marked as 0.3.1 but these changes will be
part of tower 0.4: https://github.com/tower-rs/tower/issues/431
2020-09-21 14:16:56 -07:00
Deirdre Connolly 33afeb37cb Add a comment about the short looo 2020-09-21 09:26:39 -07:00
Henry de Valence 6f3288814c network: avoid GetPeers timeout to accelerate init
The GetPeers requests sent while crawling the network are randomly
load-balanced over available peers.  But at the very beginning, they may
be both routed to the same peer, causing network initialization to be
delayed while the second one times out (since zcashd only ever responds
to the first addr message).

Only sending one GetPeers request per candidate set update means we
crawl the network a little more slowly, but avoids hanging on start.
2020-09-21 09:26:39 -07:00
Henry de Valence b72c249b96 network: add a metric+warning when shedding load 2020-09-21 09:26:39 -07:00
Henry de Valence 4df5632752 network: handle Message::NotFound as a response
This cleans up the response processing logic a little bit along the way,
but the overall division of responsibility should be better documented
in a future commit.
2020-09-20 10:21:18 -07:00
Henry de Valence 64905563d1 network: remove glob import in message-handling
This clarifies which parts are the handler state and which parts are the
incoming message.
2020-09-20 10:21:18 -07:00
Henry de Valence 9c021025a7 network: fill in remaining request/response pairs 2020-09-20 10:21:18 -07:00
Henry de Valence b289cb9164 network: clean up GetHeaders, GetBlocks modeling 2020-09-20 10:21:18 -07:00
Henry de Valence 3c993f33b1 network: add PeerError::WrongMessage
This lets us distinguish between cases where the message was unsupported
(e.g., BIP11 messages), and cases where the message was uninterpretable
in context (e.g., unsolicited messages).
2020-09-20 10:21:18 -07:00
Henry de Valence 430176dd0d network: clean up message-as-request translation 2020-09-20 10:21:18 -07:00
Henry de Valence 170f588ffb network: document load-shedding behavior
This was part of the original design and is described in the Connection
internals, but we never documented it externally.
2020-09-18 18:34:25 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
Henry de Valence 95f2463188 Try workaround for generator autotrait bug
> Added a test that the handshake's version message matches specified fields, but the test does not compile, because rustc doesn't believe that the Box<dyn std::error::Error + Send + Sync + 'static> is 'static, and therefore isn't a Box<dyn std::error::Error + Send + Sync + 'static>. This manifests as being unable to spawn the connect_isolated task. From digging through Tokio issues I believe that this is an instance of rust-lang/rust#64552 .

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-17 12:02:20 -07:00
Henry de Valence 81e8195f68 network: add connect_isolated distinguisher test
This is currently broken due to a rustc bug.
2020-09-17 12:02:20 -07:00
Henry de Valence b7472de43f network: add a zebra_network::connect_isolated() method.
The peer set provides an automatically managed connection pool, abstracting
away all the details of handling individual peer connections.  However, it's
also useful to be able to create completely isolated and
minimally-distinguishable connections to individual peers, in order to be able
to send specific messages over Tor, or to implement some custom network crawler
logic.
2020-09-17 12:02:20 -07:00
teor 66265dc11a Adjust the EWMA decay for the latest sync timeout 2020-09-09 15:35:09 -07:00
teor 1f7af0a779 Update the inv message processing comment
Cleanup after PR #1028.
2020-09-09 15:29:38 -07:00
teor 2a68ef5acb Update the peerset buffer size and sync timeout
Also add a bunch of comments and documentation for network-constrained
nodes, and for testnet.
2020-09-08 12:44:33 -07:00
teor e6e859dce2 Tweak sync timeouts
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
2020-09-08 12:44:33 -07:00
Jane Lusby 1b17691dda improve logging 2020-09-08 12:37:34 -07:00
Jane Lusby 81a3ad3a0d filter inventory advertisements correctly 2020-09-08 12:37:34 -07:00
Henry de Valence 3f150eb16e
network: implement transaction request handling. (#1016)
This commit makes several related changes to the network code:

- adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and
  `Transactions(Vec<Arc<Transaction>>)` response pair that allows
  fetching transactions from a remote peer;

- adds a `PushTransaction(Arc<Transaction>)` request that pushes an
  unsolicited transaction to a remote peer;

- adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request
  that advertises transactions by hash to a remote peer;

- adds an `AdvertiseBlock(block::Hash)` request that advertises a block
  by hash to a remote peer;

Then, it modifies the connection state machine so that outbound
requests to remote peers are handled properly:

- `TransactionsByHash` generates a `getdata` message and collects the
  results, like the existing `BlocksByHash` request.

- `PushTransaction` generates a `tx` message, and returns `Nil` immediately.

- `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv`
  message, and return `Nil` immediately.

Next, it modifies the connection state machine so that messages
from remote peers generate requests to the inbound service:

- `getdata` messages generate `BlocksByHash` or `TransactionsByHash`
  requests, depending on the content of the message;

- `tx` messages generate `PushTransaction` requests;

- `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions`
  requests.

Finally, it refactors the request routing logic for the peer set to
handle advertisement messages, providing three routing methods:

- `route_p2c`, which uses p2c as normal (default);
- `route_inv`, which uses the inventory registry and falls back to p2c
  (used for `BlocksByHash` or `TransactionsByHash`);
- `route_all`, which broadcasts a request to all ready peers (used for
  `AdvertiseBlock` and `AdvertiseTransactions`).
2020-09-08 10:16:29 -07:00
Henry de Valence cad38415b2
network: fix bug in inventory advertisement handling (#1022)
* network: fix bug in inventory advertisement handling

The RFC https://zebra.zfnd.org/dev/rfcs/0003-inventory-tracking.html described
the use of a `broadcast` channel in place of an `mpsc` channel to get
ring-buffer behavior, keeping a bound on the size of the channel but dropping
old entries when the channel is full.

However, it didn't explicitly describe how this works (the `broadcast` channel
returns a `RecvError::Lagged(u64)` to inform receivers that they lost
messages), so the lag-handling wasn't implemented and I didn't notice in
review.

Instead, the ? operator bubbled the lag error all the way up from
`InventoryRegistry::poll_inventory` through `<PeerSet as Service>::poll_ready`
through various Tower wrappers to users of the peer set.  The error propagation
is bad enough, because it caused client errors that shouldn't have happened,
but there's a worse interaction.

The `Service` contract distinguishes between request errors (from
`Service::call`, scoped to the request) and service errors (from
`Service::poll_ready`, scoped to the service).  The `Service` contract
specifies that once a service returns an error from `poll_ready`, the service
can be assumed to be failed permanently.

I believe (but haven't tested or carefully worked through the details) that
this caused various tower middleware to report the entire peer set service as
permanently failed due to a transient inventory "error" (more of an indicator),
and I suspect that this is the cause of #1003, where all of the sync
component's requests end up failing because the peer set reported that it
failed permanently.  I am able to reproduce #1003 locally before this change
and unable to reproduce it locally after this change, though I have not tested
exhaustively.

* network: add metric for dropped inventory advertisements

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2020-09-07 21:24:31 -07:00
Henry de Valence 9682d452ee network: add AddressBook::potentially_connected_peers(). 2020-09-07 11:13:15 -07:00
dependabot[bot] 142226ad57 build(deps): bump indexmap from 1.5.2 to 1.6.0
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.5.2 to 1.6.0.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Commits](https://github.com/bluss/indexmap/compare/1.5.2...1.6.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-07 07:56:39 -04:00
Alfredo Garcia 454e75e7c0
Rename old references to BlockHeaderHash and BlockHeight (#1002)
* rename some references

* Apply suggestions from code review

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2020-09-04 15:40:48 -07:00
teor b5c653ed93
Use ok_or for constants, rather than a redudant closure
* Use ok_or for constants in zebra-network
* Use ok_or for constants in zebra-consensus
2020-09-02 14:26:26 +10:00
Jane Lusby 88557ddd0a address more comments 2020-09-01 21:01:38 -04:00