Commit Graph

328 Commits

Author SHA1 Message Date
Dimitris Apostolou 36279621f0 Fix typos 2020-10-06 12:16:41 +10:00
Henry de Valence 6dd7318d3b deps: use Tower 0.4 from git instead of 0.3.1.
This addresses at least three pain points:

- we were affected by bugs that were already fixed in git, but not in
  the released crate;
- we can use service combinators to transform requests and responses;
- we can use the hedge middleware.

The version in git is still marked as 0.3.1 but these changes will be
part of tower 0.4: https://github.com/tower-rs/tower/issues/431
2020-09-21 14:16:56 -07:00
Deirdre Connolly 33afeb37cb Add a comment about the short looo 2020-09-21 09:26:39 -07:00
Henry de Valence 6f3288814c network: avoid GetPeers timeout to accelerate init
The GetPeers requests sent while crawling the network are randomly
load-balanced over available peers.  But at the very beginning, they may
be both routed to the same peer, causing network initialization to be
delayed while the second one times out (since zcashd only ever responds
to the first addr message).

Only sending one GetPeers request per candidate set update means we
crawl the network a little more slowly, but avoids hanging on start.
2020-09-21 09:26:39 -07:00
Henry de Valence b72c249b96 network: add a metric+warning when shedding load 2020-09-21 09:26:39 -07:00
Henry de Valence 4df5632752 network: handle Message::NotFound as a response
This cleans up the response processing logic a little bit along the way,
but the overall division of responsibility should be better documented
in a future commit.
2020-09-20 10:21:18 -07:00
Henry de Valence 64905563d1 network: remove glob import in message-handling
This clarifies which parts are the handler state and which parts are the
incoming message.
2020-09-20 10:21:18 -07:00
Henry de Valence 9c021025a7 network: fill in remaining request/response pairs 2020-09-20 10:21:18 -07:00
Henry de Valence b289cb9164 network: clean up GetHeaders, GetBlocks modeling 2020-09-20 10:21:18 -07:00
Henry de Valence 3c993f33b1 network: add PeerError::WrongMessage
This lets us distinguish between cases where the message was unsupported
(e.g., BIP11 messages), and cases where the message was uninterpretable
in context (e.g., unsolicited messages).
2020-09-20 10:21:18 -07:00
Henry de Valence 430176dd0d network: clean up message-as-request translation 2020-09-20 10:21:18 -07:00
Henry de Valence 170f588ffb network: document load-shedding behavior
This was part of the original design and is described in the Connection
internals, but we never documented it externally.
2020-09-18 18:34:25 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
Henry de Valence 95f2463188 Try workaround for generator autotrait bug
> Added a test that the handshake's version message matches specified fields, but the test does not compile, because rustc doesn't believe that the Box<dyn std::error::Error + Send + Sync + 'static> is 'static, and therefore isn't a Box<dyn std::error::Error + Send + Sync + 'static>. This manifests as being unable to spawn the connect_isolated task. From digging through Tokio issues I believe that this is an instance of rust-lang/rust#64552 .

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-17 12:02:20 -07:00
Henry de Valence 81e8195f68 network: add connect_isolated distinguisher test
This is currently broken due to a rustc bug.
2020-09-17 12:02:20 -07:00
Henry de Valence b7472de43f network: add a zebra_network::connect_isolated() method.
The peer set provides an automatically managed connection pool, abstracting
away all the details of handling individual peer connections.  However, it's
also useful to be able to create completely isolated and
minimally-distinguishable connections to individual peers, in order to be able
to send specific messages over Tor, or to implement some custom network crawler
logic.
2020-09-17 12:02:20 -07:00
teor 66265dc11a Adjust the EWMA decay for the latest sync timeout 2020-09-09 15:35:09 -07:00
teor 1f7af0a779 Update the inv message processing comment
Cleanup after PR #1028.
2020-09-09 15:29:38 -07:00
teor 2a68ef5acb Update the peerset buffer size and sync timeout
Also add a bunch of comments and documentation for network-constrained
nodes, and for testnet.
2020-09-08 12:44:33 -07:00
teor e6e859dce2 Tweak sync timeouts
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
2020-09-08 12:44:33 -07:00
Jane Lusby 1b17691dda improve logging 2020-09-08 12:37:34 -07:00
Jane Lusby 81a3ad3a0d filter inventory advertisements correctly 2020-09-08 12:37:34 -07:00
Henry de Valence 3f150eb16e
network: implement transaction request handling. (#1016)
This commit makes several related changes to the network code:

- adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and
  `Transactions(Vec<Arc<Transaction>>)` response pair that allows
  fetching transactions from a remote peer;

- adds a `PushTransaction(Arc<Transaction>)` request that pushes an
  unsolicited transaction to a remote peer;

- adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request
  that advertises transactions by hash to a remote peer;

- adds an `AdvertiseBlock(block::Hash)` request that advertises a block
  by hash to a remote peer;

Then, it modifies the connection state machine so that outbound
requests to remote peers are handled properly:

- `TransactionsByHash` generates a `getdata` message and collects the
  results, like the existing `BlocksByHash` request.

- `PushTransaction` generates a `tx` message, and returns `Nil` immediately.

- `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv`
  message, and return `Nil` immediately.

Next, it modifies the connection state machine so that messages
from remote peers generate requests to the inbound service:

- `getdata` messages generate `BlocksByHash` or `TransactionsByHash`
  requests, depending on the content of the message;

- `tx` messages generate `PushTransaction` requests;

- `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions`
  requests.

Finally, it refactors the request routing logic for the peer set to
handle advertisement messages, providing three routing methods:

- `route_p2c`, which uses p2c as normal (default);
- `route_inv`, which uses the inventory registry and falls back to p2c
  (used for `BlocksByHash` or `TransactionsByHash`);
- `route_all`, which broadcasts a request to all ready peers (used for
  `AdvertiseBlock` and `AdvertiseTransactions`).
2020-09-08 10:16:29 -07:00
Henry de Valence cad38415b2
network: fix bug in inventory advertisement handling (#1022)
* network: fix bug in inventory advertisement handling

The RFC https://zebra.zfnd.org/dev/rfcs/0003-inventory-tracking.html described
the use of a `broadcast` channel in place of an `mpsc` channel to get
ring-buffer behavior, keeping a bound on the size of the channel but dropping
old entries when the channel is full.

However, it didn't explicitly describe how this works (the `broadcast` channel
returns a `RecvError::Lagged(u64)` to inform receivers that they lost
messages), so the lag-handling wasn't implemented and I didn't notice in
review.

Instead, the ? operator bubbled the lag error all the way up from
`InventoryRegistry::poll_inventory` through `<PeerSet as Service>::poll_ready`
through various Tower wrappers to users of the peer set.  The error propagation
is bad enough, because it caused client errors that shouldn't have happened,
but there's a worse interaction.

The `Service` contract distinguishes between request errors (from
`Service::call`, scoped to the request) and service errors (from
`Service::poll_ready`, scoped to the service).  The `Service` contract
specifies that once a service returns an error from `poll_ready`, the service
can be assumed to be failed permanently.

I believe (but haven't tested or carefully worked through the details) that
this caused various tower middleware to report the entire peer set service as
permanently failed due to a transient inventory "error" (more of an indicator),
and I suspect that this is the cause of #1003, where all of the sync
component's requests end up failing because the peer set reported that it
failed permanently.  I am able to reproduce #1003 locally before this change
and unable to reproduce it locally after this change, though I have not tested
exhaustively.

* network: add metric for dropped inventory advertisements

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2020-09-07 21:24:31 -07:00
Henry de Valence 9682d452ee network: add AddressBook::potentially_connected_peers(). 2020-09-07 11:13:15 -07:00
dependabot[bot] 142226ad57 build(deps): bump indexmap from 1.5.2 to 1.6.0
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.5.2 to 1.6.0.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Commits](https://github.com/bluss/indexmap/compare/1.5.2...1.6.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-07 07:56:39 -04:00
Alfredo Garcia 454e75e7c0
Rename old references to BlockHeaderHash and BlockHeight (#1002)
* rename some references

* Apply suggestions from code review

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2020-09-04 15:40:48 -07:00
teor b5c653ed93
Use ok_or for constants, rather than a redudant closure
* Use ok_or for constants in zebra-network
* Use ok_or for constants in zebra-consensus
2020-09-02 14:26:26 +10:00
Jane Lusby 88557ddd0a address more comments 2020-09-01 21:01:38 -04:00
Jane Lusby d933abeebf fix typo 2020-09-01 21:01:38 -04:00
Jane Lusby 96c8809348
Implement Inventory Tracking RFC (#963)
* Add .cargo to the gitignore file

* Implement Inventory Tracking RFC

* checkpoint

* wire together the inventory registry

* add comment documenting condition

* make inventory registry optional
2020-09-01 14:28:54 -07:00
Henry de Valence f91b91b6d8 network: clarify comment on Default for handshake::Builder
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-01 13:56:00 -07:00
Henry de Valence fddba7a336 network: remove handshake::Builder::with_addr
Use the listen_addr field already specified in the config.

Also, derive Clone for Handshake<S>.

Co-authored-by: Jane Lusby <jane@zfnd.org>
2020-09-01 13:56:00 -07:00
Henry de Valence a5b6f39850 network: don't leak our exact time skew in handshakes. 2020-09-01 13:56:00 -07:00
Henry de Valence 1b5a824584 network: fix bug in BIP37 relay flag handling.
The relay flag in the version message is used in conjunction with BIP37 to
receive bloom-filtered transactions.  When it is set to false, transactions are
not relayed until a bloom filter is set.  Since we don't implement BIP37 (it's
not useful for shielded transactions), this means we'll never receive
transactions.
2020-09-01 13:56:00 -07:00
Henry de Valence 60a0b8c382 network: change Handshake::new to a Builder.
This allows more detailed control over the handshake parameters.
2020-09-01 13:56:00 -07:00
teor d7e32b68e5 fix: Split a clippy allow, so its comment is clearer 2020-09-01 11:40:18 -04:00
teor 5afa24588a fix: Remove unused dependencies 2020-08-20 14:49:17 -04:00
Henry de Valence ebdceb5197 chain: rename TransactionHash to transaction::Hash 2020-08-17 11:46:34 -07:00
Henry de Valence 2712c4b72a chain: rename BlockHeader to block::Header 2020-08-17 11:46:34 -07:00
Henry de Valence 103b663c40 chain: rename BlockHeight to block::Height 2020-08-17 11:46:34 -07:00
Henry de Valence 61dea90e2f chain: rename BlockHeaderHash to block::Hash
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
2020-08-17 11:46:34 -07:00
Henry de Valence 948b067808 chain: move Network, NetworkUpgrade to parameters
Also, avoid using star-imports of the enum variants, which pollutes the
namespace.
2020-08-17 11:46:34 -07:00
Henry de Valence dad6340cd3 chain: move BlockHeight into block 2020-08-17 11:46:34 -07:00
Henry de Valence b36fe8f937 chain: move sha256d to serialization module.
This extracts the SHA256d code from being split across two modules and puts it
in one module, under serialization.

The code is unchanged except for three deleted tests:

* `sha256d_flush` in `sha256d_writer` (not a meaningful test);
* `transactionhash_debug` (constructs an invalid transaction hash, and the
  behavior is tested in the next test);
* `decode_state_debug` (we do not need to test the Debug output of
  DecodeState);
2020-08-17 11:46:34 -07:00
Alfredo Garcia b41e33e066
Bytes read and bytes written metrics (#901)
* add bytes read and written metrics

* Apply suggestions from code review

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* store address as string

* Apply suggestions from code review

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* change addr to label

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* remove newline

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
2020-08-14 15:50:26 -07:00
Henry de Valence a79ce97957
Fix sync algorithm. (#887)
* checkpoint: reject older of duplicate verification requests.

If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled.  Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.

* sync: add a timeout layer to block requests.

Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.

* sync: restart syncing on error

Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.

* sync: additional debug info

* sync: handle lookahead limit correctly.

Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped.  This meant that completed tasks could be left until the limit was
exceeded again.  Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.

* network: add debug instrumentation to retry policy

* sync: instrument the spawned task

* sync: streamline ObtainTips/ExtendTips logic & tracing

This change does three things:

1.  It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method.  This means that when debugging we only have one
deduplication algorithm to focus on.

2.  It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.

3.  It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip").  The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.

* sync: hack to work around zcashd behavior

* sync: localize debug statement in extend_tips

* sync: change algorithm to define tips as pairs of hashes.

This is different enough from the existing description that its comments no
longer apply, so I removed them.  A further chunk of work is to change the sync
RFC to document this algorithm.

* sync: reduce block timeout

* state: add resource limits for sled

Closes #888

* sync: add a restart timeout constant

* sync: de-pub constants
2020-08-12 16:48:01 -07:00
teor 109666cc48
fix: Tweak the the network listener log (#886) 2020-08-12 14:22:54 -07:00
Henry de Valence 299afe13df
zebra-network tweaks. (#877)
* network: move gossiped peer selection logic into address book.

* network: return BoxService from init.

* zebrad: add note on why we truncate thegossiped peer list

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* Remove unused .rustfmt.toml

Many of these options are never actually loaded by our CI because of a channel
mismatch, where they're not applied on stable but only on nightly (see the logs
from a rustfmt job).  This means that we can get different settings when
running `cargo fmt` on the nightly and stable channels, which was causing a CI
failure on this PR.  Reverting back to the default rustfmt settings avoids this
problem and keeps us in line with upstream rustfmt.  There's no loss to us
since we were using the defaults anyways.

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-08-11 13:07:44 -07:00
Alfredo Garcia 9c387521bd
Print endpoint addresses at startup (#867)
* print tracing and metrics endpoints in startup

* print network address in startup
2020-08-10 12:47:26 -07:00