Commit Graph

25 Commits

Author SHA1 Message Date
teor 2685fc746e
Remove CandidateSet state and add last seen time limit to candidate_set::validate_addrs (#2177) 2021-05-21 02:21:13 +00:00
teor 752358d236
Fix some candidate set and meta addr doc links (#2174)
Suggested by jvff.
2021-05-21 11:40:14 +10:00
teor c7ea1395e7 Security: Fix CandidateSet timeout and fanout
* Refactor: Split CandidateSet::update into separate functions
* Security: Apply a timeout to the entire CandidateSet::update
* Security: Stop using very large fanout limits during initialization

Previously, Zebra used the number of resolved peer addresses.
So it was possible for all peers to fail, and for Zebra to hang on the
first update.

And Zebra could send a fanout for each initial peer, regardless
of whether their connection was successful.

Also:
- wait for at least one successful peer before trying an update
- warn if there are no successful initial peers
2021-05-21 06:51:34 +10:00
teor 458c26f1e3 Limit initial candidate set fanout to the number of initial peers
If there is a small number of initial peers, and they are slow, the
initial candidate set update can appear to hang. To avoid this issue,
limit the initial candidate set fanout to the number of initial peers.

Once the initial peers have sent us more peer addresses, there is no need
to limit the fanouts for future updates.

Reported by Niklas Long of Equilibrium.
2021-05-18 07:54:03 +10:00
teor 0203d1475a Refactor and document correctness for std::sync::Mutex<AddressBook> 2021-04-21 17:14:47 -04:00
teor 2ed8bb00cf Clarify CandidateSet state diagram
We get inbound connections on the listener port,
but the important part is the inbound connection
itself.
2021-04-21 01:37:43 -04:00
teor 375c8d8700
Fix a deadlock between the crawler and dialer, and other hangs (#1950)
* Stop ignoring inbound message errors and handshake timeouts

To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
  (not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout

Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.

* Avoid hangs by adding a timeout to the candidate set update

Also increase the fanout from 1 to 2, to increase address diversity.

But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.

Also log Peers response errors in the CandidateSet.

* Use the select macro in the crawler to reduce hangs

The `select` function is biased towards its first argument, risking
starvation.

As a side-benefit, this change also makes the code a lot easier to read
and maintain.

* Split CrawlerAction::Demand into separate actions

This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.

That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.

* Spawn a separate task for each handshake

This change avoids deadlocks by letting each handshake make progress
independently.

* Move the dial task into a separate function

This refactor improves readability.

* Fix buggy future::select function usage

And document the correctness of the new code.
2021-04-07 10:25:10 -03:00
teor 1a159dfcb6 Add more methods for creating MetaAddrs
This refactor lets us remove `MetaAddr::update_last_seen()`.
2021-03-26 07:23:49 +10:00
teor 6fe81d8992 Make MetaAddr.last_seen into a private field 2021-03-26 07:23:49 +10:00
teor e50692bd51 CandidateSet: Add Listener Port Connections
Inbound connections on the Zcash protocol listener port
perform a handshake. If the handshake is successful, it
adds the peer to the AddressBook.
2021-03-09 23:05:18 -05:00
Jane Lusby 03aa6f671f
Implement outbound connection rate limiting - includes config rename with alias (#1855)
* Implement outbound connection rate limiting
* fix breaking change on config

Co-authored-by: teor <teor@riseup.net>
2021-03-10 01:36:05 +00:00
teor 5424e1d8ba
Fix candidate set address state handling (#1709)
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
  a `PeerAddrState` variant
- Delete `AddressBook.by_state`

Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
  methods
- Simplify the `AddressBook` iterator implementation, replacing it with
  methods that are more obviously correct
- Consistently collect peer set metrics

Documentation:
- Expand and update the peer set documentation

We can optimise later, but for now we want simple code that is more
obviously correct.
2021-02-18 11:18:32 +10:00
Jane Lusby 15698245e1
Deduplicate metrics dependencies (#1561)
## Motivation

This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.

## Solution

This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.

## Related Issues

closes https://github.com/ZcashFoundation/zebra/issues/1349

## Follow Up Work

I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.

- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
2021-01-12 12:28:56 +10:00
teor 8e2f08221f
Add peer set tracing and unreachable panics (#1468)
Add some extra tracing and panics to double-check our
assumptions about the peer set state machine.
2020-12-14 11:00:39 +10:00
Deirdre Connolly 33afeb37cb Add a comment about the short looo 2020-09-21 09:26:39 -07:00
Henry de Valence 6f3288814c network: avoid GetPeers timeout to accelerate init
The GetPeers requests sent while crawling the network are randomly
load-balanced over available peers.  But at the very beginning, they may
be both routed to the same peer, causing network initialization to be
delayed while the second one times out (since zcashd only ever responds
to the first addr message).

Only sending one GetPeers request per candidate set update means we
crawl the network a little more slowly, but avoids hanging on start.
2020-09-21 09:26:39 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
Jane Lusby b6b35364f3 cleanup warnings throughout codebase 2020-05-27 15:42:29 -04:00
Henry de Valence 00edcae0c2 Add metrics for the crawler and candidate set. 2020-02-14 20:14:05 -05:00
Henry de Valence 2c0f48b587 Refactor connection logic and try a block request.
Attempting to implement requests for block data revealed a problem with
the previous connection logic.  Block data is requested by sending a
`getdata` message with hashes of the requested blocks; the peer responds
with a sequence of `block` messages with the blocks themselves.

However, this wasn't possible to handle with the previous connection
logic, which could only convert a single Bitcoin message into a
Response.  Instead, we factor out the message handling logic into a
Handler, which can statefully accumulate arbitrary data into a Response
and signal completion.  This is still pretty ugly but it does work.

As a side effect, the HeartbeatNonceMismatch error is removed; because
the Handler now tries to process messages until it comes to a Response,
it just ignores mismatched nonces (and will eventually time out).

The previous Mempool and Transaction requests were removed but could be
re-added in a different form later.  Also, the `Get` prefixes are
removed from `Request` to tidy the name.
2020-02-10 09:03:56 -08:00
Henry de Valence f04f4f0b98 Apply clippy fixes 2020-02-05 12:42:32 -08:00
Henry de Valence da78603d3a Rename `PeerClient` to `peer::Client`. 2019-11-27 23:53:36 -05:00
Henry de Valence c3ec235a5b Suppress unused import warnings. 2019-10-22 19:06:08 -07:00
Henry de Valence e1a35490af Move the CandidateSet to its own file.
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2019-10-22 19:06:08 -07:00
Henry de Valence b1832ce593 Initial work to add a crawl-and-dial task.
This responds to peerset demand by connecting to additional peers.

Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2019-10-22 19:06:08 -07:00