Commit Graph

412 Commits

Author SHA1 Message Date
teor d4f2f27218
Add global span to spawned network tasks (#1761)
Closes #1575
2021-02-20 08:36:50 +10:00
ebfull b7fddbde94
Compute the expected body length to reduce heap allocations (#1773)
* Compute the expected body length to reduce heap allocations
2021-02-19 22:18:57 +00:00
Jane Lusby 736092abb8 Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
2021-02-19 14:11:35 -08:00
Jane Lusby 6f205a1812 make sure peer/error.s comments are up to date 2021-02-19 14:11:35 -08:00
Jane Lusby 2266886a53 ensure peer/client.rs comments are up to date 2021-02-19 14:11:35 -08:00
Jane Lusby c3724031df remove unnecessary Option around request timeout 2021-02-19 14:11:35 -08:00
Jane Lusby 651d352ce1 update comments throughout connection.rs 2021-02-19 14:11:35 -08:00
Jane Lusby 2adee7b31a deduplicate match arms in handle_client_request 2021-02-19 14:11:35 -08:00
Jane Lusby cfc4717b98 rename transitions from Exit to Close 2021-02-19 14:11:35 -08:00
teor 5e4bf804aa Remove remaining references to fail_with 2021-02-19 14:11:35 -08:00
teor e06705ed81 Only reject pending client requests when the peer has errored
- Add an `ExitClient` transition, used when the internal client channel
  is closed or dropped, and there are no more pending requests
- Ignore pending requests after an `ExitClient` transition
- Reject pending requests when the peer has caused an error
  (the `Exit` and `ExitRequest` transitions)
- Remove `PeerError::ConnectionDropped`, because it is now handled by
  `ExitClient`. (Which is an internal error, not a peer error.)
2021-02-19 14:11:35 -08:00
teor 9d9734ea81 rustfmt 2021-02-19 14:11:35 -08:00
Jane Lusby 5ec8d09e0d accidental drop on mustusesender 2021-02-19 14:11:35 -08:00
Jane Lusby 6906f87ead introduce Transition enum 2021-02-19 14:11:35 -08:00
Jane Lusby e6cb20e13f leverage return value for propagating errors 2021-02-19 14:11:35 -08:00
teor e61b5e50a2
Diagnostics for CI port conflict failures (#1766)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.

Also report the configured and actual ports where possible, for better
diagnostics.
2021-02-18 12:15:09 -03:00
teor 5424e1d8ba
Fix candidate set address state handling (#1709)
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
  a `PeerAddrState` variant
- Delete `AddressBook.by_state`

Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
  methods
- Simplify the `AddressBook` iterator implementation, replacing it with
  methods that are more obviously correct
- Consistently collect peer set metrics

Documentation:
- Expand and update the peer set documentation

We can optimise later, but for now we want simple code that is more
obviously correct.
2021-02-18 11:18:32 +10:00
teor 579bd4a368
Retry DNS resolution on failure (#1762)
Otherwise, a transient DNS failure makes the node hang.
2021-02-18 07:09:02 +10:00
teor 86169f6412
Update PeerSet metrics after every change (#1727) 2021-02-18 07:06:59 +10:00
teor 8d1c498234 Log initial peer connection failures
And standardise another log message
2021-02-17 09:21:53 -05:00
teor e85441c914 Add a correctness comment to justify the revert 2021-02-16 05:52:54 +10:00
teor a02a00a3f5 Revert "Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)"
This reverts commit 241c7ad849.
2021-02-16 05:52:54 +10:00
teor e7176b86da Clarify the Response::Nil documentation 2021-02-11 09:45:42 -05:00
Deirdre Connolly 0c5daa8410 Bump versions for zebrad 1.0.0-alpha.2
Including tower-batch bump to 0.2.0, tower-fallback to 0.2.0, zebra-script to 1.0.0-alpha.3
2021-02-09 16:14:29 -05:00
Alfredo Garcia 241c7ad849
Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)
* use ServiceExt::oneshot and FuturesUnordered

Co-authored-by: teor <teor@riseup.net>
2021-02-09 08:16:02 +10:00
teor 1e156a5d60 Document that connect_isolated only works on mainnet
Document that connect_isolated only works on mainnet.

See #1687.
2021-02-04 17:32:00 -05:00
Alfredo Garcia d7c40af2a8
Fix shutdown panics (#1637)
* add a shutdown flag in zebra_chain::shutdown
* fix network panic on shutdown
* fix checkpoint panic on shutdown
2021-02-03 19:03:28 +10:00
Alfredo Garcia 221512c733
Async DNS seeder lookups (#1662)
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
2021-02-03 12:20:26 +10:00
teor 983e94f9e4 Add a TODO for inbound error handling cleanup 2021-02-03 08:32:10 +10:00
Alfredo Garcia 4b34482264
Add hints to port conflict and lock file panics (#1535)
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests

* Add a full set of stderr acceptance test functions

This change makes the stdout and stderr acceptance test interfaces
identical.

* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path

* Use Display for state path logs

Avoids weird escaping on Windows when using Debug

* Add Windows conflict error messages

* Turn PORT_IN_USE_ERROR into a regex

And add another alternative Windows-specific port error

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
2021-01-29 22:36:33 +10:00
Deirdre Connolly 1b09538277
Bump versions for zebrad 1.0.0-alpha.1 (#1646)
* Bump versions where appropriate

Tested with cargo install --locked --path etc

* Remove fixed panics from 'Known Issues'

* Change to alpha release series in the README

Co-authored-by: teor <teor@riseup.net>
2021-01-27 20:31:39 -05:00
teor b551d81f8d Explain why we stay connected on Inbound errors
We might be syncing using this peer, so it's ok to just ignore
any internal errors in their Inbound requests, and drop the
request.
2021-01-27 12:08:49 -08:00
teor 258789ed9b Use the rustc unknown lints attribute
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.

Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.

But we also need to keep the old name, so we can continue to build
without warnings on stable.

And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.

We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
2021-01-19 11:02:20 -05:00
teor 05fff8e6f7 Revert "Stop panicking when fail_with is called twice on a connection"
But keep the extra error information.
2021-01-18 00:23:36 -05:00
teor 4fe81da953 Improve logging for connection state errors 2021-01-18 00:23:36 -05:00
teor a6c1cd3c35 Stop panicking when fail_with is called twice on a connection
We can't rule out the connection state changing between the state checks
and any eventual failures, particularly in the presence of async code.

So we turn this panic into a warning.
2021-01-18 00:23:36 -05:00
teor 44c8fafc29 Stop processing the request after failing an overloaded connection
zebra-network's Connection expects that `fail_with` is only called once
per connection, but the overload handling code continues to process the
current request after an overload error, potentially leading to further
failures.

Closes #1599
2021-01-18 00:23:36 -05:00
teor 0f0fb93b5c Update some comments in zebra-network
Add ticket numbers, and update based on design decisions and new code.
2021-01-15 09:02:10 -05:00
teor 730910cd99 Upgrade to tokio 0.3.6 from crates.io
And remove the tokio git dependency patch
2021-01-12 15:37:27 -05:00
Jane Lusby 15698245e1
Deduplicate metrics dependencies (#1561)
## Motivation

This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.

## Solution

This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.

## Related Issues

closes https://github.com/ZcashFoundation/zebra/issues/1349

## Follow Up Work

I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.

- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
2021-01-12 12:28:56 +10:00
dependabot[bot] 38ac869f57 build(deps): bump byteorder from 1.3.4 to 1.4.2
Bumps [byteorder](https://github.com/BurntSushi/byteorder) from 1.3.4 to 1.4.2.
- [Release notes](https://github.com/BurntSushi/byteorder/releases)
- [Changelog](https://github.com/BurntSushi/byteorder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/BurntSushi/byteorder/compare/1.3.4...1.4.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-11 18:45:49 -05:00
teor b7d0a40ee1 Revert unused instrument macros
Reverts most of "Instrument some functions to try to locate the panic"
2021-01-06 13:07:23 -08:00
teor 6d3aa0002c Ensure received client request oneshots are used via the type system
The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.

We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.

Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
  but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`

* Add a new `ClientRequestReceiver` type that wraps a
  `mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
  converting the successful result of `inner.poll_next_unpin` into an
  `InProgressClientRequest`

* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
  with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
2021-01-06 13:07:23 -08:00
teor df1b0c8d58 Defer a timeout fix until later 2021-01-06 13:07:23 -08:00
teor d5cfd5ad5f Clarify the ClientRequest invariant
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2021-01-06 13:07:23 -08:00
teor f8ff2e9c0b Add more sends before dropping ClientRequests
This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
  would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
  on the ClientRequest sender, so the invariant is broken
2021-01-06 13:07:23 -08:00
teor 3e711ccc8a Instrument some functions to try to locate the panic 2021-01-06 13:07:23 -08:00
teor fa29fca917 Panic when must-use senders are dropped before use
Add a MustUseOneshotSender, which panics if its inner sender is unused.
Callers must call `send()` on the MustUseOneshotSender, or ensure that
the sender is canceled.

Replaces an unreliable panic in `Client::call()` with a reliable panic
when a must-use sender is dropped.
2021-01-06 13:07:23 -08:00
teor b03809ebe3 Add the invalid state to an unreachable panic message 2021-01-06 13:07:23 -08:00
teor 86136c7b5c Stop ignoring errors when the new state is AwaitingRequest
The previous code would send a Nil message on the Sender, even if the
result was actually an error.
2021-01-06 13:07:23 -08:00