Commit Graph

549 Commits

Author SHA1 Message Date
teor e9cdc224a2 Rewrite MetaAddr::sanitize so it's harder to misuse
`sanitize` could be misused in two ways:
* accidentally modifying the addresses in the address book itself
* forgetting to sanitize new fields added to `MetaAddr`

This change prevents accidental modification by taking `&self`, and
explicitly creates a new sanitized `MetaAddr` with all fields listed.
2021-03-26 07:23:49 +10:00
Deirdre Connolly c5bad9fac2
Rename NU5 to Nu5 to appease newly stable clippy::upper-case-acronyms (#1945) 2021-03-26 07:22:50 +10:00
Deirdre Connolly 7efc700aca
Merge pull request #1713 from ZcashFoundation/use-groth16-batch-math
Use batch optimizations, load params in groth16::Verifier, verify Spend & Output descriptions in transaction verifier
2021-03-24 12:28:25 -04:00
Deirdre Connolly ca1d2de87d
Bump versions for v1.0.0-alpha.5 (#1932)
Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue.

Some notable changes include:

## Added
- Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906)
- Document test coverage workflow (#1919)
- Add a final job to CI, so we can easily require all the CI jobs to pass (#1927)

## Changed
- Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926)
  - This is a breaking change for users that depend on the exact height of the mandatory checkpoint.

## Fixed
- tower-batch: wake waiting workers on close to avoid hangs (#1908)
- Assert that pre-Canopy blocks use checkpointing (#1909)
- Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923)

## Security
- Stop relying on unchecked length fields when preallocating vectors (#1925)
2021-03-22 22:05:01 -04:00
Alfredo Garcia c5b1d0deee move consts to start of the function 2021-03-22 11:54:31 -04:00
teor b623acc945 Add memory DoS prevention comments 2021-03-22 11:54:31 -04:00
teor 8e18c99cdc Avoid risky use of Read::take with untrusted lengths
Zebra already uses `Read::take` to enforce message, body, and block
maximum sizes.

So using `Read::take` on untrusted sizes can result in short reads,
without a corresponding `UnexpectedEof` error. (The old code was
correct, but copying it elsewhere would have been risky.)
2021-03-22 11:54:31 -04:00
teor 609d70ae53 Stop untrusted preallocation during string deserialization
This is an easy memory denial of service attack.
2021-03-22 11:54:31 -04:00
teor 4f923b90ea Log address book metrics when peers aren't responding 2021-03-17 10:47:04 +10:00
teor 5a30268d7a Log address metrics when the peer set has no ready peers 2021-03-17 10:47:04 +10:00
teor 6a342e93ca Refactor AddressBook metrics into their own struct
And provide an accessor function for address book metrics.
2021-03-17 10:47:04 +10:00
Alfredo Garcia d49eaab68e
Bump versions for zebrad 1.0.0-alpha.4 (#1913)
* Bump versions for zebrad 1.0.0-alpha.4

* add Cargo.lock
2021-03-16 21:12:37 -03:00
Jack Grigg 7a8cae9321 Tag message metrics by type 2021-03-17 09:38:07 +10:00
Jack Grigg e51f33a4b9 Use interoperable names for common metrics
These names match the equivalent metrics in zcashd, enabling common
metrics to be collected across both node types.
2021-03-17 09:38:07 +10:00
teor 8fabbce037
Document and log trailing message bytes (#1888)
* Rename a variable for consistency
* Log extra trailing message bytes at debug level
2021-03-15 08:25:27 +10:00
teor 976ec912db
Document that the listed address is also advertised to peers (#1891)
Documents a potential privacy leak, and a missing feature.
2021-03-15 08:25:07 +10:00
teor e50692bd51 CandidateSet: Add Listener Port Connections
Inbound connections on the Zcash protocol listener port
perform a handshake. If the handshake is successful, it
adds the peer to the AddressBook.
2021-03-09 23:05:18 -05:00
Jane Lusby 03aa6f671f
Implement outbound connection rate limiting - includes config rename with alias (#1855)
* Implement outbound connection rate limiting
* fix breaking change on config

Co-authored-by: teor <teor@riseup.net>
2021-03-10 01:36:05 +00:00
Jane Lusby e541746a50
Add initial support for NU5 to zebra (#1823)
* Add NU5 variant to NetworkUpgrade
* Add consensus branch ID for NU5
* Add network protocol versions for NU5
* Add NU5 to the protocol::version_consistent test
* Make unimplemented panic messages more specific
* Block target spacing doesn't change in NU5
* add comments for future updates for NU5

Co-authored-by: teor <teor@riseup.net>
2021-03-03 06:22:11 +10:00
teor 895bb43ead Clippy: Fix inconsistent struct member orders lint 2021-03-01 23:31:18 -05:00
teor 2587a4e272
Fix a peer DNS resolution edge case (#1796)
* Retry each peer DNS a few times individually

We retry each peer individually, as well as retrying if there are no
peers in the combined list.

DNS failures are correlated, so all peers can fail DNS, leaving Zebra
with a small list of custom-configured IP address peers.

Individual retries avoid this issue.

* Rename parse_peers to resolve_peers

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
2021-02-26 09:06:27 +10:00
teor 9c3f236075 Stop sending blocks and transactions on error 2021-02-25 08:44:57 -08:00
teor 78f162733d Revert "leverage return value for propagating errors"
This reverts commit e6cb20e13f.
2021-02-24 13:07:31 -08:00
teor 72e2e83828 Revert "introduce Transition enum"
This reverts commit 6906f87ead.
2021-02-24 13:07:31 -08:00
teor a5e89f4f2b Revert "accidental drop on mustusesender"
This reverts commit 5ec8d09e0d.
2021-02-24 13:07:31 -08:00
teor d60226a3cf Revert "rustfmt"
This reverts commit 9d9734ea81.
2021-02-24 13:07:31 -08:00
teor 359015b2be Revert "Only reject pending client requests when the peer has errored"
This reverts commit e06705ed81.
2021-02-24 13:07:31 -08:00
teor 663ed6c842 Revert "Remove remaining references to fail_with"
This reverts commit 5e4bf804aa.
2021-02-24 13:07:31 -08:00
teor 3c225550ee Revert "rename transitions from Exit to Close"
This reverts commit cfc4717b98.
2021-02-24 13:07:31 -08:00
teor 86dc66dfa9 Revert "deduplicate match arms in handle_client_request"
This reverts commit 2adee7b31a.
2021-02-24 13:07:31 -08:00
teor 292a4391e2 Revert "update comments throughout connection.rs"
This reverts commit 651d352ce1.
2021-02-24 13:07:31 -08:00
teor fc44a97925 Revert "remove unnecessary Option around request timeout"
This reverts commit c3724031df.
2021-02-24 13:07:31 -08:00
teor e06120cd36 Revert "ensure peer/client.rs comments are up to date"
This reverts commit 2266886a53.
2021-02-24 13:07:31 -08:00
teor 1a70d807b6 Revert "make sure peer/error.s comments are up to date"
This reverts commit 6f205a1812.
2021-02-24 13:07:31 -08:00
teor 3b2077fcfd Revert "Apply suggestions from code review"
This reverts commit 736092abb8.
2021-02-24 13:07:31 -08:00
teor 7558f74c78 Bump versions for zebrad 1.0.0-alpha.3 2021-02-23 10:39:13 -05:00
dependabot[bot] b578d1ff2e build(deps): bump proptest-derive from 0.2.0 to 0.3.0
Bumps [proptest-derive](https://github.com/AltSysrq/proptest) from 0.2.0 to 0.3.0.
- [Release notes](https://github.com/AltSysrq/proptest/releases)
- [Changelog](https://github.com/AltSysrq/proptest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/AltSysrq/proptest/compare/proptest-derive-0.2.0...proptest-derive-0.3.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-22 01:33:54 -05:00
teor d4f2f27218
Add global span to spawned network tasks (#1761)
Closes #1575
2021-02-20 08:36:50 +10:00
ebfull b7fddbde94
Compute the expected body length to reduce heap allocations (#1773)
* Compute the expected body length to reduce heap allocations
2021-02-19 22:18:57 +00:00
Jane Lusby 736092abb8 Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
2021-02-19 14:11:35 -08:00
Jane Lusby 6f205a1812 make sure peer/error.s comments are up to date 2021-02-19 14:11:35 -08:00
Jane Lusby 2266886a53 ensure peer/client.rs comments are up to date 2021-02-19 14:11:35 -08:00
Jane Lusby c3724031df remove unnecessary Option around request timeout 2021-02-19 14:11:35 -08:00
Jane Lusby 651d352ce1 update comments throughout connection.rs 2021-02-19 14:11:35 -08:00
Jane Lusby 2adee7b31a deduplicate match arms in handle_client_request 2021-02-19 14:11:35 -08:00
Jane Lusby cfc4717b98 rename transitions from Exit to Close 2021-02-19 14:11:35 -08:00
teor 5e4bf804aa Remove remaining references to fail_with 2021-02-19 14:11:35 -08:00
teor e06705ed81 Only reject pending client requests when the peer has errored
- Add an `ExitClient` transition, used when the internal client channel
  is closed or dropped, and there are no more pending requests
- Ignore pending requests after an `ExitClient` transition
- Reject pending requests when the peer has caused an error
  (the `Exit` and `ExitRequest` transitions)
- Remove `PeerError::ConnectionDropped`, because it is now handled by
  `ExitClient`. (Which is an internal error, not a peer error.)
2021-02-19 14:11:35 -08:00
teor 9d9734ea81 rustfmt 2021-02-19 14:11:35 -08:00
Jane Lusby 5ec8d09e0d accidental drop on mustusesender 2021-02-19 14:11:35 -08:00
Jane Lusby 6906f87ead introduce Transition enum 2021-02-19 14:11:35 -08:00
Jane Lusby e6cb20e13f leverage return value for propagating errors 2021-02-19 14:11:35 -08:00
teor e61b5e50a2
Diagnostics for CI port conflict failures (#1766)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.

Also report the configured and actual ports where possible, for better
diagnostics.
2021-02-18 12:15:09 -03:00
teor 5424e1d8ba
Fix candidate set address state handling (#1709)
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
  a `PeerAddrState` variant
- Delete `AddressBook.by_state`

Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
  methods
- Simplify the `AddressBook` iterator implementation, replacing it with
  methods that are more obviously correct
- Consistently collect peer set metrics

Documentation:
- Expand and update the peer set documentation

We can optimise later, but for now we want simple code that is more
obviously correct.
2021-02-18 11:18:32 +10:00
teor 579bd4a368
Retry DNS resolution on failure (#1762)
Otherwise, a transient DNS failure makes the node hang.
2021-02-18 07:09:02 +10:00
teor 86169f6412
Update PeerSet metrics after every change (#1727) 2021-02-18 07:06:59 +10:00
teor 8d1c498234 Log initial peer connection failures
And standardise another log message
2021-02-17 09:21:53 -05:00
teor e85441c914 Add a correctness comment to justify the revert 2021-02-16 05:52:54 +10:00
teor a02a00a3f5 Revert "Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)"
This reverts commit 241c7ad849.
2021-02-16 05:52:54 +10:00
teor e7176b86da Clarify the Response::Nil documentation 2021-02-11 09:45:42 -05:00
Deirdre Connolly 0c5daa8410 Bump versions for zebrad 1.0.0-alpha.2
Including tower-batch bump to 0.2.0, tower-fallback to 0.2.0, zebra-script to 1.0.0-alpha.3
2021-02-09 16:14:29 -05:00
Alfredo Garcia 241c7ad849
Stop using CallAllUnordered in peer_set::add_initial_peers (#1705)
* use ServiceExt::oneshot and FuturesUnordered

Co-authored-by: teor <teor@riseup.net>
2021-02-09 08:16:02 +10:00
teor 1e156a5d60 Document that connect_isolated only works on mainnet
Document that connect_isolated only works on mainnet.

See #1687.
2021-02-04 17:32:00 -05:00
Alfredo Garcia d7c40af2a8
Fix shutdown panics (#1637)
* add a shutdown flag in zebra_chain::shutdown
* fix network panic on shutdown
* fix checkpoint panic on shutdown
2021-02-03 19:03:28 +10:00
Alfredo Garcia 221512c733
Async DNS seeder lookups (#1662)
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
2021-02-03 12:20:26 +10:00
teor 983e94f9e4 Add a TODO for inbound error handling cleanup 2021-02-03 08:32:10 +10:00
Alfredo Garcia 4b34482264
Add hints to port conflict and lock file panics (#1535)
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests

* Add a full set of stderr acceptance test functions

This change makes the stdout and stderr acceptance test interfaces
identical.

* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path

* Use Display for state path logs

Avoids weird escaping on Windows when using Debug

* Add Windows conflict error messages

* Turn PORT_IN_USE_ERROR into a regex

And add another alternative Windows-specific port error

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
2021-01-29 22:36:33 +10:00
Deirdre Connolly 1b09538277
Bump versions for zebrad 1.0.0-alpha.1 (#1646)
* Bump versions where appropriate

Tested with cargo install --locked --path etc

* Remove fixed panics from 'Known Issues'

* Change to alpha release series in the README

Co-authored-by: teor <teor@riseup.net>
2021-01-27 20:31:39 -05:00
teor b551d81f8d Explain why we stay connected on Inbound errors
We might be syncing using this peer, so it's ok to just ignore
any internal errors in their Inbound requests, and drop the
request.
2021-01-27 12:08:49 -08:00
teor 258789ed9b Use the rustc unknown lints attribute
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.

Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.

But we also need to keep the old name, so we can continue to build
without warnings on stable.

And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.

We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
2021-01-19 11:02:20 -05:00
teor 05fff8e6f7 Revert "Stop panicking when fail_with is called twice on a connection"
But keep the extra error information.
2021-01-18 00:23:36 -05:00
teor 4fe81da953 Improve logging for connection state errors 2021-01-18 00:23:36 -05:00
teor a6c1cd3c35 Stop panicking when fail_with is called twice on a connection
We can't rule out the connection state changing between the state checks
and any eventual failures, particularly in the presence of async code.

So we turn this panic into a warning.
2021-01-18 00:23:36 -05:00
teor 44c8fafc29 Stop processing the request after failing an overloaded connection
zebra-network's Connection expects that `fail_with` is only called once
per connection, but the overload handling code continues to process the
current request after an overload error, potentially leading to further
failures.

Closes #1599
2021-01-18 00:23:36 -05:00
teor 0f0fb93b5c Update some comments in zebra-network
Add ticket numbers, and update based on design decisions and new code.
2021-01-15 09:02:10 -05:00
teor 730910cd99 Upgrade to tokio 0.3.6 from crates.io
And remove the tokio git dependency patch
2021-01-12 15:37:27 -05:00
Jane Lusby 15698245e1
Deduplicate metrics dependencies (#1561)
## Motivation

This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.

## Solution

This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.

## Related Issues

closes https://github.com/ZcashFoundation/zebra/issues/1349

## Follow Up Work

I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.

- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
2021-01-12 12:28:56 +10:00
dependabot[bot] 38ac869f57 build(deps): bump byteorder from 1.3.4 to 1.4.2
Bumps [byteorder](https://github.com/BurntSushi/byteorder) from 1.3.4 to 1.4.2.
- [Release notes](https://github.com/BurntSushi/byteorder/releases)
- [Changelog](https://github.com/BurntSushi/byteorder/blob/master/CHANGELOG.md)
- [Commits](https://github.com/BurntSushi/byteorder/compare/1.3.4...1.4.2)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-11 18:45:49 -05:00
teor b7d0a40ee1 Revert unused instrument macros
Reverts most of "Instrument some functions to try to locate the panic"
2021-01-06 13:07:23 -08:00
teor 6d3aa0002c Ensure received client request oneshots are used via the type system
The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.

We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.

Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
  but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`

* Add a new `ClientRequestReceiver` type that wraps a
  `mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
  converting the successful result of `inner.poll_next_unpin` into an
  `InProgressClientRequest`

* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
  with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
2021-01-06 13:07:23 -08:00
teor df1b0c8d58 Defer a timeout fix until later 2021-01-06 13:07:23 -08:00
teor d5cfd5ad5f Clarify the ClientRequest invariant
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2021-01-06 13:07:23 -08:00
teor f8ff2e9c0b Add more sends before dropping ClientRequests
This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
  would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
  on the ClientRequest sender, so the invariant is broken
2021-01-06 13:07:23 -08:00
teor 3e711ccc8a Instrument some functions to try to locate the panic 2021-01-06 13:07:23 -08:00
teor fa29fca917 Panic when must-use senders are dropped before use
Add a MustUseOneshotSender, which panics if its inner sender is unused.
Callers must call `send()` on the MustUseOneshotSender, or ensure that
the sender is canceled.

Replaces an unreliable panic in `Client::call()` with a reliable panic
when a must-use sender is dropped.
2021-01-06 13:07:23 -08:00
teor b03809ebe3 Add the invalid state to an unreachable panic message 2021-01-06 13:07:23 -08:00
teor 86136c7b5c Stop ignoring errors when the new state is AwaitingRequest
The previous code would send a Nil message on the Sender, even if the
result was actually an error.
2021-01-06 13:07:23 -08:00
teor da5084a10a Split the 3-level match using a temporary 2021-01-06 13:07:23 -08:00
teor fd23c46726 Remove a redundant fmt::Display bound 2021-01-06 13:07:23 -08:00
teor 3892894ffa Call ClientRequest.tx.send() even if there is an error
Previously, tx would be dropped before send if:
- the success case would have used tx to wait for further messages,
- but the response was actually an error.

Instead, send the error on `tx` and call `fail_with()` using the same
error.

To support this change, allow `fail_with()` to take a `PeerError` or a
`SharedPeerError`.
2021-01-06 13:07:23 -08:00
teor 28f3186182 Mark ClientRequest and State::AwaitingResponse as must_use 2021-01-06 13:07:23 -08:00
teor b1f14f47c6
Rewrite GetData handling to match the zcashd implementation (#1518)
* Rewrite GetData handling to match the zcashd implementation

`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)

This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.

Also change Zebra's GetData responses to peer request so they ignore
missing blocks.

* Stop hanging on incomplete transaction or block responses

Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
   that were successfully received
2. if none of the expected blocks or transactions were received, return
   an error, and close the connection
2021-01-04 13:25:35 +10:00
teor d482900e7f Remove a redundant pattern match
Identified by clippy's redundant_pattern_match lint.
2020-12-13 22:10:05 -05:00
teor 8e2f08221f
Add peer set tracing and unreachable panics (#1468)
Add some extra tracing and panics to double-check our
assumptions about the peer set state machine.
2020-12-14 11:00:39 +10:00
Henry de Valence 0842eb2dab
zebra: move to 1.x-based versioning. (#1476)
Previously we set the crate versions to 3.x, so that the major version was
aligned with the NU version.  But we want to be able to make API changes
independently of the NU schedule.
2020-12-08 08:53:07 +10:00
teor b4a50fd99f
Downgrade tokio to 0.3.4 to avoid a time wheel panic (#1453)
See tokio-rs/tokio#2789 for details. We were seeing this panic during
normal operation, not just at shutdown.
2020-12-04 13:52:37 +10:00
Henry de Valence b449fe93b2 network: correct data modeling for headers messages
We modeled a Bitcoin `headers` message as being a list of block headers.
However, the actual data structure is slightly different: it's a list of (block
header, transaction count) pairs.  This caused zcashd to reject our headers
messages.

To fix this, introduce a new `CountedHeader` struct with a `block::Header` and
transaction count `usize`, then thread it through the inbound service and the
state.

I tested this locally by running Zebra with these changes and inspecting a
trace-level log of the span of a peer connection that requested a nontrivial
headers packet from us, and verified that it did not reject our message.
2020-12-02 10:24:31 -08:00
Henry de Valence bfbc737b6c network: don't cancel heartbeat requests
The cancellation implementation changes made to the connection state machine
mean that if a response oneshot is dropped, the connection will avoid
cancelling the request.  So the heartbeat task does have to wait on the response.
2020-12-02 02:18:13 -05:00
Henry de Valence 69ba5584f3 network: correct parsing of reject messages
Not all reject messages include a data field.  This change partially addresses
a problem that could lead to a depleted peer set:

1. We send a response to a `getheaders` message;
2. The remote peer `reject`s our `headers` message for some reason;
3. We fail to parse their `reject` message and close the connection;
4. Repeating this process, we have no more peers.

This commit fixes (3) but does not address (2).
2020-12-02 02:12:29 -05:00
teor 34518525a5 Improve peer set logging hints
Delete hints about configuring peers.
Delete hint for typical "no ready peers" behaviour.
2020-12-01 21:37:15 -08:00
Henry de Valence 00c4f4f0e6 network: record cause of handshake failure 2020-12-01 19:16:41 -08:00
Henry de Valence 5ccd1905fc network: avoid putting null bytes in trace output 2020-12-01 19:16:41 -08:00
Henry de Valence f93deb1cac network: fix missing {0} in PeerError::Serialization 2020-12-01 19:16:41 -08:00
Henry de Valence 18cf5e0249 network: use short Display for Message in spans
This makes the span data more compact (e.g., `msg_as_req{msg=block}`) and
restores the Debug impl for Message to show all of the data contained in the
message.  The full message is added as a single event at trace level in the
span to preserve the previous full-inspectability.
2020-12-01 19:16:41 -08:00
Jane Lusby a91d0f0bb6
Include short sha in log messages and error urls (#1410)
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.

To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.

Co-authored-by: teor <teor@riseup.net>
2020-12-01 12:13:20 -08:00
teor 4d5ea4897c Log peer set ready and unready peers
* warn: if there are no peers at all
* info: if there are no ready peers
* trace: the number of ready and unready peers for every request

Log at most one warn or info log per minute, to avoid flooding the
terminal with log lines. Suppress warn and info logs for the first
minute, while the peer set is starting up.
2020-12-01 11:00:21 -05:00
teor 92eb92d1dd
Disable the nightly clippy unnecessary_wraps lint (#1403)
It seems to be a bit broken - some of our functions return `Result` for
consistency with similar functions. But the lint picks them up anyway.
2020-12-01 12:20:57 +10:00
Alfredo Garcia 4544463059
Inbound `FindBlocks` and `FindHeaders` (#1347)
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions

Add a `max_len` argument to support `FindHeaders` requests.

Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.

* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-12-01 07:30:37 +10:00
Alfredo Garcia 7d42c63790 fix comment 2020-11-25 10:55:44 -08:00
teor 8d6ac8eece Placate clippy 2020-11-24 20:03:21 +10:00
Henry de Valence d90e709ce1 network: tidy peer set implementation
- rename functions more descriptively
- create a common `take_ready_service` function
- organize poll_ functions separately
2020-11-24 20:03:21 +10:00
Henry de Valence f36a4800b2 network: fix invariant violation in peer set
Closes #1183.

The peer set maintains a preselected ready service that it can use to
perform power-of-two-choices (p2c) routing of requests.  Ready services
are stored by key (socket address) in an `IndexMap`, and the preselected
service is represented by an `Option<usize>` indexing that map.  This
means that whenever the set of ready services changes (e.g., a service
is removed from the peer set, or a service is taken to be used to
process a request), the preselected index is invalidated.  The original
P2C-only implementation maintained this invariant but did not document
it.

The change to inventory-based routing introduced a bug by failing to
maintain this invariant and appropriately invalidate the preselected
index.  However, this was only noticeable approximately 1/N of the time
on the next request after an inventory-directed request, so the bug
occurred infrequently.  Luckily, the use of `.expect` caused the bug to
be an immediate panic, making it possible to identify by inspecting all
uses of the ready service map.
2020-11-24 20:03:21 +10:00
teor 6387dfe1d0 Fix individual crate compilation failures
Some Zebra crates don't compile individually due to missing features in
their dependencies. Add those features to each crate's dependency list.
2020-11-23 23:56:28 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
Henry de Valence 06dd39df54
network: bump network version for Canopy (#1333)
Per https://zips.z.cash/zip-0251, nodes compatible with Canopy
activation on mainnet MUST advertise protocol version 170013 or later.

Once Canopy activates on testnet or mainnet, Canopy nodes SHOULD reject
new connections from pre-Canopy nodes, so this also increases the
minimum version.
2020-11-20 09:50:05 +10:00
Henry de Valence a3ab589d89 consensus,state: document cancellation contracts for services
This change explicitly documents cancellation contracts for our Tower services,
and tries to correct a bug in the implementation of the CheckpointVerifier,
which duplicates information from the state service but did not ensure that it
would be kept in sync.
2020-11-17 14:56:27 -08:00
teor ca4e792f47 Put messages in request/response order
And fix a comment typo
2020-11-17 07:52:53 +10:00
Alfredo Garcia 128643d81e
Call `zebra_test::init` where needed. (#1227)
* Add missing `zebra_test::init()` to zebra-chain
* Add missing `zebra_test::init()` to zebra-consensus
* Add missing `zebra_test::init()` to zebra-network
* Add missing `zebra_test::init()` to zebra-state
* Add missing `zebra_test::init()` to zebra-test
* Add missing `zebra_test::init()` to zebrad
2020-11-10 10:29:25 +10:00
Henry de Valence 8e709bfa88 network: don't fail on unsolicited messages
These messages might be unsolicited, or they might be a response to a
request we already canceled.  So don't fail the whole connection, just
drop the message and move on.
2020-10-26 12:05:35 -07:00
Henry de Valence 13daefa729 network: handle request cancellation in Connection
We handle request cancellation in two places: before we transition into
the AwaitingResponse state, and while we are in AwaitingResponse.  We
need both places, or else if we started processing a request, we
wouldn't process the cancellation until the timeout elapsed.

The first is a check that the oneshot is not already canceled.

For the second, we wait on a cancellation, either from a timeout or from
the tx channel closing.
2020-10-26 12:05:35 -07:00
teor 1e97691fc8 Fix some "needless lifetime" clippy lints
These lints seem to be new in clippy nightly.
2020-10-12 08:54:23 +10:00
Dimitris Apostolou 36279621f0 Fix typos 2020-10-06 12:16:41 +10:00
Henry de Valence 6dd7318d3b deps: use Tower 0.4 from git instead of 0.3.1.
This addresses at least three pain points:

- we were affected by bugs that were already fixed in git, but not in
  the released crate;
- we can use service combinators to transform requests and responses;
- we can use the hedge middleware.

The version in git is still marked as 0.3.1 but these changes will be
part of tower 0.4: https://github.com/tower-rs/tower/issues/431
2020-09-21 14:16:56 -07:00
Deirdre Connolly 33afeb37cb Add a comment about the short looo 2020-09-21 09:26:39 -07:00
Henry de Valence 6f3288814c network: avoid GetPeers timeout to accelerate init
The GetPeers requests sent while crawling the network are randomly
load-balanced over available peers.  But at the very beginning, they may
be both routed to the same peer, causing network initialization to be
delayed while the second one times out (since zcashd only ever responds
to the first addr message).

Only sending one GetPeers request per candidate set update means we
crawl the network a little more slowly, but avoids hanging on start.
2020-09-21 09:26:39 -07:00
Henry de Valence b72c249b96 network: add a metric+warning when shedding load 2020-09-21 09:26:39 -07:00
Henry de Valence 4df5632752 network: handle Message::NotFound as a response
This cleans up the response processing logic a little bit along the way,
but the overall division of responsibility should be better documented
in a future commit.
2020-09-20 10:21:18 -07:00
Henry de Valence 64905563d1 network: remove glob import in message-handling
This clarifies which parts are the handler state and which parts are the
incoming message.
2020-09-20 10:21:18 -07:00
Henry de Valence 9c021025a7 network: fill in remaining request/response pairs 2020-09-20 10:21:18 -07:00
Henry de Valence b289cb9164 network: clean up GetHeaders, GetBlocks modeling 2020-09-20 10:21:18 -07:00
Henry de Valence 3c993f33b1 network: add PeerError::WrongMessage
This lets us distinguish between cases where the message was unsupported
(e.g., BIP11 messages), and cases where the message was uninterpretable
in context (e.g., unsolicited messages).
2020-09-20 10:21:18 -07:00
Henry de Valence 430176dd0d network: clean up message-as-request translation 2020-09-20 10:21:18 -07:00
Henry de Valence 170f588ffb network: document load-shedding behavior
This was part of the original design and is described in the Connection
internals, but we never documented it externally.
2020-09-18 18:34:25 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
Henry de Valence 95f2463188 Try workaround for generator autotrait bug
> Added a test that the handshake's version message matches specified fields, but the test does not compile, because rustc doesn't believe that the Box<dyn std::error::Error + Send + Sync + 'static> is 'static, and therefore isn't a Box<dyn std::error::Error + Send + Sync + 'static>. This manifests as being unable to spawn the connect_isolated task. From digging through Tokio issues I believe that this is an instance of rust-lang/rust#64552 .

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-17 12:02:20 -07:00
Henry de Valence 81e8195f68 network: add connect_isolated distinguisher test
This is currently broken due to a rustc bug.
2020-09-17 12:02:20 -07:00
Henry de Valence b7472de43f network: add a zebra_network::connect_isolated() method.
The peer set provides an automatically managed connection pool, abstracting
away all the details of handling individual peer connections.  However, it's
also useful to be able to create completely isolated and
minimally-distinguishable connections to individual peers, in order to be able
to send specific messages over Tor, or to implement some custom network crawler
logic.
2020-09-17 12:02:20 -07:00
teor 66265dc11a Adjust the EWMA decay for the latest sync timeout 2020-09-09 15:35:09 -07:00
teor 1f7af0a779 Update the inv message processing comment
Cleanup after PR #1028.
2020-09-09 15:29:38 -07:00
teor 2a68ef5acb Update the peerset buffer size and sync timeout
Also add a bunch of comments and documentation for network-constrained
nodes, and for testnet.
2020-09-08 12:44:33 -07:00
teor e6e859dce2 Tweak sync timeouts
* increase the EWMA default and decay
* increase the block download retries
* increase the request and block download timeouts
* increase the sync timeout
2020-09-08 12:44:33 -07:00
Jane Lusby 1b17691dda improve logging 2020-09-08 12:37:34 -07:00
Jane Lusby 81a3ad3a0d filter inventory advertisements correctly 2020-09-08 12:37:34 -07:00
Henry de Valence 3f150eb16e
network: implement transaction request handling. (#1016)
This commit makes several related changes to the network code:

- adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and
  `Transactions(Vec<Arc<Transaction>>)` response pair that allows
  fetching transactions from a remote peer;

- adds a `PushTransaction(Arc<Transaction>)` request that pushes an
  unsolicited transaction to a remote peer;

- adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request
  that advertises transactions by hash to a remote peer;

- adds an `AdvertiseBlock(block::Hash)` request that advertises a block
  by hash to a remote peer;

Then, it modifies the connection state machine so that outbound
requests to remote peers are handled properly:

- `TransactionsByHash` generates a `getdata` message and collects the
  results, like the existing `BlocksByHash` request.

- `PushTransaction` generates a `tx` message, and returns `Nil` immediately.

- `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv`
  message, and return `Nil` immediately.

Next, it modifies the connection state machine so that messages
from remote peers generate requests to the inbound service:

- `getdata` messages generate `BlocksByHash` or `TransactionsByHash`
  requests, depending on the content of the message;

- `tx` messages generate `PushTransaction` requests;

- `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions`
  requests.

Finally, it refactors the request routing logic for the peer set to
handle advertisement messages, providing three routing methods:

- `route_p2c`, which uses p2c as normal (default);
- `route_inv`, which uses the inventory registry and falls back to p2c
  (used for `BlocksByHash` or `TransactionsByHash`);
- `route_all`, which broadcasts a request to all ready peers (used for
  `AdvertiseBlock` and `AdvertiseTransactions`).
2020-09-08 10:16:29 -07:00
Henry de Valence cad38415b2
network: fix bug in inventory advertisement handling (#1022)
* network: fix bug in inventory advertisement handling

The RFC https://zebra.zfnd.org/dev/rfcs/0003-inventory-tracking.html described
the use of a `broadcast` channel in place of an `mpsc` channel to get
ring-buffer behavior, keeping a bound on the size of the channel but dropping
old entries when the channel is full.

However, it didn't explicitly describe how this works (the `broadcast` channel
returns a `RecvError::Lagged(u64)` to inform receivers that they lost
messages), so the lag-handling wasn't implemented and I didn't notice in
review.

Instead, the ? operator bubbled the lag error all the way up from
`InventoryRegistry::poll_inventory` through `<PeerSet as Service>::poll_ready`
through various Tower wrappers to users of the peer set.  The error propagation
is bad enough, because it caused client errors that shouldn't have happened,
but there's a worse interaction.

The `Service` contract distinguishes between request errors (from
`Service::call`, scoped to the request) and service errors (from
`Service::poll_ready`, scoped to the service).  The `Service` contract
specifies that once a service returns an error from `poll_ready`, the service
can be assumed to be failed permanently.

I believe (but haven't tested or carefully worked through the details) that
this caused various tower middleware to report the entire peer set service as
permanently failed due to a transient inventory "error" (more of an indicator),
and I suspect that this is the cause of #1003, where all of the sync
component's requests end up failing because the peer set reported that it
failed permanently.  I am able to reproduce #1003 locally before this change
and unable to reproduce it locally after this change, though I have not tested
exhaustively.

* network: add metric for dropped inventory advertisements

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2020-09-07 21:24:31 -07:00
Henry de Valence 9682d452ee network: add AddressBook::potentially_connected_peers(). 2020-09-07 11:13:15 -07:00
dependabot[bot] 142226ad57 build(deps): bump indexmap from 1.5.2 to 1.6.0
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.5.2 to 1.6.0.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Commits](https://github.com/bluss/indexmap/compare/1.5.2...1.6.0)

Signed-off-by: dependabot[bot] <support@github.com>
2020-09-07 07:56:39 -04:00
Alfredo Garcia 454e75e7c0
Rename old references to BlockHeaderHash and BlockHeight (#1002)
* rename some references

* Apply suggestions from code review

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2020-09-04 15:40:48 -07:00
teor b5c653ed93
Use ok_or for constants, rather than a redudant closure
* Use ok_or for constants in zebra-network
* Use ok_or for constants in zebra-consensus
2020-09-02 14:26:26 +10:00
Jane Lusby 88557ddd0a address more comments 2020-09-01 21:01:38 -04:00