Commit Graph

67 Commits

Author SHA1 Message Date
teor a8a0d6450c Security: stop gossiping temporary inbound remote addresses to peers
- stop putting inbound addresses in the address book
- drop address book entries that can't be used for outbound connections
  - distinguish between temporary inbound and permanent outbound peer
    addresses
  - also create variants to handle proxy connections
    (but don't use them yet)
  - avoid tracking connection state for isolated connections
- document security constraints for the address book and peer set
2021-05-14 23:45:42 +10:00
teor 3f45735f3f Use futures:🔒:Mutex for the nonce set 2021-04-21 01:39:49 -04:00
teor ad272f2bee Make sure handshake version negotiation always has a timeout
As part of this change, refactor handshake version negotiation into its
own function.
2021-04-19 18:31:28 -04:00
teor 2cecd52a10 Fix comment typo 2021-04-19 10:11:22 -04:00
teor 8fb12f07a1 Fix outdated comment 2021-04-19 10:11:22 -04:00
teor eabadb8301 Make heartbeats wait for the connection queue to empty, with a timeout
Also cleanup the heartbeat code, so each heartbeat request/response runs
in a future with a single timeout.
2021-04-19 10:11:22 -04:00
teor 375c8d8700
Fix a deadlock between the crawler and dialer, and other hangs (#1950)
* Stop ignoring inbound message errors and handshake timeouts

To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
  (not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout

Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.

* Avoid hangs by adding a timeout to the candidate set update

Also increase the fanout from 1 to 2, to increase address diversity.

But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.

Also log Peers response errors in the CandidateSet.

* Use the select macro in the crawler to reduce hangs

The `select` function is biased towards its first argument, risking
starvation.

As a side-benefit, this change also makes the code a lot easier to read
and maintain.

* Split CrawlerAction::Demand into separate actions

This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.

That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.

* Spawn a separate task for each handshake

This change avoids deadlocks by letting each handshake make progress
independently.

* Move the dial task into a separate function

This refactor improves readability.

* Fix buggy future::select function usage

And document the correctness of the new code.
2021-04-07 10:25:10 -03:00
teor b329892665 Add a comment about a zcashd inv message bug 2021-03-26 11:26:59 -04:00
teor 6fe81d8992 Make MetaAddr.last_seen into a private field 2021-03-26 07:23:49 +10:00
Jack Grigg 7a8cae9321 Tag message metrics by type 2021-03-17 09:38:07 +10:00
Jack Grigg e51f33a4b9 Use interoperable names for common metrics
These names match the equivalent metrics in zcashd, enabling common
metrics to be collected across both node types.
2021-03-17 09:38:07 +10:00
teor 72e2e83828 Revert "introduce Transition enum"
This reverts commit 6906f87ead.
2021-02-24 13:07:31 -08:00
teor fc44a97925 Revert "remove unnecessary Option around request timeout"
This reverts commit c3724031df.
2021-02-24 13:07:31 -08:00
Jane Lusby c3724031df remove unnecessary Option around request timeout 2021-02-19 14:11:35 -08:00
Jane Lusby 6906f87ead introduce Transition enum 2021-02-19 14:11:35 -08:00
teor 5424e1d8ba
Fix candidate set address state handling (#1709)
Design:
- Add a `PeerAddrState` to each `MetaAddr`
- Use a single peer set for all peers, regardless of state
- Implement time-based liveness as an `AddressBook` method, rather than
  a `PeerAddrState` variant
- Delete `AddressBook.by_state`

Implementation:
- Simplify `AddressBook` changes using `update` and `take` modifier
  methods
- Simplify the `AddressBook` iterator implementation, replacing it with
  methods that are more obviously correct
- Consistently collect peer set metrics

Documentation:
- Expand and update the peer set documentation

We can optimise later, but for now we want simple code that is more
obviously correct.
2021-02-18 11:18:32 +10:00
teor 0f0fb93b5c Update some comments in zebra-network
Add ticket numbers, and update based on design decisions and new code.
2021-01-15 09:02:10 -05:00
teor b7d0a40ee1 Revert unused instrument macros
Reverts most of "Instrument some functions to try to locate the panic"
2021-01-06 13:07:23 -08:00
teor 6d3aa0002c Ensure received client request oneshots are used via the type system
The `peer::Client` translates `Request`s into `ClientRequest`s, which
it sends to a background task. If the send is `Ok(())`, it will assume
that it is safe to unconditionally poll the `Receiver` tied to the
`Sender` used to create the `ClientRequest`.

We enforce this invariant via the type system, by converting
`ClientRequest`s to `InProgressClientRequest`s when they are received by
the background task. These conversions are implemented by
`ClientRequestReceiver`.

Changes:
* Revert `ClientRequest` so it uses a `oneshot::Sender`
* Add `InProgressClientRequest`, which is the same as `ClientRequest`,
  but has a `MustUseOneshotSender`
* `impl From<ClientRequest> for InProgressClientRequest`

* Add a new `ClientRequestReceiver` type that wraps a
  `mpsc::Receiver<ClientRequest>`
* `impl Stream<InProgressClientRequest> for ClientRequestReceiver`,
  converting the successful result of `inner.poll_next_unpin` into an
  `InProgressClientRequest`

* Replace `client_rx: mpsc::Receiver<ClientRequest>` in `Connection`
  with the new `ClientRequestReceiver` type
* `impl From<mpsc::Receiver<ClientRequest>> for ClientRequestReceiver`
2021-01-06 13:07:23 -08:00
teor df1b0c8d58 Defer a timeout fix until later 2021-01-06 13:07:23 -08:00
teor f8ff2e9c0b Add more sends before dropping ClientRequests
This fix also changes heartbeat behaviour in the following ways:
* if the queue is full, the connection is closed. Previously, the sender
  would wait until the queue had emptied
* if the queue flush fails, Zebra panics, because it can't send an error
  on the ClientRequest sender, so the invariant is broken
2021-01-06 13:07:23 -08:00
teor 3e711ccc8a Instrument some functions to try to locate the panic 2021-01-06 13:07:23 -08:00
teor fa29fca917 Panic when must-use senders are dropped before use
Add a MustUseOneshotSender, which panics if its inner sender is unused.
Callers must call `send()` on the MustUseOneshotSender, or ensure that
the sender is canceled.

Replaces an unreliable panic in `Client::call()` with a reliable panic
when a must-use sender is dropped.
2021-01-06 13:07:23 -08:00
Henry de Valence bfbc737b6c network: don't cancel heartbeat requests
The cancellation implementation changes made to the connection state machine
mean that if a response oneshot is dropped, the connection will avoid
cancelling the request.  So the heartbeat task does have to wait on the response.
2020-12-02 02:18:13 -05:00
Jane Lusby a91d0f0bb6
Include short sha in log messages and error urls (#1410)
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.

To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.

Co-authored-by: teor <teor@riseup.net>
2020-12-01 12:13:20 -08:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
teor 1f7af0a779 Update the inv message processing comment
Cleanup after PR #1028.
2020-09-09 15:29:38 -07:00
Jane Lusby 1b17691dda improve logging 2020-09-08 12:37:34 -07:00
Jane Lusby 81a3ad3a0d filter inventory advertisements correctly 2020-09-08 12:37:34 -07:00
teor b5c653ed93
Use ok_or for constants, rather than a redudant closure
* Use ok_or for constants in zebra-network
* Use ok_or for constants in zebra-consensus
2020-09-02 14:26:26 +10:00
Jane Lusby 88557ddd0a address more comments 2020-09-01 21:01:38 -04:00
Jane Lusby d933abeebf fix typo 2020-09-01 21:01:38 -04:00
Jane Lusby 96c8809348
Implement Inventory Tracking RFC (#963)
* Add .cargo to the gitignore file

* Implement Inventory Tracking RFC

* checkpoint

* wire together the inventory registry

* add comment documenting condition

* make inventory registry optional
2020-09-01 14:28:54 -07:00
Henry de Valence f91b91b6d8 network: clarify comment on Default for handshake::Builder
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-09-01 13:56:00 -07:00
Henry de Valence fddba7a336 network: remove handshake::Builder::with_addr
Use the listen_addr field already specified in the config.

Also, derive Clone for Handshake<S>.

Co-authored-by: Jane Lusby <jane@zfnd.org>
2020-09-01 13:56:00 -07:00
Henry de Valence a5b6f39850 network: don't leak our exact time skew in handshakes. 2020-09-01 13:56:00 -07:00
Henry de Valence 60a0b8c382 network: change Handshake::new to a Builder.
This allows more detailed control over the handshake parameters.
2020-09-01 13:56:00 -07:00
Henry de Valence 103b663c40 chain: rename BlockHeight to block::Height 2020-08-17 11:46:34 -07:00
Henry de Valence dad6340cd3 chain: move BlockHeight into block 2020-08-17 11:46:34 -07:00
Alfredo Garcia b41e33e066
Bytes read and bytes written metrics (#901)
* add bytes read and written metrics

* Apply suggestions from code review

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* store address as string

* Apply suggestions from code review

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* change addr to label

Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>

* remove newline

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
2020-08-14 15:50:26 -07:00
Henry de Valence 3d46ab746a
Clean up options in network config section. (#839)
Closes #536.

This removes:

- the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature);
- the EWMA parameters (these were put in the config just to avoid making a choice);
- the peer connection timeout (we can change the default value if anyone ever has a problem with it);
- the peer set request buffer size (setting this too low can make the application deadlock);

The new peer interval is left in.
2020-08-06 11:29:00 -07:00
teor da09965a5f feature: Get the current minimum protocol version 2020-07-23 15:52:18 +10:00
teor c9ee85c3b5 feature: Add network upgrade activation heights 2020-07-23 15:52:18 +10:00
Henry de Valence 0dc2d92ad8 network: ensure dropping a Client closes the connection.
This fixes a bug introduced when we added heartbeat support.  Recall that we
handle the Bitcoin connection state machine on a per-peer basis.  Each
connection has a task created from the `Connection` struct, and a `Client:
tower::Service` "frontend" that passes requests to it via a channel.  In the
`Connection` event loop, the connection checks whether the request channel has
been closed, indicating no further requests from the `Client`, in which case it
shuts itself down and cleans up resources.  This occurs when all of the senders
have been dropped.

However, this behavior broke when we introduced heartbeat support, because we
spawned an additional task to send heartbeat messages along the request
channel.  This meant that instead of having a single sender, dropped by the
`Client`, we have two senders, the `Client` and the "shadow client" task that
generates heartbeat messages.  This means that when the `Client` is dropped, we
still have a live sender and the connection is not closed.  To fix this, the
`Client` now uses a `oneshot` to shut down its corresponding heartbeat task.
This closes all senders.
2020-07-21 15:43:31 -07:00
teor b0cd920fad feature: Use the Heartwood protocol version in zebra-network 2020-07-21 10:46:07 -07:00
teor ab6d1f5ec8
fix: Use the default Zcash port in version messages (#661)
We don't provide our address yet, so the port should be ignored.

But let's use the correct port, to avoid carrying this bug forward into
working code.
2020-07-15 11:43:28 -07:00
Henry de Valence 217c25ef07 network: propagate tracing Spans through peer connection 2020-07-09 11:15:06 -07:00
Deirdre Connolly 05316dee21 Listen on 0.0.0.0, not 127.0.0.1
Turns out when your node faces the internet directly, it has to listen
to those addresses directly.
2020-06-19 03:46:09 -04:00
Jane Lusby 4a2d2a359c
add cargo fmt to ci (#403)
* add cargo fmt to ci

* rebase on main

* switch to stable

Co-authored-by: Jane Lusby <jane@zfnd.org>
2020-05-27 19:12:25 -07:00
George Tankersley df79fa75e0
Implement minimal version handshaking (#295)
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
2020-04-13 18:33:15 -04:00