Commit Graph

56 Commits

Author SHA1 Message Date
Pili Guerra ec2e9ca276
Delete outdated `TODOs` refering to closed issues (#6732)
* ZIPs were updated to remove ambiguity, this was tracked in #1267.

* #2105 was fixed by #3039 and #2379 was closed by #3069

* #2230 was a duplicate of #2231 which was closed by #2511

* #3235 was obsoleted by #2156 which was fixed by #3505

* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969

* We migrated to Rust 2021 edition in Jan 2022 with #3332

* #1631 was closed as not needed

* #338 was fixed by #3040 and #1162 was fixed by #3067

* #2079 was fixed by #2445

* #4794 was fixed by #6122

* #1678 stopped being an issue

* #3151 was fixed by #3934

* #3204 was closed as not needed

* #1213 was fixed by #4586

* #1774 was closed as not needed

* #4633 was closed as not needed

* Clarify behaviour of difficulty spacing

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour

Co-authored-by: teor <teor@riseup.net>

* Update comment to reflect implemented behaviour when retrying block downloads

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify when we might want to fix

Co-authored-by: teor <teor@riseup.net>

* Update `TODO` to remove closed issue and clarify what we might want to change in future

Co-authored-by: teor <teor@riseup.net>

* Clarify benefits of how we do block verification

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt errors

---------

Co-authored-by: teor <teor@riseup.net>
2023-05-23 03:33:14 +00:00
teor 3f13072c46
cleanup(rust): Simplify code using closure capture in Rust 2021 edition (#6737)
* Simplify code using closure capture in Rust 2021 edition

* clippy: manual_next_back and unit_arg

* cargo fmt --all
2023-05-22 19:22:13 +00:00
teor b0d9471214
fix(log): Stop logging peer IP addresses, to protect user privacy (#6662)
* Add a PeerSocketAddr type which hides its IP address, but shows the port

* Manually replace SocketAddr with PeerSocketAddr where needed

```sh
fastmod SocketAddr PeerSocketAddr zebra-network
```

* Add missing imports

* Make converting into PeerSocketAddr easier

* Fix some unused imports

* Add a canonical_peer_addr() function

* Fix connection handling for PeerSocketAddr

* Fix serialization for PeerSocketAddr

* Fix tests for PeerSocketAddr

* Remove some unused imports

* Fix address book listener handling

* Remove redundant imports and conversions

* Update outdated IPv4-mapped IPv6 address code

* Make addresses canonical when deserializing

* Stop logging peer addresses in RPC code

* Update zebrad tests with new PeerSocketAddr type

* Update zebra-rpc tests with new PeerSocketAddr type

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2023-05-14 15:06:07 +00:00
teor 09836d2800
fix(clippy): Put Rust format variables inline (#5783)
* cargo clippy --fix --all-features --all-targets

With rustc 1.67.0-nightly (234151769 2022-12-03)

* cargo fmt --all
2022-12-08 01:05:57 +00:00
teor 806dd0f24c
feat(net): return peer metadata from `connect_isolated` functions (#4870)
* Move version into a ConnectionInfo struct

* Add negotiated version to ConnectionInfo

Part of this change was generated using:
```
fastmod --fixed-strings ".version(" ".remote_version(" zebra-network
```

* Add the peer address to ConnectionInfo, add ConnectionInfo to Connection

* Return a Client instance from connect_isolated_* functions

This allows library users to access client ConnectionInfo.

* Add and improve debug formatting

* Add peer services and user agent to ConnectionInfo

* Export the Client type, and fix up a zebrad test

* Export types used by the public API

* Split VersionMessage into its own struct

* Use VersionMessage in ConnectionInfo

* Add a public API test for ConnectionInfo

* Wrap ConnectionInfo in an Arc

* Fix some doc links
2022-09-14 15:00:25 +00:00
Alfredo Garcia 97fb85dca9
lint(clippy): add `unwrap_in_result` lint (#4667)
* `unwrap_in_result` in zebra-chain crate

* `unwrap_in_result` in zebra-script crate

* `unwrap_in_result` in zebra-state crate

* `unwrap_in_result` in zebra-consensus crate

* `unwrap_in_result` in zebra-test crate

* `unwrap_in_result` in zebra-network crate

* `unwrap_in_result` in zebra-rpc crate

* `unwrap_in_result` in zebrad crate

* rustfmt

* revert `?` and add exceptions

* explain some panics better

* move some lint positions

* replace a panic with error

* Fix rustfmt?

Co-authored-by: teor <teor@riseup.net>
2022-06-28 06:22:07 +00:00
teor e5c92084ba
Group PeerSet fields into themes (#4618)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-16 23:12:35 +00:00
Marek cc75c3f5f9
fix(doc): Fix various doc warnings, part 3 (#4611)
* Fix the syntax of links in comments

* Fix a mistake in the docs

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>

* Remove unnecessary angle brackets from a link

* Revert the changes for links that serve as references

* Revert "Revert the changes for links that serve as references"

This reverts commit 8b091aa9fa.

* Remove `<` `>` from links that serve as references

This reverts commit 046ef25620.

* Don't use `<` `>` in normal comments

* Don't use `<` `>` for normal comments

* Revert changes for comments starting with `//`

* Fix some warnings produced by `cargo doc`

* Fix some rustdoc warnings

* Fix some warnings

* Refactor some changes

* Fix some rustdoc warnings

* Fix some rustdoc warnings

* Resolve various TODOs

Co-authored-by: teor <teor@riseup.net>

* Fix some unresolved links

* Allow links to private items

* Fix some unresolved links

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-15 03:57:19 +00:00
teor a4dd3b7396
4. Avoid repeated requests to peers after partial responses or errors (#3505)
* fix(network): split synthetic NotFoundRegistry from message NotFoundResponse

* docs(network): Improve `notfound` message documentation

* refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender

```
fastmod MustUseOneshotSender MustUseClientResponseSender zebra*
```

* docs(network): fix a comment typo

* refactor(network): remove generics from MustUseClientResponseSender

* refactor(network): add an inventory collector to Client, but don't use it yet

* feat(network): register missing peer responses as missing inventory

We register this missing inventory based on peer responses,
or connection errors or timeouts.

Inbound message inventory tracking requires peers to send `notfound` messages.
But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages.
This missing inventory tracking works regardless of peer `notfound` messages.

* refactor(network): rename ResponseStatus to InventoryResponse

```sh
fastmod ResponseStatus InventoryResponse zebra*
```

* refactor(network): rename InventoryStatus::inner() to to_inner()

* fix(network): remove a redundant runtime.enter() in a test

* doc(network): the exact time used to filter outbound peers doesn't matter

* fix(network): handle block requests slightly more efficiently

* doc(network): fix a typo

* fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse

* doc(test): clarify some test comments

* test(network): test synthetic notfound from connection errors and peer inventory routing

* test(network): improve inbound test diagnostics

* feat(network): add a proptest-impl feature to zebra-network

* feat(network): add a test-only connect_isolated_with_inbound function

* test(network): allow a response on the isolated peer test connection

* test(network): fix failures in test synthetic notfound

* test(network): Simplify SharedPeerError test assertions

* test(network): test synthetic notfound from partially successful requests

* test(network): MissingInventoryCollector ignores local NotFoundRegistry errors

* fix(network): decrease the inventory rotation interval

This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block.

Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit.
This should speed up syncing near the tip, and on testnet.

* fmt(network): cargo fmt --all

* cleanup(network): remove unnecessary allow(dead_code)

* cleanup(network): stop importing the whole sync module into tests

* doc(network): clarify syncer inventory retry constraint

* doc(network): add a TODO for a fix to ensure API behaviour remains consistent

* doc(network): fix a function doc typo

* doc(network): clarify how we handle peers that don't send `notfound`

* docs(network): clarify a test comment

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 01:44:33 +00:00
teor 9be13a4fb7
2. Route peer requests based on missing inventory (#3465)
* feat(network): send notfound messages to the inventory registry

* refactor(network): move the inventory filter into an async function

* feat(network): avoid routing requests to peers that are missing inventory

* test(network): advertised routing is independent of numeric address value

* test(network): peer set routes requests to peers not missing that inventory

* test(network): peer set fails requests if all ready peers are missing that inventory

* fix(clippy): needless-borrow in the peer set

* fix(lint): remove redundant trailing commas in macro calls

There is no clippy lint for this, maybe because some macros
are sensitive to trailing commas.
(But not the ones changed in this commit.)

* test(network): check the exact number of inventory peers

* doc(network): explain why we ignore inventory send failures

* docs(network): explain why a channel error is ignored
2022-02-08 01:16:41 +00:00
teor 98502d6181
1. Create an API for a missing inventory registry, but don't register any missing inventory yet (#3255)
* feat(network): create an API for registering missing inventory, but don't use it yet

* feat(constraint): implement AtLeastOne::iter_mut()

* refactor(network): add InventoryStatus::marker() method to remove associated data

* fix(network): prefer current inventory, and missing inventory statuses

* fix(network): if an inventory rotation is missed, delay future rotations

* fix(network): don't immediately rotate a new empty inventory registry

* fix(network): assert that only expected inventory variants are stored in the registry

* test(network): add a basic empty inventory registry test

Also adds an inventory registry update future,
which makes it easier to call from an async context.

* refactor(network): add a convenience API for new InventoryChanges

* feat(network): improve inventory registry logging and metrics

* test(network): make sure advertised and missing inventory is correctly registered

* test(network): check that missing inventory is preferred over advertised

* test(network): check that current inventory is preferred over previous

* test(network): check peer set routes inv requests to advertised peers

* refactor(network): make the InventoryChange API more flexible

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-06 23:05:52 +00:00
teor 469fa6b917
1. Fix some address crawler timing issues (#3293)
* Stop holding completed messages until the next inbound message

* Add more info to network message block download debug logs

* Simplify address metrics logs

* Try handling inbound messages as responses, then try as a new request

* Improve address book logging

* Fix a race between the first heartbeat and getaddr requests

* Temporarily reduce the getaddr fanout to 1

* Update metrics when exiting the Connection run loop

* Downgrade some debug logs to trace
2022-01-04 18:43:30 -05:00
Alfredo Garcia 5f772ebf65
Test for message broadcast to the right number of peers (#3284)
* create a test for message broadcast to peers

* remove dbg comment

* use `prop_assert_eq`

* use a random number of peers with constant version

* fix some docs

* check total peers is equal to active peers

* increase coverage

Co-authored-by: teor <teor@riseup.net>
2021-12-31 00:26:50 +00:00
teor 6814525a7a
Update async correctness docs and the async in Zebra RFC (#3243)
* Justify that the ErrorSlot Mutex is deadlock-safe

* Document cancellation safety in the async RFC

* Document task starvation in the async RFC

Co-authored-by: Marek <mail@marek.onl>
2021-12-21 07:10:15 +00:00
teor d0e6de8040
Avoid deadlocks in the address book mutex (#3244)
* Tweak crawler timings so peers are more likely to be available

* Tweak min peer connection interval so we try all peers

* Let other tasks run between fanouts, so we're more likely to choose different peers

* Let other tasks run between retries, so we're more likely to choose different peers

* Let other tasks run after peer crawler DemandDrop

This makes it more likely that peers will become ready.

* Spawn the address book updater on a blocking thread

* Spawn CandidateSet address book operations on blocking threads

* Replace the PeerSet address book with a metrics watch channel

* Fix comment

* Await spawned address book tasks

* Run the address book update tasks concurrently (except for the mutex)

* Explain an internal-only method better

* Fix a typo

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-12-20 00:44:43 +00:00
teor f176bb59a2
Stop ignoring some connection errors that could make the peer set hang (#3200)
* Drop peer services if their cancel handles are dropped

* Exit the client task if the heartbeat task exits

* Allow multiple errors on a connection without panicking

* Explain why we don't need to send an error when the request is cancelled

* Document connection fields

* Make sure connections don't hang due to spurious timer or channel usage

* Actually shut down the client when the heartbeat task exits

* Add tests for unready services

* Close all senders to peer when `Client` is dropped

* Return a Client error if the error slot has an error

* Add tests for peer Client service errors

* Make Client drop and error cleanups consistent

* Use a ClientDropped error when the Client struct is dropped

* Test channel and error state in peer Client tests

* Move all Connection cleanup into a single method

* Add tests for Connection

* fix typo in comment

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-12-15 14:52:44 +00:00
Janito Vaqueiro Ferreira Filho 7bc2f0ac27
Describe `PeerSet`s behavior at a network upgrade (#3181)
ZIP-201 describes how a Zcash node should behave when it reaches a
network upgrade activation height. Zebra doesn't implement all the
details specified there, so we need to document what it does implement
and what it doesn't and why.
2021-12-13 11:17:20 +01:00
Janito Vaqueiro Ferreira Filho 0ad89f2f41
Disconnect from outdated peers on network upgrade (#3108)
* Replace usage of `discover::Change` with a tuple

Remove the assumption that a `Remove` variant would never be created
with type changes that allow the compiler to guarantee that assumption.

* Add a `version` field to the `Client` type

Keep track of the peer's reported protocol version.

* Create `LoadTrackedClient` type

A `peer::Client` type wrapper that implements `Load`. This helps with
the creation of a client service that has extra peer information to be
accessed without having to send requests.

* Use `LoadTrackedClient` in `initialize`

Ensure that `PeerSet` receives `LoadTrackedClient`s so that it will be
able to query the peer's protocol version later on.

* Require `LoadTrackedClient` in `PeerSet`

Replace the generic type with a concrete `LoadTrackedClient` so that we
can query its version.

* Create `MinimumPeerVersion` helper type

A type to track the current minimum protocol version for connected
peers based on the current block height.

* Use `MinimumPeerVersion` in handshakes

Keep the code to obtain the current minimum peer protocol version in a
central place.

* Add a `MinimumPeerVersion` instance to `PeerSet`

Prepare it to be able to disconnect from outdated peers based on the
current minimum supported peer protocol version.

* Disconnect from ready services for outdated peers

When the minimum peer protocol version is detected to have changed
(because of a network upgrade), remove all ready services of peers that
became outdated.

* Cancel added unready services of outdated peers

Only add an unready service if it's for a peer that has a supported
protocol version. Otherwise, add it but drop the cancel handle so that
the `UnreadyService` can execute and detect that it was cancelled.

* Avoid adding ready services for outdated peers

If a service becomes ready but it's for a connection to an outdated
peer, drop it.

* Improve comment inside `crawl_and_dial`

Describe an edge case that is also handled but was not explicit.

Co-authored-by: teor <teor@riseup.net>

* Test if calculated minimum peer version is correct

Given an arbitrary best chain tip height, check that the calculated
minimum peer protocol version is the expected value.

* Test if minimum version changes with chain tip

Apply an arbitrary list of chain tip height updates and check that for
each update the minimum peer version is calculated correctly.

* Test minimum peer version changed reports

Simulate a series of best chain tip height updates, and check for
minimum peer version updates at least once between them. Changes should
only be reported once.

* Create a `MockedClientHandle` helper type

Used to create and then track a mock `Client` instance.

* Add `MinimumPeerVersion::with_mock_chain_tip`

An extension method useful for tests, that contains some shared
boilerplate code.

* Bias arbitrary `Version`s to be in valid range

Give a 50% chance for an arbitrary `Version` to be in the range of
previously used values the Zcash network.

* Create a `PeerVersions` helper type

Helps with the creation of mocked client services with arbitrary
protocol versions.

* Create a `PeerSetGuard` helper type

An auxiliary type to a `PeerSet` instance created for testing. It keeps
track of any dummy endpoints of channels created and passed to the
`PeerSet` instance.

* Create a `PeerSetBuilder` helper type

Helps to reduce the code when preparing a `PeerSet` test instance.

* Test if outdated peers are rejected by `PeerSet`

Simulate a set of discovered peers being sent to the `PeerSet`. Ensure
that only up-to-date peers are kept by the `PeerSet` and that outdated
peers are dropped.

* Create `BlockHeightPairAcrossNetworkUpgrades` type

A helper type that allows the creation of arbitrary block height pairs,
where one value is before and the other is at or after the activation
height of an arbitrary network upgrade.

* Test if peers are dropped as they become outdated

Simulate a network upgrade, and check that peers that become outdated
are dropped by the `PeerSet`.

* Remove dbg! macros

Co-authored-by: teor <teor@riseup.net>
2021-12-09 02:54:29 +00:00
teor 4d608d3224
Stop doing thousands of time checks each time we connect to a peer (#3106)
* Stop checking the entire AddressBook for each connection attempt

* Stop redundant peer time checks within the address book

* Stop calling `Instant::now` 3 times for each address book update

* Only get the time once each time an address book method is called

* Update outdated comment

* Use an OrderedMap to efficiently store address book peers

* Add address book order tests
2021-12-03 15:09:43 -03:00
teor c85ea18b43
Fix slow Zebra startup times, to reduce CI failures (#3104)
* Tweak a log message

* Only retry failed DNS once, then use the other DNS responses

* Limit broadcasts to half the peers

* Use a longer minimum interval for GetAddr requests

* Reduce the syncer and mempool crawler fanouts

* Stop resetting the mempool twice when it starts up

This spawns two crawlers, which send two fanouts,
so it can use up a lot of peers.

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-11-30 21:04:32 +00:00
teor f6abb15778
Security: Stop routing inventory requests by peer address (#3090)
* Rewrite PeerSet comments to split long sentences

* Replace peer set integer indexes with address-based indexes

Also improve documentation and logging.

* Security: Stop using peer addresses to choose inventory routing order

* Minor doc and code cleanups

* Stop re-using a drained HashSet

* Replace used `_cancel` with `cancel`

* Reword a comment

* Replace cloned with copied
2021-11-24 10:31:42 +10:00
teor b39f4ca5aa
Shut down channels and tasks on PeerSet Drop (#3078)
* Shut down channels and tasks on PeerSet Drop

* Document all the PeerSet fields

* Close the peer set background task handle on shutdown

* Receive background tasks during shutdown

Also, split receiving and polling background tasks into separate methods.
2021-11-22 22:29:34 -03:00
teor 3fc049e2eb
Implement graceful shutdown for the peer set (#3071)
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-11-18 13:28:25 +00:00
teor f26a60b801
Limit the number of inbound peer connections (#2961)
* Limit open inbound connections based on the config

* Log inbound connection errors at debug level

* Test inbound connection limits

* Use clone directly in function call argument lists

* Remove an outdated comment

* Update tests to use an unbounded channel rather than mem::forget

And rename some variables.

* Use a lower limit in a slow test and require that it is exceeded
2021-10-28 01:49:31 +00:00
teor 424edfa4d9
Improve documentation and types in the PeerSet (#2925)
* Replace some unit tuples with named unit structs

This helps distinguish generic channels and make them type-safe.

Also tidy imports and documentation in `peer_set::set`.

* Link to the tower balance crate from docs

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-10-22 01:26:04 +00:00
teor c608260256
Support witnessed transaction IDs in zebra-network requests and responses (#2638)
* Rename internal network requests for wide transaction IDs

fastmod TransactionsByHash TransactionsById zebra*
fastmod AdvertiseTransactions AdvertiseTransactionIds zebra*
fastmod MempoolTransactions MempoolTransactionIds zebra*
fastmod TransactionHashes TransactionIds zebra*

* Update network transaction request/response comments

* Rename a transaction hash method for wide transaction IDs

fastmod transaction_hashes transaction_ids zebra-network

* Add UnminedTxId methods and conversions for InventoryHash

* Map WtxIds to unmined transaction network messages

Also, use UnminedTxId and UnminedTx in:
* Zebra's internal request and response format, and
* external Zcash network protocol messages.

* Enable WtxId mempool inventory tracking for peers

* Further clarify transaction IDs

* Use Witnessed rather than Wide for transaction IDs

And rename narrow to legacy when it only applies to v1-v4 transactions.
Otherwise, rename it to mined ID.

* Rename a missed binding
* Remove an incorrectly named binding

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2021-08-18 22:55:24 +00:00
teor a8a0d6450c Security: stop gossiping temporary inbound remote addresses to peers
- stop putting inbound addresses in the address book
- drop address book entries that can't be used for outbound connections
  - distinguish between temporary inbound and permanent outbound peer
    addresses
  - also create variants to handle proxy connections
    (but don't use them yet)
  - avoid tracking connection state for isolated connections
- document security constraints for the address book and peer set
2021-05-14 23:45:42 +10:00
teor 0203d1475a Refactor and document correctness for std::sync::Mutex<AddressBook> 2021-04-21 17:14:47 -04:00
teor 83b88f5b7a
Merge pull request #1972 from ZcashFoundation/peer-set-demand-deadlock-doc
Document peer set deadlock resistance
2021-04-01 22:50:17 -04:00
teor 306fa88214 Document the correctness of Poll::Pending wakeups 2021-03-27 08:55:49 -04:00
teor 5a30268d7a Log address metrics when the peer set has no ready peers 2021-03-17 10:47:04 +10:00
Jack Grigg e51f33a4b9 Use interoperable names for common metrics
These names match the equivalent metrics in zcashd, enabling common
metrics to be collected across both node types.
2021-03-17 09:38:07 +10:00
teor 86169f6412
Update PeerSet metrics after every change (#1727) 2021-02-18 07:06:59 +10:00
Jane Lusby 15698245e1
Deduplicate metrics dependencies (#1561)
## Motivation

This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.

## Solution

This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.

## Related Issues

closes https://github.com/ZcashFoundation/zebra/issues/1349

## Follow Up Work

I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.

- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
2021-01-12 12:28:56 +10:00
teor 34518525a5 Improve peer set logging hints
Delete hints about configuring peers.
Delete hint for typical "no ready peers" behaviour.
2020-12-01 21:37:15 -08:00
teor 4d5ea4897c Log peer set ready and unready peers
* warn: if there are no peers at all
* info: if there are no ready peers
* trace: the number of ready and unready peers for every request

Log at most one warn or info log per minute, to avoid flooding the
terminal with log lines. Suppress warn and info logs for the first
minute, while the peer set is starting up.
2020-12-01 11:00:21 -05:00
teor 8d6ac8eece Placate clippy 2020-11-24 20:03:21 +10:00
Henry de Valence d90e709ce1 network: tidy peer set implementation
- rename functions more descriptively
- create a common `take_ready_service` function
- organize poll_ functions separately
2020-11-24 20:03:21 +10:00
Henry de Valence f36a4800b2 network: fix invariant violation in peer set
Closes #1183.

The peer set maintains a preselected ready service that it can use to
perform power-of-two-choices (p2c) routing of requests.  Ready services
are stored by key (socket address) in an `IndexMap`, and the preselected
service is represented by an `Option<usize>` indexing that map.  This
means that whenever the set of ready services changes (e.g., a service
is removed from the peer set, or a service is taken to be used to
process a request), the preselected index is invalidated.  The original
P2C-only implementation maintained this invariant but did not document
it.

The change to inventory-based routing introduced a bug by failing to
maintain this invariant and appropriately invalidate the preselected
index.  However, this was only noticeable approximately 1/N of the time
on the next request after an inventory-directed request, so the bug
occurred infrequently.  Luckily, the use of `.expect` caused the bug to
be an immediate panic, making it possible to identify by inspecting all
uses of the ready service map.
2020-11-24 20:03:21 +10:00
Henry de Valence 6dd7318d3b deps: use Tower 0.4 from git instead of 0.3.1.
This addresses at least three pain points:

- we were affected by bugs that were already fixed in git, but not in
  the released crate;
- we can use service combinators to transform requests and responses;
- we can use the hedge middleware.

The version in git is still marked as 0.3.1 but these changes will be
part of tower 0.4: https://github.com/tower-rs/tower/issues/431
2020-09-21 14:16:56 -07:00
Henry de Valence 1d3892e1dc network: rename alias to BoxError
This is shorter and consistent with Tower (which is why we use it in the
first place).
2020-09-18 18:34:25 -07:00
Henry de Valence 3f150eb16e
network: implement transaction request handling. (#1016)
This commit makes several related changes to the network code:

- adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and
  `Transactions(Vec<Arc<Transaction>>)` response pair that allows
  fetching transactions from a remote peer;

- adds a `PushTransaction(Arc<Transaction>)` request that pushes an
  unsolicited transaction to a remote peer;

- adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request
  that advertises transactions by hash to a remote peer;

- adds an `AdvertiseBlock(block::Hash)` request that advertises a block
  by hash to a remote peer;

Then, it modifies the connection state machine so that outbound
requests to remote peers are handled properly:

- `TransactionsByHash` generates a `getdata` message and collects the
  results, like the existing `BlocksByHash` request.

- `PushTransaction` generates a `tx` message, and returns `Nil` immediately.

- `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv`
  message, and return `Nil` immediately.

Next, it modifies the connection state machine so that messages
from remote peers generate requests to the inbound service:

- `getdata` messages generate `BlocksByHash` or `TransactionsByHash`
  requests, depending on the content of the message;

- `tx` messages generate `PushTransaction` requests;

- `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions`
  requests.

Finally, it refactors the request routing logic for the peer set to
handle advertisement messages, providing three routing methods:

- `route_p2c`, which uses p2c as normal (default);
- `route_inv`, which uses the inventory registry and falls back to p2c
  (used for `BlocksByHash` or `TransactionsByHash`);
- `route_all`, which broadcasts a request to all ready peers (used for
  `AdvertiseBlock` and `AdvertiseTransactions`).
2020-09-08 10:16:29 -07:00
Jane Lusby 96c8809348
Implement Inventory Tracking RFC (#963)
* Add .cargo to the gitignore file

* Implement Inventory Tracking RFC

* checkpoint

* wire together the inventory registry

* add comment documenting condition

* make inventory registry optional
2020-09-01 14:28:54 -07:00
Jane Lusby 685bdaf2df don't require absense of cancel handles
Prior to this change, we required that services that are canceled do not
have a cancel handle in the `cancel_handles` list, based on the
assumption that the handle must have been removed in the process of
canceling this service.

This doesn't holding up though, because it is currently possible for us
to have the same peer connect to us multiple times, the second connect
removes the cancel handle of the original connect and inserts it's own
cancel handle in its place. In this scenario, when the first service is
polled for readiness it will see that it has been canceled and go to
clean itself up, but when it asserts that it doesn't have a cancel
handle it will see the cancel handle of the second connect event, which
uses the same key as the first connect, and fail its debug assertion.

This change removes that debug assert on the assumption that it is okay
for a peer to connect multiple times consecutively, and that the correct
behavior in that case is to just cancel the first connection and
continue as normal.
2020-06-16 13:42:31 -07:00
Jane Lusby 431f194c0f
propagate errors out of zebra_network::init (#435)
Prior to this change, the service returned by `zebra_network::init` would spawn background tasks that could silently fail, causing unexpected errors in the zebra_network service.

This change modifies the `PeerSet` that backs `zebra_network::init` to store all of the `JoinHandle`s for each background task it depends on. The `PeerSet` then checks this set of futures to see if any of them have exited with an error or a panic, and if they have it returns the error as part of `poll_ready`.
2020-06-09 12:24:28 -07:00
Jane Lusby 8c178c3ee4
fix panic in seed subcommand (#401)
Co-authored-by: Jane Lusby <jane@zfnd.org>

Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs. 

First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections.

The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.
2020-05-27 17:40:12 -07:00
Henry de Valence 3ed75cb626 Tweak peer set metrics.
- Add a total peers metric to prevent races between measurements of
  ready/unready peers (which can cause the sum to be wrong).
- Add an outbound request counter.
2020-02-21 06:48:25 -05:00
Henry de Valence 75d3d44fb3 Metrics MVP: add two metrics and export them to Prometheus.
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2020-02-14 20:14:05 -05:00
Henry de Valence 8d58dd804f Note that tracing causes clippy false positives
Thanks @hawkw for pointing this out.
2020-02-05 12:42:32 -08:00
Henry de Valence f04f4f0b98 Apply clippy fixes 2020-02-05 12:42:32 -08:00