zebra

Commit Graph

Author	SHA1	Message	Date
teor	6814525a7a	Update async correctness docs and the async in Zebra RFC (#3243 ) * Justify that the ErrorSlot Mutex is deadlock-safe * Document cancellation safety in the async RFC * Document task starvation in the async RFC Co-authored-by: Marek <mail@marek.onl>	2021-12-21 07:10:15 +00:00
teor	d0e6de8040	Avoid deadlocks in the address book mutex (#3244 ) * Tweak crawler timings so peers are more likely to be available * Tweak min peer connection interval so we try all peers * Let other tasks run between fanouts, so we're more likely to choose different peers * Let other tasks run between retries, so we're more likely to choose different peers * Let other tasks run after peer crawler DemandDrop This makes it more likely that peers will become ready. * Spawn the address book updater on a blocking thread * Spawn CandidateSet address book operations on blocking threads * Replace the PeerSet address book with a metrics watch channel * Fix comment * Await spawned address book tasks * Run the address book update tasks concurrently (except for the mutex) * Explain an internal-only method better * Fix a typo Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-12-20 00:44:43 +00:00
teor	f176bb59a2	Stop ignoring some connection errors that could make the peer set hang (#3200 ) * Drop peer services if their cancel handles are dropped * Exit the client task if the heartbeat task exits * Allow multiple errors on a connection without panicking * Explain why we don't need to send an error when the request is cancelled * Document connection fields * Make sure connections don't hang due to spurious timer or channel usage * Actually shut down the client when the heartbeat task exits * Add tests for unready services * Close all senders to peer when `Client` is dropped * Return a Client error if the error slot has an error * Add tests for peer Client service errors * Make Client drop and error cleanups consistent * Use a ClientDropped error when the Client struct is dropped * Test channel and error state in peer Client tests * Move all Connection cleanup into a single method * Add tests for Connection * fix typo in comment Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-12-15 14:52:44 +00:00
Janito Vaqueiro Ferreira Filho	7bc2f0ac27	Describe `PeerSet`s behavior at a network upgrade (#3181 ) ZIP-201 describes how a Zcash node should behave when it reaches a network upgrade activation height. Zebra doesn't implement all the details specified there, so we need to document what it does implement and what it doesn't and why.	2021-12-13 11:17:20 +01:00
Janito Vaqueiro Ferreira Filho	0ad89f2f41	Disconnect from outdated peers on network upgrade (#3108 ) * Replace usage of `discover::Change` with a tuple Remove the assumption that a `Remove` variant would never be created with type changes that allow the compiler to guarantee that assumption. * Add a `version` field to the `Client` type Keep track of the peer's reported protocol version. * Create `LoadTrackedClient` type A `peer::Client` type wrapper that implements `Load`. This helps with the creation of a client service that has extra peer information to be accessed without having to send requests. * Use `LoadTrackedClient` in `initialize` Ensure that `PeerSet` receives `LoadTrackedClient`s so that it will be able to query the peer's protocol version later on. * Require `LoadTrackedClient` in `PeerSet` Replace the generic type with a concrete `LoadTrackedClient` so that we can query its version. * Create `MinimumPeerVersion` helper type A type to track the current minimum protocol version for connected peers based on the current block height. * Use `MinimumPeerVersion` in handshakes Keep the code to obtain the current minimum peer protocol version in a central place. * Add a `MinimumPeerVersion` instance to `PeerSet` Prepare it to be able to disconnect from outdated peers based on the current minimum supported peer protocol version. * Disconnect from ready services for outdated peers When the minimum peer protocol version is detected to have changed (because of a network upgrade), remove all ready services of peers that became outdated. * Cancel added unready services of outdated peers Only add an unready service if it's for a peer that has a supported protocol version. Otherwise, add it but drop the cancel handle so that the `UnreadyService` can execute and detect that it was cancelled. * Avoid adding ready services for outdated peers If a service becomes ready but it's for a connection to an outdated peer, drop it. * Improve comment inside `crawl_and_dial` Describe an edge case that is also handled but was not explicit. Co-authored-by: teor <teor@riseup.net> * Test if calculated minimum peer version is correct Given an arbitrary best chain tip height, check that the calculated minimum peer protocol version is the expected value. * Test if minimum version changes with chain tip Apply an arbitrary list of chain tip height updates and check that for each update the minimum peer version is calculated correctly. * Test minimum peer version changed reports Simulate a series of best chain tip height updates, and check for minimum peer version updates at least once between them. Changes should only be reported once. * Create a `MockedClientHandle` helper type Used to create and then track a mock `Client` instance. * Add `MinimumPeerVersion::with_mock_chain_tip` An extension method useful for tests, that contains some shared boilerplate code. * Bias arbitrary `Version`s to be in valid range Give a 50% chance for an arbitrary `Version` to be in the range of previously used values the Zcash network. * Create a `PeerVersions` helper type Helps with the creation of mocked client services with arbitrary protocol versions. * Create a `PeerSetGuard` helper type An auxiliary type to a `PeerSet` instance created for testing. It keeps track of any dummy endpoints of channels created and passed to the `PeerSet` instance. * Create a `PeerSetBuilder` helper type Helps to reduce the code when preparing a `PeerSet` test instance. * Test if outdated peers are rejected by `PeerSet` Simulate a set of discovered peers being sent to the `PeerSet`. Ensure that only up-to-date peers are kept by the `PeerSet` and that outdated peers are dropped. * Create `BlockHeightPairAcrossNetworkUpgrades` type A helper type that allows the creation of arbitrary block height pairs, where one value is before and the other is at or after the activation height of an arbitrary network upgrade. * Test if peers are dropped as they become outdated Simulate a network upgrade, and check that peers that become outdated are dropped by the `PeerSet`. * Remove dbg! macros Co-authored-by: teor <teor@riseup.net>	2021-12-09 02:54:29 +00:00
teor	4d608d3224	Stop doing thousands of time checks each time we connect to a peer (#3106 ) * Stop checking the entire AddressBook for each connection attempt * Stop redundant peer time checks within the address book * Stop calling `Instant::now` 3 times for each address book update * Only get the time once each time an address book method is called * Update outdated comment * Use an OrderedMap to efficiently store address book peers * Add address book order tests	2021-12-03 15:09:43 -03:00
teor	c85ea18b43	Fix slow Zebra startup times, to reduce CI failures (#3104 ) * Tweak a log message * Only retry failed DNS once, then use the other DNS responses * Limit broadcasts to half the peers * Use a longer minimum interval for GetAddr requests * Reduce the syncer and mempool crawler fanouts * Stop resetting the mempool twice when it starts up This spawns two crawlers, which send two fanouts, so it can use up a lot of peers. Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-11-30 21:04:32 +00:00
teor	f6abb15778	Security: Stop routing inventory requests by peer address (#3090 ) * Rewrite PeerSet comments to split long sentences * Replace peer set integer indexes with address-based indexes Also improve documentation and logging. * Security: Stop using peer addresses to choose inventory routing order * Minor doc and code cleanups * Stop re-using a drained HashSet * Replace used `_cancel` with `cancel` * Reword a comment * Replace cloned with copied	2021-11-24 10:31:42 +10:00
teor	b39f4ca5aa	Shut down channels and tasks on PeerSet Drop (#3078 ) * Shut down channels and tasks on PeerSet Drop * Document all the PeerSet fields * Close the peer set background task handle on shutdown * Receive background tasks during shutdown Also, split receiving and polling background tasks into separate methods.	2021-11-22 22:29:34 -03:00
teor	3fc049e2eb	Implement graceful shutdown for the peer set (#3071 ) Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-11-18 13:28:25 +00:00
teor	f26a60b801	Limit the number of inbound peer connections (#2961 ) * Limit open inbound connections based on the config * Log inbound connection errors at debug level * Test inbound connection limits * Use clone directly in function call argument lists * Remove an outdated comment * Update tests to use an unbounded channel rather than mem::forget And rename some variables. * Use a lower limit in a slow test and require that it is exceeded	2021-10-28 01:49:31 +00:00
teor	424edfa4d9	Improve documentation and types in the PeerSet (#2925 ) * Replace some unit tuples with named unit structs This helps distinguish generic channels and make them type-safe. Also tidy imports and documentation in `peer_set::set`. * Link to the tower balance crate from docs Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-10-22 01:26:04 +00:00
teor	c608260256	Support witnessed transaction IDs in zebra-network requests and responses (#2638 ) * Rename internal network requests for wide transaction IDs fastmod TransactionsByHash TransactionsById zebra* fastmod AdvertiseTransactions AdvertiseTransactionIds zebra* fastmod MempoolTransactions MempoolTransactionIds zebra* fastmod TransactionHashes TransactionIds zebra* * Update network transaction request/response comments * Rename a transaction hash method for wide transaction IDs fastmod transaction_hashes transaction_ids zebra-network * Add UnminedTxId methods and conversions for InventoryHash * Map WtxIds to unmined transaction network messages Also, use UnminedTxId and UnminedTx in: * Zebra's internal request and response format, and * external Zcash network protocol messages. * Enable WtxId mempool inventory tracking for peers * Further clarify transaction IDs * Use Witnessed rather than Wide for transaction IDs And rename narrow to legacy when it only applies to v1-v4 transactions. Otherwise, rename it to mined ID. * Rename a missed binding * Remove an incorrectly named binding Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-08-18 22:55:24 +00:00
teor	a8a0d6450c	Security: stop gossiping temporary inbound remote addresses to peers - stop putting inbound addresses in the address book - drop address book entries that can't be used for outbound connections - distinguish between temporary inbound and permanent outbound peer addresses - also create variants to handle proxy connections (but don't use them yet) - avoid tracking connection state for isolated connections - document security constraints for the address book and peer set	2021-05-14 23:45:42 +10:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	83b88f5b7a	Merge pull request #1972 from ZcashFoundation/peer-set-demand-deadlock-doc Document peer set deadlock resistance	2021-04-01 22:50:17 -04:00
teor	306fa88214	Document the correctness of Poll::Pending wakeups	2021-03-27 08:55:49 -04:00
teor	5a30268d7a	Log address metrics when the peer set has no ready peers	2021-03-17 10:47:04 +10:00
Jack Grigg	e51f33a4b9	Use interoperable names for common metrics These names match the equivalent metrics in zcashd, enabling common metrics to be collected across both node types.	2021-03-17 09:38:07 +10:00
teor	86169f6412	Update PeerSet metrics after every change (#1727 )	2021-02-18 07:06:59 +10:00
Jane Lusby	15698245e1	Deduplicate metrics dependencies (#1561 ) ## Motivation This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`. ## Solution This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up. ## Related Issues closes https://github.com/ZcashFoundation/zebra/issues/1349 ## Follow Up Work I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible. - https://github.com/ZcashFoundation/zebra/issues/1582 Co-authored-by: teor <teor@riseup.net>	2021-01-12 12:28:56 +10:00
teor	34518525a5	Improve peer set logging hints Delete hints about configuring peers. Delete hint for typical "no ready peers" behaviour.	2020-12-01 21:37:15 -08:00
teor	4d5ea4897c	Log peer set ready and unready peers * warn: if there are no peers at all * info: if there are no ready peers * trace: the number of ready and unready peers for every request Log at most one warn or info log per minute, to avoid flooding the terminal with log lines. Suppress warn and info logs for the first minute, while the peer set is starting up.	2020-12-01 11:00:21 -05:00
teor	8d6ac8eece	Placate clippy	2020-11-24 20:03:21 +10:00
Henry de Valence	d90e709ce1	network: tidy peer set implementation - rename functions more descriptively - create a common `take_ready_service` function - organize poll_ functions separately	2020-11-24 20:03:21 +10:00
Henry de Valence	f36a4800b2	network: fix invariant violation in peer set Closes #1183. The peer set maintains a preselected ready service that it can use to perform power-of-two-choices (p2c) routing of requests. Ready services are stored by key (socket address) in an `IndexMap`, and the preselected service is represented by an `Option<usize>` indexing that map. This means that whenever the set of ready services changes (e.g., a service is removed from the peer set, or a service is taken to be used to process a request), the preselected index is invalidated. The original P2C-only implementation maintained this invariant but did not document it. The change to inventory-based routing introduced a bug by failing to maintain this invariant and appropriately invalidate the preselected index. However, this was only noticeable approximately 1/N of the time on the next request after an inventory-directed request, so the bug occurred infrequently. Luckily, the use of `.expect` caused the bug to be an immediate panic, making it possible to identify by inspecting all uses of the ready service map.	2020-11-24 20:03:21 +10:00
Henry de Valence	6dd7318d3b	deps: use Tower 0.4 from git instead of 0.3.1. This addresses at least three pain points: - we were affected by bugs that were already fixed in git, but not in the released crate; - we can use service combinators to transform requests and responses; - we can use the hedge middleware. The version in git is still marked as 0.3.1 but these changes will be part of tower 0.4: https://github.com/tower-rs/tower/issues/431	2020-09-21 14:16:56 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
Henry de Valence	3f150eb16e	network: implement transaction request handling. (#1016 ) This commit makes several related changes to the network code: - adds a `TransactionsByHash(HashSet<transaction::Hash>)` request and `Transactions(Vec<Arc<Transaction>>)` response pair that allows fetching transactions from a remote peer; - adds a `PushTransaction(Arc<Transaction>)` request that pushes an unsolicited transaction to a remote peer; - adds an `AdvertiseTransactions(HashSet<transaction::Hash>)` request that advertises transactions by hash to a remote peer; - adds an `AdvertiseBlock(block::Hash)` request that advertises a block by hash to a remote peer; Then, it modifies the connection state machine so that outbound requests to remote peers are handled properly: - `TransactionsByHash` generates a `getdata` message and collects the results, like the existing `BlocksByHash` request. - `PushTransaction` generates a `tx` message, and returns `Nil` immediately. - `AdvertiseTransactions` and `AdvertiseBlock` generate an `inv` message, and return `Nil` immediately. Next, it modifies the connection state machine so that messages from remote peers generate requests to the inbound service: - `getdata` messages generate `BlocksByHash` or `TransactionsByHash` requests, depending on the content of the message; - `tx` messages generate `PushTransaction` requests; - `inv` messages generate `AdvertiseBlock` or `AdvertiseTransactions` requests. Finally, it refactors the request routing logic for the peer set to handle advertisement messages, providing three routing methods: - `route_p2c`, which uses p2c as normal (default); - `route_inv`, which uses the inventory registry and falls back to p2c (used for `BlocksByHash` or `TransactionsByHash`); - `route_all`, which broadcasts a request to all ready peers (used for `AdvertiseBlock` and `AdvertiseTransactions`).	2020-09-08 10:16:29 -07:00
Jane Lusby	96c8809348	Implement Inventory Tracking RFC (#963 ) * Add .cargo to the gitignore file * Implement Inventory Tracking RFC * checkpoint * wire together the inventory registry * add comment documenting condition * make inventory registry optional	2020-09-01 14:28:54 -07:00
Jane Lusby	685bdaf2df	don't require absense of cancel handles Prior to this change, we required that services that are canceled do not have a cancel handle in the `cancel_handles` list, based on the assumption that the handle must have been removed in the process of canceling this service. This doesn't holding up though, because it is currently possible for us to have the same peer connect to us multiple times, the second connect removes the cancel handle of the original connect and inserts it's own cancel handle in its place. In this scenario, when the first service is polled for readiness it will see that it has been canceled and go to clean itself up, but when it asserts that it doesn't have a cancel handle it will see the cancel handle of the second connect event, which uses the same key as the first connect, and fail its debug assertion. This change removes that debug assert on the assumption that it is okay for a peer to connect multiple times consecutively, and that the correct behavior in that case is to just cancel the first connection and continue as normal.	2020-06-16 13:42:31 -07:00
Jane Lusby	431f194c0f	propagate errors out of zebra_network::init (#435 ) Prior to this change, the service returned by `zebra_network::init` would spawn background tasks that could silently fail, causing unexpected errors in the zebra_network service. This change modifies the `PeerSet` that backs `zebra_network::init` to store all of the `JoinHandle`s for each background task it depends on. The `PeerSet` then checks this set of futures to see if any of them have exited with an error or a panic, and if they have it returns the error as part of `poll_ready`.	2020-06-09 12:24:28 -07:00
Jane Lusby	8c178c3ee4	fix panic in seed subcommand (#401 ) Co-authored-by: Jane Lusby <jane@zfnd.org> Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs. First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections. The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.	2020-05-27 17:40:12 -07:00
Henry de Valence	3ed75cb626	Tweak peer set metrics. - Add a total peers metric to prevent races between measurements of ready/unready peers (which can cause the sum to be wrong). - Add an outbound request counter.	2020-02-21 06:48:25 -05:00
Henry de Valence	75d3d44fb3	Metrics MVP: add two metrics and export them to Prometheus. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>	2020-02-14 20:14:05 -05:00
Henry de Valence	8d58dd804f	Note that tracing causes clippy false positives Thanks @hawkw for pointing this out.	2020-02-05 12:42:32 -08:00
Henry de Valence	f04f4f0b98	Apply clippy fixes	2020-02-05 12:42:32 -08:00
Henry de Valence	2965187b91	Upgrade tokio, futures, hyper to released versions.	2019-12-13 17:42:15 -05:00
Henry de Valence	c3ec235a5b	Suppress unused import warnings.	2019-10-22 19:06:08 -07:00
Henry de Valence	ed2ee9d42f	Add a PeerConnector wrapper around PeerHandshake	2019-10-22 19:06:08 -07:00
Henry de Valence	b1832ce593	Initial work to add a crawl-and-dial task. This responds to peerset demand by connecting to additional peers. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>	2019-10-22 19:06:08 -07:00
Henry de Valence	5847b490da	Move PeerSet setup logic into a peer_set::init()	2019-10-18 16:11:01 -07:00
Henry de Valence	ae1a164ff8	Beginning of peerset implementation. (#62 ) * Don't expose submodules of zebra_network::peer. * PeerSet, PeerDiscover stubs. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org> * Initial work on PeerSet. This is adapted from the MIT-licensed tower-balance implementation. * Use PeerSet in the connect stub.	2019-10-10 18:15:24 -07:00

43 Commits