zebra/zebra-network/src/constants.rs

434 lines
17 KiB
Rust
Raw Normal View History

//! Definitions of Zebra network constants, including:
//! - network protocol versions,
//! - network protocol user agents,
//! - peer address limits,
//! - peer connection limits, and
//! - peer connection timeouts.
use std::{collections::HashMap, time::Duration};
use lazy_static::lazy_static;
use regex::Regex;
// XXX should these constants be split into protocol also?
use crate::protocol::external::types::*;
use zebra_chain::{
parameters::{
Network::{self, *},
NetworkUpgrade::*,
},
serialization::Duration32,
};
/// A multiplier used to calculate the inbound connection limit for the peer set,
///
/// When it starts up, Zebra opens [`Config.peerset_initial_target_size`]
/// outbound connections.
///
/// Then it opens additional outbound connections as needed for network requests,
/// and accepts inbound connections initiated by other peers.
///
/// The inbound and outbound connection limits are calculated from:
///
/// The inbound limit is:
/// `Config.peerset_initial_target_size * INBOUND_PEER_LIMIT_MULTIPLIER`.
/// (This is similar to `zcashd`'s default inbound limit.)
///
/// The outbound limit is:
/// `Config.peerset_initial_target_size * OUTBOUND_PEER_LIMIT_MULTIPLIER`.
/// (This is a bit larger than `zcashd`'s default outbound limit.)
///
/// # Security
///
/// Each connection requires one inbound slot and one outbound slot, on two different peers.
/// But some peers only make outbound connections, because they are behind a firewall,
/// or their lister port address is misconfigured.
///
/// Zebra allows extra inbound connection slots,
/// to prevent accidental connection slot exhaustion.
/// (`zcashd` also allows a large number of extra inbound slots.)
///
/// ## Security Tradeoff
///
/// Since the inbound peer limit is higher than the outbound peer limit,
/// Zebra can be connected to a majority of peers
/// that it has *not* chosen from its [`crate::AddressBook`].
///
/// Inbound peer connections are initiated by the remote peer,
/// so inbound peer selection is not controlled by the local node.
/// This means that an attacker can easily become a majority of a node's peers.
///
/// However, connection exhaustion is a higher priority.
pub const INBOUND_PEER_LIMIT_MULTIPLIER: usize = 5;
/// A multiplier used to calculate the outbound connection limit for the peer set,
///
/// See [`INBOUND_PEER_LIMIT_MULTIPLIER`] for details.
pub const OUTBOUND_PEER_LIMIT_MULTIPLIER: usize = 3;
/// The buffer size for the peer set.
///
/// This should be greater than 1 to avoid sender contention, but also reasonably
/// small, to avoid queueing too many in-flight block downloads. (A large queue
/// of in-flight block downloads can choke a constrained local network
/// connection, or a small peer set on testnet.)
///
/// We assume that Zebra nodes have at least 10 Mbps bandwidth. Therefore, a
/// maximum-sized block can take up to 2 seconds to download. So the peer set
/// buffer adds up to 6 seconds worth of blocks to the queue.
pub const PEERSET_BUFFER_SIZE: usize = 3;
/// The timeout for sending a message to a remote peer,
/// and receiving a response from a remote peer.
pub const REQUEST_TIMEOUT: Duration = Duration::from_secs(20);
/// The timeout for handshakes when connecting to new peers.
///
/// This timeout should remain small, because it helps stop slow peers getting
/// into the peer set. This is particularly important for network-constrained
/// nodes, and on testnet.
pub const HANDSHAKE_TIMEOUT: Duration = Duration::from_secs(3);
/// We expect to receive a message from a live peer at least once in this time duration.
2019-10-21 12:16:28 -07:00
///
/// This is the sum of:
/// - the interval between connection heartbeats
/// - the timeout of a possible pending (already-sent) request
/// - the timeout for a possible queued request
/// - the timeout for the heartbeat request itself
///
/// This avoids explicit synchronization, but relies on the peer
/// connector actually setting up channels and these heartbeats in a
/// specific manner that matches up with this math.
pub const MIN_PEER_RECONNECTION_DELAY: Duration = Duration::from_secs(59 + 20 + 20 + 20);
4. Avoid repeated requests to peers after partial responses or errors (#3505) * fix(network): split synthetic NotFoundRegistry from message NotFoundResponse * docs(network): Improve `notfound` message documentation * refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender ``` fastmod MustUseOneshotSender MustUseClientResponseSender zebra* ``` * docs(network): fix a comment typo * refactor(network): remove generics from MustUseClientResponseSender * refactor(network): add an inventory collector to Client, but don't use it yet * feat(network): register missing peer responses as missing inventory We register this missing inventory based on peer responses, or connection errors or timeouts. Inbound message inventory tracking requires peers to send `notfound` messages. But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages. This missing inventory tracking works regardless of peer `notfound` messages. * refactor(network): rename ResponseStatus to InventoryResponse ```sh fastmod ResponseStatus InventoryResponse zebra* ``` * refactor(network): rename InventoryStatus::inner() to to_inner() * fix(network): remove a redundant runtime.enter() in a test * doc(network): the exact time used to filter outbound peers doesn't matter * fix(network): handle block requests slightly more efficiently * doc(network): fix a typo * fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse * doc(test): clarify some test comments * test(network): test synthetic notfound from connection errors and peer inventory routing * test(network): improve inbound test diagnostics * feat(network): add a proptest-impl feature to zebra-network * feat(network): add a test-only connect_isolated_with_inbound function * test(network): allow a response on the isolated peer test connection * test(network): fix failures in test synthetic notfound * test(network): Simplify SharedPeerError test assertions * test(network): test synthetic notfound from partially successful requests * test(network): MissingInventoryCollector ignores local NotFoundRegistry errors * fix(network): decrease the inventory rotation interval This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block. Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit. This should speed up syncing near the tip, and on testnet. * fmt(network): cargo fmt --all * cleanup(network): remove unnecessary allow(dead_code) * cleanup(network): stop importing the whole sync module into tests * doc(network): clarify syncer inventory retry constraint * doc(network): add a TODO for a fix to ensure API behaviour remains consistent * doc(network): fix a function doc typo * doc(network): clarify how we handle peers that don't send `notfound` * docs(network): clarify a test comment Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-14 17:44:33 -08:00
/// Zebra rotates its peer inventory registry every time this interval elapses.
///
/// After 2 of these intervals, Zebra's local available and missing inventory entries expire.
pub const INVENTORY_ROTATION_INTERVAL: Duration = Duration::from_secs(53);
/// The default peer address crawler interval.
///
/// This should be at least [`HANDSHAKE_TIMEOUT`] lower than all other crawler
/// intervals.
///
/// This makes the following sequence of events more likely:
/// 1. a peer address crawl,
/// 2. new peer connections,
/// 3. peer requests from other crawlers.
///
/// Using a prime number makes sure that peer address crawls
/// don't synchronise with other crawls.
pub const DEFAULT_CRAWL_NEW_PEER_INTERVAL: Duration = Duration::from_secs(61);
Security: Zebra should stop gossiping unreachable addresses to other nodes, Action: re-deploy all nodes (#2392) * Rename some methods and constants for clarity Using the following commands: ``` fastmod '\bis_ready_for_attempt\b' is_ready_for_connection_attempt # One instance required a tweak, because of the ASCII diagram. fastmod '\bwas_recently_live\b' has_connection_recently_responded fastmod '\bwas_recently_attempted\b' was_connection_recently_attempted fastmod '\bwas_recently_failed\b' has_connection_recently_failed fastmod '\bLIVE_PEER_DURATION\b' MIN_PEER_RECONNECTION_DELAY ``` * Use `Instant::elapsed` for conciseness Instead of `Instant::now().saturating_duration_since`. They're both equivalent, and `elapsed` only panics if the `Instant` is somehow synthetically generated. * Allow `Duration32` to be created in other crates Export the `Duration32` from the `zebra_chain::serialization` module. * Add some new `Duration32` constructors Create some helper `const` constructors to make it easy to create constant durations. Add methods to create a `Duration32` from seconds, minutes and hours. * Avoid gossiping unreachable peers When sanitizing the list of peers to gossip, remove those that we haven't seen in more than three hours. * Test if unreachable addresses aren't gossiped Create a property test with random addreses inserted into an `AddressBook`, and verify that the sanitized list of addresses does not contain any addresses considered unreachable. * Test if new alternate address isn't gossipable Create a new alternate peer, because that type of `MetaAddr` does not have `last_response` or `untrusted_last_seen` times. Verify that the peer is not considered gossipable. * Test if local listener is gossipable The `MetaAddr` representing the local peer's listening address should always be considered gossipable. * Test if gossiped peer recently seen is gossipable Create a `MetaAddr` representing a gossiped peer that was reported to be seen recently. Check that the peer is considered gossipable. * Test peer reportedly last seen in the future Create a `MetaAddr` representing a peer gossiped and reported to have been last seen in a time that's in the future. Check that the peer is considered gossipable, to check that the fallback calculation is working as intended. * Test gossiped peer reportedly seen long ago Create a `MetaAddr` representing a gossiped peer that was reported to last have been seen a long time ago. Check that the peer is not considered gossipable. * Test if just responded peer is gossipable Create a `MetaAddr` representing a peer that has just responded and check that it is considered gossipable. * Test if recently responded peer is gossipable Create a `MetaAddr` representing a peer that last responded within the duration a peer is considered reachable. Verify that the peer is considered gossipable. * Test peer that responded long ago isn't gossipable Create a `MetaAddr` representing a peer that last responded outside the duration a peer is considered reachable. Verify that the peer is not considered gossipable.
2021-06-28 22:12:27 -07:00
/// The maximum duration since a peer was last seen to consider it reachable.
///
/// This is used to prevent Zebra from gossiping addresses that are likely unreachable. Peers that
/// have last been seen more than this duration ago will not be gossiped.
///
/// This is determined as a tradeoff between network health and network view leakage. From the
/// [Bitcoin protocol documentation](https://en.bitcoin.it/wiki/Protocol_documentation#getaddr):
///
/// "The typical presumption is that a node is likely to be active if it has been sending a message
/// within the last three hours."
pub const MAX_PEER_ACTIVE_FOR_GOSSIP: Duration32 = Duration32::from_hours(3);
Security: Avoid reconnecting to peers that are likely unreachable (#3030) * Add a `Duration32::from_days` constructor Make it simpler to construct a `Duration32` representing a certain number of days. * Add `MetaAddr::was_not_recently_seen` method A helper method to check if a peer was never seen before or if it was last seen a long time ago. This will be one of the conditions to consider a peer as unreachable. * Add `MetaAddr::is_probably_unreachable` method A helper method to check if a peer should be considered unreachable. It is considered unreachable if recent connection attempts have failed and it was not recently seen. If a peer is considered unreachable, Zebra shouldn't attempt to connect to it again. * Do not keep trying to connect to unreachable peer A peer is probably unreachable if it was last seen a long time ago and if it's last connection attempt failed. * Test `was_not_recently_seen` Redo the calculation on arbitrary `MetaAddr`s. * Test `is_probably_unreachable` Redo the calculation on arbitrary `MetaAddr`s. * Test if probably unreachable peers are ignored Given an `AddressBook` with a list of arbitrary `MetaAddr`s, check that none of the peers listed for a reconnection is probably unreachable. * Rename unit test to improve clarity Remove the double negative from the name. Co-authored-by: teor <teor@riseup.net> * Rename constant to `MAX_RECENT_PEER_AGE` Make the purpose of the constant clearer. Co-authored-by: teor <teor@riseup.net> * Rename method to `last_seen_is_recent` Remove the double negative from the name. * Rename method to `is_probably_reachable` Avoid having to negate the result of the method in security critical filter. * Move check into `is_ready_for_connection_attempt` Make sure the check is used in any place that requires a peer that's ready for a connection attempt. * Improve test documention Describe the goal of the test better. Co-authored-by: teor <teor@riseup.net> * Improve `is_probably_reachable` documentation List the conditions as bullet points. Co-authored-by: teor <teor@riseup.net> * Document what happens when peers have no last seen time Co-authored-by: teor <teor@riseup.net>
2021-11-10 15:51:22 -08:00
/// The maximum duration since a peer was last seen to consider reconnecting to it.
///
/// Peers that haven't been seen for more than three days and that had its last connection attempt
/// fail are considered to be offline and Zebra will stop trying to connect to them.
///
/// This is to ensure that Zebra can't have a denial-of-service as a consequence of having too many
/// offline peers that it constantly and uselessly retries to connect to.
pub const MAX_RECENT_PEER_AGE: Duration32 = Duration32::from_days(3);
/// Regular interval for sending keepalive `Ping` messages to each
/// connected peer.
///
/// Using a prime number makes sure that heartbeats don't synchronise with crawls.
pub const HEARTBEAT_INTERVAL: Duration = Duration::from_secs(59);
/// The minimum time between successive calls to
/// [`CandidateSet::next`][crate::peer_set::CandidateSet::next].
Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>
2021-06-08 16:42:45 -07:00
///
/// ## Security
///
/// Zebra resists distributed denial of service attacks by making sure that new peer connections
/// are initiated at least [`MIN_PEER_CONNECTION_INTERVAL`] apart.
pub const MIN_PEER_CONNECTION_INTERVAL: Duration = Duration::from_millis(25);
Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>
2021-06-08 16:42:45 -07:00
/// The minimum time between successive calls to
/// [`CandidateSet::update`][crate::peer_set::CandidateSet::update].
Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>
2021-06-08 16:42:45 -07:00
///
/// Using a prime number makes sure that peer address crawls don't synchronise with other crawls.
///
Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>
2021-06-08 16:42:45 -07:00
/// ## Security
///
/// Zebra resists distributed denial of service attacks by making sure that requests for more
/// peer addresses are sent at least [`MIN_PEER_GET_ADDR_INTERVAL`] apart.
pub const MIN_PEER_GET_ADDR_INTERVAL: Duration = Duration::from_secs(31);
/// The combined timeout for all the requests in
/// [`CandidateSet::update`][crate::peer_set::CandidateSet::update].
///
/// `zcashd` doesn't respond to most `getaddr` requests,
/// so this timeout needs to be short.
pub const PEER_GET_ADDR_TIMEOUT: Duration = Duration::from_secs(8);
Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>
2021-06-08 16:42:45 -07:00
Fix a deadlock between the crawler and dialer, and other hangs (#1950) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.
2021-04-07 06:25:10 -07:00
/// The number of GetAddr requests sent when crawling for new peers.
///
/// # Security
Fix a deadlock between the crawler and dialer, and other hangs (#1950) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.
2021-04-07 06:25:10 -07:00
///
/// The fanout should be greater than 2, so that Zebra avoids getting a majority
/// of its initial address book entries from a single peer.
///
/// Zebra regularly crawls for new peers, initiating a new crawl every
/// [`crawl_new_peer_interval`](crate::config::Config.crawl_new_peer_interval).
///
/// TODO: Restore the fanout to 3, once fanouts are limited to the number of ready peers (#2214)
///
/// In #3110, we changed the fanout to 1, to make sure we actually use cached address responses.
/// With a fanout of 3, we were dropping a lot of responses, because the overall crawl timed out.
pub const GET_ADDR_FANOUT: usize = 1;
Fix a deadlock between the crawler and dialer, and other hangs (#1950) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.
2021-04-07 06:25:10 -07:00
/// The maximum number of addresses allowed in an `addr` or `addrv2` message.
///
/// `addr`:
/// > The number of IP address entries up to a maximum of 1,000.
///
/// <https://developer.bitcoin.org/reference/p2p_networking.html#addr>
///
/// `addrv2`:
/// > One message can contain up to 1,000 addresses.
/// > Clients MUST reject messages with more addresses.
///
/// <https://zips.z.cash/zip-0155#specification>
pub const MAX_ADDRS_IN_MESSAGE: usize = 1000;
/// The fraction of addresses Zebra sends in response to a `Peers` request.
///
/// Each response contains approximately:
/// `address_book.len() / ADDR_RESPONSE_LIMIT_DENOMINATOR`
/// addresses, selected at random from the address book.
///
/// # Security
///
/// This limit makes sure that Zebra does not reveal its entire address book
/// in a single `Peers` response.
pub const ADDR_RESPONSE_LIMIT_DENOMINATOR: usize = 3;
/// The maximum number of addresses Zebra will keep in its address book.
///
/// This is a tradeoff between:
/// - revealing the whole address book in a few requests,
/// - sending the maximum number of peer addresses, and
/// - making sure the limit code actually gets run.
pub const MAX_ADDRS_IN_ADDRESS_BOOK: usize =
MAX_ADDRS_IN_MESSAGE * (ADDR_RESPONSE_LIMIT_DENOMINATOR + 1);
/// Truncate timestamps in outbound address messages to this time interval.
///
/// ## SECURITY
///
/// Timestamp truncation prevents a peer from learning exactly when we received
/// messages from each of our peers.
pub const TIMESTAMP_TRUNCATION_SECONDS: u32 = 30 * 60;
/// The User-Agent string provided by the node.
///
/// This must be a valid [BIP 14] user agent.
///
/// [BIP 14]: https://github.com/bitcoin/bips/blob/master/bip-0014.mediawiki
//
// TODO: generate this from crate metadata (#2375)
pub const USER_AGENT: &str = "/Zebra:1.0.0-rc.4/";
/// The Zcash network protocol version implemented by this crate, and advertised
/// during connection setup.
///
/// The current protocol version is checked by our peers. If it is too old,
/// newer peers will disconnect from us.
///
/// The current protocol version typically changes before Mainnet and Testnet
/// network upgrades.
pub const CURRENT_NETWORK_PROTOCOL_VERSION: Version = Version(170_100);
2019-09-13 05:28:38 -07:00
/// The default RTT estimate for peer responses.
///
/// We choose a high value for the default RTT, so that new peers must prove they
/// are fast, before we prefer them to other peers. This is particularly
/// important on testnet, which has a small number of peers, which are often
/// slow.
///
/// Make the default RTT slightly higher than the request timeout.
pub const EWMA_DEFAULT_RTT: Duration = Duration::from_secs(REQUEST_TIMEOUT.as_secs() + 1);
/// The decay time for the EWMA response time metric used for load balancing.
///
/// This should be much larger than the `SYNC_RESTART_TIMEOUT`, so we choose
/// better peers when we restart the sync.
Disconnect from outdated peers on network upgrade (#3108) * Replace usage of `discover::Change` with a tuple Remove the assumption that a `Remove` variant would never be created with type changes that allow the compiler to guarantee that assumption. * Add a `version` field to the `Client` type Keep track of the peer's reported protocol version. * Create `LoadTrackedClient` type A `peer::Client` type wrapper that implements `Load`. This helps with the creation of a client service that has extra peer information to be accessed without having to send requests. * Use `LoadTrackedClient` in `initialize` Ensure that `PeerSet` receives `LoadTrackedClient`s so that it will be able to query the peer's protocol version later on. * Require `LoadTrackedClient` in `PeerSet` Replace the generic type with a concrete `LoadTrackedClient` so that we can query its version. * Create `MinimumPeerVersion` helper type A type to track the current minimum protocol version for connected peers based on the current block height. * Use `MinimumPeerVersion` in handshakes Keep the code to obtain the current minimum peer protocol version in a central place. * Add a `MinimumPeerVersion` instance to `PeerSet` Prepare it to be able to disconnect from outdated peers based on the current minimum supported peer protocol version. * Disconnect from ready services for outdated peers When the minimum peer protocol version is detected to have changed (because of a network upgrade), remove all ready services of peers that became outdated. * Cancel added unready services of outdated peers Only add an unready service if it's for a peer that has a supported protocol version. Otherwise, add it but drop the cancel handle so that the `UnreadyService` can execute and detect that it was cancelled. * Avoid adding ready services for outdated peers If a service becomes ready but it's for a connection to an outdated peer, drop it. * Improve comment inside `crawl_and_dial` Describe an edge case that is also handled but was not explicit. Co-authored-by: teor <teor@riseup.net> * Test if calculated minimum peer version is correct Given an arbitrary best chain tip height, check that the calculated minimum peer protocol version is the expected value. * Test if minimum version changes with chain tip Apply an arbitrary list of chain tip height updates and check that for each update the minimum peer version is calculated correctly. * Test minimum peer version changed reports Simulate a series of best chain tip height updates, and check for minimum peer version updates at least once between them. Changes should only be reported once. * Create a `MockedClientHandle` helper type Used to create and then track a mock `Client` instance. * Add `MinimumPeerVersion::with_mock_chain_tip` An extension method useful for tests, that contains some shared boilerplate code. * Bias arbitrary `Version`s to be in valid range Give a 50% chance for an arbitrary `Version` to be in the range of previously used values the Zcash network. * Create a `PeerVersions` helper type Helps with the creation of mocked client services with arbitrary protocol versions. * Create a `PeerSetGuard` helper type An auxiliary type to a `PeerSet` instance created for testing. It keeps track of any dummy endpoints of channels created and passed to the `PeerSet` instance. * Create a `PeerSetBuilder` helper type Helps to reduce the code when preparing a `PeerSet` test instance. * Test if outdated peers are rejected by `PeerSet` Simulate a set of discovered peers being sent to the `PeerSet`. Ensure that only up-to-date peers are kept by the `PeerSet` and that outdated peers are dropped. * Create `BlockHeightPairAcrossNetworkUpgrades` type A helper type that allows the creation of arbitrary block height pairs, where one value is before and the other is at or after the activation height of an arbitrary network upgrade. * Test if peers are dropped as they become outdated Simulate a network upgrade, and check that peers that become outdated are dropped by the `PeerSet`. * Remove dbg! macros Co-authored-by: teor <teor@riseup.net>
2021-12-08 18:54:29 -08:00
pub const EWMA_DECAY_TIME_NANOS: f64 = 200.0 * NANOS_PER_SECOND;
/// The number of nanoseconds in one second.
const NANOS_PER_SECOND: f64 = 1_000_000_000.0;
lazy_static! {
/// The minimum network protocol version accepted by this crate for each network,
/// represented as a network upgrade.
///
/// The minimum protocol version is used to check the protocol versions of our
/// peers during the initial block download. After the initial block download,
/// we use the current block height to select the minimum network protocol
/// version.
///
/// If peer versions are too old, we will disconnect from them.
///
/// The minimum network protocol version typically changes after Mainnet and
/// Testnet network upgrades.
pub static ref INITIAL_MIN_NETWORK_PROTOCOL_VERSION: HashMap<Network, Version> = {
let mut hash_map = HashMap::new();
hash_map.insert(Mainnet, Version::min_specified_for_upgrade(Mainnet, Nu5));
hash_map.insert(Testnet, Version::min_specified_for_upgrade(Testnet, Nu5));
hash_map
};
/// OS-specific error when the port attempting to be opened is already in use.
pub static ref PORT_IN_USE_ERROR: Regex = if cfg!(unix) {
#[allow(clippy::trivial_regex)]
Regex::new(&regex::escape("already in use"))
} else {
Regex::new("(access a socket in a way forbidden by its access permissions)|(Only one usage of each socket address)")
}.expect("regex is valid");
}
/// The timeout for DNS lookups.
///
/// [6.1.3.3 Efficient Resource Usage] from [RFC 1123: Requirements for Internet Hosts]
/// suggest no less than 5 seconds for resolving timeout.
///
/// [RFC 1123: Requirements for Internet Hosts] <https://tools.ietf.org/rfcmarkup?doc=1123>
/// [6.1.3.3 Efficient Resource Usage] <https://tools.ietf.org/rfcmarkup?doc=1123#page-77>
pub const DNS_LOOKUP_TIMEOUT: Duration = Duration::from_secs(5);
/// Magic numbers used to identify different Zcash networks.
pub mod magics {
use super::*;
/// The production mainnet.
pub const MAINNET: Magic = Magic([0x24, 0xe9, 0x27, 0x64]);
/// The testnet.
pub const TESTNET: Magic = Magic([0xfa, 0x1a, 0xf9, 0xbf]);
}
#[cfg(test)]
mod tests {
use std::convert::TryFrom;
4. Avoid repeated requests to peers after partial responses or errors (#3505) * fix(network): split synthetic NotFoundRegistry from message NotFoundResponse * docs(network): Improve `notfound` message documentation * refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender ``` fastmod MustUseOneshotSender MustUseClientResponseSender zebra* ``` * docs(network): fix a comment typo * refactor(network): remove generics from MustUseClientResponseSender * refactor(network): add an inventory collector to Client, but don't use it yet * feat(network): register missing peer responses as missing inventory We register this missing inventory based on peer responses, or connection errors or timeouts. Inbound message inventory tracking requires peers to send `notfound` messages. But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages. This missing inventory tracking works regardless of peer `notfound` messages. * refactor(network): rename ResponseStatus to InventoryResponse ```sh fastmod ResponseStatus InventoryResponse zebra* ``` * refactor(network): rename InventoryStatus::inner() to to_inner() * fix(network): remove a redundant runtime.enter() in a test * doc(network): the exact time used to filter outbound peers doesn't matter * fix(network): handle block requests slightly more efficiently * doc(network): fix a typo * fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse * doc(test): clarify some test comments * test(network): test synthetic notfound from connection errors and peer inventory routing * test(network): improve inbound test diagnostics * feat(network): add a proptest-impl feature to zebra-network * feat(network): add a test-only connect_isolated_with_inbound function * test(network): allow a response on the isolated peer test connection * test(network): fix failures in test synthetic notfound * test(network): Simplify SharedPeerError test assertions * test(network): test synthetic notfound from partially successful requests * test(network): MissingInventoryCollector ignores local NotFoundRegistry errors * fix(network): decrease the inventory rotation interval This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block. Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit. This should speed up syncing near the tip, and on testnet. * fmt(network): cargo fmt --all * cleanup(network): remove unnecessary allow(dead_code) * cleanup(network): stop importing the whole sync module into tests * doc(network): clarify syncer inventory retry constraint * doc(network): add a TODO for a fix to ensure API behaviour remains consistent * doc(network): fix a function doc typo * doc(network): clarify how we handle peers that don't send `notfound` * docs(network): clarify a test comment Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-14 17:44:33 -08:00
use zebra_chain::parameters::POST_BLOSSOM_POW_TARGET_SPACING;
use super::*;
2019-10-21 12:16:28 -07:00
/// This assures that the `Duration` value we are computing for
/// [`MIN_PEER_RECONNECTION_DELAY`] actually matches the other const values
/// it relies on.
#[test]
fn ensure_live_peer_duration_value_matches_others() {
let _init_guard = zebra_test::init();
let constructed_live_peer_duration =
HEARTBEAT_INTERVAL + REQUEST_TIMEOUT + REQUEST_TIMEOUT + REQUEST_TIMEOUT;
Security: Zebra should stop gossiping unreachable addresses to other nodes, Action: re-deploy all nodes (#2392) * Rename some methods and constants for clarity Using the following commands: ``` fastmod '\bis_ready_for_attempt\b' is_ready_for_connection_attempt # One instance required a tweak, because of the ASCII diagram. fastmod '\bwas_recently_live\b' has_connection_recently_responded fastmod '\bwas_recently_attempted\b' was_connection_recently_attempted fastmod '\bwas_recently_failed\b' has_connection_recently_failed fastmod '\bLIVE_PEER_DURATION\b' MIN_PEER_RECONNECTION_DELAY ``` * Use `Instant::elapsed` for conciseness Instead of `Instant::now().saturating_duration_since`. They're both equivalent, and `elapsed` only panics if the `Instant` is somehow synthetically generated. * Allow `Duration32` to be created in other crates Export the `Duration32` from the `zebra_chain::serialization` module. * Add some new `Duration32` constructors Create some helper `const` constructors to make it easy to create constant durations. Add methods to create a `Duration32` from seconds, minutes and hours. * Avoid gossiping unreachable peers When sanitizing the list of peers to gossip, remove those that we haven't seen in more than three hours. * Test if unreachable addresses aren't gossiped Create a property test with random addreses inserted into an `AddressBook`, and verify that the sanitized list of addresses does not contain any addresses considered unreachable. * Test if new alternate address isn't gossipable Create a new alternate peer, because that type of `MetaAddr` does not have `last_response` or `untrusted_last_seen` times. Verify that the peer is not considered gossipable. * Test if local listener is gossipable The `MetaAddr` representing the local peer's listening address should always be considered gossipable. * Test if gossiped peer recently seen is gossipable Create a `MetaAddr` representing a gossiped peer that was reported to be seen recently. Check that the peer is considered gossipable. * Test peer reportedly last seen in the future Create a `MetaAddr` representing a peer gossiped and reported to have been last seen in a time that's in the future. Check that the peer is considered gossipable, to check that the fallback calculation is working as intended. * Test gossiped peer reportedly seen long ago Create a `MetaAddr` representing a gossiped peer that was reported to last have been seen a long time ago. Check that the peer is not considered gossipable. * Test if just responded peer is gossipable Create a `MetaAddr` representing a peer that has just responded and check that it is considered gossipable. * Test if recently responded peer is gossipable Create a `MetaAddr` representing a peer that last responded within the duration a peer is considered reachable. Verify that the peer is considered gossipable. * Test peer that responded long ago isn't gossipable Create a `MetaAddr` representing a peer that last responded outside the duration a peer is considered reachable. Verify that the peer is not considered gossipable.
2021-06-28 22:12:27 -07:00
assert_eq!(MIN_PEER_RECONNECTION_DELAY, constructed_live_peer_duration);
}
/// Make sure that the timeout values are consistent with each other.
#[test]
fn ensure_timeouts_consistent() {
let _init_guard = zebra_test::init();
assert!(HANDSHAKE_TIMEOUT <= REQUEST_TIMEOUT,
"Handshakes are requests, so the handshake timeout can't be longer than the timeout for all requests.");
// This check is particularly important on testnet, which has a small
// number of peers, which are often slow.
assert!(EWMA_DEFAULT_RTT > REQUEST_TIMEOUT,
"The default EWMA RTT should be higher than the request timeout, so new peers are required to prove they are fast, before we prefer them to other peers.");
Disconnect from outdated peers on network upgrade (#3108) * Replace usage of `discover::Change` with a tuple Remove the assumption that a `Remove` variant would never be created with type changes that allow the compiler to guarantee that assumption. * Add a `version` field to the `Client` type Keep track of the peer's reported protocol version. * Create `LoadTrackedClient` type A `peer::Client` type wrapper that implements `Load`. This helps with the creation of a client service that has extra peer information to be accessed without having to send requests. * Use `LoadTrackedClient` in `initialize` Ensure that `PeerSet` receives `LoadTrackedClient`s so that it will be able to query the peer's protocol version later on. * Require `LoadTrackedClient` in `PeerSet` Replace the generic type with a concrete `LoadTrackedClient` so that we can query its version. * Create `MinimumPeerVersion` helper type A type to track the current minimum protocol version for connected peers based on the current block height. * Use `MinimumPeerVersion` in handshakes Keep the code to obtain the current minimum peer protocol version in a central place. * Add a `MinimumPeerVersion` instance to `PeerSet` Prepare it to be able to disconnect from outdated peers based on the current minimum supported peer protocol version. * Disconnect from ready services for outdated peers When the minimum peer protocol version is detected to have changed (because of a network upgrade), remove all ready services of peers that became outdated. * Cancel added unready services of outdated peers Only add an unready service if it's for a peer that has a supported protocol version. Otherwise, add it but drop the cancel handle so that the `UnreadyService` can execute and detect that it was cancelled. * Avoid adding ready services for outdated peers If a service becomes ready but it's for a connection to an outdated peer, drop it. * Improve comment inside `crawl_and_dial` Describe an edge case that is also handled but was not explicit. Co-authored-by: teor <teor@riseup.net> * Test if calculated minimum peer version is correct Given an arbitrary best chain tip height, check that the calculated minimum peer protocol version is the expected value. * Test if minimum version changes with chain tip Apply an arbitrary list of chain tip height updates and check that for each update the minimum peer version is calculated correctly. * Test minimum peer version changed reports Simulate a series of best chain tip height updates, and check for minimum peer version updates at least once between them. Changes should only be reported once. * Create a `MockedClientHandle` helper type Used to create and then track a mock `Client` instance. * Add `MinimumPeerVersion::with_mock_chain_tip` An extension method useful for tests, that contains some shared boilerplate code. * Bias arbitrary `Version`s to be in valid range Give a 50% chance for an arbitrary `Version` to be in the range of previously used values the Zcash network. * Create a `PeerVersions` helper type Helps with the creation of mocked client services with arbitrary protocol versions. * Create a `PeerSetGuard` helper type An auxiliary type to a `PeerSet` instance created for testing. It keeps track of any dummy endpoints of channels created and passed to the `PeerSet` instance. * Create a `PeerSetBuilder` helper type Helps to reduce the code when preparing a `PeerSet` test instance. * Test if outdated peers are rejected by `PeerSet` Simulate a set of discovered peers being sent to the `PeerSet`. Ensure that only up-to-date peers are kept by the `PeerSet` and that outdated peers are dropped. * Create `BlockHeightPairAcrossNetworkUpgrades` type A helper type that allows the creation of arbitrary block height pairs, where one value is before and the other is at or after the activation height of an arbitrary network upgrade. * Test if peers are dropped as they become outdated Simulate a network upgrade, and check that peers that become outdated are dropped by the `PeerSet`. * Remove dbg! macros Co-authored-by: teor <teor@riseup.net>
2021-12-08 18:54:29 -08:00
let request_timeout_nanos = REQUEST_TIMEOUT.as_secs_f64()
+ f64::from(REQUEST_TIMEOUT.subsec_nanos()) * NANOS_PER_SECOND;
assert!(EWMA_DECAY_TIME_NANOS > request_timeout_nanos,
"The EWMA decay time should be higher than the request timeout, so timed out peers are penalised by the EWMA.");
assert!(
u32::try_from(MAX_ADDRS_IN_ADDRESS_BOOK).expect("fits in u32")
* MIN_PEER_CONNECTION_INTERVAL
< MIN_PEER_RECONNECTION_DELAY,
"each peer should get at least one connection attempt in each connection interval",
);
assert!(
MIN_PEER_RECONNECTION_DELAY.as_secs()
/ (u32::try_from(MAX_ADDRS_IN_ADDRESS_BOOK).expect("fits in u32")
* MIN_PEER_CONNECTION_INTERVAL)
.as_secs()
<= 2,
"each peer should only have a few connection attempts in each connection interval",
);
}
/// Make sure that peer age limits are consistent with each other.
#[test]
fn ensure_peer_age_limits_consistent() {
let _init_guard = zebra_test::init();
assert!(
MAX_PEER_ACTIVE_FOR_GOSSIP <= MAX_RECENT_PEER_AGE,
"we should only gossip peers we are actually willing to try ourselves"
);
}
/// Make sure the address limits are consistent with each other.
#[test]
#[allow(clippy::assertions_on_constants)]
fn ensure_address_limits_consistent() {
// Zebra 1.0.0-beta.2 address book metrics in December 2021.
const TYPICAL_MAINNET_ADDRESS_BOOK_SIZE: usize = 4_500;
let _init_guard = zebra_test::init();
assert!(
MAX_ADDRS_IN_ADDRESS_BOOK >= GET_ADDR_FANOUT * MAX_ADDRS_IN_MESSAGE,
"the address book should hold at least a fanout's worth of addresses"
);
assert!(
MAX_ADDRS_IN_ADDRESS_BOOK / ADDR_RESPONSE_LIMIT_DENOMINATOR > MAX_ADDRS_IN_MESSAGE,
"the address book should hold enough addresses for a full response"
);
assert!(
MAX_ADDRS_IN_ADDRESS_BOOK < TYPICAL_MAINNET_ADDRESS_BOOK_SIZE,
"the address book limit should actually be used"
);
}
4. Avoid repeated requests to peers after partial responses or errors (#3505) * fix(network): split synthetic NotFoundRegistry from message NotFoundResponse * docs(network): Improve `notfound` message documentation * refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender ``` fastmod MustUseOneshotSender MustUseClientResponseSender zebra* ``` * docs(network): fix a comment typo * refactor(network): remove generics from MustUseClientResponseSender * refactor(network): add an inventory collector to Client, but don't use it yet * feat(network): register missing peer responses as missing inventory We register this missing inventory based on peer responses, or connection errors or timeouts. Inbound message inventory tracking requires peers to send `notfound` messages. But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages. This missing inventory tracking works regardless of peer `notfound` messages. * refactor(network): rename ResponseStatus to InventoryResponse ```sh fastmod ResponseStatus InventoryResponse zebra* ``` * refactor(network): rename InventoryStatus::inner() to to_inner() * fix(network): remove a redundant runtime.enter() in a test * doc(network): the exact time used to filter outbound peers doesn't matter * fix(network): handle block requests slightly more efficiently * doc(network): fix a typo * fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse * doc(test): clarify some test comments * test(network): test synthetic notfound from connection errors and peer inventory routing * test(network): improve inbound test diagnostics * feat(network): add a proptest-impl feature to zebra-network * feat(network): add a test-only connect_isolated_with_inbound function * test(network): allow a response on the isolated peer test connection * test(network): fix failures in test synthetic notfound * test(network): Simplify SharedPeerError test assertions * test(network): test synthetic notfound from partially successful requests * test(network): MissingInventoryCollector ignores local NotFoundRegistry errors * fix(network): decrease the inventory rotation interval This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block. Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit. This should speed up syncing near the tip, and on testnet. * fmt(network): cargo fmt --all * cleanup(network): remove unnecessary allow(dead_code) * cleanup(network): stop importing the whole sync module into tests * doc(network): clarify syncer inventory retry constraint * doc(network): add a TODO for a fix to ensure API behaviour remains consistent * doc(network): fix a function doc typo * doc(network): clarify how we handle peers that don't send `notfound` * docs(network): clarify a test comment Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-14 17:44:33 -08:00
/// Make sure inventory registry rotation is consistent with the target block interval.
#[test]
fn ensure_inventory_rotation_consistent() {
let _init_guard = zebra_test::init();
4. Avoid repeated requests to peers after partial responses or errors (#3505) * fix(network): split synthetic NotFoundRegistry from message NotFoundResponse * docs(network): Improve `notfound` message documentation * refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender ``` fastmod MustUseOneshotSender MustUseClientResponseSender zebra* ``` * docs(network): fix a comment typo * refactor(network): remove generics from MustUseClientResponseSender * refactor(network): add an inventory collector to Client, but don't use it yet * feat(network): register missing peer responses as missing inventory We register this missing inventory based on peer responses, or connection errors or timeouts. Inbound message inventory tracking requires peers to send `notfound` messages. But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages. This missing inventory tracking works regardless of peer `notfound` messages. * refactor(network): rename ResponseStatus to InventoryResponse ```sh fastmod ResponseStatus InventoryResponse zebra* ``` * refactor(network): rename InventoryStatus::inner() to to_inner() * fix(network): remove a redundant runtime.enter() in a test * doc(network): the exact time used to filter outbound peers doesn't matter * fix(network): handle block requests slightly more efficiently * doc(network): fix a typo * fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse * doc(test): clarify some test comments * test(network): test synthetic notfound from connection errors and peer inventory routing * test(network): improve inbound test diagnostics * feat(network): add a proptest-impl feature to zebra-network * feat(network): add a test-only connect_isolated_with_inbound function * test(network): allow a response on the isolated peer test connection * test(network): fix failures in test synthetic notfound * test(network): Simplify SharedPeerError test assertions * test(network): test synthetic notfound from partially successful requests * test(network): MissingInventoryCollector ignores local NotFoundRegistry errors * fix(network): decrease the inventory rotation interval This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block. Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit. This should speed up syncing near the tip, and on testnet. * fmt(network): cargo fmt --all * cleanup(network): remove unnecessary allow(dead_code) * cleanup(network): stop importing the whole sync module into tests * doc(network): clarify syncer inventory retry constraint * doc(network): add a TODO for a fix to ensure API behaviour remains consistent * doc(network): fix a function doc typo * doc(network): clarify how we handle peers that don't send `notfound` * docs(network): clarify a test comment Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-14 17:44:33 -08:00
assert!(
INVENTORY_ROTATION_INTERVAL
< Duration::from_secs(
POST_BLOSSOM_POW_TARGET_SPACING
.try_into()
.expect("non-negative"),
),
"we should expire inventory every time 1-2 new blocks get generated"
);
}
}