zebra

Commit Graph

Author	SHA1	Message	Date
teor	0918663e3e	fix(net): Rate-limit MetaAddrChange::Responded from peers (#6738 ) * Rate-limit MetaAddrChange::Responded from peers * Document rate-limits on the address book updater channel	2023-05-23 20:50:29 +00:00
teor	b0d9471214	fix(log): Stop logging peer IP addresses, to protect user privacy (#6662 ) * Add a PeerSocketAddr type which hides its IP address, but shows the port * Manually replace SocketAddr with PeerSocketAddr where needed ```sh fastmod SocketAddr PeerSocketAddr zebra-network ``` * Add missing imports * Make converting into PeerSocketAddr easier * Fix some unused imports * Add a canonical_peer_addr() function * Fix connection handling for PeerSocketAddr * Fix serialization for PeerSocketAddr * Fix tests for PeerSocketAddr * Remove some unused imports * Fix address book listener handling * Remove redundant imports and conversions * Update outdated IPv4-mapped IPv6 address code * Make addresses canonical when deserializing * Stop logging peer addresses in RPC code * Update zebrad tests with new PeerSocketAddr type * Update zebra-rpc tests with new PeerSocketAddr type --------- Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2023-05-14 15:06:07 +00:00
teor	0d50d973d2	fix(net): Limit the number of leftover nonces in the self-connection nonce set (#6534 ) * Use a stricter connection rate limit for successful inbound peer connections * Limit the number of nonces in the self-connection nonce set * Rate-limit failed inbound connections as well * Justify the sleep and the yield_now * Use the configured connection limit rather than a constant * Tests that the number of nonces is limited (#37) * Tests that the number of nonces is limited * removes unused constant * test that it reaches the nonce limit --------- Co-authored-by: Arya <aryasolhi@gmail.com>	2023-04-18 08:13:19 +00:00
teor	6aea0fd9e8	Add some missing tracing spans (#4660 ) Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2022-06-23 07:46:02 +00:00
Marek	cc75c3f5f9	fix(doc): Fix various doc warnings, part 3 (#4611 ) * Fix the syntax of links in comments * Fix a mistake in the docs Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com> * Remove unnecessary angle brackets from a link * Revert the changes for links that serve as references * Revert "Revert the changes for links that serve as references" This reverts commit `8b091aa9fa`. * Remove `<` `>` from links that serve as references This reverts commit `046ef25620`. * Don't use `<` `>` in normal comments * Don't use `<` `>` for normal comments * Revert changes for comments starting with `//` * Fix some warnings produced by `cargo doc` * Fix some rustdoc warnings * Fix some warnings * Refactor some changes * Fix some rustdoc warnings * Fix some rustdoc warnings * Resolve various TODOs Co-authored-by: teor <teor@riseup.net> * Fix some unresolved links * Allow links to private items * Fix some unresolved links Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com> Co-authored-by: teor <teor@riseup.net> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2022-06-15 03:57:19 +00:00
teor	d0e6de8040	Avoid deadlocks in the address book mutex (#3244 ) * Tweak crawler timings so peers are more likely to be available * Tweak min peer connection interval so we try all peers * Let other tasks run between fanouts, so we're more likely to choose different peers * Let other tasks run between retries, so we're more likely to choose different peers * Let other tasks run after peer crawler DemandDrop This makes it more likely that peers will become ready. * Spawn the address book updater on a blocking thread * Spawn CandidateSet address book operations on blocking threads * Replace the PeerSet address book with a metrics watch channel * Fix comment * Await spawned address book tasks * Run the address book update tasks concurrently (except for the mutex) * Explain an internal-only method better * Fix a typo Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-12-20 00:44:43 +00:00
teor	6cbd7dce43	Fix task handling bugs, so peers are more likely to be available (#3191 ) * Tweak crawler timings so peers are more likely to be available * Tweak min peer connection interval so we try all peers * Let other tasks run between fanouts, so we're more likely to choose different peers * Let other tasks run between retries, so we're more likely to choose different peers * Let other tasks run after peer crawler DemandDrop This makes it more likely that peers will become ready.	2021-12-20 09:02:31 +10:00
teor	37808eaadb	Security: When there are no new peers, stop crawler using CPU and writing logs (#3177 ) * Stop useless crawler attempts when there are no peers and no crawl responses * Disable GitHub bug report URLs when the disk is full * Add help text for the `zebrad start` tracing filter option	2021-12-10 00:19:52 +00:00
teor	4d608d3224	Stop doing thousands of time checks each time we connect to a peer (#3106 ) * Stop checking the entire AddressBook for each connection attempt * Stop redundant peer time checks within the address book * Stop calling `Instant::now` 3 times for each address book update * Only get the time once each time an address book method is called * Update outdated comment * Use an OrderedMap to efficiently store address book peers * Add address book order tests	2021-12-03 15:09:43 -03:00
teor	c85ea18b43	Fix slow Zebra startup times, to reduce CI failures (#3104 ) * Tweak a log message * Only retry failed DNS once, then use the other DNS responses * Limit broadcasts to half the peers * Use a longer minimum interval for GetAddr requests * Reduce the syncer and mempool crawler fanouts * Stop resetting the mempool twice when it starts up This spawns two crawlers, which send two fanouts, so it can use up a lot of peers. Co-authored-by: Conrado Gouvea <conrado@zfnd.org> Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>	2021-11-30 21:04:32 +00:00
Janito Vaqueiro Ferreira Filho	0960e4fb0b	Update to Tokio 1.13.0 (#2994 ) * Update `tower` to version `0.4.9` Update to latest version to add support for Tokio version 1. * Replace usage of `ServiceExt::ready_and` It was deprecated in favor of `ServiceExt::ready`. * Update Tokio dependency to version `1.13.0` This will break the build because the code isn't ready for the update, but future commits will fix the issues. * Replace import of `tokio::stream::StreamExt` Use `futures::stream::StreamExt` instead, because newer versions of Tokio don't have the `stream` feature. * Use `IntervalStream` in `zebra-network` In newer versions of Tokio `Interval` doesn't implement `Stream`, so the wrapper types from `tokio-stream` have to be used instead. * Use `IntervalStream` in `inventory_registry` In newer versions of Tokio the `Interval` type doesn't implement `Stream`, so `tokio_stream::wrappers::IntervalStream` has to be used instead. * Use `BroadcastStream` in `inventory_registry` In newer versions of Tokio `broadcast::Receiver` doesn't implement `Stream`, so `tokio_stream::wrappers::BroadcastStream` instead. This also requires changing the error type that is used. * Handle `Semaphore::acquire` error in `tower-batch` Newer versions of Tokio can return an error if the semaphore is closed. This shouldn't happen in `tower-batch` because the semaphore is never closed. * Handle `Semaphore::acquire` error in `zebrad` test On newer versions of Tokio `Semaphore::acquire` can return an error if the semaphore is closed. This shouldn't happen in the test because the semaphore is never closed. * Update some `zebra-network` dependencies Use versions compatible with Tokio version 1. * Upgrade Hyper to version 0.14 Use a version that supports Tokio version 1. * Update `metrics` dependency to version 0.17 And also update the `metrics-exporter-prometheus` to version 0.6.1. These updates are to make sure Tokio 1 is supported. * Use `f64` as the histogram data type `u64` isn't supported as the histogram data type in newer versions of `metrics`. * Update the initialization of the metrics component Make it compatible with the new version of `metrics`. * Simplify build version counter Remove all constants and use the new `metrics::incement_counter!` macro. * Change metrics output line to match on The snapshot string isn't included in the newer version of `metrics-exporter-prometheus`. * Update `sentry` to version 0.23.0 Use a version compatible with Tokio version 1. * Remove usage of `TracingIntegration` This seems to not be available from `sentry-tracing` anymore, so it needs to be replaced. * Add sentry layer to tracing initialization This seems like the replacement for `TracingIntegration`. * Remove unnecessary conversion Suggested by a Clippy lint. * Update Cargo lock file Apply all of the updates to dependencies. * Ban duplicate tokio dependencies Also ban git sources for tokio dependencies. * Stop allowing sentry-tracing git repository in `deny.toml` * Allow remaining duplicates after the tokio upgrade * Use C: drive for CI build output on Windows GitHub Actions uses a Windows image with two disk drives, and the default D: drive is smaller than the C: drive. Zebra currently uses a lot of space to build, so it has to use the C: drive to avoid CI build failures because of insufficient space. Co-authored-by: teor <teor@riseup.net>	2021-11-02 18:46:57 +00:00
Janito Vaqueiro Ferreira Filho	a9f1c189d9	Make `services` field in `MetaAddr` optional (#2976 ) * Use `prop_assert` instead of `assert` Otherwise the test input isn't minimized. * Split long string into a multi-line string And add some newlines to try to improve readability. * Fix referenced issue number They had a typo in their number. * Make peer services optional It is unknown for initial peers. * Fix `preserve_initial_untrusted_values` test Now that it's optional, the services field can be written to if it was previously empty. * Fix formatting of property tests Run rustfmt on them. * Restore `TODO` comment Make it easy to find planned improvements in the code. Co-authored-by: teor <teor@riseup.net> * Comment on how ordering is affected Make it clear that missing services causes the peer to be chosen last. Co-authored-by: teor <teor@riseup.net> * Don't expect `services` to be available Avoid a panic by using the compiler to help enforce the handling of the case correctly. * Panic if received gossiped address has no services All received gossiped addresses have services. The only addresses that don't have services configured are the initial seed addresses. Co-authored-by: teor <teor@riseup.net>	2021-11-02 02:45:35 +00:00
Janito Vaqueiro Ferreira Filho	192a45ccf1	Refactor rate limiting to not store `Sleep` type (#2915 ) In newer Tokio versions the `Sleep` type doesn't implement `Unpin`, so it's a little more complicated to use it. In this case it was easier to refactor the code to not store the `Sleep` type instead of wrapping it in a `Pin` type.	2021-10-21 11:47:04 +00:00
teor	4b8b65a627	Avoid spurious acceptance test failures by decreasing the peer crawler timeout (#2905 ) * Improve logging for initial peer connections * Decrease the initial peer crawl timeout to make tests more reliable Co-authored-by: Conrado Gouvea <conrado@zfnd.org>	2021-10-19 15:29:03 +00:00
Janito Vaqueiro Ferreira Filho	b68202c68a	Security: Zebra should stop gossiping unreachable addresses to other nodes, Action: re-deploy all nodes (#2392 ) * Rename some methods and constants for clarity Using the following commands: ``` fastmod '\bis_ready_for_attempt\b' is_ready_for_connection_attempt # One instance required a tweak, because of the ASCII diagram. fastmod '\bwas_recently_live\b' has_connection_recently_responded fastmod '\bwas_recently_attempted\b' was_connection_recently_attempted fastmod '\bwas_recently_failed\b' has_connection_recently_failed fastmod '\bLIVE_PEER_DURATION\b' MIN_PEER_RECONNECTION_DELAY ``` * Use `Instant::elapsed` for conciseness Instead of `Instant::now().saturating_duration_since`. They're both equivalent, and `elapsed` only panics if the `Instant` is somehow synthetically generated. * Allow `Duration32` to be created in other crates Export the `Duration32` from the `zebra_chain::serialization` module. * Add some new `Duration32` constructors Create some helper `const` constructors to make it easy to create constant durations. Add methods to create a `Duration32` from seconds, minutes and hours. * Avoid gossiping unreachable peers When sanitizing the list of peers to gossip, remove those that we haven't seen in more than three hours. * Test if unreachable addresses aren't gossiped Create a property test with random addreses inserted into an `AddressBook`, and verify that the sanitized list of addresses does not contain any addresses considered unreachable. * Test if new alternate address isn't gossipable Create a new alternate peer, because that type of `MetaAddr` does not have `last_response` or `untrusted_last_seen` times. Verify that the peer is not considered gossipable. * Test if local listener is gossipable The `MetaAddr` representing the local peer's listening address should always be considered gossipable. * Test if gossiped peer recently seen is gossipable Create a `MetaAddr` representing a gossiped peer that was reported to be seen recently. Check that the peer is considered gossipable. * Test peer reportedly last seen in the future Create a `MetaAddr` representing a peer gossiped and reported to have been last seen in a time that's in the future. Check that the peer is considered gossipable, to check that the fallback calculation is working as intended. * Test gossiped peer reportedly seen long ago Create a `MetaAddr` representing a gossiped peer that was reported to last have been seen a long time ago. Check that the peer is not considered gossipable. * Test if just responded peer is gossipable Create a `MetaAddr` representing a peer that has just responded and check that it is considered gossipable. * Test if recently responded peer is gossipable Create a `MetaAddr` representing a peer that last responded within the duration a peer is considered reachable. Verify that the peer is considered gossipable. * Test peer that responded long ago isn't gossipable Create a `MetaAddr` representing a peer that last responded outside the duration a peer is considered reachable. Verify that the peer is not considered gossipable.	2021-06-29 05:12:27 +00:00
teor	1a57023eac	Security: Use canonical SocketAddrs to avoid duplicate peer connections, Feature: Send local listener to peers (#2276 ) * Always send our local listener with the latest time Previously, whenever there was an inbound request for peers, we would clone the address book and update it with the local listener. This had two impacts: - the listener could conflict with an existing entry, rather than unconditionally replacing it, and - the listener was briefly included in the address book metrics. As a side-effect, this change also makes sanitization slightly faster, because it avoids some useless peer filtering and sorting. * Skip listeners that are not valid for outbound connections * Filter sanitized addresses Zebra based on address state This fix correctly prevents Zebra gossiping client addresses to peers, but still keeps the client in the address book to avoid reconnections. * Add a full set of DateTime32 and Duration32 calculation methods * Refactor sanitize to use the new DateTime32/Duration32 methods * Security: Use canonical SocketAddrs to avoid duplicate connections If we allow multiple variants for each peer address, we can make multiple connections to that peer. Also make sure sanitized MetaAddrs are valid for outbound connections. * Test that address books contain the local listener address Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-06-22 02:16:59 +00:00
teor	4d22a0bae9	Security: Limit reconnection rate to individual peers (#2275 ) * Security: Limit reconnection rate to individual peers Reconnection Rate Limit the reconnection rate to each individual peer by applying the liveness cutoff to the attempt, responded, and failure time fields. If any field is recent, the peer is skipped. The new liveness cutoff skips any peers that have recently been attempted or failed. (Previously, the liveness check was only applied if the peer was in the `Responded` state, which could lead to repeated retries of `Failed` peers, particularly in small address books.) Reconnection Order Zebra prefers more useful peer states, then the earliest attempted, failed, and responded times, then the most recent gossiped last seen times. Before this change, Zebra took the most recent time in all the peer time fields, and used that time for liveness and ordering. This led to confusion between trusted and untrusted data, and success and failure times. Unlike the previous order, the new order: - tries all peers in each state, before re-trying any peer in that state, and - only checks the the gossiped untrusted last seen time if all other times are equal. * Preserve the later time if changes arrive out of order * Update CandidateSet::next documentation * Update CandidateSet state diagram * Fix variant names in comments * Explain why timestamps can be left out of MetaAddrChanges * Add a simple test for the individual peer retry limit * Only generate valid Arbitrary PeerServices values * Add an individual peer retry limit AddressBook and CandidateSet test * Stop deleting recently live addresses from the address book If we delete recently live addresses from the address book, we can get a new entry for them, and reconnect too rapidly. * Rename functions to match similar tokio API * Fix docs for service sorting * Clarify a comment * Cleanup a variable and comments * Remove blank lines in the CandidateSet state diagram * Add a multi-peer proptest that checks outbound attempt fairness * Fix a comment typo Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> * Simplify time maths in MetaAddr * Create a Duration32 type to simplify calculations and comparisons * Rename variables for clarity * Split a string constant into multiple lines * Make constants match rustdoc order Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-06-18 09:30:44 -03:00
teor	3f7410d073	Security: stop gossiping failure and attempt times as last_seen times (#2273 ) * Security: stop gossiping failure and attempt times as last_seen times Previously, Zebra had a single time field for peer addresses, which was updated every time a peer was attempted, sent a message, or failed. This is a security issue, because the `last_seen` time should be "the last time [a peer] connected to that node", so that "nodes can use the time field to avoid relaying old 'addr' messages". So Zebra was sending incorrect peer information to other nodes. As part of this change, we split the `last_seen` time into the following fields: - untrusted_last_seen: gossiped from other peers - last_response: time we got a response from a directly connected peer - last_attempt: time we attempted to connect to a peer - last_failure: time a connection with a peer failed * Implement Arbitrary and strategies for MetaAddrChange Also replace the MetaAddr Arbitrary impl with a derive. * Write proptests for MetaAddr and MetaAddrChange MetaAddr: - the only times that get included in serialized MetaAddrs are the untrusted last seen and responded times MetaAddrChange: - the untrusted last seen time is never updated - the services are only updated if there has been a handshake	2021-06-15 13:31:16 +10:00
teor	86f23f7960	Security: only apply the outbound connection rate-limit to actual connections (#2278 ) * Only advance the outbound connection timer when it returns an address Previously, we were advancing the timer even when we returned `None`. This created large wait times when there were no eligible peers. * Refactor to avoid overlapping sleep timers * Add a maximum next peer delay test Also refactor peer numbers into constants. * Make the number of proptests overridable by the standard env var Also cleanup the test constants. * Test that skipping peer connections also skips their rate limits * Allow an extra second after each sleep on loaded machines macOS VMs seem to need this extra time to pass their tests. * Restart test time bounds from the current time This change avoids test failures due to cumulative errors. Also use a single call to `Instant::now` for each test round. And print the times when the tests fail. * Stop generating invalid outbound peers in proptests The candidate set proptests will fail if enough generated peers are invalid for outbound connections.	2021-06-15 08:29:17 +10:00
Janito Vaqueiro Ferreira Filho	e8d5f6978d	Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254 ) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>	2021-06-09 09:42:45 +10:00
Janito Vaqueiro Ferreira Filho	aaef94c2bf	Prevent burst of reconnection attempts (#2251 ) * Rate-limit new outbound peer connections Set the rate-limiting sleep timer to use a delay added to the maximum between the next peer connection instant and now. This ensures that the timer always sleeps at least the time used for the delay. This change fixes rate-limiting new outbound peer connections, since before there could be a burst of attempts until the deadline progressed to the current instant. Fixes #2216 * Create `MetaAddr::alternate_node_strategy` helper Creates arbitrary `MetaAddr`s as if they were network nodes that sent their listening address. * Test outbound peer connection rate limiting Tests if connections are rate limited to 10 per second, and also tests that sleeping before continuing with the attempts still respets the rate limit and does not result in a burst of reconnection attempts.	2021-06-07 14:13:46 +10:00
teor	ce45198c17	Fix comment typo: overflow -> underflow	2021-06-01 16:44:45 +10:00
Janito Vaqueiro Ferreira Filho	3c9c920bbd	Test if validation offsets times in the future Use some mock gossiped peers that all have `last_seen` times in the future and check that they all have a specific offset applied to them.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	82452621e0	Remove empty list of peers check The `limit_last_seen_times` can now safely handle an empty list.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	966430d400	Update security note to be broader Focus on what can go wrong, and not on the specific causes. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	f3419b7baf	Handle overflow when applying offset If an overflow occurs, the reported `last_seen` times are either very wrong or malicious, so reject all addresses gossiped by that peer.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	5b8f33390c	Add comment to describe purpose Make it clear why all peers have the time offset applied to them. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	9eac43a8bb	Apply offset to all times received from a peer If any of the times gossiped by a peer are in the future, apply the necessary offset to all the times gossiped by that peer. This ensures that all gossiped peers from a malicious peer are moved further back in the queue. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	fa35c9b4f1	Only apply offset to times in the future Times in the past don't have any security implications, so there's no point in trying to apply the offset to them as well.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	876d515dd6	Improve documentation - Make the security impact clearer and in a separate section. - Instead of listing an assumption as almost a side-note, describe it clearly inside a `Panics` section. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	54809a1b89	Don't trust reported peer `last_seen` times Due to clock skew, the peers could end up at the front of the reconnection queue or far at the back. The solution to this is to offset the reported times by the difference between the most recent reported sight (in the remote clock) and the current time (in the local clock).	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	14ecc79f01	Use `DateTime32` in `validate_addrs`	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	b891a96a6d	Improve ergonomics by returning `impl Iterator` Returning `impl IntoIterator` means that the caller will always be forced to call `.into_iter()`, and returning `impl Iterator` still allows them to call `.into_iter()` because it becomes the identity function.	2021-06-01 03:42:08 -03:00
teor	2685fc746e	Remove CandidateSet state and add last seen time limit to candidate_set::validate_addrs (#2177 )	2021-05-21 02:21:13 +00:00
teor	752358d236	Fix some candidate set and meta addr doc links (#2174 ) Suggested by jvff.	2021-05-21 11:40:14 +10:00
teor	c7ea1395e7	Security: Fix CandidateSet timeout and fanout * Refactor: Split CandidateSet::update into separate functions * Security: Apply a timeout to the entire CandidateSet::update * Security: Stop using very large fanout limits during initialization Previously, Zebra used the number of resolved peer addresses. So it was possible for all peers to fail, and for Zebra to hang on the first update. And Zebra could send a fanout for each initial peer, regardless of whether their connection was successful. Also: - wait for at least one successful peer before trying an update - warn if there are no successful initial peers	2021-05-21 06:51:34 +10:00
teor	458c26f1e3	Limit initial candidate set fanout to the number of initial peers If there is a small number of initial peers, and they are slow, the initial candidate set update can appear to hang. To avoid this issue, limit the initial candidate set fanout to the number of initial peers. Once the initial peers have sent us more peer addresses, there is no need to limit the fanouts for future updates. Reported by Niklas Long of Equilibrium.	2021-05-18 07:54:03 +10:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	2ed8bb00cf	Clarify CandidateSet state diagram We get inbound connections on the listener port, but the important part is the inbound connection itself.	2021-04-21 01:37:43 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	1a159dfcb6	Add more methods for creating MetaAddrs This refactor lets us remove `MetaAddr::update_last_seen()`.	2021-03-26 07:23:49 +10:00
teor	6fe81d8992	Make MetaAddr.last_seen into a private field	2021-03-26 07:23:49 +10:00
teor	e50692bd51	CandidateSet: Add Listener Port Connections Inbound connections on the Zcash protocol listener port perform a handshake. If the handshake is successful, it adds the peer to the AddressBook.	2021-03-09 23:05:18 -05:00
Jane Lusby	03aa6f671f	Implement outbound connection rate limiting - includes config rename with alias (#1855 ) * Implement outbound connection rate limiting * fix breaking change on config Co-authored-by: teor <teor@riseup.net>	2021-03-10 01:36:05 +00:00
teor	5424e1d8ba	Fix candidate set address state handling (#1709 ) Design: - Add a `PeerAddrState` to each `MetaAddr` - Use a single peer set for all peers, regardless of state - Implement time-based liveness as an `AddressBook` method, rather than a `PeerAddrState` variant - Delete `AddressBook.by_state` Implementation: - Simplify `AddressBook` changes using `update` and `take` modifier methods - Simplify the `AddressBook` iterator implementation, replacing it with methods that are more obviously correct - Consistently collect peer set metrics Documentation: - Expand and update the peer set documentation We can optimise later, but for now we want simple code that is more obviously correct.	2021-02-18 11:18:32 +10:00
Jane Lusby	15698245e1	Deduplicate metrics dependencies (#1561 ) ## Motivation This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`. ## Solution This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up. ## Related Issues closes https://github.com/ZcashFoundation/zebra/issues/1349 ## Follow Up Work I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible. - https://github.com/ZcashFoundation/zebra/issues/1582 Co-authored-by: teor <teor@riseup.net>	2021-01-12 12:28:56 +10:00
teor	8e2f08221f	Add peer set tracing and unreachable panics (#1468 ) Add some extra tracing and panics to double-check our assumptions about the peer set state machine.	2020-12-14 11:00:39 +10:00
Deirdre Connolly	33afeb37cb	Add a comment about the short looo	2020-09-21 09:26:39 -07:00
Henry de Valence	6f3288814c	network: avoid GetPeers timeout to accelerate init The GetPeers requests sent while crawling the network are randomly load-balanced over available peers. But at the very beginning, they may be both routed to the same peer, causing network initialization to be delayed while the second one times out (since zcashd only ever responds to the first addr message). Only sending one GetPeers request per candidate set update means we crawl the network a little more slowly, but avoids hanging on start.	2020-09-21 09:26:39 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00

1 2

58 Commits