zebra

Commit Graph

Author	SHA1	Message	Date
Janito Vaqueiro Ferreira Filho	e8d5f6978d	Rate limit `GetAddr` messages to any peer, Credit: Equilibrium (#2254 ) * Rename field to `wait_next_handshake` Make the name a bit more clear regarding to the field's purpose. * Move `MIN_PEER_CONNECTION_INTERVAL` to `constants` Move it to the `constants` module so that it is placed closer to other constants for consistency and to make it easier to see any relationships when changing them. * Rate limit calls to `CandidateSet::update()` This effectively rate limits requests asking for more peer addresses sent to the same peer. A new `min_next_crawl` field was added to `CandidateSet`, and `update` only sends requests for more peer addresses if the call happens after the instant specified by that field. After sending the requests, the field value is updated so that there is a `MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call sends requests again. * Include `update_initial` in rate limiting Move the rate limiting code from `update` to `update_timeout`, so that both `update` and `update_initial` get rate limited. * Test `CandidateSet::update` rate limiting Create a `CandidateSet` that uses a mocked `PeerService`. The mocked service always returns an empty list of peers, but it also checks that the requests only happen after expected instants, determined by the fanout amount and the rate limiting interval. * Refactor to create a `mock_peer_service` helper Move the code from the test to a utility function so that another test will be able to use it as well. * Check number of times service was called Use an `AtomicUsize` shared between the service and the test body that the service increments on every call. The test can then verify if the service was called the number of times it expected. * Test calling `update` after `update_initial` The call to `update` should be skipped because the call to `update_initial` should also be considered in the rate limiting. * Mention that call to `update` may be skipped Make it clearer that in this case the rate limiting causes calls to be skipped, and not that there's an internal sleep that happens. Also remove "to the same peers", because it's more general than that. Co-authored-by: teor <teor@riseup.net>	2021-06-09 09:42:45 +10:00
Janito Vaqueiro Ferreira Filho	aaef94c2bf	Prevent burst of reconnection attempts (#2251 ) * Rate-limit new outbound peer connections Set the rate-limiting sleep timer to use a delay added to the maximum between the next peer connection instant and now. This ensures that the timer always sleeps at least the time used for the delay. This change fixes rate-limiting new outbound peer connections, since before there could be a burst of attempts until the deadline progressed to the current instant. Fixes #2216 * Create `MetaAddr::alternate_node_strategy` helper Creates arbitrary `MetaAddr`s as if they were network nodes that sent their listening address. * Test outbound peer connection rate limiting Tests if connections are rate limited to 10 per second, and also tests that sleeping before continuing with the attempts still respets the rate limit and does not result in a burst of reconnection attempts.	2021-06-07 14:13:46 +10:00
teor	ce45198c17	Fix comment typo: overflow -> underflow	2021-06-01 16:44:45 +10:00
Janito Vaqueiro Ferreira Filho	3c9c920bbd	Test if validation offsets times in the future Use some mock gossiped peers that all have `last_seen` times in the future and check that they all have a specific offset applied to them.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	82452621e0	Remove empty list of peers check The `limit_last_seen_times` can now safely handle an empty list.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	966430d400	Update security note to be broader Focus on what can go wrong, and not on the specific causes. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	f3419b7baf	Handle overflow when applying offset If an overflow occurs, the reported `last_seen` times are either very wrong or malicious, so reject all addresses gossiped by that peer.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	5b8f33390c	Add comment to describe purpose Make it clear why all peers have the time offset applied to them. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	9eac43a8bb	Apply offset to all times received from a peer If any of the times gossiped by a peer are in the future, apply the necessary offset to all the times gossiped by that peer. This ensures that all gossiped peers from a malicious peer are moved further back in the queue. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	fa35c9b4f1	Only apply offset to times in the future Times in the past don't have any security implications, so there's no point in trying to apply the offset to them as well.	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	876d515dd6	Improve documentation - Make the security impact clearer and in a separate section. - Instead of listing an assumption as almost a side-note, describe it clearly inside a `Panics` section. Co-authored-by: teor <teor@riseup.net>	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	54809a1b89	Don't trust reported peer `last_seen` times Due to clock skew, the peers could end up at the front of the reconnection queue or far at the back. The solution to this is to offset the reported times by the difference between the most recent reported sight (in the remote clock) and the current time (in the local clock).	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	14ecc79f01	Use `DateTime32` in `validate_addrs`	2021-06-01 03:42:08 -03:00
Janito Vaqueiro Ferreira Filho	b891a96a6d	Improve ergonomics by returning `impl Iterator` Returning `impl IntoIterator` means that the caller will always be forced to call `.into_iter()`, and returning `impl Iterator` still allows them to call `.into_iter()` because it becomes the identity function.	2021-06-01 03:42:08 -03:00
teor	2685fc746e	Remove CandidateSet state and add last seen time limit to candidate_set::validate_addrs (#2177 )	2021-05-21 02:21:13 +00:00
teor	752358d236	Fix some candidate set and meta addr doc links (#2174 ) Suggested by jvff.	2021-05-21 11:40:14 +10:00
teor	c7ea1395e7	Security: Fix CandidateSet timeout and fanout * Refactor: Split CandidateSet::update into separate functions * Security: Apply a timeout to the entire CandidateSet::update * Security: Stop using very large fanout limits during initialization Previously, Zebra used the number of resolved peer addresses. So it was possible for all peers to fail, and for Zebra to hang on the first update. And Zebra could send a fanout for each initial peer, regardless of whether their connection was successful. Also: - wait for at least one successful peer before trying an update - warn if there are no successful initial peers	2021-05-21 06:51:34 +10:00
teor	458c26f1e3	Limit initial candidate set fanout to the number of initial peers If there is a small number of initial peers, and they are slow, the initial candidate set update can appear to hang. To avoid this issue, limit the initial candidate set fanout to the number of initial peers. Once the initial peers have sent us more peer addresses, there is no need to limit the fanouts for future updates. Reported by Niklas Long of Equilibrium.	2021-05-18 07:54:03 +10:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	2ed8bb00cf	Clarify CandidateSet state diagram We get inbound connections on the listener port, but the important part is the inbound connection itself.	2021-04-21 01:37:43 -04:00
teor	375c8d8700	Fix a deadlock between the crawler and dialer, and other hangs (#1950 ) * Stop ignoring inbound message errors and handshake timeouts To avoid hangs, Zebra needs to maintain the following invariants in the handshake and heartbeat code: - each handshake should run in a separate spawned task (not yet implemented) - every message, error, timeout, and shutdown must update the peer address state - every await that depends on the network must have a timeout Once the Connection is created, it should handle timeouts. But we need to handle timeouts during handshake setup. * Avoid hangs by adding a timeout to the candidate set update Also increase the fanout from 1 to 2, to increase address diversity. But only return permanent errors from `CandidateSet::update`, because the crawler task exits if `update` returns an error. Also log Peers response errors in the CandidateSet. * Use the select macro in the crawler to reduce hangs The `select` function is biased towards its first argument, risking starvation. As a side-benefit, this change also makes the code a lot easier to read and maintain. * Split CrawlerAction::Demand into separate actions This refactor makes the code a bit easier to read, at the cost of sometimes blocking the crawler on `candidates.next()`. That's ok, because `next` only has a short (< 100 ms) delay. And we're just about to spawn a separate task for each handshake. * Spawn a separate task for each handshake This change avoids deadlocks by letting each handshake make progress independently. * Move the dial task into a separate function This refactor improves readability. * Fix buggy future::select function usage And document the correctness of the new code.	2021-04-07 10:25:10 -03:00
teor	1a159dfcb6	Add more methods for creating MetaAddrs This refactor lets us remove `MetaAddr::update_last_seen()`.	2021-03-26 07:23:49 +10:00
teor	6fe81d8992	Make MetaAddr.last_seen into a private field	2021-03-26 07:23:49 +10:00
teor	e50692bd51	CandidateSet: Add Listener Port Connections Inbound connections on the Zcash protocol listener port perform a handshake. If the handshake is successful, it adds the peer to the AddressBook.	2021-03-09 23:05:18 -05:00
Jane Lusby	03aa6f671f	Implement outbound connection rate limiting - includes config rename with alias (#1855 ) * Implement outbound connection rate limiting * fix breaking change on config Co-authored-by: teor <teor@riseup.net>	2021-03-10 01:36:05 +00:00
teor	5424e1d8ba	Fix candidate set address state handling (#1709 ) Design: - Add a `PeerAddrState` to each `MetaAddr` - Use a single peer set for all peers, regardless of state - Implement time-based liveness as an `AddressBook` method, rather than a `PeerAddrState` variant - Delete `AddressBook.by_state` Implementation: - Simplify `AddressBook` changes using `update` and `take` modifier methods - Simplify the `AddressBook` iterator implementation, replacing it with methods that are more obviously correct - Consistently collect peer set metrics Documentation: - Expand and update the peer set documentation We can optimise later, but for now we want simple code that is more obviously correct.	2021-02-18 11:18:32 +10:00
Jane Lusby	15698245e1	Deduplicate metrics dependencies (#1561 ) ## Motivation This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`. ## Solution This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up. ## Related Issues closes https://github.com/ZcashFoundation/zebra/issues/1349 ## Follow Up Work I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible. - https://github.com/ZcashFoundation/zebra/issues/1582 Co-authored-by: teor <teor@riseup.net>	2021-01-12 12:28:56 +10:00
teor	8e2f08221f	Add peer set tracing and unreachable panics (#1468 ) Add some extra tracing and panics to double-check our assumptions about the peer set state machine.	2020-12-14 11:00:39 +10:00
Deirdre Connolly	33afeb37cb	Add a comment about the short looo	2020-09-21 09:26:39 -07:00
Henry de Valence	6f3288814c	network: avoid GetPeers timeout to accelerate init The GetPeers requests sent while crawling the network are randomly load-balanced over available peers. But at the very beginning, they may be both routed to the same peer, causing network initialization to be delayed while the second one times out (since zcashd only ever responds to the first addr message). Only sending one GetPeers request per candidate set update means we crawl the network a little more slowly, but avoids hanging on start.	2020-09-21 09:26:39 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
Jane Lusby	b6b35364f3	cleanup warnings throughout codebase	2020-05-27 15:42:29 -04:00
Henry de Valence	00edcae0c2	Add metrics for the crawler and candidate set.	2020-02-14 20:14:05 -05:00
Henry de Valence	2c0f48b587	Refactor connection logic and try a block request. Attempting to implement requests for block data revealed a problem with the previous connection logic. Block data is requested by sending a `getdata` message with hashes of the requested blocks; the peer responds with a sequence of `block` messages with the blocks themselves. However, this wasn't possible to handle with the previous connection logic, which could only convert a single Bitcoin message into a Response. Instead, we factor out the message handling logic into a Handler, which can statefully accumulate arbitrary data into a Response and signal completion. This is still pretty ugly but it does work. As a side effect, the HeartbeatNonceMismatch error is removed; because the Handler now tries to process messages until it comes to a Response, it just ignores mismatched nonces (and will eventually time out). The previous Mempool and Transaction requests were removed but could be re-added in a different form later. Also, the `Get` prefixes are removed from `Request` to tidy the name.	2020-02-10 09:03:56 -08:00
Henry de Valence	f04f4f0b98	Apply clippy fixes	2020-02-05 12:42:32 -08:00
Henry de Valence	da78603d3a	Rename `PeerClient` to `peer::Client`.	2019-11-27 23:53:36 -05:00
Henry de Valence	c3ec235a5b	Suppress unused import warnings.	2019-10-22 19:06:08 -07:00
Henry de Valence	e1a35490af	Move the CandidateSet to its own file. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>	2019-10-22 19:06:08 -07:00
Henry de Valence	b1832ce593	Initial work to add a crawl-and-dial task. This responds to peerset demand by connecting to additional peers. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>	2019-10-22 19:06:08 -07:00

39 Commits