- Add a custom semver match for `zebrad` versions
- Prefer "line contains string" matches, so tests ignore minor changes
- Escape regex meta-characters when a literal string match is intended
- Rename test functions so they are more precise
- Rewrite match internals to remove duplicate code and enable custom matches
- Document match functions
Rust atomics have an API that's very easy to use incorrectly, leading to
hard to find bugs. For that reason, it's best to avoid it unless there's
a good reason not to.
* Rename field to `wait_next_handshake`
Make the name a bit more clear regarding to the field's purpose.
* Move `MIN_PEER_CONNECTION_INTERVAL` to `constants`
Move it to the `constants` module so that it is placed closer to other
constants for consistency and to make it easier to see any relationships
when changing them.
* Rate limit calls to `CandidateSet::update()`
This effectively rate limits requests asking for more peer addresses
sent to the same peer. A new `min_next_crawl` field was added to
`CandidateSet`, and `update` only sends requests for more peer addresses
if the call happens after the instant specified by that field. After
sending the requests, the field value is updated so that there is a
`MIN_PEER_GET_ADDR_INTERVAL` wait time until the next `update` call
sends requests again.
* Include `update_initial` in rate limiting
Move the rate limiting code from `update` to `update_timeout`, so that
both `update` and `update_initial` get rate limited.
* Test `CandidateSet::update` rate limiting
Create a `CandidateSet` that uses a mocked `PeerService`. The mocked
service always returns an empty list of peers, but it also checks that
the requests only happen after expected instants, determined by the
fanout amount and the rate limiting interval.
* Refactor to create a `mock_peer_service` helper
Move the code from the test to a utility function so that another test
will be able to use it as well.
* Check number of times service was called
Use an `AtomicUsize` shared between the service and the test body that
the service increments on every call. The test can then verify if the
service was called the number of times it expected.
* Test calling `update` after `update_initial`
The call to `update` should be skipped because the call to
`update_initial` should also be considered in the rate limiting.
* Mention that call to `update` may be skipped
Make it clearer that in this case the rate limiting causes calls to be
skipped, and not that there's an internal sleep that happens.
Also remove "to the same peers", because it's more general than that.
Co-authored-by: teor <teor@riseup.net>
* Rate-limit new outbound peer connections
Set the rate-limiting sleep timer to use a delay added to the maximum
between the next peer connection instant and now. This ensures that the
timer always sleeps at least the time used for the delay.
This change fixes rate-limiting new outbound peer connections, since
before there could be a burst of attempts until the deadline progressed
to the current instant.
Fixes#2216
* Create `MetaAddr::alternate_node_strategy` helper
Creates arbitrary `MetaAddr`s as if they were network nodes that sent
their listening address.
* Test outbound peer connection rate limiting
Tests if connections are rate limited to 10 per second, and also tests
that sleeping before continuing with the attempts still respets the rate
limit and does not result in a burst of reconnection attempts.
* Standardise lints across Zebra crates, and add missing docs
The only remaining module with missing docs is `zebra_test::command`
* Todo -> TODO
* Clarify what a transcript ErrorChecker does
Also change `Error` -> `BoxError`
* TransError -> ExpectedTranscriptError
* Output Descriptions -> Output descriptions
Given a generated list of gossiped peers, ensure that after running the
`validate_addrs` function none of the resulting peers have a `last_seen`
time that's after the specified limit.
If the calculation to apply the compensation offset overflows or
underflows, the reported times are too distant apart, and could be sent
on purpose by a malicious peer, so all addresses from that peer should
be rejected.
Use some mock gossiped peers where some have `last_seen` times in the
past and some have times in the future. Check that all the peers have
an offset applied to them by the `validate_addrs` function.
This tests if the offset is applied to all peers that a malicious peer
gossiped to us.
Use some mock gossiped peers that all have `last_seen` times in the
past and check that they don't have any changes to the `last_seen` times
applied by the `validate_addrs` function.
If any of the times gossiped by a peer are in the future, apply the
necessary offset to all the times gossiped by that peer. This ensures
that all gossiped peers from a malicious peer are moved further back in
the queue.
Co-authored-by: teor <teor@riseup.net>
- Make the security impact clearer and in a separate section.
- Instead of listing an assumption as almost a side-note, describe it
clearly inside a `Panics` section.
Co-authored-by: teor <teor@riseup.net>
Due to clock skew, the peers could end up at the front of the
reconnection queue or far at the back. The solution to this is to offset
the reported times by the difference between the most recent reported
sight (in the remote clock) and the current time (in the local clock).
Returning `impl IntoIterator` means that the caller will always be
forced to call `.into_iter()`, and returning `impl Iterator` still
allows them to call `.into_iter()` because it becomes the identity
function.
Zebra assumes that deserialized times are always able to be serialized.
But this assumption is wrong because:
- sanitization can modify times
- gossiped `MetaAddr` validation can modify times
* Refactor: Split CandidateSet::update into separate functions
* Security: Apply a timeout to the entire CandidateSet::update
* Security: Stop using very large fanout limits during initialization
Previously, Zebra used the number of resolved peer addresses.
So it was possible for all peers to fail, and for Zebra to hang on the
first update.
And Zebra could send a fanout for each initial peer, regardless
of whether their connection was successful.
Also:
- wait for at least one successful peer before trying an update
- warn if there are no successful initial peers
When peers ask for peer addresses, add our local listener address to the
set of addresses, sanitize, then truncate. Sanitize shuffles addresses,
so if there are lots of addresses in the address book, our address will
only be sent to some peers.
Add canonical addresses from inbound connections to the address book,
so that Zebra can use them for reconnection attempts.
Use the newly added `NeverAttemptedAlternate` state for these addresses,
so we try gossiped addresses first, then canonical addresses. This avoids
duplicate connections to inbound peers.
If there is a small number of initial peers, and they are slow, the
initial candidate set update can appear to hang. To avoid this issue,
limit the initial candidate set fanout to the number of initial peers.
Once the initial peers have sent us more peer addresses, there is no need
to limit the fanouts for future updates.
Reported by Niklas Long of Equilibrium.
* Security: panic if an internally generated time is out of range
If Zebra has a bug where it generates blocks, transactions, or meta
addresses with bad times, panic. This avoids sending bad data onto the
network.
(Previously, Zebra would truncate some of these times, silently
corrupting the underlying data.)
Make it clear that deserialization of these objects is infalliable.
* Instrument the crawl task
When we created the crawl task, we forgot to instrument it with the
global span. This fix makes sure that the git and network span appears on
crawl logs.
* Instrument the connector
* Improve handshake instrumentation
Make some spans debug, so there are not too many spans.
* Add the address to initial peer connection errors