* Add tracing and metrics for seed peer DNS resolution
* Add a grafana dashboard for seed peers
Currently this just shows the initial peer count from each seed.
* Add tracing and metrics for peer network protocol versions
* Update peers dashboard with network protocol versions
* Show peer network protocol versions for each seeder in dashboard
* Add per-seed filter to dashboard
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* README: update known issues
* Add ticket numbers
* Add network ports to README
* Make heading a bit clearer
* Update zebra listener address docs
Explain how Zebra currently uses listener addresses,
after recent changes.
* Gossip dynamically allocated listener ports to peers
Previously, Zebra would either gossip port `0`, which is invalid, or skip
gossiping its own dynamically allocated listener port.
* Improve "no configured peers" warning
And downgrade from error to warning, because inbound-only nodes are a
valid use case.
* Move random_known_port to zebra-test
* Add tests for dynamic local listener ports and the AddressBook
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Always send our local listener with the latest time
Previously, whenever there was an inbound request for peers, we would
clone the address book and update it with the local listener.
This had two impacts:
- the listener could conflict with an existing entry,
rather than unconditionally replacing it, and
- the listener was briefly included in the address book metrics.
As a side-effect, this change also makes sanitization slightly faster,
because it avoids some useless peer filtering and sorting.
* Skip listeners that are not valid for outbound connections
* Filter sanitized addresses Zebra based on address state
This fix correctly prevents Zebra gossiping client addresses to peers,
but still keeps the client in the address book to avoid reconnections.
* Add a full set of DateTime32 and Duration32 calculation methods
* Refactor sanitize to use the new DateTime32/Duration32 methods
* Security: Use canonical SocketAddrs to avoid duplicate connections
If we allow multiple variants for each peer address, we can make multiple
connections to that peer.
Also make sure sanitized MetaAddrs are valid for outbound connections.
* Test that address books contain the local listener address
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Allow use listen address in config without port
* update comments
* remove not used alias
* use Network::default_port
* Move tests and use toml instead json
* change error message
* Make match more readable
Co-authored-by: teor <teor@riseup.net>
Zebra avoids having a majority of addresses from a single peer by asking
3 peers for new addresses.
Also update a bunch of security comments and related documentation.
* Retry each peer DNS a few times individually
We retry each peer individually, as well as retrying if there are no
peers in the combined list.
DNS failures are correlated, so all peers can fail DNS, leaving Zebra
with a small list of custom-configured IP address peers.
Individual retries avoid this issue.
* Rename parse_peers to resolve_peers
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
Closes#536.
This removes:
- the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature);
- the EWMA parameters (these were put in the config just to avoid making a choice);
- the peer connection timeout (we can change the default value if anyone ever has a problem with it);
- the peer set request buffer size (setting this too low can make the application deadlock);
The new peer interval is left in.
Each subsection has to have `serde(default)` to get the behaviour we want
(delete all fields except the ones that have been changed); otherwise, we can
delete only entire sections.
Co-authored-by: Jane Lusby <jane@zfnd.org>
Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs.
First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections.
The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.
tower-buffer uses tokio's mpsc channels, not the futures-rs mpsc channels.
Unlike futures-rs mpsc channels, which have capacity n+m, where n is the buffer
size and m is the number of senders, tokio channels always have buffer size n.
This means that the buffer size is shared across all peer set handles.
Thanks to @hawkw for sharing details of the Tokio internals!
The previous outbound peer connection logic got requests to connect to new
peers and processed them one at a time, making single connection attempts
and retrying if the connection attempt failed. This was quite slow, because
many connections fail, and we have to wait for timeouts. Instead, this logic
connects to new peers concurrently (up to 50 at a time).
The toml serializer function we are using -- maybe because of to_string_pretty
(?) barfs on structs that mix ordering of simple values and "tables", so just
keep all the Durations to the end.