* Rewrite some state cache docs to clarify
* Add a zebra_network::Config.cache_dir for peer address caches
* Add new config test files and fix config test failure message
* Create some zebra-chain and zebra-network convenience functions
* Add methods for reading and writing the peer address cache
* Add cached disk peers to the initial peers list
* Add metrics and logging for loading and storing the peer cache
* Replace log of useless redacted peer IP addresses
* Limit the peer cache minimum and maximum size, don't write empty caches
* Add a cacheable_peers() method to the address book
* Add a peer disk cache updater task to the peer set tasks
* Document that the peer cache is shared by multiple instances unless configured otherwise
* Disable peer cache read/write in disconnected tests
* Make initial peer cache updater sleep shorter for tests
* Add unit tests for reading and writing the peer cache
* Update the task list in the start command docs
* Modify the existing persistent acceptance test to check for peer caches
* Update the peer cache directory when writing test configs
* Add a CacheDir type so the default config can be enabled, but tests can disable it
* Update tests to use the CacheDir config type
* Rename some CacheDir internals
* Add config file test cases for each kind of CacheDir config
* Panic if the config contains invalid socket addresses, rather than continuing
* Add a network directory to state cache directory contents tests
* Add new network.cache_dir config to the config parsing tests
* ZIPs were updated to remove ambiguity, this was tracked in #1267.
* #2105 was fixed by #3039 and #2379 was closed by #3069
* #2230 was a duplicate of #2231 which was closed by #2511
* #3235 was obsoleted by #2156 which was fixed by #3505
* #1850 was fixed by #2944, #1851 was fixed by #2961 and #2902 was fixed by #2969
* We migrated to Rust 2021 edition in Jan 2022 with #3332
* #1631 was closed as not needed
* #338 was fixed by #3040 and #1162 was fixed by #3067
* #2079 was fixed by #2445
* #4794 was fixed by #6122
* #1678 stopped being an issue
* #3151 was fixed by #3934
* #3204 was closed as not needed
* #1213 was fixed by #4586
* #1774 was closed as not needed
* #4633 was closed as not needed
* Clarify behaviour of difficulty spacing
Co-authored-by: teor <teor@riseup.net>
* Update comment to reflect implemented behaviour
Co-authored-by: teor <teor@riseup.net>
* Update comment to reflect implemented behaviour when retrying block downloads
Co-authored-by: teor <teor@riseup.net>
* Update `TODO` to remove closed issue and clarify when we might want to fix
Co-authored-by: teor <teor@riseup.net>
* Update `TODO` to remove closed issue and clarify what we might want to change in future
Co-authored-by: teor <teor@riseup.net>
* Clarify benefits of how we do block verification
Co-authored-by: teor <teor@riseup.net>
* Fix rustfmt errors
---------
Co-authored-by: teor <teor@riseup.net>
* Add a PeerSocketAddr type which hides its IP address, but shows the port
* Manually replace SocketAddr with PeerSocketAddr where needed
```sh
fastmod SocketAddr PeerSocketAddr zebra-network
```
* Add missing imports
* Make converting into PeerSocketAddr easier
* Fix some unused imports
* Add a canonical_peer_addr() function
* Fix connection handling for PeerSocketAddr
* Fix serialization for PeerSocketAddr
* Fix tests for PeerSocketAddr
* Remove some unused imports
* Fix address book listener handling
* Remove redundant imports and conversions
* Update outdated IPv4-mapped IPv6 address code
* Make addresses canonical when deserializing
* Stop logging peer addresses in RPC code
* Update zebrad tests with new PeerSocketAddr type
* Update zebra-rpc tests with new PeerSocketAddr type
---------
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Remove verbose continuous_blockchain test logs
* Downgrade verbose zebra-network logs to debug
* Downgrade some state logs to debug during tests
* Mark were we would add always-on log filters, if we needed to
* Reduce the number of mempool property tests, to reduce logging
* change `initial_mainnet_peers` and `initial_testnet_peers` type to `IndexSet`
* add tests for zebra config files
* add serde feature to indexmap
* remove async
* update config
* fix `stored_config_path()`
* skip tests if config is not found
* improve error
* use CARGO_MANIFEST_DIR
* remove `stored_config_is_newest` test
* move `stored_config_works` test to the end of `valid_generated_config_test`
* space
* use `humantime_serde` for config durations
* move debug config option to the bottom
* fix deserialization
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Fix the syntax of links in comments
* Fix a mistake in the docs
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Remove unnecessary angle brackets from a link
* Revert the changes for links that serve as references
* Revert "Revert the changes for links that serve as references"
This reverts commit 8b091aa9fa.
* Remove `<` `>` from links that serve as references
This reverts commit 046ef25620.
* Don't use `<` `>` in normal comments
* Don't use `<` `>` for normal comments
* Revert changes for comments starting with `//`
* Fix some warnings produced by `cargo doc`
* Fix some rustdoc warnings
* Fix some warnings
* Refactor some changes
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* fix(network): allow more inbound than outbound connections
* refactor(network): access constants using consistent paths
* fixup! fix(network): allow more inbound than outbound connections
* fixup! fixup! fix(network): allow more inbound than outbound connections
* refactor(network): convert to standard test module layout
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Add arti as a zebra-network dependency
* Add a method for isolated anonymised Tor connections to a specific hostname
* Add tests for isolated tor connections
* Use a shared tor client instance for all isolated connections
* Silence a spurious tor warning in tests
* Make tor support optional, activate it via a new "tor" feature
* Extra Cargo.lock changes
* fastmod AsyncReadWrite PeerTransport zebra*
* Remove unnecessary PeerTransport generics
* Refactor common test code into a function
* Don't drop the stream until the end of the test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Tweak crawler timings so peers are more likely to be available
* Tweak min peer connection interval so we try all peers
* Let other tasks run between fanouts, so we're more likely to choose different peers
* Let other tasks run between retries, so we're more likely to choose different peers
* Let other tasks run after peer crawler DemandDrop
This makes it more likely that peers will become ready.
* Tweak a log message
* Only retry failed DNS once, then use the other DNS responses
* Limit broadcasts to half the peers
* Use a longer minimum interval for GetAddr requests
* Reduce the syncer and mempool crawler fanouts
* Stop resetting the mempool twice when it starts up
This spawns two crawlers, which send two fanouts,
so it can use up a lot of peers.
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Implement addr v1 serialization using a separate AddrV1 type
* Remove commented-out code
* Split the address serialization code into modules
* Reorder v1 and in_version fields in serialization order
* Fix a missed search-and-replace
* Explain conversion to MetaAddr
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Limit the number of outbound connections in the crawler
* Make zebra-network channel bounds depend on config.peerset_initial_target_size
* Bias Zebra towards outbound connections
And turn connection limits into `Config` methods.
* Downgrade some connection logs to debug
* Remove verbose or outdated fields in tracing logs
* Clarify connection limits
Includes:
- `fastmod OUTBOUND_PEER_BIAS_FRACTION OUTBOUND_PEER_BIAS_DENOMINATOR zebra*`
- clarify connection limit documentation
* Clarify inventory channel capacity
* Add zebra_network::initialize tests with limited numbers of peers
* Avoid cooperative async task starvation in the peer crawler and listener
If we don't yield in these loops, they can run for a long time before
tokio forces them to yield.
* Test the crawler with small connection limits
And use the multi-threaded runtime to avoid long hangs.
* Stop using the multi-threaded executor in tests where it's not needed
* Avoid starvation for every connection
Adds yields after inbound successes and initial peer connections.
* Add a crawler peer connection success test
* Add outbound connection limit tests
* Improve outbound tests
* Update some comments
* Add a mempool debug_enable_at_height config
* Rename a field in the mempool crawler
* Propagate syncer channel errors through the crawler
We don't want to ignore these errors, because they might indicate a shutdown.
(Or a bug that we should fix.)
* Use debug_enable_at_height in the mempool crawler
* Log when the mempool is activated or deactivated
* Deny unknown fields and apply defaults for all configs
* Move Duration last, as required for TOML tables
* Add a basic mempool acceptance test
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
* Add tracing and metrics for seed peer DNS resolution
* Add a grafana dashboard for seed peers
Currently this just shows the initial peer count from each seed.
* Add tracing and metrics for peer network protocol versions
* Update peers dashboard with network protocol versions
* Show peer network protocol versions for each seeder in dashboard
* Add per-seed filter to dashboard
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* README: update known issues
* Add ticket numbers
* Add network ports to README
* Make heading a bit clearer
* Update zebra listener address docs
Explain how Zebra currently uses listener addresses,
after recent changes.
* Gossip dynamically allocated listener ports to peers
Previously, Zebra would either gossip port `0`, which is invalid, or skip
gossiping its own dynamically allocated listener port.
* Improve "no configured peers" warning
And downgrade from error to warning, because inbound-only nodes are a
valid use case.
* Move random_known_port to zebra-test
* Add tests for dynamic local listener ports and the AddressBook
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Always send our local listener with the latest time
Previously, whenever there was an inbound request for peers, we would
clone the address book and update it with the local listener.
This had two impacts:
- the listener could conflict with an existing entry,
rather than unconditionally replacing it, and
- the listener was briefly included in the address book metrics.
As a side-effect, this change also makes sanitization slightly faster,
because it avoids some useless peer filtering and sorting.
* Skip listeners that are not valid for outbound connections
* Filter sanitized addresses Zebra based on address state
This fix correctly prevents Zebra gossiping client addresses to peers,
but still keeps the client in the address book to avoid reconnections.
* Add a full set of DateTime32 and Duration32 calculation methods
* Refactor sanitize to use the new DateTime32/Duration32 methods
* Security: Use canonical SocketAddrs to avoid duplicate connections
If we allow multiple variants for each peer address, we can make multiple
connections to that peer.
Also make sure sanitized MetaAddrs are valid for outbound connections.
* Test that address books contain the local listener address
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Allow use listen address in config without port
* update comments
* remove not used alias
* use Network::default_port
* Move tests and use toml instead json
* change error message
* Make match more readable
Co-authored-by: teor <teor@riseup.net>
Zebra avoids having a majority of addresses from a single peer by asking
3 peers for new addresses.
Also update a bunch of security comments and related documentation.
* Retry each peer DNS a few times individually
We retry each peer individually, as well as retrying if there are no
peers in the combined list.
DNS failures are correlated, so all peers can fail DNS, leaving Zebra
with a small list of custom-configured IP address peers.
Individual retries avoid this issue.
* Rename parse_peers to resolve_peers
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
* replace to_socket_addrs
* refactor `resolve()` into `resolve_host()`
* use `resolve_host()` to resolve config peers
* add DNS_LOOKUP_TIMEOUT constant
* don't block the main thread in initialize
Closes#536.
This removes:
- the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature);
- the EWMA parameters (these were put in the config just to avoid making a choice);
- the peer connection timeout (we can change the default value if anyone ever has a problem with it);
- the peer set request buffer size (setting this too low can make the application deadlock);
The new peer interval is left in.
Each subsection has to have `serde(default)` to get the behaviour we want
(delete all fields except the ones that have been changed); otherwise, we can
delete only entire sections.
Co-authored-by: Jane Lusby <jane@zfnd.org>
Prior to this change, the seed subcommand would consistently encounter a panic in one of the background tasks, but would continue running after the panic. This is indicative of two bugs.
First, zebrad was not configured to treat panics as non recoverable and instead defaulted to the tokio defaults, which are to catch panics in tasks and return them via the join handle if available, or to print them if the join handle has been discarded. This is likely a poor fit for zebrad as an application, we do not need to maximize uptime or minimize the extent of an outage should one of our tasks / services start encountering panics. Ignoring a panic increases our risk of observing invalid state, causing all sorts of wild and bad bugs. To deal with this we've switched the default panic behavior from `unwind` to `abort`. This makes panics fail immediately and take down the entire application, regardless of where they occur, which is consistent with our treatment of misbehaving connections.
The second bug is the panic itself. This was triggered by a duplicate entry in the initial_peers set. To fix this we've switched the storage for the peers from a `Vec` to a `HashSet`, which has similar properties but guarantees uniqueness of its keys.
tower-buffer uses tokio's mpsc channels, not the futures-rs mpsc channels.
Unlike futures-rs mpsc channels, which have capacity n+m, where n is the buffer
size and m is the number of senders, tokio channels always have buffer size n.
This means that the buffer size is shared across all peer set handles.
Thanks to @hawkw for sharing details of the Tokio internals!
The previous outbound peer connection logic got requests to connect to new
peers and processed them one at a time, making single connection attempts
and retrying if the connection attempt failed. This was quite slow, because
many connections fail, and we have to wait for timeouts. Instead, this logic
connects to new peers concurrently (up to 50 at a time).
The toml serializer function we are using -- maybe because of to_string_pretty
(?) barfs on structs that mix ordering of simple values and "tables", so just
keep all the Durations to the end.