* Implement addr v1 serialization using a separate AddrV1 type
* Remove commented-out code
* Split the address serialization code into modules
* Reorder v1 and in_version fields in serialization order
* Fix a missed search-and-replace
* Explain conversion to MetaAddr
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Add unused seed peers to the AddressBook
* Document a new `await`
We added an extra await on the AddressBook thread mutex.
Co-authored-by: teor <teor@riseup.net>
* Fix a typo
* Refactor names
* Return early from `limit_initial_peers`
* Add `proptest`s regressions
* Return `MetaAddr` instead of `None`
* Test if `zebra_network::init()` deadlocks
* Remove unneeded regressions
* Rename `TimestampCollector` to `AddressBookUpdater` (#2992)
* Rename `TimestampCollector` to `AddressBookUpdater`
* Update comments
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Move `all_peers` instead of copying them
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Make `Duration` a const
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Use a timeout instead of measuring the elapsed time
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Copy `initial_peers` instead of moving them
* Refactor the position of `NewInitial` and `new_initial`
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Update `tower` to version `0.4.9`
Update to latest version to add support for Tokio version 1.
* Replace usage of `ServiceExt::ready_and`
It was deprecated in favor of `ServiceExt::ready`.
* Update Tokio dependency to version `1.13.0`
This will break the build because the code isn't ready for the update,
but future commits will fix the issues.
* Replace import of `tokio::stream::StreamExt`
Use `futures::stream::StreamExt` instead, because newer versions of
Tokio don't have the `stream` feature.
* Use `IntervalStream` in `zebra-network`
In newer versions of Tokio `Interval` doesn't implement `Stream`, so the
wrapper types from `tokio-stream` have to be used instead.
* Use `IntervalStream` in `inventory_registry`
In newer versions of Tokio the `Interval` type doesn't implement
`Stream`, so `tokio_stream::wrappers::IntervalStream` has to be used
instead.
* Use `BroadcastStream` in `inventory_registry`
In newer versions of Tokio `broadcast::Receiver` doesn't implement
`Stream`, so `tokio_stream::wrappers::BroadcastStream` instead. This
also requires changing the error type that is used.
* Handle `Semaphore::acquire` error in `tower-batch`
Newer versions of Tokio can return an error if the semaphore is closed.
This shouldn't happen in `tower-batch` because the semaphore is never
closed.
* Handle `Semaphore::acquire` error in `zebrad` test
On newer versions of Tokio `Semaphore::acquire` can return an error if
the semaphore is closed. This shouldn't happen in the test because the
semaphore is never closed.
* Update some `zebra-network` dependencies
Use versions compatible with Tokio version 1.
* Upgrade Hyper to version 0.14
Use a version that supports Tokio version 1.
* Update `metrics` dependency to version 0.17
And also update the `metrics-exporter-prometheus` to version 0.6.1.
These updates are to make sure Tokio 1 is supported.
* Use `f64` as the histogram data type
`u64` isn't supported as the histogram data type in newer versions of
`metrics`.
* Update the initialization of the metrics component
Make it compatible with the new version of `metrics`.
* Simplify build version counter
Remove all constants and use the new `metrics::incement_counter!` macro.
* Change metrics output line to match on
The snapshot string isn't included in the newer version of
`metrics-exporter-prometheus`.
* Update `sentry` to version 0.23.0
Use a version compatible with Tokio version 1.
* Remove usage of `TracingIntegration`
This seems to not be available from `sentry-tracing` anymore, so it
needs to be replaced.
* Add sentry layer to tracing initialization
This seems like the replacement for `TracingIntegration`.
* Remove unnecessary conversion
Suggested by a Clippy lint.
* Update Cargo lock file
Apply all of the updates to dependencies.
* Ban duplicate tokio dependencies
Also ban git sources for tokio dependencies.
* Stop allowing sentry-tracing git repository in `deny.toml`
* Allow remaining duplicates after the tokio upgrade
* Use C: drive for CI build output on Windows
GitHub Actions uses a Windows image with two disk drives, and the
default D: drive is smaller than the C: drive. Zebra currently uses a
lot of space to build, so it has to use the C: drive to avoid CI build
failures because of insufficient space.
Co-authored-by: teor <teor@riseup.net>
* Limit the number of outbound connections in the crawler
* Make zebra-network channel bounds depend on config.peerset_initial_target_size
* Bias Zebra towards outbound connections
And turn connection limits into `Config` methods.
* Downgrade some connection logs to debug
* Remove verbose or outdated fields in tracing logs
* Clarify connection limits
Includes:
- `fastmod OUTBOUND_PEER_BIAS_FRACTION OUTBOUND_PEER_BIAS_DENOMINATOR zebra*`
- clarify connection limit documentation
* Clarify inventory channel capacity
* Add zebra_network::initialize tests with limited numbers of peers
* Avoid cooperative async task starvation in the peer crawler and listener
If we don't yield in these loops, they can run for a long time before
tokio forces them to yield.
* Test the crawler with small connection limits
And use the multi-threaded runtime to avoid long hangs.
* Stop using the multi-threaded executor in tests where it's not needed
* Avoid starvation for every connection
Adds yields after inbound successes and initial peer connections.
* Add a crawler peer connection success test
* Add outbound connection limit tests
* Improve outbound tests
* Wrap `Sleep` timer in a `Pin<Box<_>>`
The `Sleep` type doesn't implement `Unpin` in newer versions of Tokio.
* Wrap `Sleep` type in a `Pin<Box<_>>`
In newer Tokio versions the `Sleep` type doesn't implement `Unpin`, so
it needs to be manually pinned.
* Count the number of active inbound and outbound peer connections
And reduce the count when each connection fails.
* Fix a comment typo
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
* Add metrics gauges for the most recent peer network protocol version
This gague lets us join the initial seeds to the network protocol versions,
even if the peer upgrades and reconnects with a different version.
* Ensure dashboard peer network versions are unique
Otherwise, prometheus returns an error,
and the dashboard shows no data.
* Make seeder labels more readable
- put labels to the right of the graph
- remove default ports
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Add tracing and metrics for seed peer DNS resolution
* Add a grafana dashboard for seed peers
Currently this just shows the initial peer count from each seed.
* Add tracing and metrics for peer network protocol versions
* Update peers dashboard with network protocol versions
* Show peer network protocol versions for each seeder in dashboard
* Add per-seed filter to dashboard
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
* Rename ChainTipReceiver to CurrentChainTip
`fastmod ChainTipReceiver CurrentChainTip zebra*`
* Update chain tip documentation and variable names
* Basic chain tip change implementation, without resets
Also includes the following name changes:
```
fastmod CurrentChainTip LatestChainTip zebra*
fastmod chain_tip_receiver latest_chain_tip zebra*
```
* Clarify the difference between `LatestChainTip` and `ChainTipChange`
* Rename BestTipHeight so it can be generalised to ChainTipSender
`fastmod BestTipHeight ChainTipSender zebra*`
For senders:
`fastmod best_tip_height chain_tip_sender zebra*`
For receivers:
`fastmod best_tip_height chain_tip_receiver zebra*`
* Rename best_tip_height module to chain_tip
* Wrap the chain tip watch channel in a ChainTipReceiver type
* Create a ChainTip trait to avoid tricky crate dependencies
And add convenience impls for optional and empty chain tips.
* Use the ChainTip trait in zebra-network
* Replace `Option<ChainTip>` with `NoChainTip`
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
`Message::Inv(TxId+)` is a transaction advertisement,
so it should be converted into `Request::AdvertiseTransactionIds`.
This is a copy-paste mistake from the original zebra-network
implementation.
* Rename internal network requests for wide transaction IDs
fastmod TransactionsByHash TransactionsById zebra*
fastmod AdvertiseTransactions AdvertiseTransactionIds zebra*
fastmod MempoolTransactions MempoolTransactionIds zebra*
fastmod TransactionHashes TransactionIds zebra*
* Update network transaction request/response comments
* Rename a transaction hash method for wide transaction IDs
fastmod transaction_hashes transaction_ids zebra-network
* Add UnminedTxId methods and conversions for InventoryHash
* Map WtxIds to unmined transaction network messages
Also, use UnminedTxId and UnminedTx in:
* Zebra's internal request and response format, and
* external Zcash network protocol messages.
* Enable WtxId mempool inventory tracking for peers
* Further clarify transaction IDs
* Use Witnessed rather than Wide for transaction IDs
And rename narrow to legacy when it only applies to v1-v4 transactions.
Otherwise, rename it to mined ID.
* Rename a missed binding
* Remove an incorrectly named binding
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Simplify state service initialization in test
Use the test helper function to remove redundant code.
* Create `BestTipHeight` helper type
This type abstracts away the calculation of the best tip height based on
the finalized block height and the best non-finalized chain's tip.
* Add `best_tip_height` field to `StateService`
The receiver endpoint is currently ignored.
* Return receiver endpoint from service constructor
Make it available so that the best tip height can be watched.
* Update finalized height after finalizing blocks
After blocks from the queue are finalized and committed to disk, update
the finalized block height.
* Update best non-finalized height after validation
Update the value of the best non-finalized chain tip block height after
a new block is committed to the non-finalized state.
* Update finalized height after loading from disk
When `FinalizedState` is first created, it loads the state from
persistent storage, and the finalized tip height is updated. Therefore,
the `best_tip_height` must be notified of the initial value.
* Update the finalized height on checkpoint commit
When a checkpointed block is commited, it bypasses the non-finalized
state, so there's an extra place where the finalized height has to be
updated.
* Add `best_tip_height` to `Handshake` service
It can be configured using the `Builder::with_best_tip_height`. It's
currently not used, but it will be used to determine if a connection to
a remote peer should be rejected or not based on that peer's protocol
version.
* Require best tip height to init. `zebra_network`
Without it the handshake service can't properly enforce the minimum
network protocol version from peers. Zebrad obtains the best tip height
endpoint from `zebra_state`, and the test vectors simply use a dummy
endpoint that's fixed at the genesis height.
* Pass `best_tip_height` to proto. ver. negotiation
The protocol version negotiation code will reject connections to peers
if they are using an old protocol version. An old version is determined
based on the current known best chain tip height.
* Handle an optional height in `Version`
Fallback to the genesis height in `None` is specified.
* Reject connections to peers on old proto. versions
Avoid connecting to peers that are on protocol versions that don't
recognize a network update.
* Document why peers on old versions are rejected
Describe why it's a security issue above the check.
* Test if `BestTipHeight` starts with `None`
Check if initially there is no best tip height.
* Test if best tip height is max. of latest values
After applying a list of random updates where each one either sets the
finalized height or the non-finalized height, check that the best tip
height is the maximum of the most recently set finalized height and the
most recently set non-finalized height.
* Add `queue_and_commit_finalized` method
A small refactor to make testing easier. The handling of requests for
committing non-finalized and finalized blocks is now more consistent.
* Add `assert_block_can_be_validated` helper
Refactor to move into a separate method some assertions that are done
before a block is validated. This is to allow moving these assertions
more easily to simplify testing.
* Remove redundant PoW block assertion
It's also checked in
`zebra_state::service::check::block_is_contextually_valid`, and it was
getting in the way of tests that received a gossiped block before
finalizing enough blocks.
* Create a test strategy for test vector chain
Splits a chain loaded from the test vectors in two parts, containing the
blocks to finalize and the blocks to keep in the non-finalized state.
* Test committing blocks update best tip height
Create a mock blockchain state, with a chain of finalized blocks and a
chain of non-finalized blocks. Commit all the blocks appropriately, and
verify that the best tip height is updated.
Co-authored-by: teor <teor@riseup.net>
* Support a min protocol version during initial block download
But don't actually use the state height yet.
Also rename some functions and constants.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Security: stop gossiping failure and attempt times as last_seen times
Previously, Zebra had a single time field for peer addresses, which was
updated every time a peer was attempted, sent a message, or failed.
This is a security issue, because the `last_seen` time should be
"the last time [a peer] connected to that node", so that
"nodes can use the time field to avoid relaying old 'addr' messages".
So Zebra was sending incorrect peer information to other nodes.
As part of this change, we split the `last_seen` time into the
following fields:
- untrusted_last_seen: gossiped from other peers
- last_response: time we got a response from a directly connected peer
- last_attempt: time we attempted to connect to a peer
- last_failure: time a connection with a peer failed
* Implement Arbitrary and strategies for MetaAddrChange
Also replace the MetaAddr Arbitrary impl with a derive.
* Write proptests for MetaAddr and MetaAddrChange
MetaAddr:
- the only times that get included in serialized MetaAddrs are
the untrusted last seen and responded times
MetaAddrChange:
- the untrusted last seen time is never updated
- the services are only updated if there has been a handshake
Add canonical addresses from inbound connections to the address book,
so that Zebra can use them for reconnection attempts.
Use the newly added `NeverAttemptedAlternate` state for these addresses,
so we try gossiped addresses first, then canonical addresses. This avoids
duplicate connections to inbound peers.
* Instrument the crawl task
When we created the crawl task, we forgot to instrument it with the
global span. This fix makes sure that the git and network span appears on
crawl logs.
* Instrument the connector
* Improve handshake instrumentation
Make some spans debug, so there are not too many spans.
* Add the address to initial peer connection errors
- stop putting inbound addresses in the address book
- drop address book entries that can't be used for outbound connections
- distinguish between temporary inbound and permanent outbound peer
addresses
- also create variants to handle proxy connections
(but don't use them yet)
- avoid tracking connection state for isolated connections
- document security constraints for the address book and peer set
* Stop ignoring inbound message errors and handshake timeouts
To avoid hangs, Zebra needs to maintain the following invariants in the
handshake and heartbeat code:
- each handshake should run in a separate spawned task
(not yet implemented)
- every message, error, timeout, and shutdown must update the peer address state
- every await that depends on the network must have a timeout
Once the Connection is created, it should handle timeouts.
But we need to handle timeouts during handshake setup.
* Avoid hangs by adding a timeout to the candidate set update
Also increase the fanout from 1 to 2, to increase address diversity.
But only return permanent errors from `CandidateSet::update`, because
the crawler task exits if `update` returns an error.
Also log Peers response errors in the CandidateSet.
* Use the select macro in the crawler to reduce hangs
The `select` function is biased towards its first argument, risking
starvation.
As a side-benefit, this change also makes the code a lot easier to read
and maintain.
* Split CrawlerAction::Demand into separate actions
This refactor makes the code a bit easier to read, at the cost of
sometimes blocking the crawler on `candidates.next()`.
That's ok, because `next` only has a short (< 100 ms) delay. And we're
just about to spawn a separate task for each handshake.
* Spawn a separate task for each handshake
This change avoids deadlocks by letting each handshake make progress
independently.
* Move the dial task into a separate function
This refactor improves readability.
* Fix buggy future::select function usage
And document the correctness of the new code.