* build(deps): bump vergen from 3.2.0 to 5.1.1
* fix hardcoded version for Tracing struct
* add additional metadata
* remove extra allocations for metadata
* Remove zebrad code version from release checklist
The zebrad code automatically uses the crate version now.
* Sort panic metadata into rough categories
Co-authored-by: teor <teor@riseup.net>
Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue.
Some notable changes include:
## Added
- Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906)
- Document test coverage workflow (#1919)
- Add a final job to CI, so we can easily require all the CI jobs to pass (#1927)
## Changed
- Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926)
- This is a breaking change for users that depend on the exact height of the mandatory checkpoint.
## Fixed
- tower-batch: wake waiting workers on close to avoid hangs (#1908)
- Assert that pre-Canopy blocks use checkpointing (#1909)
- Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923)
## Security
- Stop relying on unchecked length fields when preallocating vectors (#1925)
`node2.is_running()` can return `true` on Windows, even though `node2`
has logged a panic. This cleanup code only runs if `node2` fails to panic
and exit as expected. So it's ok for us to skip it.
See #1781 for details.
On Windows, if a process is killed after it is dead, it returns `true`
for `was_killed`. Instead, check if the process is running before killing
it.
Also make the section where processes are running as short as possible,
and include context for both processes in every error.
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.
Also report the configured and actual ports where possible, for better
diagnostics.
* Increase the conflict acceptance test launch delay
Also rename the tests - the listener is for the Zcash protocol,
but the state, metrics, and tracing are Zebra-specific.
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
This change encodes a bunch of invariants in the type system,
and adds explicit failure states for:
* a closed oneshot,
* bugs in the initialization code.
Use `ServiceExt::oneshot` to perform state requests.
Explain that `ServiceExt::call_all` calls `poll_ready` internally.
Document a state service invariant imposed by `ServiceExt::call_all`.
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests
* Add a full set of stderr acceptance test functions
This change makes the stdout and stderr acceptance test interfaces
identical.
* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path
* Use Display for state path logs
Avoids weird escaping on Windows when using Debug
* Add Windows conflict error messages
* Turn PORT_IN_USE_ERROR into a regex
And add another alternative Windows-specific port error
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
* Bump versions where appropriate
Tested with cargo install --locked --path etc
* Remove fixed panics from 'Known Issues'
* Change to alpha release series in the README
Co-authored-by: teor <teor@riseup.net>
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.
Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.
But we also need to keep the old name, so we can continue to build
without warnings on stable.
And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.
We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
* Stop failing acceptance tests if their directories already exist
* Add an immutable config writing helper
and use it in the cached sapling acceptance tests.
Also:
* consistently create missing config and state directories
* refactor the common config writing code into a separate function
* only ignore NotFound errors in replace_config
* enforce config immutability using the type system
This timeout stops the sync service hanging when it is missing required
blocks, but the lookahead queue is full of dependent verify tasks, so the
missing blocks never get downloaded.
Check misconfigured ephemeral doesn't create a state dir
Add extra misconfigured `zebrad` ephemeral mode checks:
* doesn't create a state directory
* doesn't create unexpected files or directories in the working directory
Check ephemeral doesn't delete an existing state directory
Refactor all the ephemeral configs and checks into a single test
function.
Also:
* cleanup acceptance tests using utility functions
* make some checks consistent between tests
* make error messages consistent
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Add the configured network to error reports
* Log the configured network at error level
* Create the global span immediately after activating tracing
And leak the span guard, so the span is always active.
* Include panic metadata in the report and URL
* Use `Main` and `Test` in the global span
`net=Mainnet` is a bit redundant
When `cargo run` is run in the workspace directory, it can see two
executables:
- `zebrad`
- `zebra_checkpoints`
Adding `default-run = "zebrad"` to `zebrad/Cargo.toml` makes the
workspace run `zebrad` by default. (Even though it's redundant for the
`zebrad` crate itself.)
* Rewrite GetData handling to match the zcashd implementation
`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)
This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.
Also change Zebra's GetData responses to peer request so they ignore
missing blocks.
* Stop hanging on incomplete transaction or block responses
Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
that were successfully received
2. if none of the expected blocks or transactions were received, return
an error, and close the connection
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.
Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
This change avoids errors when tests are cancelled and re-run within a
short period of time, for example, using `cargo watch`.
It introduces a slight risk of port conflicts between the endpoint tests,
and with (ephemeral) ports used by other services. The risk of conflicts
across 2 tests is very low, and tests should be run in an isolated
environment on busy servers.
vergen's implementation of REBUILD_ON_HEAD_CHANGE assumes that the .git
directory is in the crate root, but Zebra uses a workspace.
Temporary fix for rustyhorde/vergen#21.
Because the new version of the prometheus exporter launches its own
single-threaded runtime on a dedicated worker thread, there's no need
for the tokio and hyper versions it uses internally to align with the
versions used in other crates. So we don't need to use our fork with
tokio 0.3, and can just use the published alpha. Advancing to a later
alpha may fix the missing-metrics issues.
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.
To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.
Co-authored-by: teor <teor@riseup.net>
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions
Add a `max_len` argument to support `FindHeaders` requests.
Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.
* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
This provides useful and not too noisy output at INFO level. We do an
info-level message on every block commit instead of trying to do one
message every N blocks, because this is useful both for initial block
sync as well as continuous state updates on new blocks.
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware. This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See
ddc64e8d4d
for more context on the Tower changes. To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
## Motivation
Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb.
Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network.
## Solution
This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain.
The code in this pull request has:
- [x] Documentation Comments
- [x] Unit Tests and Property Tests
## Review
@hdevalence