Commit Graph

475 Commits

Author SHA1 Message Date
teor 74cc30c307 Change the cached sync tests to canopy
This change requires a cached state rebuild. The rebuilt state will be
significantly larger.
2021-03-18 10:13:47 +10:00
Alfredo Garcia d49eaab68e
Bump versions for zebrad 1.0.0-alpha.4 (#1913)
* Bump versions for zebrad 1.0.0-alpha.4

* add Cargo.lock
2021-03-16 21:12:37 -03:00
Jack Grigg bae9a7ecd5 Expose binary data in metrics
This enables slicing and aggregating metrics based on zebrad version:
https://www.robustperception.io/exposing-the-software-version-to-prometheus
2021-03-17 09:38:07 +10:00
dependabot[bot] b618f5b522 build(deps): bump tracing-subscriber from 0.2.16 to 0.2.17
Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.16 to 0.2.17.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.16...tracing-subscriber-0.2.17)

Signed-off-by: dependabot[bot] <support@github.com>
2021-03-14 19:37:24 -04:00
teor 1e1859f5a3
Merge pull request #1887 from ZcashFoundation/revert-1877-revert-1789-large-sync-testnet-disable
Revert "Revert "Disable unreliable `sync_large_checkpoints_testnet`""

This disables the failing large testnet sync test.
2021-03-12 12:31:19 +10:00
teor d494af1e90 Document how the syncer resists memory DoS 2021-03-11 06:24:46 -05:00
teor c6358b157c Reduce inbound concurrency to limit memory usage
Inbound malicious blocks can use a large amount of RAM when
deserialized. Limit inbound concurrency, so that the total amount
of RAM remains small.
2021-03-11 06:24:46 -05:00
teor 475deaf655
Adjust the crawl interval and acceptance test timeout (#1878) 2021-03-11 07:53:37 +10:00
teor ac4611ffc4 Revert "Disable unreliable `sync_large_checkpoints_testnet` (#1789)"
This reverts commit bae49e54df.
2021-03-10 02:14:09 -05:00
dependabot[bot] 70327dc9f5 build(deps): bump once_cell from 1.6.0 to 1.7.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.6.0 to 1.7.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.6.0...v1.7.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-25 15:32:03 -05:00
teor fb6acfaff7
Update the acceptance test port range (#1812)
Windows can reserve or use ports up to 53500.

Windows and macOS sequentially allocate ephemeral ports,
starting at 41952.
2021-02-25 09:27:56 +10:00
dependabot[bot] dab65b33eb build(deps): bump once_cell from 1.5.2 to 1.6.0
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.5.2 to 1.6.0.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.5.2...v1.6.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-23 10:59:40 -05:00
teor 7558f74c78 Bump versions for zebrad 1.0.0-alpha.3 2021-02-23 10:39:13 -05:00
dependabot[bot] 304d7682f5 build(deps): bump tracing-subscriber from 0.2.15 to 0.2.16
Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.15 to 0.2.16.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.15...tracing-subscriber-0.2.16)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-22 01:38:42 -05:00
Alfredo Garcia bae49e54df
Disable unreliable `sync_large_checkpoints_testnet` (#1789)
* delete `sync_large_checkpoints_testnet`
2021-02-19 21:40:01 +00:00
teor b0bc4a79c9 Disable conflict failure cleanup on macOS 2021-02-20 06:22:11 +10:00
teor af12f20732 Re-enable macOS conflict tests
We disabled these tests pending #1613. But the comment incorrectly said
we were waiting for #1631.
2021-02-20 06:22:11 +10:00
teor c51fd688ee Skip node2.is_running() on Windows
`node2.is_running()` can return `true` on Windows, even though `node2`
has logged a panic. This cleanup code only runs if `node2` fails to panic
and exit as expected. So it's ok for us to skip it.

See #1781 for details.
2021-02-19 18:03:07 +10:00
teor 631fe22422 Fix conflict test node termination
On Windows, if a process is killed after it is dead, it returns `true`
for `was_killed`. Instead, check if the process is running before killing
it.

Also make the section where processes are running as short as possible,
and include context for both processes in every error.
2021-02-19 18:03:07 +10:00
teor bcb9cead88
Kill running acceptance test nodes on error (#1770)
And show the output from those nodes.

These changes help us diagnose errors that happen while one or more
acceptance test nodes are running.
2021-02-19 06:46:27 +10:00
teor e61b5e50a2
Diagnostics for CI port conflict failures (#1766)
Log a "Trying..." message before each listener opens, to see if the
delay is inside Zebra, or in the test harness or OS.

Also report the configured and actual ports where possible, for better
diagnostics.
2021-02-18 12:15:09 -03:00
teor 972103d797 Fix tracing macro syntax 2021-02-17 11:09:22 -05:00
teor 253d1c02b3 Make sync logging a bit less verbose
And tweak some log content
2021-02-17 11:09:22 -05:00
teor cc7d5bd2ad
Update comments for the inbound service (#1740) 2021-02-16 06:14:40 +10:00
teor 372a432179
Update the call_all comment in Inbound (#1737) 2021-02-16 06:14:16 +10:00
teor e8d8a49f41
Increase the conflict acceptance test launch delay (#1736)
* Increase the conflict acceptance test launch delay

Also rename the tests - the listener is for the Zcash protocol,
but the state, metrics, and tracing are Zebra-specific.

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-02-16 05:52:30 +10:00
dependabot[bot] af11cc4815 build(deps): bump vergen from 3.1.0 to 3.2.0
Bumps [vergen](https://github.com/rustyhorde/vergen) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/rustyhorde/vergen/releases)
- [Commits](https://github.com/rustyhorde/vergen/compare/v3.1.0...v3.2.0)

Signed-off-by: dependabot[bot] <support@github.com>
2021-02-10 21:36:03 -05:00
teor 0b76352468
Document a state_contains bug (#1715)
* Document a state_contains bug in the syncer and Inbound
2021-02-10 09:05:14 +10:00
Deirdre Connolly 0c5daa8410 Bump versions for zebrad 1.0.0-alpha.2
Including tower-batch bump to 0.2.0, tower-fallback to 0.2.0, zebra-script to 1.0.0-alpha.3
2021-02-09 16:14:29 -05:00
teor dce11358d7
Log when the syncer awaits peer readiness (#1714) 2021-02-10 07:09:27 +10:00
Alfredo Garcia d7c40af2a8
Fix shutdown panics (#1637)
* add a shutdown flag in zebra_chain::shutdown
* fix network panic on shutdown
* fix checkpoint panic on shutdown
2021-02-03 19:03:28 +10:00
teor 6679a124e3 Require Inbound setup handlers to provide a result
Rather than having them default to `Ok(())`, which is incorrect
for some error handlers.
2021-02-03 08:32:10 +10:00
teor 09c8c89462 Make sure FailedInit never escapes Inbound::poll_ready 2021-02-03 08:32:10 +10:00
teor 134a5e78bd Consistently use `network_setup` for the Inbound Setup 2021-02-03 08:32:10 +10:00
teor 1c8362fe01 Remove unused imports 2021-02-03 08:32:10 +10:00
Jane Lusby 4cf331562c combine network setup into an exhaustive match 2021-02-03 08:32:10 +10:00
Jane Lusby 4d6ef89248 avoid using async blocks to avoid lifetime bug with generators 2021-02-03 08:32:10 +10:00
Jane Lusby 685a592399 Add clonable wrapper around TryRecvError 2021-02-03 08:32:10 +10:00
teor 6ffeb670ed Log the failed response in an unreachable panic 2021-02-03 08:32:10 +10:00
teor eac4fd181a Add a Setup enum to manage Inbound network setup internal state
This change encodes a bunch of invariants in the type system,
and adds explicit failure states for:
* a closed oneshot,
* bugs in the initialization code.
2021-02-03 08:32:10 +10:00
teor 32b032204a Consistently return Response::Nil during setup
And log an info-level message as a diagnostic, in case setup takes a
long time.
2021-02-03 08:32:10 +10:00
teor 94eb91305b Stop using ServiceExt::call_all due to buffer bugs
ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use
it in Zebra.

Instead, use a loop in the returned future.

See #1593 for details.
2021-02-03 08:32:10 +10:00
teor 64bc45cd2e Fix state readiness hangs for Inbound
Use `ServiceExt::oneshot` to perform state requests.

Explain that `ServiceExt::call_all` calls `poll_ready` internally.
Document a state service invariant imposed by `ServiceExt::call_all`.
2021-02-03 08:32:10 +10:00
teor 4d1a2fd02e Make the Inbound invariant clearer 2021-02-03 08:32:10 +10:00
teor 2a25b9ee72 Remove services that are never `call`ed from Inbound
Uses the `ServiceExt::oneshot` design pattern from #1593.
2021-02-03 08:32:10 +10:00
Alfredo Garcia 4b34482264
Add hints to port conflict and lock file panics (#1535)
* add hint for port error
* add issue filter for port panic
* add lock file hint
* add metrics endpoint port conflict hint
* add hint for tracing endpoint port conflict
* add acceptance test for resource conflics
* Split out common conflict test code into a function
* Add state, metrics, and tracing conflict tests

* Add a full set of stderr acceptance test functions

This change makes the stdout and stderr acceptance test interfaces
identical.

* move Zcash listener opening
* add todo about hint for disk full
* add constant for lock file
* match path in state cache
* don't match windows cache path

* Use Display for state path logs

Avoids weird escaping on Windows when using Debug

* Add Windows conflict error messages

* Turn PORT_IN_USE_ERROR into a regex

And add another alternative Windows-specific port error

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jane@zfnd.org>
2021-01-29 22:36:33 +10:00
teor 24f1b9bad1
Document the Inbound service in the start module (#1653) 2021-01-29 22:19:06 +10:00
teor 21b0360114 Limit concurrent inbound gossipped block requests
Uses the "load shed directly" design pattern from #1618.
2021-01-29 11:02:26 +10:00
teor 3d9888f736 Rewrite a sync comment 2021-01-29 11:02:26 +10:00
Deirdre Connolly 1b09538277
Bump versions for zebrad 1.0.0-alpha.1 (#1646)
* Bump versions where appropriate

Tested with cargo install --locked --path etc

* Remove fixed panics from 'Known Issues'

* Change to alpha release series in the README

Co-authored-by: teor <teor@riseup.net>
2021-01-27 20:31:39 -05:00
teor 391c53aa60 Move BoxError to zebrad's lib.rs
For consistency with other crates.
2021-01-27 12:14:27 -08:00
teor 494a5130c1 Fix clippy "unnecessary Vec::push" lints 2021-01-22 11:51:30 -08:00
teor 00c3ad5d8e Fix clippy "unnecessary Ok" lints 2021-01-22 11:51:30 -08:00
teor eadbea1423 Construct structs with ..default() to avoid a lint 2021-01-19 11:02:20 -05:00
teor b1d28b73fd Stop disabling lints that no longer cause warnings on nightly 2021-01-19 11:02:20 -05:00
teor 258789ed9b Use the rustc unknown lints attribute
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.

Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.

But we also need to keep the old name, so we can continue to build
without warnings on stable.

And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.

We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
2021-01-19 11:02:20 -05:00
teor 8a7b0232f0
Stop failing acceptance tests if their directories already exist (#1588)
* Stop failing acceptance tests if their directories already exist
* Add an immutable config writing helper
  and use it in the cached sapling acceptance tests.

Also:
  * consistently create missing config and state directories
  * refactor the common config writing code into a separate function
  * only ignore NotFound errors in replace_config
  * enforce config immutability using the type system
2021-01-14 16:59:06 +10:00
teor 9cdf41f5f4
Panic if the lookahead limit is misconfigured (#1589) 2021-01-14 14:06:30 +10:00
teor 92d95d4be5 Refactor inbound members into a consistent order
And add download comments
2021-01-13 20:46:25 -05:00
teor fb76eb2e6b Add download and verify timeouts to the inbound service 2021-01-13 20:46:25 -05:00
teor 973aec8ccc Refactor sync members into a consistent order
And add comments about correctness and usage.
2021-01-13 20:46:25 -05:00
teor c2893dce51 Warn when the user's configured lookahead limit is ignored 2021-01-13 20:46:25 -05:00
teor 3699bbdae6 Add some additional sync correctness constraints
And adjust the sync restart delay as a consequence.
2021-01-13 20:46:25 -05:00
teor cef0a492d8 Add a timeout to sync service block verification
This timeout stops the sync service hanging when it is missing required
blocks, but the lookahead queue is full of dependent verify tasks, so the
missing blocks never get downloaded.
2021-01-13 20:46:25 -05:00
teor 730910cd99 Upgrade to tokio 0.3.6 from crates.io
And remove the tokio git dependency patch
2021-01-12 15:37:27 -05:00
dependabot[bot] e3e9990315 build(deps): bump inferno from 0.10.2 to 0.10.3
Bumps [inferno](https://github.com/jonhoo/inferno) from 0.10.2 to 0.10.3.
- [Release notes](https://github.com/jonhoo/inferno/releases)
- [Changelog](https://github.com/jonhoo/inferno/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jonhoo/inferno/compare/v0.10.2...v0.10.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-12 00:53:49 -05:00
teor caca450904
zebrad acceptance test cleanup (#1560)
Check misconfigured ephemeral doesn't create a state dir

Add extra misconfigured `zebrad` ephemeral mode checks:
* doesn't create a state directory
* doesn't create unexpected files or directories in the working directory

Check ephemeral doesn't delete an existing state directory

Refactor all the ephemeral configs and checks into a single test
function.

Also:
* cleanup acceptance tests using utility functions
* make some checks consistent between tests
* make error messages consistent

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2021-01-12 12:26:08 +10:00
teor c75cbdea79
Log configured network in every log message (#1568)
* Add the configured network to error reports
* Log the configured network at error level
* Create the global span immediately after activating tracing
And leak the span guard, so the span is always active.

* Include panic metadata in the report and URL
* Use `Main` and `Test` in the global span
`net=Mainnet` is a bit redundant
2021-01-12 07:46:56 +10:00
teor 40c427cf7c Make `cargo run` use `zebrad` by default
When `cargo run` is run in the workspace directory, it can see two
executables:
- `zebrad`
- `zebra_checkpoints`

Adding `default-run = "zebrad"` to `zebrad/Cargo.toml` makes the
workspace run `zebrad` by default. (Even though it's redundant for the
`zebrad` crate itself.)
2021-01-08 15:14:08 -08:00
teor b1f14f47c6
Rewrite GetData handling to match the zcashd implementation (#1518)
* Rewrite GetData handling to match the zcashd implementation

`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)

This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.

Also change Zebra's GetData responses to peer request so they ignore
missing blocks.

* Stop hanging on incomplete transaction or block responses

Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
   that were successfully received
2. if none of the expected blocks or transactions were received, return
   an error, and close the connection
2021-01-04 13:25:35 +10:00
Deirdre Connolly f5b3412a50 Get PathBuf even if /zebrad-cache exists for create_cached_database_height() 2020-12-30 21:32:55 -05:00
teor 6cd7f255d4
Cleanup acceptance tests using utility functions (#1537)
And add a missing network check.
2020-12-20 09:06:18 +10:00
teor 3355be4c41
Improve the random port function docs (#1536)
Also:
* rename the function
* create an alternative function for the common case.
2020-12-18 11:32:33 +10:00
teor 69fcf64d6c
Disable issue URLs for "duplicate hash" errors (#1517)
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.

Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
2020-12-16 08:14:42 +10:00
teor 008577561c Use a sleep future in the async acceptance tests
And wait slightly longer for `zebrad` to launch.

These fixes should reduce the failure rate of the acceptance tests on
busy machines.
2020-12-16 08:09:48 +10:00
teor 1b6bf7f105 Use random ports in the acceptance tests
This change avoids errors when tests are cancelled and re-run within a
short period of time, for example, using `cargo watch`.

It introduces a slight risk of port conflicts between the endpoint tests,
and with (ephemeral) ports used by other services. The risk of conflicts
across 2 tests is very low, and tests should be run in an isolated
environment on busy servers.
2020-12-16 08:09:48 +10:00
Alfredo Garcia 41833340c1
downgrade remaining version strings to 1.0.0-alpha.0 (#1488) 2020-12-15 11:21:00 +10:00
Deirdre Connolly 2d1698a120 Comment out Sentry stacktraces for now
While panic = abort, Sentry collects the same one-line stack trace for all panics,
making it incorrectly dedupe different errors into one.
2020-12-12 13:26:52 -05:00
Deirdre Connolly cff28f7ac8 Use the commit sha as the sentry release 2020-12-09 13:06:18 -05:00
Jane Lusby 400213e2b3 integrate sentry with our existing panic reporting logic 2020-12-09 13:06:18 -05:00
Deirdre Connolly f1ec1d626d Tidy for now 2020-12-09 13:06:18 -05:00
Deirdre Connolly 44e1051dee Debug 2020-12-09 13:06:18 -05:00
Deirdre Connolly 8b268e3f71 Don't keep guard around 2020-12-09 13:06:18 -05:00
Deirdre Connolly 25f6fd25b3 Test catching panic 2020-12-09 13:06:18 -05:00
Deirdre Connolly 6a17549945 Try sentry-tracing integration 2020-12-09 13:06:18 -05:00
Deirdre Connolly c03a3a2606 Pull DSN from runtime env, enable Sentry debug mode with RUST_LOG=debug 2020-12-09 13:06:18 -05:00
Deirdre Connolly 27e42f4ed5 Set up Sentry error collection via a feature flag 2020-12-09 13:06:18 -05:00
Deirdre Connolly 47d78d4cf4 Try sentry::init() 2020-12-09 13:06:18 -05:00
teor 16ffb1dbbf
Disable issue URLs on all timeouts (#1470)
This change helps prevent spurious bug reports.
2020-12-08 07:47:01 +10:00
teor 531a33f03b
Update the zebrad commit whenever any Zebra crate changes (#1455)
vergen's implementation of REBUILD_ON_HEAD_CHANGE assumes that the .git
directory is in the crate root, but Zebra uses a workspace.

Temporary fix for rustyhorde/vergen#21.
2020-12-05 07:23:05 +10:00
teor b4a50fd99f
Downgrade tokio to 0.3.4 to avoid a time wheel panic (#1453)
See tokio-rs/tokio#2789 for details. We were seeing this panic during
normal operation, not just at shutdown.
2020-12-04 13:52:37 +10:00
Jane Lusby ef7e91c3c7
disable color-eyre colors if not connected to a tty (#1443)
* disable color-eyre colors if not connected to a tty
* check if color is disabled
2020-12-04 11:05:25 +10:00
dependabot[bot] 8c052cc39a build(deps): bump color-eyre from 0.5.9 to 0.5.10
Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.9 to 0.5.10.
- [Release notes](https://github.com/yaahc/color-eyre/releases)
- [Changelog](https://github.com/yaahc/color-eyre/blob/v0.5.10/CHANGELOG.md)
- [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.9...v0.5.10)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-03 10:55:16 -05:00
Jane Lusby 90f944709b
fix git commit logic to work on gcloud (#1442) 2020-12-03 15:18:55 +10:00
teor c0bbac89b3 `cargo fmt --all` 2020-12-02 19:45:27 -05:00
teor e4525d8ee2 Improve metrics acceptance test failure messages 2020-12-02 19:45:27 -05:00
Deirdre Connolly 9221dddb7b Revert "zebrad: remove git pin on metrics dependency"
It broke our metrics endpoint.

This reverts commit dc77163524.
2020-12-02 17:28:49 -05:00
Jane Lusby d7bef1c155
bump color-eyre version to avoid a panic when printing spantraces (#1438) 2020-12-02 14:16:18 -08:00
teor 0e42d8b6c1 Always enable color_eyre, even when color is disabled
We want to automatically disable colors upstream in color_eyre,
and add a config that allows users to always turn off color.
2020-12-02 10:25:44 -08:00
teor bed34168c1 Automatically disable abscissa colors and color_eyre when writing to a file 2020-12-02 10:25:44 -08:00