Commit Graph

427 Commits

Author SHA1 Message Date
teor 3d9888f736 Rewrite a sync comment 2021-01-29 11:02:26 +10:00
Deirdre Connolly 1b09538277
Bump versions for zebrad 1.0.0-alpha.1 (#1646)
* Bump versions where appropriate

Tested with cargo install --locked --path etc

* Remove fixed panics from 'Known Issues'

* Change to alpha release series in the README

Co-authored-by: teor <teor@riseup.net>
2021-01-27 20:31:39 -05:00
teor 391c53aa60 Move BoxError to zebrad's lib.rs
For consistency with other crates.
2021-01-27 12:14:27 -08:00
teor 494a5130c1 Fix clippy "unnecessary Vec::push" lints 2021-01-22 11:51:30 -08:00
teor 00c3ad5d8e Fix clippy "unnecessary Ok" lints 2021-01-22 11:51:30 -08:00
teor eadbea1423 Construct structs with ..default() to avoid a lint 2021-01-19 11:02:20 -05:00
teor b1d28b73fd Stop disabling lints that no longer cause warnings on nightly 2021-01-19 11:02:20 -05:00
teor 258789ed9b Use the rustc unknown lints attribute
The clippy unknown lints attribute was deprecated in
nightly in rust-lang/rust#80524. The old lint name now produces a
warning.

Since we're using `allow(unknown_lints)` to suppress warnings, we need to
add the canonical name, so we can continue to build without warnings on
nightly.

But we also need to keep the old name, so we can continue to build
without warnings on stable.

And therefore, we also need to disable the "removed lints" warning,
otherwise we'll get warnings about the old name on nightly.

We'll need to keep this transitional clippy config until rustc 1.51 is
stable.
2021-01-19 11:02:20 -05:00
teor 8a7b0232f0
Stop failing acceptance tests if their directories already exist (#1588)
* Stop failing acceptance tests if their directories already exist
* Add an immutable config writing helper
  and use it in the cached sapling acceptance tests.

Also:
  * consistently create missing config and state directories
  * refactor the common config writing code into a separate function
  * only ignore NotFound errors in replace_config
  * enforce config immutability using the type system
2021-01-14 16:59:06 +10:00
teor 9cdf41f5f4
Panic if the lookahead limit is misconfigured (#1589) 2021-01-14 14:06:30 +10:00
teor 92d95d4be5 Refactor inbound members into a consistent order
And add download comments
2021-01-13 20:46:25 -05:00
teor fb76eb2e6b Add download and verify timeouts to the inbound service 2021-01-13 20:46:25 -05:00
teor 973aec8ccc Refactor sync members into a consistent order
And add comments about correctness and usage.
2021-01-13 20:46:25 -05:00
teor c2893dce51 Warn when the user's configured lookahead limit is ignored 2021-01-13 20:46:25 -05:00
teor 3699bbdae6 Add some additional sync correctness constraints
And adjust the sync restart delay as a consequence.
2021-01-13 20:46:25 -05:00
teor cef0a492d8 Add a timeout to sync service block verification
This timeout stops the sync service hanging when it is missing required
blocks, but the lookahead queue is full of dependent verify tasks, so the
missing blocks never get downloaded.
2021-01-13 20:46:25 -05:00
teor 730910cd99 Upgrade to tokio 0.3.6 from crates.io
And remove the tokio git dependency patch
2021-01-12 15:37:27 -05:00
dependabot[bot] e3e9990315 build(deps): bump inferno from 0.10.2 to 0.10.3
Bumps [inferno](https://github.com/jonhoo/inferno) from 0.10.2 to 0.10.3.
- [Release notes](https://github.com/jonhoo/inferno/releases)
- [Changelog](https://github.com/jonhoo/inferno/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jonhoo/inferno/compare/v0.10.2...v0.10.3)

Signed-off-by: dependabot[bot] <support@github.com>
2021-01-12 00:53:49 -05:00
teor caca450904
zebrad acceptance test cleanup (#1560)
Check misconfigured ephemeral doesn't create a state dir

Add extra misconfigured `zebrad` ephemeral mode checks:
* doesn't create a state directory
* doesn't create unexpected files or directories in the working directory

Check ephemeral doesn't delete an existing state directory

Refactor all the ephemeral configs and checks into a single test
function.

Also:
* cleanup acceptance tests using utility functions
* make some checks consistent between tests
* make error messages consistent

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2021-01-12 12:26:08 +10:00
teor c75cbdea79
Log configured network in every log message (#1568)
* Add the configured network to error reports
* Log the configured network at error level
* Create the global span immediately after activating tracing
And leak the span guard, so the span is always active.

* Include panic metadata in the report and URL
* Use `Main` and `Test` in the global span
`net=Mainnet` is a bit redundant
2021-01-12 07:46:56 +10:00
teor 40c427cf7c Make `cargo run` use `zebrad` by default
When `cargo run` is run in the workspace directory, it can see two
executables:
- `zebrad`
- `zebra_checkpoints`

Adding `default-run = "zebrad"` to `zebrad/Cargo.toml` makes the
workspace run `zebrad` by default. (Even though it's redundant for the
`zebrad` crate itself.)
2021-01-08 15:14:08 -08:00
teor b1f14f47c6
Rewrite GetData handling to match the zcashd implementation (#1518)
* Rewrite GetData handling to match the zcashd implementation

`zcashd` silently ignores missing blocks, but sends found transactions
followed by a `NotFound` message:
e7b425298f/src/main.cpp (L5497)

This is significantly different to the behaviour expected by the old
Zebra connection state machine, which expected `NotFound` for blocks.

Also change Zebra's GetData responses to peer request so they ignore
missing blocks.

* Stop hanging on incomplete transaction or block responses

Instead, if the peer sends an unexpected block, unexpected transaction,
or NotFound message:
1. end the request, and return a partial response containing any items
   that were successfully received
2. if none of the expected blocks or transactions were received, return
   an error, and close the connection
2021-01-04 13:25:35 +10:00
Deirdre Connolly f5b3412a50 Get PathBuf even if /zebrad-cache exists for create_cached_database_height() 2020-12-30 21:32:55 -05:00
teor 6cd7f255d4
Cleanup acceptance tests using utility functions (#1537)
And add a missing network check.
2020-12-20 09:06:18 +10:00
teor 3355be4c41
Improve the random port function docs (#1536)
Also:
* rename the function
* create an alternative function for the common case.
2020-12-18 11:32:33 +10:00
teor 69fcf64d6c
Disable issue URLs for "duplicate hash" errors (#1517)
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.

Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
2020-12-16 08:14:42 +10:00
teor 008577561c Use a sleep future in the async acceptance tests
And wait slightly longer for `zebrad` to launch.

These fixes should reduce the failure rate of the acceptance tests on
busy machines.
2020-12-16 08:09:48 +10:00
teor 1b6bf7f105 Use random ports in the acceptance tests
This change avoids errors when tests are cancelled and re-run within a
short period of time, for example, using `cargo watch`.

It introduces a slight risk of port conflicts between the endpoint tests,
and with (ephemeral) ports used by other services. The risk of conflicts
across 2 tests is very low, and tests should be run in an isolated
environment on busy servers.
2020-12-16 08:09:48 +10:00
Alfredo Garcia 41833340c1
downgrade remaining version strings to 1.0.0-alpha.0 (#1488) 2020-12-15 11:21:00 +10:00
Deirdre Connolly 2d1698a120 Comment out Sentry stacktraces for now
While panic = abort, Sentry collects the same one-line stack trace for all panics,
making it incorrectly dedupe different errors into one.
2020-12-12 13:26:52 -05:00
Deirdre Connolly cff28f7ac8 Use the commit sha as the sentry release 2020-12-09 13:06:18 -05:00
Jane Lusby 400213e2b3 integrate sentry with our existing panic reporting logic 2020-12-09 13:06:18 -05:00
Deirdre Connolly f1ec1d626d Tidy for now 2020-12-09 13:06:18 -05:00
Deirdre Connolly 44e1051dee Debug 2020-12-09 13:06:18 -05:00
Deirdre Connolly 8b268e3f71 Don't keep guard around 2020-12-09 13:06:18 -05:00
Deirdre Connolly 25f6fd25b3 Test catching panic 2020-12-09 13:06:18 -05:00
Deirdre Connolly 6a17549945 Try sentry-tracing integration 2020-12-09 13:06:18 -05:00
Deirdre Connolly c03a3a2606 Pull DSN from runtime env, enable Sentry debug mode with RUST_LOG=debug 2020-12-09 13:06:18 -05:00
Deirdre Connolly 27e42f4ed5 Set up Sentry error collection via a feature flag 2020-12-09 13:06:18 -05:00
Deirdre Connolly 47d78d4cf4 Try sentry::init() 2020-12-09 13:06:18 -05:00
teor 16ffb1dbbf
Disable issue URLs on all timeouts (#1470)
This change helps prevent spurious bug reports.
2020-12-08 07:47:01 +10:00
teor 531a33f03b
Update the zebrad commit whenever any Zebra crate changes (#1455)
vergen's implementation of REBUILD_ON_HEAD_CHANGE assumes that the .git
directory is in the crate root, but Zebra uses a workspace.

Temporary fix for rustyhorde/vergen#21.
2020-12-05 07:23:05 +10:00
teor b4a50fd99f
Downgrade tokio to 0.3.4 to avoid a time wheel panic (#1453)
See tokio-rs/tokio#2789 for details. We were seeing this panic during
normal operation, not just at shutdown.
2020-12-04 13:52:37 +10:00
Jane Lusby ef7e91c3c7
disable color-eyre colors if not connected to a tty (#1443)
* disable color-eyre colors if not connected to a tty
* check if color is disabled
2020-12-04 11:05:25 +10:00
dependabot[bot] 8c052cc39a build(deps): bump color-eyre from 0.5.9 to 0.5.10
Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.9 to 0.5.10.
- [Release notes](https://github.com/yaahc/color-eyre/releases)
- [Changelog](https://github.com/yaahc/color-eyre/blob/v0.5.10/CHANGELOG.md)
- [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.9...v0.5.10)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-03 10:55:16 -05:00
Jane Lusby 90f944709b
fix git commit logic to work on gcloud (#1442) 2020-12-03 15:18:55 +10:00
teor c0bbac89b3 `cargo fmt --all` 2020-12-02 19:45:27 -05:00
teor e4525d8ee2 Improve metrics acceptance test failure messages 2020-12-02 19:45:27 -05:00
Deirdre Connolly 9221dddb7b Revert "zebrad: remove git pin on metrics dependency"
It broke our metrics endpoint.

This reverts commit dc77163524.
2020-12-02 17:28:49 -05:00
Jane Lusby d7bef1c155
bump color-eyre version to avoid a panic when printing spantraces (#1438) 2020-12-02 14:16:18 -08:00
teor 0e42d8b6c1 Always enable color_eyre, even when color is disabled
We want to automatically disable colors upstream in color_eyre,
and add a config that allows users to always turn off color.
2020-12-02 10:25:44 -08:00
teor bed34168c1 Automatically disable abscissa colors and color_eyre when writing to a file 2020-12-02 10:25:44 -08:00
teor 97d1a81b7c Automatically disable colors when tracing to a file 2020-12-02 10:25:44 -08:00
Henry de Valence dc77163524 zebrad: remove git pin on metrics dependency
Because the new version of the prometheus exporter launches its own
single-threaded runtime on a dedicated worker thread, there's no need
for the tokio and hyper versions it uses internally to align with the
versions used in other crates.  So we don't need to use our fork with
tokio 0.3, and can just use the published alpha.  Advancing to a later
alpha may fix the missing-metrics issues.
2020-12-02 10:23:59 -08:00
Henry de Valence f0db75e712 cargo fmt 2020-12-01 19:16:41 -08:00
Jane Lusby a91d0f0bb6
Include short sha in log messages and error urls (#1410)
As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs.

To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs.

Co-authored-by: teor <teor@riseup.net>
2020-12-01 12:13:20 -08:00
Jane Lusby de34c47cc2 enable tracing acceptance test 2020-12-01 11:03:13 -05:00
Jane Lusby fceef849cf remove unused mutability to defuse deadlock 2020-12-01 11:03:13 -05:00
dependabot[bot] 61d0f02c57 build(deps): bump inferno from 0.10.1 to 0.10.2
Bumps [inferno](https://github.com/jonhoo/inferno) from 0.10.1 to 0.10.2.
- [Release notes](https://github.com/jonhoo/inferno/releases)
- [Changelog](https://github.com/jonhoo/inferno/blob/master/CHANGELOG.md)
- [Commits](https://github.com/jonhoo/inferno/compare/v0.10.1...v0.10.2)

Signed-off-by: dependabot[bot] <support@github.com>
2020-12-01 10:35:14 -05:00
teor 92eb92d1dd
Disable the nightly clippy unnecessary_wraps lint (#1403)
It seems to be a bit broken - some of our functions return `Result` for
consistency with similar functions. But the lint picks them up anyway.
2020-12-01 12:20:57 +10:00
Henry de Valence 1df9284444 zebrad: add a use_color option to the tracing config.
This is useful for creating searchable logs without having to filter color codes after the fact.
2020-11-30 15:25:50 -08:00
Henry de Valence e8c16b172f zebrad: pass TracingSection to Tracing component 2020-11-30 15:25:50 -08:00
Alfredo Garcia 4544463059
Inbound `FindBlocks` and `FindHeaders` (#1347)
* implement inbound `FindBlocks`
* Handle inbound peer FindHeaders requests
* handle request before having any chain tip
* Split `find_chain_hashes` into smaller functions

Add a `max_len` argument to support `FindHeaders` requests.

Rewrite the hash collection code to use heights, so we can handle the
`stop` hash and "no intersection" cases correctly.

* Split state height functions into "any chain" and "best chain"
* Rename the best chain block method to `best_block`
* Move fmt utilities to zebra_chain::fmt
* Summarise Debug for some Message variants

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-12-01 07:30:37 +10:00
Henry de Valence fa02b266ca clippy 2020-11-25 10:55:44 -08:00
Henry de Valence de8415dcb1 tidy spans 2020-11-25 10:55:44 -08:00
Henry de Valence 05837797b1 tidy imports 2020-11-25 10:55:44 -08:00
Henry de Valence 77bf327b07 fix errors (2) 2020-11-25 10:55:44 -08:00
Henry de Valence 527f4d39ed fix errors 2020-11-25 10:55:44 -08:00
Henry de Valence e645e3bf0c remove async 2020-11-25 10:55:44 -08:00
Henry de Valence 6569977549 test compile change 2020-11-25 10:55:44 -08:00
Alfredo Garcia 486e55104a create Downloads for Inbound 2020-11-25 10:55:44 -08:00
Deirdre Connolly 6a0a6f6d37 allow(dead_code) not allow(clippy::dead_code) 2020-11-24 11:04:30 -05:00
Deirdre Connolly 4a67e0e7bb Enable stateful/long sync tests by features, mount rocksdb-based state at Sapling activation for sync_past_sapling_mainnet test 2020-11-24 11:04:30 -05:00
Deirdre Connolly d813603bac Remove defunct memory_cache_bytes from test config 2020-11-24 11:04:30 -05:00
Jane Lusby c2a57d7e49 slight comment tweek 2020-11-24 11:04:30 -05:00
Jane Lusby 99c5acc94f rename test fn 2020-11-24 11:04:30 -05:00
Jane Lusby 602d8c4898 document tests 2020-11-24 11:04:30 -05:00
Jane Lusby 17fdbe941b fix stdout issue with test framework for cached data tests 2020-11-24 11:04:30 -05:00
Jane Lusby 0f51891359 revert unnecessary change in sync_until 2020-11-24 11:04:30 -05:00
Jane Lusby 4bfe747f34 update acceptance tests 2020-11-24 11:04:30 -05:00
Jane Lusby d093b4e528 Add network integration test for quick post sapling sync testing 2020-11-24 11:04:30 -05:00
dependabot[bot] a4af90c2b0 build(deps): bump color-eyre from 0.5.7 to 0.5.8
Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.7 to 0.5.8.
- [Release notes](https://github.com/yaahc/color-eyre/releases)
- [Changelog](https://github.com/yaahc/color-eyre/blob/master/CHANGELOG.md)
- [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.7...v0.5.8)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-24 09:59:22 -05:00
Henry de Valence 2a4a89c002 state,zebrad: tidy span levels for good INFO output
This provides useful and not too noisy output at INFO level.  We do an
info-level message on every block commit instead of trying to do one
message every N blocks, because this is useful both for initial block
sync as well as continuous state updates on new blocks.
2020-11-23 14:16:39 +10:00
Henry de Valence f0810b028d state,consensus,sync: shorten span lengths
These changes help reduce the size of the resulting spans, making the
output more compact.  Together they save about 30-40 characters.
2020-11-23 14:16:39 +10:00
teor d4da9609ee Update the max_concurrent_block_requests docs
In #1298, we decreased `max_concurrent_block_requests`,
but forgot to update the docs.
2020-11-20 10:08:57 -08:00
Henry de Valence ba3c19142c deps: update hyper, metrics to tokio 0.3
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
2020-11-20 10:08:16 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
Jane Lusby 4c9bb87df2
zebra-state: replace sled with rocksdb (#1325)
## Motivation

Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb.

Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network.

## Solution

This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain.

The code in this pull request has:
  - [x] Documentation Comments
  - [x] Unit Tests and Property Tests

## Review

@hdevalence
2020-11-18 18:05:06 -08:00
Henry de Valence 4953f21670 fixup! zebrad: hack to skip alreadyverified errors 2020-11-18 03:09:06 -05:00
Henry de Valence d2fc01755b zebrad: more reasonable concurrent block limit
This helps prevent overloading the network with too many concurrent
block requests.  On a fast network, we're likely to still have enough
room to saturate our bandwidth.  In the worst case, with 2MB blocks,
downloading 50 blocks concurrently is 100MB of queued downloads.  If we
need to download this in 20 seconds to avoid peer connection timeouts,
the implied worst-case minimum speed is 5MB/s.  In practice, this
minimum speed will likely be much lower.
2020-11-17 14:56:27 -08:00
Henry de Valence aa7538ab15 zebrad: hack to skip alreadyverified errors 2020-11-17 14:56:27 -08:00
Henry de Valence e55392b61e zebrad: explicitly select the threaded scheduler. 2020-11-17 14:56:27 -08:00
Henry de Valence 6de824bd99 zebrad: remove block verification timeout
Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.
2020-11-17 14:56:27 -08:00
Henry de Valence e9c847bbd7 zebrad: avoid a borrow in the ChainSync future 2020-11-17 14:56:27 -08:00
Henry de Valence b632a24436 zebrad: add diagnostics on cancelled download tasks 2020-11-17 14:56:27 -08:00
Henry de Valence ec411574ee zebrad: improve sync diagnostics 2020-11-17 14:56:27 -08:00
teor 54cb9277ef Allow some new clippy nightly lints 2020-11-17 10:07:37 +10:00
dependabot[bot] 8c5f6d0177 build(deps): bump once_cell from 1.5.1 to 1.5.2
Bumps [once_cell](https://github.com/matklad/once_cell) from 1.5.1 to 1.5.2.
- [Release notes](https://github.com/matklad/once_cell/releases)
- [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md)
- [Commits](https://github.com/matklad/once_cell/compare/v1.5.1...v1.5.2)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-13 14:48:11 -05:00
Jane Lusby 7c0275ac0b
reorganize stop check (#1288)
* reorganize stop check
* remove unused enum
* move out and make it unique
Co-authored-by: teor <teor@riseup.net>
2020-11-13 11:37:52 +10:00
Henry de Valence e0c92167bc Revert "Hedge every syncer block download request"
This reverts commit 656bd24ba7.

The Hedge middleware keeps a pair of histograms, writing into one in the
current time interval and reading from the previous time interval's
data.  This means that the reverted change resulted in doubling all
block downloads until after at least the second measurement interval
(which means that the time measurements are also incorrect, as they're
operating under double the network load...)
2020-11-12 16:45:47 -05:00