Commit Graph

1930 Commits

Author SHA1 Message Date
Henry de Valence 36cd76d590 state: tidy process_queued tracing
Previously, this function was instrumented with a span containing the
parent hash that was the entry to the function.  But it doesn't make
sense to consider the work done by the function as happening in the
context of the supplied parent hash (as distinct from the context of the
hash of the newly arrived block, which is already contained in an outer
span), so this adds noise without conveying extra context.

Instead, use events that occur within the context of the existing spans.
2020-11-23 14:16:39 +10:00
Henry de Valence f0810b028d state,consensus,sync: shorten span lengths
These changes help reduce the size of the resulting spans, making the
output more compact.  Together they save about 30-40 characters.
2020-11-23 14:16:39 +10:00
Henry de Valence 77b60f3a30 state: add traces for utxo scanning 2020-11-23 14:16:39 +10:00
Henry de Valence aa45bf2b58 consensus: add traces to block verifier 2020-11-23 14:16:39 +10:00
teor 196dc6369c Delete outdated transaction comments 2020-11-22 23:11:00 -05:00
Deirdre Connolly a877da2157 Enable RUST_BACKTRACE=full for test and build/deploy images 2020-11-22 23:10:20 -05:00
Henry de Valence 2eceff421f consensus: remove incorrect check
This consensus rule is supposed to apply to transactions whose
transparent inputs are the *outputs* of previous coinbase
transactions, not to transactions with coinbase inputs.  Because that
logic is different enough from this logic, and requires different data
flow, it's cleaner to just remove this check for now.
2020-11-21 14:09:15 -05:00
Henry de Valence ace1103462 consensus: fix bug in tx input/output presence check
Making this check's match statement exhaustive revealed a bug similar to
the previous commit.  The logic in the spec is written in terms of
numbers, but our data is internally represented in terms of enums
(ADTs).  This kind of cross-representation rule translation is a bug
surface, which we can avoid by converting to counts and summing up.  (We
should use one style at a time).
2020-11-21 14:09:15 -05:00
Henry de Valence 96ee32e5d2 consensus: fix bug in coinbase joinsplit/spend check
This function caused spurious "WrongVersion" errors, because the match
pattern in the first arm was non-exhaustive, but the fallthrough match
arm was present and assumed it would only be reached if the version was
incorrect.

This commit cleans up the implemenation, splits out the error variants,
and renames the check to be more precise.

To avoid this kind of bug in the future, two guidelines are useful:

1. Avoid fallthrough cases that circumvent non-exhaustive match checks;
2. Avoid nested conditionals, preferring a "straight-line" sequence of
   match arm => result pairs rather than nested matches or matches with
   conditionals inside.
2020-11-21 14:09:15 -05:00
Henry de Valence b116cfcd76 consensus: add debug event on wrong version check
Adding this check reveals that the WrongVersion errors aren't coming
from the correct WrongVersion check.
2020-11-21 14:09:15 -05:00
Henry de Valence 25fd52be51 chain: tidy Debug for Amount
This avoids printing a bunch of PhantomData.
2020-11-21 14:09:15 -05:00
Henry de Valence b5515123eb chain: add custom Debug for CoinbaseData
The derived Debug impl just shows u8s as numbers, which isn't what we
want.  There are basically two reasonable options here:

1. Hex-encoded bytes
2. Escaped ASCII

I picked (2) because a lot of coinbase data has ascii text in it.
2020-11-21 14:09:15 -05:00
Henry de Valence d1ee7f263a consensus: add debug span to TransactionVerifier 2020-11-21 14:09:15 -05:00
Henry de Valence 2e4f4d8e87 consensus: fix span handling in BlockVerifier
The BlockVerifier constructed a tracing span and manually entered it
inside of an async block.  Manually entering spans inside async blocks
can cause problems where the span might not be entered and exited
correctly as the resulting future is polled.  Instead, using the
.instrument creates a wrapper future that handles the bookkeeping.

I changed the span name and contents to be consistent with the spans in
the checkpoint verifier.
2020-11-21 14:09:15 -05:00
Deirdre Connolly 0b6a61c9e8 gcloud build is still the only required check for PR merge, run tests in release profile 2020-11-21 05:40:25 -05:00
Deirdre Connolly 902c6f6b29 Remove test attributes and allow(dead_code) for command timeout tests that exercise currently broken properties 2020-11-21 05:40:25 -05:00
Deirdre Connolly 558661a531 Remove test attributes and allow(dead_code) for test code that tests currently unimplemented functionality 2020-11-21 05:40:25 -05:00
Deirdre Connolly 036abd50ac Back to stable for test image 2020-11-21 05:40:25 -05:00
Deirdre Connolly 52296b96c7 Bump test job timeout to 45 minutes because Windows debug builds are taking a while 2020-11-21 05:40:25 -05:00
Deirdre Connolly 706c42de3e Filter broken command tests while including ignored otherwise 2020-11-21 05:40:25 -05:00
Henry de Valence 7dfea510d5 state: remove state_trace span
This turns out not to give much additional information when stacked with
child spans.
2020-11-20 15:28:46 -08:00
Henry de Valence bbd7a62b20 state: add service request count metrics
These are all one metric, with the type as an attribute, so that we can
display total requests, filter by a particular type, etc.
2020-11-20 17:38:21 -05:00
Henry de Valence 3bfe63e38f state: add span to state service
Here the span is added to the body of the `Service::call`
implementation, not to the futures it returns, because the state service
does all of the work synchronously in `call` rather than in the futures
it returns.

The service is skipped as a span field.  We could either include or
exclude the request itself.  It would be useful, but the request body
can be very large.  Instead, we make two spans, one at info level and
one at trace level, and filter that way.
2020-11-20 17:38:21 -05:00
Henry de Valence 04acc9da6c consensus: instrument script verification 2020-11-20 17:38:21 -05:00
teor d4da9609ee Update the max_concurrent_block_requests docs
In #1298, we decreased `max_concurrent_block_requests`,
but forgot to update the docs.
2020-11-20 10:08:57 -08:00
Henry de Valence faa9cbcade deps: bump tower to pick up auto-resize in Hedge
Picks up https://github.com/tower-rs/tower/pull/484
2020-11-20 10:08:16 -08:00
Henry de Valence ba3c19142c deps: update hyper, metrics to tokio 0.3
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
2020-11-20 10:08:16 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
teor ec00ee4cf0
Stop using /dev/shm on Linux (#1338)
Some systems have a very small /dev/shm, for example, see:
https://github.com/docker-library/postgres/issues/416

So we should just use the temporary directory on all operating systems.

Also:
* use TempDir to generate the temporary path
* delete the code that we copied from sled
* prefix the temporary path with the state version and network
2020-11-20 13:01:19 +10:00
Deirdre Connolly af5f3c1395 Bump down cores, running into default quotas 2020-11-19 19:47:38 -05:00
Deirdre Connolly 2b9819a190 Remove defunct memory_cache_bytes
It left with sled
2020-11-19 19:47:38 -05:00
Deirdre Connolly f6dc92a256 Correctly grep for instance group & region 2020-11-19 18:55:19 -05:00
Henry de Valence 06dd39df54
network: bump network version for Canopy (#1333)
Per https://zips.z.cash/zip-0251, nodes compatible with Canopy
activation on mainnet MUST advertise protocol version 170013 or later.

Once Canopy activates on testnet or mainnet, Canopy nodes SHOULD reject
new connections from pre-Canopy nodes, so this also increases the
minimum version.
2020-11-20 09:50:05 +10:00
Deirdre Connolly fb66c7ecdf Supply --image-project, return to N2 not N2D 2020-11-19 18:10:04 -05:00
Deirdre Connolly 623949bbaa Remove vestigial 'needs' 2020-11-19 16:50:08 -05:00
Deirdre Connolly e325775bf3 Specify region not just zone 2020-11-19 16:50:08 -05:00
Deirdre Connolly a317cc11c6 Install clang to build rocksdb dep 2020-11-19 16:20:36 -05:00
Deirdre Connolly 53d63d0514 Build this branch 2020-11-19 16:20:36 -05:00
Deirdre Connolly 938b6d6fdd Make the full test suite command explicit 2020-11-19 16:20:36 -05:00
Deirdre Connolly 44970af929 Split up big test job into its own workflow 2020-11-19 16:20:36 -05:00
Deirdre Connolly 2445d23dd8 Shell form CMD 2020-11-19 16:20:36 -05:00
Deirdre Connolly 1c49e57eba Escape single quotes passed as CMD args to cargo 2020-11-19 16:20:36 -05:00
Deirdre Connolly a23de13af9 Break up Dockerfile into (additional) test and build images 2020-11-19 16:20:36 -05:00
Jane Lusby 4c9bb87df2
zebra-state: replace sled with rocksdb (#1325)
## Motivation

Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb.

Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network.

## Solution

This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain.

The code in this pull request has:
  - [x] Documentation Comments
  - [x] Unit Tests and Property Tests

## Review

@hdevalence
2020-11-18 18:05:06 -08:00
Jane Lusby 65a605520f remove references to sled from service.rs 2020-11-18 15:09:43 -05:00
Jane Lusby 5a6a9fd51e remove some references to sled in serialization definition module 2020-11-18 15:09:43 -05:00
Jane Lusby a122a547be reorganize modules for consistency 2020-11-18 15:09:43 -05:00
Henry de Valence 4953f21670 fixup! zebrad: hack to skip alreadyverified errors 2020-11-18 03:09:06 -05:00
dependabot[bot] 3edc1f7db4 build(deps): bump codecov/codecov-action from v1.0.14 to v1.0.15
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from v1.0.14 to v1.0.15.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Commits](https://github.com/codecov/codecov-action/compare/v1.0.14...239febf655bba88b16ff5dea1d3135ea8663a1f9)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-18 03:07:14 -05:00
Henry de Valence 608b3953af deps: cargo update 2020-11-17 14:56:27 -08:00