Commit Graph

1714 Commits

Author SHA1 Message Date
Deirdre Connolly 558661a531 Remove test attributes and allow(dead_code) for test code that tests currently unimplemented functionality 2020-11-21 05:40:25 -05:00
Deirdre Connolly 036abd50ac Back to stable for test image 2020-11-21 05:40:25 -05:00
Deirdre Connolly 52296b96c7 Bump test job timeout to 45 minutes because Windows debug builds are taking a while 2020-11-21 05:40:25 -05:00
Deirdre Connolly 706c42de3e Filter broken command tests while including ignored otherwise 2020-11-21 05:40:25 -05:00
Henry de Valence 7dfea510d5 state: remove state_trace span
This turns out not to give much additional information when stacked with
child spans.
2020-11-20 15:28:46 -08:00
Henry de Valence bbd7a62b20 state: add service request count metrics
These are all one metric, with the type as an attribute, so that we can
display total requests, filter by a particular type, etc.
2020-11-20 17:38:21 -05:00
Henry de Valence 3bfe63e38f state: add span to state service
Here the span is added to the body of the `Service::call`
implementation, not to the futures it returns, because the state service
does all of the work synchronously in `call` rather than in the futures
it returns.

The service is skipped as a span field.  We could either include or
exclude the request itself.  It would be useful, but the request body
can be very large.  Instead, we make two spans, one at info level and
one at trace level, and filter that way.
2020-11-20 17:38:21 -05:00
Henry de Valence 04acc9da6c consensus: instrument script verification 2020-11-20 17:38:21 -05:00
teor d4da9609ee Update the max_concurrent_block_requests docs
In #1298, we decreased `max_concurrent_block_requests`,
but forgot to update the docs.
2020-11-20 10:08:57 -08:00
Henry de Valence faa9cbcade deps: bump tower to pick up auto-resize in Hedge
Picks up https://github.com/tower-rs/tower/pull/484
2020-11-20 10:08:16 -08:00
Henry de Valence ba3c19142c deps: update hyper, metrics to tokio 0.3
The metrics code becomes much simpler because the current version of the
metrics crate builds its own single-threaded runtime on a dedicated worker
thread, so no dependency on the main Zebra Tokio runtime is required.
2020-11-20 10:08:16 -08:00
Henry de Valence add94c1c45 deps: move to tokio 0.3, tower 0.4
This change is mostly mechanical, with the exception of the changes to the
`tower-batch` middleware.  This middleware was adapted from `tower::buffer`,
and the `tower::buffer` code was changed to implement its own bounded queue,
because Tokio 0.3 removed the `mpsc::Sender::poll_send` method.  See

ddc64e8d4d

for more context on the Tower changes.  To match Tower as closely as possible
in order to be able to upstream `tower-batch`, those changes are copied from
`tower::Buffer` to `tower-batch`.
2020-11-20 10:08:16 -08:00
teor ec00ee4cf0
Stop using /dev/shm on Linux (#1338)
Some systems have a very small /dev/shm, for example, see:
https://github.com/docker-library/postgres/issues/416

So we should just use the temporary directory on all operating systems.

Also:
* use TempDir to generate the temporary path
* delete the code that we copied from sled
* prefix the temporary path with the state version and network
2020-11-20 13:01:19 +10:00
Deirdre Connolly af5f3c1395 Bump down cores, running into default quotas 2020-11-19 19:47:38 -05:00
Deirdre Connolly 2b9819a190 Remove defunct memory_cache_bytes
It left with sled
2020-11-19 19:47:38 -05:00
Deirdre Connolly f6dc92a256 Correctly grep for instance group & region 2020-11-19 18:55:19 -05:00
Henry de Valence 06dd39df54
network: bump network version for Canopy (#1333)
Per https://zips.z.cash/zip-0251, nodes compatible with Canopy
activation on mainnet MUST advertise protocol version 170013 or later.

Once Canopy activates on testnet or mainnet, Canopy nodes SHOULD reject
new connections from pre-Canopy nodes, so this also increases the
minimum version.
2020-11-20 09:50:05 +10:00
Deirdre Connolly fb66c7ecdf Supply --image-project, return to N2 not N2D 2020-11-19 18:10:04 -05:00
Deirdre Connolly 623949bbaa Remove vestigial 'needs' 2020-11-19 16:50:08 -05:00
Deirdre Connolly e325775bf3 Specify region not just zone 2020-11-19 16:50:08 -05:00
Deirdre Connolly a317cc11c6 Install clang to build rocksdb dep 2020-11-19 16:20:36 -05:00
Deirdre Connolly 53d63d0514 Build this branch 2020-11-19 16:20:36 -05:00
Deirdre Connolly 938b6d6fdd Make the full test suite command explicit 2020-11-19 16:20:36 -05:00
Deirdre Connolly 44970af929 Split up big test job into its own workflow 2020-11-19 16:20:36 -05:00
Deirdre Connolly 2445d23dd8 Shell form CMD 2020-11-19 16:20:36 -05:00
Deirdre Connolly 1c49e57eba Escape single quotes passed as CMD args to cargo 2020-11-19 16:20:36 -05:00
Deirdre Connolly a23de13af9 Break up Dockerfile into (additional) test and build images 2020-11-19 16:20:36 -05:00
Jane Lusby 4c9bb87df2
zebra-state: replace sled with rocksdb (#1325)
## Motivation

Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb.

Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network.

## Solution

This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain.

The code in this pull request has:
  - [x] Documentation Comments
  - [x] Unit Tests and Property Tests

## Review

@hdevalence
2020-11-18 18:05:06 -08:00
Jane Lusby 65a605520f remove references to sled from service.rs 2020-11-18 15:09:43 -05:00
Jane Lusby 5a6a9fd51e remove some references to sled in serialization definition module 2020-11-18 15:09:43 -05:00
Jane Lusby a122a547be reorganize modules for consistency 2020-11-18 15:09:43 -05:00
Henry de Valence 4953f21670 fixup! zebrad: hack to skip alreadyverified errors 2020-11-18 03:09:06 -05:00
dependabot[bot] 3edc1f7db4 build(deps): bump codecov/codecov-action from v1.0.14 to v1.0.15
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from v1.0.14 to v1.0.15.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Commits](https://github.com/codecov/codecov-action/compare/v1.0.14...239febf655bba88b16ff5dea1d3135ea8663a1f9)

Signed-off-by: dependabot[bot] <support@github.com>
2020-11-18 03:07:14 -05:00
Henry de Valence 608b3953af deps: cargo update 2020-11-17 14:56:27 -08:00
Henry de Valence d2fc01755b zebrad: more reasonable concurrent block limit
This helps prevent overloading the network with too many concurrent
block requests.  On a fast network, we're likely to still have enough
room to saturate our bandwidth.  In the worst case, with 2MB blocks,
downloading 50 blocks concurrently is 100MB of queued downloads.  If we
need to download this in 20 seconds to avoid peer connection timeouts,
the implied worst-case minimum speed is 5MB/s.  In practice, this
minimum speed will likely be much lower.
2020-11-17 14:56:27 -08:00
Henry de Valence aa7538ab15 zebrad: hack to skip alreadyverified errors 2020-11-17 14:56:27 -08:00
Henry de Valence e55392b61e zebrad: explicitly select the threaded scheduler. 2020-11-17 14:56:27 -08:00
Henry de Valence 6de824bd99 zebrad: remove block verification timeout
Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.
2020-11-17 14:56:27 -08:00
Henry de Valence e9c847bbd7 zebrad: avoid a borrow in the ChainSync future 2020-11-17 14:56:27 -08:00
Henry de Valence b632a24436 zebrad: add diagnostics on cancelled download tasks 2020-11-17 14:56:27 -08:00
Henry de Valence ec411574ee zebrad: improve sync diagnostics 2020-11-17 14:56:27 -08:00
Henry de Valence e0b2af7123 state: add sled tree precommit metrics on tracked objects 2020-11-17 14:56:27 -08:00
Henry de Valence aa8d95bd23 consensus: improve checkpoint request replacement diagnostics 2020-11-17 14:56:27 -08:00
Henry de Valence a3ab589d89 consensus,state: document cancellation contracts for services
This change explicitly documents cancellation contracts for our Tower services,
and tries to correct a bug in the implementation of the CheckpointVerifier,
which duplicates information from the state service but did not ensure that it
would be kept in sync.
2020-11-17 14:56:27 -08:00
Henry de Valence d5d17a9a71 consensus: remove incorrect comment
The ZcashDeserialize implementation for Block doesn't check that blocks
have a coinbase height.
2020-11-17 14:56:27 -08:00
teor 2f53ff44f7 Move chain order assertions to commit_finalized_direct
And remove a duplicate assert in the contextual verification function.
2020-11-17 13:16:31 +10:00
Deirdre Connolly 40b012acef Add mdbook stuff to path using environment files/variables instead of workflow commands
Fixes #1309
2020-11-16 21:18:19 -05:00
teor d7d15984eb Move all contextual validation code into its own function
This change has two benefits:
* reduces conflicts with the sled refactor and any replacement
* allows the function to be called independently for testing
2020-11-17 11:46:57 +10:00
Alfredo Garcia c8e6f5843f
Update RFC template (#1278)
* update rfc template
* change pull to issues
2020-11-17 11:10:21 +10:00
teor cfe779db69 Add an info-level span to check_contextual_validity 2020-11-17 10:07:37 +10:00