Commit Graph

1542 Commits

Author SHA1 Message Date
Henry de Valence 6f8f8a56d4 state: perform sled reads synchronously
We already use an actor model for the state service, so we get an
ordered sequence of state queries by message-passing.  Instead of
performing reads in the futures we return, this commit performs them
synchronously.  This means that all sled access is done from the same
task, which

(1) might reduce contention
(2) allows us to avoid using sled transactions when writing to the
state.

Co-authored-by: Jane Lusby <jane@zfnd.org>


Co-authored-by: Jane Lusby <jane@zfnd.org>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
2020-10-26 12:05:35 -07:00
Henry de Valence 253bab042e sync: add a concurrency limit for block downloads 2020-10-26 12:05:35 -07:00
Henry de Valence 0a405c737d zebrad: check state in obtaintips, not extendtips.
The original sync algorithm split the sync process into two phases, one
that obtained prospective chain tips, and another that attempted to
extend those chain tips as far as possible until encountering an error
(at which point the prospective state is discarded and the process
restarts).

Because a previous implementation of this algorithm didn't properly
enforce linkage between segments of the chain while extending tips,
sometimes it would get confused and fail to discard responses that did
not extend a tip.  To mitigate this, a check against the state was
added.  However, this check can cause stalls while checkpointing,
because when a checkpoint is reached we may suddenly need to commit
thousands of blocks to the state.  Because the sync algorithm now has a
a `CheckedTip` structure that ensures that a new segment of hashes
actually extends an existing one, we don't need to check against the
state while extending a tip, because we don't get confused while
interpreting responses.

This change results in significantly smoother progress on mainnet.
2020-10-26 12:05:35 -07:00
Henry de Valence 65e0c22fbe state: don't pre-buffer the service
There's no reason to return a pre-Buffer'd service (there's no need for
internal access to the state service, as in zebra-network), but wrapping
it internally removes control of the buffer size from the caller.
2020-10-26 12:05:35 -07:00
Henry de Valence ce2ac3336f zebrad: add debug message before state check
This reveals that there may be contention in access to the state, as
this takes a long time.
2020-10-26 12:05:35 -07:00
Henry de Valence a1a3e4db5a consensus: simplify block verify tracing output
The previous debug output printed a message that the chain verifier had
recieved a block.  But this provides no additional information compared
to printing no message in chain::Verifier and a message in whichever
verifier the block was sent to, since the resulting spans indicate where
the block was dispatched.

This commit also removes the "unexpected high block" detection; this was
an artefact of the original sync algorithm failing to handle block
advertisements, but we don't have that problem any more, so we can
simplify the code by eliminating that logic.
2020-10-26 12:05:35 -07:00
Henry de Valence 91469faf3c zebrad: eliminate duplicate span in sync 2020-10-26 12:05:35 -07:00
Henry de Valence b5a43f4516 zebrad: remove implementation details from docs
The timeout behavior in zebra-network is an implementation detail, not a
feature of the public API.  So it shouldn't be mentioned in the doc
comments -- if we want timeout behavior, we have to layer it ourselves.
2020-10-26 12:05:35 -07:00
Henry de Valence 1d7309afe2 zebrad: correctly handle duplicates in DownloadSet
Using the cancel_handles, we can deduplicate requests.  This is
important to do, because otherwise when we insert the second cancel
handle, we'd drop the first one, cancelling an existing task for no
reason.
2020-10-26 12:05:35 -07:00
Henry de Valence 56fe4f4379 zebrad: unify sync restart logic
This lets us keep the main loop simple and just write `continue 'sync;`
to keep going.
2020-10-26 12:05:35 -07:00
Henry de Valence 12d25159c6 zebrad: use hedged requests in sync
The hedge middleware implements hedged requests, as described in _The
Tail At Scale_. The idea is that we auto-tune our retry logic according
to the actual network conditions, pre-emptively retrying requests that
exceed some latency percentile. This would hopefully solve the problem
where our timeouts are too long on mainnet and too slow on testnet.
2020-10-26 12:05:35 -07:00
Henry de Valence 5f229d1475 zebrad: use Downloads in sync
Try to use the better cancellation logic to revert to previous sync
algorithm.  As designed, the sync algorithm is supposed to proceed by
downloading state prospectively and handle errors by flushing the
pipeline and starting over.  This hasn't worked well, because we didn't
previously cancel tasks properly.  Now that we can, try to use something
in the spirit of the original sync algorithm.
2020-10-26 12:05:35 -07:00
Henry de Valence b90581a3d7 zebrad: create a Downloads Stream for syncing.
This makes two changes relative to the existing download code:

1.  It uses a oneshot to attempt to cancel the download task after it
    has started;

2.  It encapsulates the download creation and cancellation logic into a
    Downloads struct.
2020-10-26 12:05:35 -07:00
Henry de Valence 8e709bfa88 network: don't fail on unsolicited messages
These messages might be unsolicited, or they might be a response to a
request we already canceled.  So don't fail the whole connection, just
drop the message and move on.
2020-10-26 12:05:35 -07:00
Henry de Valence 13daefa729 network: handle request cancellation in Connection
We handle request cancellation in two places: before we transition into
the AwaitingResponse state, and while we are in AwaitingResponse.  We
need both places, or else if we started processing a request, we
wouldn't process the cancellation until the timeout elapsed.

The first is a check that the oneshot is not already canceled.

For the second, we wait on a cancellation, either from a timeout or from
the tx channel closing.
2020-10-26 12:05:35 -07:00
Henry de Valence b636660d6a zebrad: rename sync::Error alias to BoxError. 2020-10-26 12:05:35 -07:00
teor a141c336ab Actually fix whitespace 2020-10-26 13:49:48 -04:00
teor bbe4aa47ea Fix whitespace for rustfmt 2020-10-26 13:49:48 -04:00
teor 2fa3d8a8f4 Add a comment explaining why block metrics follow validation 2020-10-26 13:49:48 -04:00
teor f5a53d9dae Update block metrics after async transaction verification 2020-10-26 13:49:48 -04:00
teor fb079c2ca1
Replace BlockHeaderHash with block::Hash 2020-10-26 22:27:57 +10:00
teor a9102e8d6d
Fix State RFC rendering ambiguities 2020-10-26 22:02:45 +10:00
teor 0935b3305a
Fix more state RFC function heading sizes 2020-10-26 21:14:14 +10:00
teor 7bf2fdd6d7
Fix a header level in the state RFC 2020-10-26 21:11:26 +10:00
teor 60322c3d48 Test that the checkpoint list gap is correct
If we change the gap, but don't rebuild the lists, `zebrad` hangs with
weird errors.
2020-10-26 20:59:40 +10:00
teor f9dc481934 Rebuild the checkpoint lists with smaller checkpoints 2020-10-26 20:59:40 +10:00
teor 20dfd04463 Reduce maximum checkpoint size in the Zebra code
The new limits are 400 blocks and 32 MB.
2020-10-26 20:59:40 +10:00
teor 672b39a847 Use MAX_BLOCK_REORG_HEIGHT in zebra-checkpoints
MAX_BLOCK_REORG_HEIGHT is 1 less than the constant it replaces. The new
calculation is correct: the 100th block is finalized.
2020-10-26 20:59:40 +10:00
teor 90e755472c Add data source instructions to the metrics help 2020-10-23 15:06:37 -04:00
teor b492cabeee Bind grafana to localhost in metrics instructions
Binding grafana to localhost makes it inaccessible from the wider internet,
which is a secure default.

Since we run docker with host networking, docker containers have access to D-Bus and other
security-related services on localhost. So it's risky to also expose them to the wider internet.
2020-10-23 15:06:37 -04:00
dependabot[bot] ff51c2e0c0 build(deps): bump tracing-subscriber from 0.2.13 to 0.2.14
Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.13 to 0.2.14.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.13...tracing-subscriber-0.2.14)

Signed-off-by: dependabot[bot] <support@github.com>
2020-10-23 15:02:02 -04:00
Henry de Valence cab96aa1a8
zebrad: clarify config help text (#1194) 2020-10-22 15:03:01 +10:00
teor 6dc95b1d6d
Revise the checkpoint verifier metrics (#1195)
* update continuous and processing.next metrics correctly
* remove duplicate metrics
* rename ambiguous metrics
2020-10-21 20:06:26 -07:00
teor d745d2b47c
Stop assuming Mainnet in Address From impls (#1191) 2020-10-22 07:58:52 +10:00
Alfredo Garcia 21ad6ffc47
Reverse displayed endianness of transaction and block hashes (#1171)
* Reverse displayed endianness of transaction and block hashes
* fix zebra-checkpoints utility for new hash order
* Stop using "zebrad revhex" in zebrad-hash-lookup
* Rebuild checkpoint lists in new hash order
This change also adds additional checkpoints to the end of each list.

* Replace TransactionHash with transaction::Hash
This change should have been made in #905, but we missed Debug impls
and some docs.

Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com>
Co-authored-by: teor <teor@riseup.net>
2020-10-22 07:54:02 +10:00
teor a5a86622d4
Add team approval to the RFC pull request template (#1178) 2020-10-21 11:36:23 -07:00
teor e52a1c07a3 Ignore longer sync tests by default 2020-10-21 21:08:04 +10:00
teor 0d121833af Add sync tests that download 2000 blocks 2020-10-21 21:08:04 +10:00
teor b4f92adc40 Disable sync tests on Windows CI 2020-10-21 00:58:08 -04:00
teor 6fe3cc56dd Refactor sync test to be more flexible
And add documentation
2020-10-21 00:58:08 -04:00
teor 17a3612b36 Remove a redundant condition in expect_stdout
When the loop exits, either the process has stopped running,
or the deadline has passed.

If the process is still running, we want to kill it.
2020-10-21 00:58:08 -04:00
teor 0343e28d3a Disable sync test on ubuntu CI runners
They don't seem to have DNS or network configured during the tests.

Also make capitalisation of step names consistent.
2020-10-21 00:58:08 -04:00
teor 1d35c5a0b9 Enable the zebrad sync tests by default
If your test environment does not have DNS or network access, set the
ZEBRA_SKIP_NETWORK_TESTS environmental variable to disable these tests.
2020-10-21 00:58:08 -04:00
Deirdre Connolly 9549e180c0 Allow dead_code on parameters for now 2020-10-20 11:16:22 -04:00
Deirdre Connolly a7ef6f6a40 Allow dead_code for checkpoint::Verifier for now 2020-10-20 11:16:22 -04:00
Deirdre Connolly e796132057 Allow dead_code for the transaction::Request for now (mempool) 2020-10-20 11:16:22 -04:00
Henry de Valence eb43893de0 consensus: minimize API, clean docs
This reduces the API surface to the minimum required for functionality,
and cleans up module documentation.  The stub mempool module is deleted
entirely, since it will need to be redone later anyways.
2020-10-20 11:16:22 -04:00
Henry de Valence d4ce3eb054 consensus: improve docs
- remove no longer accurate documentation about transaction verifier;
- add description of the role of the crate.
2020-10-20 11:16:22 -04:00
Jane Lusby c0aa1b477e consensus: add #[source] attributes to chain errors 2020-10-20 11:16:22 -04:00
Jane Lusby 8a64c056fb consensus: integrate block, transaction Verifiers 2020-10-20 11:16:22 -04:00