## Motivation
This PR is motivated by the regression identified in https://github.com/ZcashFoundation/zebra/issues/1349. That PR notes that the metrics stopped working for most of the crates other than `zebrad`.
## Solution
This PR resolves the regression by deduplicating the `metrics` crate dependency. During a recent change we upgraded the metrics version in `zebrad` and a couple other of our crates, but we never updated the dependencies in `zebra-state`, `zebra-consensus`, or `zebra-network`. This caused the metrics macros to attempt to retrieve the current metrics exporter through the wrong function. We would install the metrics exporter in `0.13`, but then attempt to look it up through the `0.12` crate, which contains a different instance of the metrics exporter static variable which is unset. Doing this causes the metrics macros to return `None` for the current exporter after which they just silently give up.
## Related Issues
closes https://github.com/ZcashFoundation/zebra/issues/1349
## Follow Up Work
I noticed we have quite a few duplicate dependencies in our tree. We might be able to save some compilation time by auditing those and deduplicating them as much as possible.
- https://github.com/ZcashFoundation/zebra/issues/1582
Co-authored-by: teor <teor@riseup.net>
As a side effect of computing Merkle roots, we build a list of
transaction hashes. Instead of discarding these, add them to
PreparedBlock and FinalizedBlock so that they can be reused rather than
recomputed.
This commit adds Merkle root validation to:
1. the block verifier;
2. the checkpoint verifier.
In the first case, Bitcoin Merkle tree malleability has no effect,
because only a single Merkle tree in each malleablity set is valid (the
others have duplicate transactions).
In the second case, we need to check that the Merkle tree does not contain any
duplicate transactions.
Closes#1385Closes#906
This change introduces two new types:
- `PreparedBlock`, representing a block which has undergone semantic
validation and has been prepared for contextual validation;
- `FinalizedBlock`, representing a block which is ready to be finalized
immediately;
and changes the `Request::CommitBlock`,`Request::CommitFinalizedBlock`
variants to use these types instead of their previous fields.
This change solves the problem of passing data between semantic
validation and contextual validation, and cleans up the state code by
allowing it to pass around a bundle of data. Previously, the state code
just passed around an `Arc<Block>`, which forced it to needlessly
recompute block hashes and other data, and was incompatible with the
already-known but not-yet-implemented data transfer requirements, namely
passing in the Sprout and Sapling anchors computed during contextual
validation.
This commit propagates the `PreparedBlock` and `FinalizedBlock` types
through the state code but only uses their data opportunistically, e.g.,
changing .hash() computations to use the precomputed hash. In the
future, these structures can be extended to pass data through the
verification pipeline for reuse as appropriate. For instance, these
changes allow the sprout and sapling anchors to be propagated through
the state.
This change explicitly documents cancellation contracts for our Tower services,
and tries to correct a bug in the implementation of the CheckpointVerifier,
which duplicates information from the state service but did not ensure that it
would be kept in sync.
The previous debug output printed a message that the chain verifier had
recieved a block. But this provides no additional information compared
to printing no message in chain::Verifier and a message in whichever
verifier the block was sent to, since the resulting spans indicate where
the block was dispatched.
This commit also removes the "unexpected high block" detection; this was
an artefact of the original sync algorithm failing to handle block
advertisements, but we don't have that problem any more, so we can
simplify the code by eliminating that logic.
This reduces the API surface to the minimum required for functionality,
and cleans up module documentation. The stub mempool module is deleted
entirely, since it will need to be redone later anyways.
* refactor block and tx validation errors
* rename errors module to error
* move NoTransactions to BlockError
* clarify some errors, use dbg format for hash in error
* mnake is_coinbase_first return BlockError
* add new error types for each consensus Service
Co-authored-by: Jane Lusby <jane@zfnd.org>
This makes the component verifiers both always return `poll_ready`,
because they do not exert backpressure and cannot fail.
The checkpoint verifier now immediately rejects any blocks that arrive
after it finishes checkpointing, instead of marking the service itself
as failed.
The chain verifier is agnostic to the readiness behavior of its
components, and reports readiness when they are both ready.
This is a really nice function but there might be a bug in its future
implementation: https://github.com/tower-rs/tower/issues/469
This bug may have already been fixed for the 0.4.0 release, so we could change
back then.
This disables one test that can't be easily fixed at the moment, because
it tests the wrong thing: the checkpoint and block verifiers will
produce different transcripts.
It also disables the initial_tip logic for now, pending simplification
of the ChainVerifier logic.
* stop committing to the state in the ChainVerifier
* commit to the state in the BlockVerifier
* commit to the state in the CheckpointVerifier
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
* checkpoint: reject older of duplicate verification requests.
If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled. Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.
* sync: add a timeout layer to block requests.
Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.
* sync: restart syncing on error
Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.
* sync: additional debug info
* sync: handle lookahead limit correctly.
Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped. This meant that completed tasks could be left until the limit was
exceeded again. Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.
* network: add debug instrumentation to retry policy
* sync: instrument the spawned task
* sync: streamline ObtainTips/ExtendTips logic & tracing
This change does three things:
1. It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method. This means that when debugging we only have one
deduplication algorithm to focus on.
2. It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.
3. It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip"). The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.
* sync: hack to work around zcashd behavior
* sync: localize debug statement in extend_tips
* sync: change algorithm to define tips as pairs of hashes.
This is different enough from the existing description that its comments no
longer apply, so I removed them. A further chunk of work is to change the sync
RFC to document this algorithm.
* sync: reduce block timeout
* state: add resource limits for sled
Closes#888
* sync: add a restart timeout constant
* sync: de-pub constants
Log a message with the height when we get duplicate blocks.
Downgrade some verifier errors and warnings to info and debug, because
some peers on mainnet consistently provide bad blocks.
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
* Add checkpoint list generation scripts
* Limit the checkpoint block data size
* Limit the checkpoint height gap
* Add Mainnet and Testnet checkpoint lists
* Parse hard-coded checkpoint lists
The lists were generated using the following limits:
- 256 MB spacing, based on block byte size, and
- 2000 blocks.