This addresses at least three pain points:
- we were affected by bugs that were already fixed in git, but not in
the released crate;
- we can use service combinators to transform requests and responses;
- we can use the hedge middleware.
The version in git is still marked as 0.3.1 but these changes will be
part of tower 0.4: https://github.com/tower-rs/tower/issues/431
* refactor block and tx validation errors
* rename errors module to error
* move NoTransactions to BlockError
* clarify some errors, use dbg format for hash in error
* mnake is_coinbase_first return BlockError
* add new error types for each consensus Service
Co-authored-by: Jane Lusby <jane@zfnd.org>
Using a Buffer with size 1 is a footgun because it allows only one
sender to call poll_ready at a time. This is usually undesirable
because it means that a task or service that calls poll_ready but only
makes a service call later (potentially much later) will block all other
callers.
This makes the component verifiers both always return `poll_ready`,
because they do not exert backpressure and cannot fail.
The checkpoint verifier now immediately rejects any blocks that arrive
after it finishes checkpointing, instead of marking the service itself
as failed.
The chain verifier is agnostic to the readiness behavior of its
components, and reports readiness when they are both ready.
This is a really nice function but there might be a bug in its future
implementation: https://github.com/tower-rs/tower/issues/469
This bug may have already been fixed for the 0.4.0 release, so we could change
back then.
This test aimed to exercise both the checkpoint and block verifiers by
making a checkpoint list of length 1. However, the block verifier can't
work on any blocks below Sapling activation.
Instead of conditionally parsing the hardcoded checkpoint list and
optionally making a CheckpointVerifier, make one unconditionally, and
use the config settings to decide whether to route responses to it.
Then, fix up all of the places needed to make it compile and remove all
of the dead code.
This disables one test that can't be easily fixed at the moment, because
it tests the wrong thing: the checkpoint and block verifiers will
produce different transcripts.
It also disables the initial_tip logic for now, pending simplification
of the ChainVerifier logic.
* stop committing to the state in the ChainVerifier
* commit to the state in the BlockVerifier
* commit to the state in the CheckpointVerifier
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove in-memory state service
* make the config compatible with toml again
* checkpoint commit to see how much I still have to revert
* back to the starting point...
* remove unused dependency
* reorganize error handling a bit
* need to make a new color-eyre release now
* reorder again because I have problems
* remove unnecessary helpers
* revert changes to config loading
* add back missing space
* Switch to released color-eyre version
* add back missing newline again...
* improve error message on unix when terminated by signal
* add context to last few asserts in acceptance tests
* instrument some of the helpers
* remove accidental extra space
* try to make this compile on windows
* reorg platform specific code
* hide on_disk module and fix broken link
* add CheckpointList::new_up_to(limit: NetworkUpgrade)
* if checkpoint_sync is false, limit checkpoints to Sapling
* update tests for CheckpointList and chain::init
This is the first in a sequence of changes that change the block:: items
to not include Block as a prefix in their name, in accordance with the
Rust API guidelines.
* checkpoint: reject older of duplicate verification requests.
If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled. Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.
* sync: add a timeout layer to block requests.
Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.
* sync: restart syncing on error
Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.
* sync: additional debug info
* sync: handle lookahead limit correctly.
Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped. This meant that completed tasks could be left until the limit was
exceeded again. Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.
* network: add debug instrumentation to retry policy
* sync: instrument the spawned task
* sync: streamline ObtainTips/ExtendTips logic & tracing
This change does three things:
1. It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method. This means that when debugging we only have one
deduplication algorithm to focus on.
2. It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.
3. It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip"). The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.
* sync: hack to work around zcashd behavior
* sync: localize debug statement in extend_tips
* sync: change algorithm to define tips as pairs of hashes.
This is different enough from the existing description that its comments no
longer apply, so I removed them. A further chunk of work is to change the sync
RFC to document this algorithm.
* sync: reduce block timeout
* state: add resource limits for sled
Closes#888
* sync: add a restart timeout constant
* sync: de-pub constants
* change several tests to transcript in consensus chain tests
* rename transcripts
* rename state transcript
* fix spandocs
* add timeout layer to tests
* run transcripts on the wrapped timeout service, remove ready calls
Log a message with the height when we get duplicate blocks.
Downgrade some verifier errors and warnings to info and debug, because
some peers on mainnet consistently provide bad blocks.
Reduce the amount of time that the block verifier waits for the previous
block, before polling again.
(Waiting for 2 seconds resulted in some apparent block verifier hangs.)
This is a temporary fix, until the state layer handles context checks.
We can use this network upgrade to implement different consensus rules
and chain context handling for genesis blocks.
Part of the chain state design in #682.
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
We had a brief discussion on discord and it seemed like we had consensus on the
following versioning policy:
* zebrad: match major version to NU version, so we will start by releasing
zebrad 3.0.0;
* zebra-* libraries: start by matching zebrad's version, then increment major
versions of each library as we need to make breaking changes (potentially
faster than the zebrad version, always respecting semver but making no
guarantees about the longevity of major releases).
This commit sets all of the crate versions to 3.0.0-alpha.0 -- the -alpha.0
marks it as a prerelease not subject to perfect adherence to compatibility
guarantees.
* Add checkpoint list generation scripts
* Limit the checkpoint block data size
* Limit the checkpoint height gap
* Add Mainnet and Testnet checkpoint lists
* Parse hard-coded checkpoint lists
The lists were generated using the following limits:
- 256 MB spacing, based on block byte size, and
- 2000 blocks.
* Add MIN and MAX for BlockHeight and LockTime
* Remove duplicate test cases
* fix a comment about the minimum lock time
The minimum LockTime::Time is 5 November 1985 00:53:20 UTC, so the first
day that only contains valid times is 6 November 1985 (in all timezones).
Similarly, the maximum LockTime::Time is 7 February 2106 06::28::15 UTC,
so the last day that only contains valid times in all time zones is
5 February 2106.
* fix: Reject checkpoint lists with bad hashes or heights
Reject the all-zeroes hash, because it is the parent hash of the genesis
block, and should never appear in a checkpoint list.
Reject checkpoint heights that are greater than the maximum block
height.
* fix: Resist CheckpointVerifier memory DoS attacks
Allow a maximum of 2 queued blocks at each height, as a tradeoff between
efficient bad block rejection, and memory usage.
Closes#628.
* fix: Make max queued blocks at height equal to fanout
* fix: Just allocate all the capacity upfront
* fix: Use with_capacity(1) and reserve_exact(1)
* Flatten consensus::verify::* to consensus::*
* Move consensus::*::tests into their own files
* Move CheckpointList into its own file
* Move Progress and Target into a types module
QueuedBlock and QueuedBlockList can stay in checkpoint.rs, because
they are tightly coupled to CheckpointVerifier.
Using tower-batch-based async pattern.
Now the Verifier is agnostic of redjubjub SigTypes. Updated tests to
generate sigs of both types and batch verifies the whole batch.
Resolves#407
* Return Poll::Ready(Err(_)) when verification has finished
* Turn checkpoint::init() into CheckpointVerifier::new()
* Accept IntoIterator<...> for CheckpointVerifier::new()
* Add a CheckpointList type
* Replace the state service with oneshot channels.
* Reject redundant checkpoint blocks
* impl Drop for CheckpointVerifier
* Add fields for caching blocks, and managing verify chains.
* Add current checkpoint functions
* Use a checkpoint range
* Get full backtraces with Err::Try
* Add enums for verification progress and target block heights.
* Replace install_tracing() with zebra_test::init()
* Add a test that mixes good and bad blocks
* Add timeouts to the checkpoint test futures
* Use spandoc correctly
* Refactor consensus test error handling
* Delete a checkpoint test that will soon be obsolete
* Only initialise tracing once for the block tests
* Use tracing in the checkpoint tests
* Move BlockVerifier and tests into block.rs
* Update a BlockVerifier comment
* Tweak some TODO comments
node_time_check() is a small function, so we inline it into its callers.
(And then rename node_time_check_helper() to node_time_check().)
Part of #477.
We don't want to call the state's AddBlock until we know the block is
valid. This avoids subtle bugs if the state is modified in call().
(in_memory currently modifies the state in call(), on_disk modifies the
state in the async block.)
Part of #477.
consensus: Add a checkpoint verifier stub
This stub only verifies blocks whose hashes are in the checkpoint
list.
It doesn't have tests, chain child verifies to their ancestors, or
support checkpoint maximum height queries.
Part of #429.
Move block header hashing from zebra-consensus to zebra-state.
Handle zebra-state AddBlock errors in zebra-consensus BlockVerifier.
Add unit tests for BlockVerifier state error handling.
Part of #428.
Placing bounds on the service's future is less ideal, because the future is
already constrained by the `Service` trait, so the bounds can be expressed more
directly and simply by bounding the service itself.
If the verification service already has to have a generic parameter for the
future (the `ZSF`), it could instead be generic over `S`, the storage service.
This has the upside that it's no longer required for the verification service
to box the storage service, so we don't add any extra layers of indirection,
and the where bounds become more straightforward, since they're centered on the
requirements for the storage service itself, not the future it returns.
Finally, we can simplify the bounds by using the request / response types
directly rather than defining wrapper types.
The reason the test failed is that the future returned by `call` on the state
service was immediately dropped, rather than being driven to completion.
Instead, we link the state update future with the verification future by
.awaiting it in an async block.
Note that the state update future is constructed outside of the async block
returned by the verification service. Why? Because calling
`self.state_service.call` requires mutable access to `state_service`, which we
only have in the body of the verification service's `call` method, and not
during execution of the async block it returns (which could happen at some
later point, or never).
In Tower's model, the `call` method itself only does the work necessary to
construct a future that represents completion of the service call, and the rest
of the work is done in the future itself. In particular, the fact that
`Service::call` takes `&mut self` means two things:
1. the service's state can be mutated while setting up the future, but not
during the future's subsequent execution,
2. any nested service calls made *by* the service *to* sub-services (e.g., the
verification service calling the state service) must either be made upfront,
while constructing the response future, or must be made to a clone of the
sub-service owned by the the response future.
We want to allow different state service implementations, and wrapped
state services. So we make verify::init() take a state_service, and
store that service in the BlockVerifier state_service field.
Part of #428.
* Fix authorship, license information.
I *thought* I had done a sed pass over the Cargo defaults when doing
repository initialization, but I guess I missed it or something.
Anyways, fixed now.