* Add review guidelines to the default PR template
* Apply suggestions
Co-authored-by: teor <teor@riseup.net>
* Add a Follow Up Work section to the PR template
* Mention design RFCs in the PR template
* Put key PR review questions in bold
* Tweak PR review "skip task" process
* Update .github/pull_request_template.md
Co-authored-by: teor <teor@riseup.net>
* Shorter alternative pull request template
Add Review and Follow Up sections
Add a checklist for documentation and tests
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Implement Expanded to Compact Difficulty
* Implement Arbitrary for CompactDifficulty
Remove the derive, and generate values from random block
hashes.
* Implement Arbitrary for ExpandedDifficulty and Work
* Use Arbitrary for CompactDifficulty in Arbitrary for Block
* Test difficulty on all block test vectors
And cleanup some duplicate test code
* Round-trip tests for compact test cases
* Round-trip tests for compact difficulty in block test vectors
* Make Add for Work return PartialCumulativeWork
Remove AddAssign for Work
Rewrite a proptest using Sub for PartialCumulativeWork
Use Arbitrary for Work
* Add roundtrip work sum tests
* Add roundtrip comparison difficulty tests
* Add failing proptest cases due to test bugs
* Use Some(_) rather than _.into()
* Reduce visibility of difficulty type inner values
* Split work and other difficulty proptests
This change makes sure that rejected work values don't disable property
tests on other types.
Closes#1026
Because of the way that sled uses this parameter, the actual in-memory
size may be much larger. Dialing this down should help avoid high
memory usage.
Remove the minimum data points from the syncer hedge configuragtion.
When there are no data points, hedge sends the second request
immediately.
Where there are less than 1/(1-latency_percentile) data points (20),
hedge delays the second request by the highest recent download time.
This change should improve genesis and post-restart sync latency.
* Run large checkpoint sync tests in CI
* Improve test child output match error context
* Add a debug_stop_at_height config
* Use stop at height in acceptance tests
And add some restart acceptance tests, to make sure the stop at
height feature works correctly.
## Motivation
The zebra-state service needs to be able to handle duplicate blocks.
## Solution
This implements changes already outlined by [The State
RFC](https://zebra.zfnd.org/dev/rfcs/0005-state-updates.html). We check for
successfully committed blocks first, since interacting with the queued blocks
struct at this point just complicates the implimentation. If the block has not
already been committed we then check if the block has already been queued, if
not we handle the block normally (normally here being the bit we already had
implemented).
## Documentation Changes
- [x] Update the state RFC to match the ways this fix departs from the design
- the main thing is that I switched the order of checking for duplicates
- [x] ~~Add newly added functions to the state rfc~~ Decided not to do this because they're minor getters that don't influence the rest of the design and aren't exposed as part of the API
- [x] Document newly added functions inline
## Testing
## Related Issues
- fixes https://github.com/ZcashFoundation/zebra/issues/1182
- tracking issue https://github.com/ZcashFoundation/zebra/issues/1049
Co-authored-by: teor <teor@riseup.net>
We should error if we notice that we're attempting to download the same
blocks multiple times, because that indicates that peers reported bad
information to us, or we got confused trying to interpret their
responses.
We already use an actor model for the state service, so we get an
ordered sequence of state queries by message-passing. Instead of
performing reads in the futures we return, this commit performs them
synchronously. This means that all sled access is done from the same
task, which
(1) might reduce contention
(2) allows us to avoid using sled transactions when writing to the
state.
Co-authored-by: Jane Lusby <jane@zfnd.org>
Co-authored-by: Jane Lusby <jane@zfnd.org>
Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>
The original sync algorithm split the sync process into two phases, one
that obtained prospective chain tips, and another that attempted to
extend those chain tips as far as possible until encountering an error
(at which point the prospective state is discarded and the process
restarts).
Because a previous implementation of this algorithm didn't properly
enforce linkage between segments of the chain while extending tips,
sometimes it would get confused and fail to discard responses that did
not extend a tip. To mitigate this, a check against the state was
added. However, this check can cause stalls while checkpointing,
because when a checkpoint is reached we may suddenly need to commit
thousands of blocks to the state. Because the sync algorithm now has a
a `CheckedTip` structure that ensures that a new segment of hashes
actually extends an existing one, we don't need to check against the
state while extending a tip, because we don't get confused while
interpreting responses.
This change results in significantly smoother progress on mainnet.
There's no reason to return a pre-Buffer'd service (there's no need for
internal access to the state service, as in zebra-network), but wrapping
it internally removes control of the buffer size from the caller.
The previous debug output printed a message that the chain verifier had
recieved a block. But this provides no additional information compared
to printing no message in chain::Verifier and a message in whichever
verifier the block was sent to, since the resulting spans indicate where
the block was dispatched.
This commit also removes the "unexpected high block" detection; this was
an artefact of the original sync algorithm failing to handle block
advertisements, but we don't have that problem any more, so we can
simplify the code by eliminating that logic.
The timeout behavior in zebra-network is an implementation detail, not a
feature of the public API. So it shouldn't be mentioned in the doc
comments -- if we want timeout behavior, we have to layer it ourselves.
Using the cancel_handles, we can deduplicate requests. This is
important to do, because otherwise when we insert the second cancel
handle, we'd drop the first one, cancelling an existing task for no
reason.