Commit Graph

27 Commits

Author SHA1 Message Date
Henry de Valence 948b067808 chain: move Network, NetworkUpgrade to parameters
Also, avoid using star-imports of the enum variants, which pollutes the
namespace.
2020-08-17 11:46:34 -07:00
Henry de Valence a79ce97957
Fix sync algorithm. (#887)
* checkpoint: reject older of duplicate verification requests.

If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled.  Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.

* sync: add a timeout layer to block requests.

Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.

* sync: restart syncing on error

Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.

* sync: additional debug info

* sync: handle lookahead limit correctly.

Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped.  This meant that completed tasks could be left until the limit was
exceeded again.  Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.

* network: add debug instrumentation to retry policy

* sync: instrument the spawned task

* sync: streamline ObtainTips/ExtendTips logic & tracing

This change does three things:

1.  It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method.  This means that when debugging we only have one
deduplication algorithm to focus on.

2.  It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.

3.  It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip").  The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.

* sync: hack to work around zcashd behavior

* sync: localize debug statement in extend_tips

* sync: change algorithm to define tips as pairs of hashes.

This is different enough from the existing description that its comments no
longer apply, so I removed them.  A further chunk of work is to change the sync
RFC to document this algorithm.

* sync: reduce block timeout

* state: add resource limits for sled

Closes #888

* sync: add a restart timeout constant

* sync: de-pub constants
2020-08-12 16:48:01 -07:00
teor 2550c44d48
Make sync ignore known hashes (#853)
* fix: Handle known ObtainTips correctly

enumerate never returns a value beyond the end of the vector.

* fix: Ignore known tips in ExtendTips

Some peers send us known tips when we try to extend.

* fix: Ignore known hashes when downloading

Despite all our other checks, we still end up downloading some hashes
multiple times.

* fix: Increase the number of retries

The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.

Instead, just retry the fetches that fail.

* fix: Tweak comments

Co-authored-by: Jane Lusby <jlusby42@gmail.com>

* fix: Cleanup the state_contains interface in Sync

* Fix brackets

Oops

Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-08-10 16:17:50 -07:00
teor e95358dbe3 fix: Increase the number of retries
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.

Instead, just retry the fetches that fail.
2020-08-10 18:58:21 +10:00
teor faac50697c feature: Add a verified blocks metrics counter
We have a counter for pending "download and verify" futures. But these
futures are spawned, so they can complete in any order. They can also
complete before we receive their results.
2020-08-10 15:12:08 +10:00
teor 6aeefcee8b fix: Improve sync diagnostics 2020-08-10 15:12:08 +10:00
Alfredo Garcia 5b3c6e4c6c
Port bash checkpoint scripts to zebra-checkpoints single rust binary (#740)
* make zebra-checkpoints
* fix LOOKAHEAD_LIMIT scope
* add a default cli path
* change doc usage text
* add tracing
* move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus
* do byte_reverse_hex in a map
2020-07-25 17:53:00 +10:00
Henry de Valence b59cfc49b7 sync: create requests sequentially to respect backpressure.
This seems like a better design on principle but also appears to give a much
nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother
pattern of block requests.
2020-07-24 18:36:00 -04:00
teor 77a1fefa1e
Download genesis (#731)
* feature: Add more CheckpointVerifier tracing

* fix: Download the genesis block
2020-07-23 10:56:52 -07:00
teor c95c825707 fix: Lookup the genesis hash based on the network 2020-07-23 03:46:24 -04:00
Henry de Valence 4a98b8fa0d Add basic metrics to the syncer. 2020-07-22 21:59:00 -07:00
Henry de Valence c2c2a28e8b Improve tracing output in chain verifier 2020-07-22 21:59:00 -07:00
Jane Lusby 7d4e717182
Add block locator request to state layer (#712)
* Add block locator request to state layer

* pass genesis in request

* Update zebrad/src/commands/start/sync.rs

* fix errors
2020-07-22 18:01:31 -07:00
Henry de Valence 49aa41544d sync: try to ignore spurious inv messages.
Closes #697.

per  https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971

The response to a getblocks message is an inv message with the hashes of the
following blocks. However, inv messages are also sent unsolicited to gossip new
blocks across the network. Normally, this wouldn't be a problem, because for
every other request we filter only for the messages that are relevant to us.
But because the response to a getblocks message is an inv, the network layer
doesn't (and can't) distinguish between the response inv and the unsolicited
inv.

But there is a mitigation we can do. In our sync algorithm we have two phases:
(1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call
"ExtendTips" to extend those as far as possible. The unsolicited inv messages
have length 1, but when extending tips we expect to get more than one hash. So
we could reject responses in ExtendTips that have length 1 in order to ignore
these messages. This way we automatically ignore gossip messages during initial
block sync (while we're extending a tip) but we don't ignore length-1 responses
while trying to obtain tips (while querying the network for new tips).
2020-07-22 17:55:52 -07:00
Henry de Valence 928b0beb5d sync: unindent fetch task 2020-07-21 20:16:23 -07:00
Henry de Valence b722818e02 sync: remove redundant tracing specifier
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
2020-07-21 20:16:23 -07:00
Henry de Valence 1047d2f690 sync: add backpressure to syncer
Closes #617.
Closes #698.

The remaining work on the syncer is alluded to in a new comment:

1. Correctly constructing a block locator object
2. Detecting when we've stopped making progress syncing and restarting obtain_tips.
2020-07-21 20:16:23 -07:00
teor e5bb96715f fix: Reduce sync error logs to info or warn
Network issues are very common.
2020-07-21 10:13:03 -07:00
teor 851afad01f
fix: Resist CheckpointVerifier memory DoS attacks (#635)
* fix: Resist CheckpointVerifier memory DoS attacks

Allow a maximum of 2 queued blocks at each height, as a tradeoff between
efficient bad block rejection, and memory usage.

Closes #628.

* fix: Make max queued blocks at height equal to fanout

* fix: Just allocate all the capacity upfront

* fix: Use with_capacity(1) and reserve_exact(1)
2020-07-15 13:27:10 -07:00
Henry de Valence ff4e722cd7 sync: touch up tracing output. 2020-07-09 11:15:06 -07:00
Jane Lusby 51f6ce86ff
Implement retry policy for syncer (#551) 2020-07-01 13:35:01 -07:00
Jane Lusby 7245d91fe9
fix block downloading to be parallelized and commited via the verifier (#540) 2020-06-30 09:42:09 -07:00
Henry de Valence 21bf913b48 Revert "correctly trim and download tips (#531)"
This reverts commit e102bd5e34.
2020-06-24 12:24:37 -07:00
Jane Lusby e102bd5e34
correctly trim and download tips (#531)
* also download tips and filter tips

* dispatch all block downloads together

* tweek to match henry's changes

* switch to more intuitive match

Co-authored-by: Jane Lusby <jane@zfnd.org>
2020-06-24 15:19:34 -04:00
Henry de Valence a453edd91c Put type definitions back at the bottom of the file. 2020-06-23 10:16:27 -07:00
Henry de Valence 18eb212d8e Set the new tips to be the last, not first, hash. 2020-06-23 10:16:27 -07:00
Jane Lusby 1c42b66a4f
Implement sync component for start subcommand (#506) 2020-06-22 19:24:53 -07:00