zebra

Commit Graph

Author	SHA1	Message	Date
teor	e6e859dce2	Tweak sync timeouts * increase the EWMA default and decay * increase the block download retries * increase the request and block download timeouts * increase the sync timeout	2020-09-08 12:44:33 -07:00
teor	ce12d4dadc	Add timeouts for tip responses and block verify tasks	2020-09-08 12:44:33 -07:00
teor	379ce5c1b8	Retry obtain and extend tips on failure	2020-09-08 12:44:33 -07:00
teor	48497d4857	Ignore sync errors when the block is already verified (#980 ) * Ignore sync errors when the block is already verified If we get an error for a block that is already in our state, we don't need to restart the sync. It was probably a duplicate download. Also: Process any ready tasks before reset, so the logs and metrics are up to date. (But ignore the errors, because we're about to reset.) Improve sync logging and metrics during the download and verify task. * Remove duplicate hashes in logs Co-authored-by: Jane Lusby <jlusby42@gmail.com> * Log the sync hash span at warn level Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-09-04 08:13:00 +10:00
teor	437549d8e9	Always drop the final hash in peer responses (#991 ) To workaround a zcashd bug that squashes responses together.	2020-09-04 08:09:34 +10:00
teor	c770daa51f	If the first ExtendTips hash is bad, discard it and re-check (#992 )	2020-09-04 08:08:19 +10:00
teor	3fdfcb3179	fix: remove old tips that are behind new tips This change makes sync less reliant on the exact order of ObtainTips and ExtendTips responses.	2020-09-01 11:42:48 -04:00
teor	b8e8d4f548	fix: Remove some deeply-nested instrument spans Closes #923.	2020-08-20 14:52:39 -04:00
Henry de Valence	103b663c40	chain: rename BlockHeight to block::Height	2020-08-17 11:46:34 -07:00
Henry de Valence	61dea90e2f	chain: rename BlockHeaderHash to block::Hash This is the first in a sequence of changes that change the block:: items to not include Block as a prefix in their name, in accordance with the Rust API guidelines.	2020-08-17 11:46:34 -07:00
Henry de Valence	948b067808	chain: move Network, NetworkUpgrade to parameters Also, avoid using star-imports of the enum variants, which pollutes the namespace.	2020-08-17 11:46:34 -07:00
Henry de Valence	a79ce97957	Fix sync algorithm. (#887 ) * checkpoint: reject older of duplicate verification requests. If we get a duplicate block verification request, we should drop the older one in favor of the newer one, because the older request is likely to have been canceled. Previously, this code would accept up to four duplicate verification requests, then fail all subsequent ones. * sync: add a timeout layer to block requests. Note that if this timeout is too short, we'll bring down the peer set in a retry storm. * sync: restart syncing on error Restart the syncing process when an error occurs, rather than ignoring it. Restarting means we discard all tips and start over with a new block locator, so we can have another chance to "unstuck" ourselves. * sync: additional debug info * sync: handle lookahead limit correctly. Instead of extracting all the completed task results, the previous code pulled results out until there were fewer tasks than the lookahead limit, then stopped. This meant that completed tasks could be left until the limit was exceeded again. Instead, extract all completed results, and use the number of pending tasks to decide whether to extend the tip or wait for blocks to finish. * network: add debug instrumentation to retry policy * sync: instrument the spawned task * sync: streamline ObtainTips/ExtendTips logic & tracing This change does three things: 1. It aligns the implementation of ObtainTips and ExtendTips so that they use the same deduplication method. This means that when debugging we only have one deduplication algorithm to focus on. 2. It streamlines the tracing output to not include information already included in spans. Both obtain_tips and extend_tips have their own spans attached to the events, so it's not necessary to add Scope: prefixes in messages. 3. It changes the messages to be focused on reporting the actual events rather than the interpretation of the events (e.g., "got genesis hash in response" rather than "peer could not extend tip"). The motivation for this change is that when debugging, the interpretation of events is already known to be incorrect, in the sense that the mental model of the code (no bug) does not match its behavior (has bug), so presenting minimally-interpreted events forces interpretation relative to the actual code. * sync: hack to work around zcashd behavior * sync: localize debug statement in extend_tips * sync: change algorithm to define tips as pairs of hashes. This is different enough from the existing description that its comments no longer apply, so I removed them. A further chunk of work is to change the sync RFC to document this algorithm. * sync: reduce block timeout * state: add resource limits for sled Closes #888 * sync: add a restart timeout constant * sync: de-pub constants	2020-08-12 16:48:01 -07:00
teor	2550c44d48	Make sync ignore known hashes (#853 ) * fix: Handle known ObtainTips correctly enumerate never returns a value beyond the end of the vector. * fix: Ignore known tips in ExtendTips Some peers send us known tips when we try to extend. * fix: Ignore known hashes when downloading Despite all our other checks, we still end up downloading some hashes multiple times. * fix: Increase the number of retries The old sync code relied on duplicate block fetches to make progress, but the last few commits have removed some of those duplicates. Instead, just retry the fetches that fail. * fix: Tweak comments Co-authored-by: Jane Lusby <jlusby42@gmail.com> * fix: Cleanup the state_contains interface in Sync * Fix brackets Oops Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-08-10 16:17:50 -07:00
teor	e95358dbe3	fix: Increase the number of retries The old sync code relied on duplicate block fetches to make progress, but the last few commits have removed some of those duplicates. Instead, just retry the fetches that fail.	2020-08-10 18:58:21 +10:00
teor	faac50697c	feature: Add a verified blocks metrics counter We have a counter for pending "download and verify" futures. But these futures are spawned, so they can complete in any order. They can also complete before we receive their results.	2020-08-10 15:12:08 +10:00
teor	6aeefcee8b	fix: Improve sync diagnostics	2020-08-10 15:12:08 +10:00
Alfredo Garcia	5b3c6e4c6c	Port bash checkpoint scripts to zebra-checkpoints single rust binary (#740 ) * make zebra-checkpoints * fix LOOKAHEAD_LIMIT scope * add a default cli path * change doc usage text * add tracing * move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus * do byte_reverse_hex in a map	2020-07-25 17:53:00 +10:00
Henry de Valence	b59cfc49b7	sync: create requests sequentially to respect backpressure. This seems like a better design on principle but also appears to give a much nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother pattern of block requests.	2020-07-24 18:36:00 -04:00
teor	77a1fefa1e	Download genesis (#731 ) * feature: Add more CheckpointVerifier tracing * fix: Download the genesis block	2020-07-23 10:56:52 -07:00
teor	c95c825707	fix: Lookup the genesis hash based on the network	2020-07-23 03:46:24 -04:00
Henry de Valence	4a98b8fa0d	Add basic metrics to the syncer.	2020-07-22 21:59:00 -07:00
Henry de Valence	c2c2a28e8b	Improve tracing output in chain verifier	2020-07-22 21:59:00 -07:00
Jane Lusby	7d4e717182	Add block locator request to state layer (#712 ) * Add block locator request to state layer * pass genesis in request * Update zebrad/src/commands/start/sync.rs * fix errors	2020-07-22 18:01:31 -07:00
Henry de Valence	49aa41544d	sync: try to ignore spurious inv messages. Closes #697. per https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971 The response to a getblocks message is an inv message with the hashes of the following blocks. However, inv messages are also sent unsolicited to gossip new blocks across the network. Normally, this wouldn't be a problem, because for every other request we filter only for the messages that are relevant to us. But because the response to a getblocks message is an inv, the network layer doesn't (and can't) distinguish between the response inv and the unsolicited inv. But there is a mitigation we can do. In our sync algorithm we have two phases: (1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call "ExtendTips" to extend those as far as possible. The unsolicited inv messages have length 1, but when extending tips we expect to get more than one hash. So we could reject responses in ExtendTips that have length 1 in order to ignore these messages. This way we automatically ignore gossip messages during initial block sync (while we're extending a tip) but we don't ignore length-1 responses while trying to obtain tips (while querying the network for new tips).	2020-07-22 17:55:52 -07:00
Henry de Valence	928b0beb5d	sync: unindent fetch task	2020-07-21 20:16:23 -07:00
Henry de Valence	b722818e02	sync: remove redundant tracing specifier Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-07-21 20:16:23 -07:00
Henry de Valence	1047d2f690	sync: add backpressure to syncer Closes #617. Closes #698. The remaining work on the syncer is alluded to in a new comment: 1. Correctly constructing a block locator object 2. Detecting when we've stopped making progress syncing and restarting obtain_tips.	2020-07-21 20:16:23 -07:00
teor	e5bb96715f	fix: Reduce sync error logs to info or warn Network issues are very common.	2020-07-21 10:13:03 -07:00
teor	851afad01f	fix: Resist CheckpointVerifier memory DoS attacks (#635 ) * fix: Resist CheckpointVerifier memory DoS attacks Allow a maximum of 2 queued blocks at each height, as a tradeoff between efficient bad block rejection, and memory usage. Closes #628. * fix: Make max queued blocks at height equal to fanout * fix: Just allocate all the capacity upfront * fix: Use with_capacity(1) and reserve_exact(1)	2020-07-15 13:27:10 -07:00
Henry de Valence	ff4e722cd7	sync: touch up tracing output.	2020-07-09 11:15:06 -07:00
Jane Lusby	51f6ce86ff	Implement retry policy for syncer (#551 )	2020-07-01 13:35:01 -07:00
Jane Lusby	7245d91fe9	fix block downloading to be parallelized and commited via the verifier (#540 )	2020-06-30 09:42:09 -07:00
Henry de Valence	21bf913b48	Revert "correctly trim and download tips (#531 )" This reverts commit `e102bd5e34`.	2020-06-24 12:24:37 -07:00
Jane Lusby	e102bd5e34	correctly trim and download tips (#531 ) * also download tips and filter tips * dispatch all block downloads together * tweek to match henry's changes * switch to more intuitive match Co-authored-by: Jane Lusby <jane@zfnd.org>	2020-06-24 15:19:34 -04:00
Henry de Valence	a453edd91c	Put type definitions back at the bottom of the file.	2020-06-23 10:16:27 -07:00
Henry de Valence	18eb212d8e	Set the new tips to be the last, not first, hash.	2020-06-23 10:16:27 -07:00
Jane Lusby	1c42b66a4f	Implement sync component for start subcommand (#506 )	2020-06-22 19:24:53 -07:00

37 Commits