zebra

Commit Graph

Author	SHA1	Message	Date
teor	628b3e39af	fix(net): Add outer timeouts for critical network operations to avoid hangs (#7869 ) * Refactor out try_to_sync_once() * Add outer timeouts for obtaining and extending tips * Refactor out request_genesis_once() * Wrap genesis download once in a timeout * Increase the genesis timeout to avoid denial of service from old nodes * Add an outer timeout to mempool crawls * Add an outer timeout to mempool download/verify * Remove threaded mutex blocking from the inbound service * Explain why inbound readiness never hangs * Fix whitespace that cargo fmt doesn't * Avoid hangs by always resetting the past lookahead limit flag * Document block-specific and syncer-wide errors * Update zebrad/src/components/sync.rs Co-authored-by: Marek <mail@marek.onl> * Use correct condition for log messages Co-authored-by: Marek <mail@marek.onl> * Keep lookahead reset metric --------- Co-authored-by: Arya <aryasolhi@gmail.com> Co-authored-by: Marek <mail@marek.onl>	2023-11-02 15:00:18 +00:00
Alfredo Garcia	eb07bb31d6	rename(state): Rename state verifiers and related code (#6762 ) * rename verifiers * rename `PreparedBlock` to `SemanticallyVerifiedBlock` * rename `CommitBlock` to `SemanticallyVerifiedBlock` * rename `FinalizedBlock` to `CheckpointVerifiedBlock` * rename `CommitFinalizedBlock` to `CommitCheckpointVerifiedBlock` * rename `FinalizedWithTrees` to `ContextuallyVerifiedBlockWithTrees` * rename `ContextuallyValidBlock` to `ContextuallyVerifiedBlock` * change some `finalized` variables or function arguments to `checkpoint_verified` * fix docs * document the difference between `CheckpointVerifiedBlock` and `ContextuallyVerifiedBlock` * fix doc links * apply suggestions to request Co-authored-by: Marek <mail@marek.onl> * apply suggestions to service Co-authored-by: Marek <mail@marek.onl> * apply suggestions to finalized_state.rs and write.rs Co-authored-by: Marek <mail@marek.onl> * fmt * change some more variable names * change a few missing generics * fix checkpoint log issue * rename more `prepared` vars `semantically_verified` * fix test regex * fix test regex 2 --------- Co-authored-by: Marek <mail@marek.onl>	2023-06-01 12:29:03 +00:00
Marek	2a48d4cf25	change(chain): Refactor the handling of height differences (#6330 ) * Unify the `impl`s of `Sub` and `Add` for `Height` * Adjust tests for `Height` subtraction * Use `Height` instead of `i32` * Use `block:Height` in RPC tests * Use `let .. else` statement Co-authored-by: Arya <aryasolhi@gmail.com> * Update zebra-consensus/src/block/subsidy/general.rs * Refactor the handling of height differences * Remove a redundant comment * Update zebrad/src/components/sync/progress.rs Co-authored-by: Arya <aryasolhi@gmail.com> * Update progress.rs * impl TryFrom<u32> for Height * Make some test assertions clearer * Refactor estimate_up_to() * Restore a comment that was accidentally removed * Document when estimate_distance_to_network_chain_tip() returns None * Change HeightDiff to i64 and make Height.sub(Height) return HeightDiff (no Option) * Update chain tip estimates for HeightDiff i64 * Update subsidy for HeightDiff i64 * Fix some height calculation test edge cases * Fix the funding stream interval calculation --------- Co-authored-by: Arya <aryasolhi@gmail.com> Co-authored-by: teor <teor@riseup.net>	2023-03-29 23:06:31 +00:00
Arya	3cbee9465a	change(rpc): Add proposal capability to getblocktemplate (#5870 ) * adds ValidateBlock request to state * adds `Request` enum in block verifier skips solution check for BlockProposal requests calls CheckBlockValidity instead of Commit block for BlockProposal requests * uses new Request in references to chain verifier * adds getblocktemplate proposal mode response type * makes getblocktemplate-rpcs feature in zebra-consensus select getblocktemplate-rpcs in zebra-state * Adds PR review revisions * adds info log in CheckBlockProposalValidity * Reverts replacement of match statement * adds `GetBlockTemplate::capabilities` fn * conditions calling checkpoint verifier on !request.is_proposal * updates references to validate_and_commit_non_finalized * adds snapshot test, updates test vectors * adds `should_count_metrics` to NonFinalizedState * Returns an error from chain verifier for block proposal requests below checkpoint height adds feature flags * adds "proposal" to GET_BLOCK_TEMPLATE_CAPABILITIES_FIELD * adds back block::Request to zebra-consensus lib * updates snapshots * Removes unnecessary network arg * skips req in tracing intstrument for read state * Moves out block proposal validation to its own fn * corrects `difficulty_threshold_is_valid` docs adds/fixes some comments, adds TODOs general cleanup from a self-review. * Update zebra-state/src/service.rs * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * Update zebra-rpc/src/methods/get_block_template_rpcs.rs Co-authored-by: teor <teor@riseup.net> * check best chain tip * Update zebra-state/src/service.rs Co-authored-by: teor <teor@riseup.net> * Applies cleanup suggestions from code review Co-authored-by: teor <teor@riseup.net> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2023-01-11 23:39:51 +00:00
teor	c4fad29824	fix(sync): Pause new downloads when Zebra reaches the lookahead limit (#5561 ) * Use correct release for getblocktemplate config * Include at least 2 full checkpoints in the lookahead limit * Increase full sync timeout to 36 hours * Only log "synced block height too far ahead of the tip" once * Replace AboveLookaheadHeightLimit error with pausing the syncer * Use AboveLookaheadHeightLimit for blocks a very long way from the tip * Also add the getblocktemplate config, and fix the test message * Remove an outdated TODO comment * Allow syncing again when a small number of blocks are in the queue * Allow some dead code	2022-11-09 04:42:04 +00:00
teor	7b47aac370	Allow extra lookahead in the verifier, state, and block commit queues (#5490 )	2022-10-27 20:16:42 +00:00
teor	c812f880cf	cleanup(clippy): Use inline format strings (#5489 ) * Inline format strings using an automated clippy fix ```sh cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args cargo fmt --all ``` * Remove unused & and &mut using an automated clippy fix ```sh cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args ```	2022-10-27 13:25:18 +00:00
teor	737fbac3fc	Allow extra lookahead blocks in the verifier, state, and block commit task queues (#5465 )	2022-10-24 14:55:57 +00:00
Arya	a28350e742	change(state): Write non-finalized blocks to the state in a separate thread, to avoid network and RPC hangs (#5257 ) * Add a new block commit task and channels, that don't do anything yet * Add last_block_hash_sent to the state service, to avoid database accesses * Update last_block_hash_sent regardless of commit errors * Rename a field to StateService.max_queued_finalized_height * Commit finalized blocks to the state in a separate task * Check for panics in the block write task * Wait for the block commit task in tests, and check for errors * Always run a proptest that sleeps once * Add extra debugging to state shutdowns * Work around a RocksDB shutdown bug * Close the finalized block channel when we're finished with it * Only reset state queue once per error * Update some TODOs * Add a module doc comment * Drop channels and check for closed channels in the block commit task * Close state channels and tasks on drop * Remove some duplicate fields across StateService and ReadStateService * Try tweaking the shutdown steps * Update and clarify some comments * Clarify another comment * Don't try to cancel RocksDB background work on drop * Fix up some comments * Remove some duplicate code * Remove redundant workarounds for shutdown issues * Remode a redundant channel close in the block commit task * Remove a mistaken `!force` shutdown condition * Remove duplicate force-shutdown code and explain it better * Improve RPC error logging * Wait for chain tip updates in the RPC tests * Wait 2 seconds for chain tip updates before skipping them * Remove an unnecessary block_in_place() * Fix some test error messages that were changed by earlier fixes * Expand some comments, fix typos Co-authored-by: Marek <mail@marek.onl> * Actually drop children of failed blocks * Explain why we drop descendants of failed blocks * Clarify a comment * Wait for chain tip updates in a failing test on macOS * Clean duplicate finalized blocks when the non-finalized state activates * Send an error when receiving a duplicate finalized block * Update checkpoint block behaviour, document its consensus rule * Wait for chain tip changes in inbound_block_height_lookahead_limit test * Wait for the genesis block to commit in the fake peer set mempool tests * Disable unreliable mempool verification check in the send transaction test * Appease rustfmt * Use clear_finalized_block_queue() everywhere that blocks are dropped * Document how Finalized and NonFinalized clones are different * sends non-finalized blocks to the block write task * passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState * Update zebra-state/src/service/write.rs Co-authored-by: teor <teor@riseup.net> * updates comments, renames send_process_queued, other minor cleanup * update assert_block_can_be_validated comment * removes `mem` field from StateService * removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state * updates tests that use the disk to use read_service.db instead * moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service * changes `contextual_validity` to get the network from the finalized_state instead of another param * swaps out StateService with FinalizedState and NonFinalizedState in tests * adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain * removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send * makes parent_errors_map an indexmap * clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped * sends non-finalized blocks to the block write task * passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState * updates comments, renames send_process_queued, other minor cleanup * Update zebra-state/src/service/write.rs Co-authored-by: teor <teor@riseup.net> * update assert_block_can_be_validated comment * removes `mem` field from StateService * removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state * updates tests that use the disk to use read_service.db instead * moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service * changes `contextual_validity` to get the network from the finalized_state instead of another param * swaps out StateService with FinalizedState and NonFinalizedState in tests * adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain * removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send * makes parent_errors_map an indexmap * clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped * removes duplicate field definitions on StateService that were a result of a bad merge * update NotReadyToBeCommitted error message * Appear rustfmt * Fix doc links * Rename a function to initial_contextual_validity() * Do error tasks on Err, and success tasks on Ok * Simplify parent_error_map truncation * Rewrite best_tip() to use tip() * Rename latest_mem() to latest_non_finalized_state() ```sh fastmod latest_mem latest_non_finalized_state zebra* cargo fmt --all ``` * Simplify latest_non_finalized_state() using a new WatchReceiver API * Expand some error messages * Send the result after updating the channels, and document why * wait for chain_tip_update before cancelling download in mempool_cancel_mined * adds `sent_non_finalized_block_hashes` field to StateService * adds batched sent_hash insertions and checks sent hashes in queue_and_commit_non_finalized before adding a block to the queue * check that the `curr_buf` in SentHashes is not empty before pushing it to the `sent_bufs` * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * Fix rustfmt * Check for finalized block heights using zs_contains() * adds known_utxos field to SentHashes * updates comment on SentHashes.add method * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * return early when there's a duplicate hash in QueuedBlocks.queue instead of panicking * Make finalized UTXOs near the final checkpoint available for full block verification * Replace a checkpoint height literal with the actual config * Update mainnet and testnet checkpoints - 7 October 2022 * Fix some state service init arguments * Allow more lookahead in the downloader, but less lookahead in the syncer * Add the latest config to the tests, and fix the latest config check * Increase the number of finalized blocks checked for non-finalized block UTXO spends * fix(log): reduce verbose logs for block commits (#5348) * Remove some verbose block write channel logs * Only warn about tracing endpoint if the address is actually set * Use CloneError instead of formatting a non-cloneable error Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> * Increase block verify timeout * Work around a known block timeout bug by using a shorter timeout Co-authored-by: teor <teor@riseup.net> Co-authored-by: Marek <mail@marek.onl> Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2022-10-11 19:25:45 +00:00
teor	87f4308caf	fix(sync): Temporarily set full verification concurrency to 30 blocks (#4726 ) * Return the maximum checkpoint height from the chain verifier * Return the verified block height from the sync downloader * Track the verified height in the syncer * Use a lower concurrency limit during full verification * Get the tip from the state before the first verified block * Limit the number of submitted download and verify blocks in a batch * Adjust lookahead limits when transitioning to full verification * Keep unused extra hashes and submit them to the downloader later * Remove redundant verified_height and state_tip() * Split the checkpoint and full verify concurrency configs * Decrease full verification concurrency to 5 blocks 10 concurrent blocks causes 3 minute stalls on some blocks on my machine. (And it has about 4x as many cores as a standard machine.) * cargo +stable fmt --all * Remove a log that's verbose with smaller lookahead limits * Apply the full verify concurrency limit to the inbound service * Add a summary of the config changes to the CHANGELOG * Increase the default full verify concurrency limit to 30	2022-07-06 10:13:57 -04:00
teor	c75a68e655	fix(sync): change default sync config to improve reliability (#4670 ) * Decrease the default lookahead limit to 400 * Increase the block verification timeout to 10 minutes * Halve the default concurrent downloads config * Try to run the spawned download task before queueing the next download * Allow verification to be cancelled if the verifier is busy	2022-06-22 18:17:21 +00:00
teor	fee10ae014	Add height and hash info to syncer errors (#4287 )	2022-05-11 06:51:06 +00:00
teor	56f766f9b8	fix(sync): fix testnet syncer loop on large Orchard blocks (#4286 ) * Return BlockDownloadVerifyError from download_and_verify * Check block requests and genesis for temporary errors * Ignore DuplicateBlockQueuedForDownload as a temporary error * Propagate error info to the syncer main loop * Sleep after temporary genesis download and verify errors	2022-05-04 22:04:34 +00:00
teor	9f2028feff	3. Send notfound when Zebra doesn't have a block or transaction (#3466 ) * refactor(network): rename Advertised to Available ```sh fastmod Advertised Available zebra* fastmod advertised available zebra* ``` * refactor(network): allow different available and missing types inside an InventoryStatus And rename it to ResponseStatus. Split the methods between ResponseStatus and an InventoryStatus alias. * refactor(network): add a block_hash convenience method to InventoryHash * test(network): improve failure logs for connection tests * fix(inbound): move address sanitization into the response future * feat(network): send notfound when Zebra doesn't have a block or transaction * doc(network): move module docs to the top of each module This makes them more likely to get updated when the module changes. * fix(network): stop sending unsupported missing inventory types to the registry * test(network): inbound messages are forwarded to the registry * test(inbound): test Peers requests to the inbound service, directly and via TCP * test(network): notfound block responses are sent by the inbound service * test(network): notfound tx responses are sent by the inbound service * test(network): increase sync test mock service timeout The code that these tests use hasn't actually changed much, and they are only failing on some platforms (coverage, macOS). So it seems like the extra concurrent inbound tests have pushed them past their time limit. (Perhaps due to TCP system calls, or extra serialization work.) * doc(network): fix typo Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com> * test(network): remove unnecessary multi-threaded runtime from tests This prevents `MockService<zebra_state>` timeouts in the `sync_block_too_high_extend_tips` test, at the cost of reducing coverage of different execution orders. Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2022-02-14 01:51:34 +00:00
teor	fa071562fd	fix(network): increase state concurrency and syncer lookahead (#3455 ) * fix(state): set state concurrency based on other services' concurrency * fix(sync): increase the sync downloader lookahead limit It seems like the recent tokio upgrade made this code even more efficient, so on testnet we can have around 6000 blocks in flight. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>	2022-02-02 22:44:15 +00:00
Alfredo Garcia	3c1ba59001	Reduce log level of components (#3418 ) * reduce log level of components * revert some log downgrades * dedupe log	2022-01-28 14:24:53 -03:00
teor	d076b999f3	Fix syncer download order and add sync tests (#3168 ) * Refactor so that RetryLimit::Future is std::marker::Sync * Make the syncer future std::marker::Send by spawning tips futures * Download synced blocks in chain order, not HashSet order * Improve MockService failure messages * Add closure-based responses to the MockService API * Move MockChainTip to zebra-chain * Add a MockChainTipSender type alias * Support MockChainTip in ChainSync and its downloader * Add syncer tests for obtain tips, extend tips, and wrong block hashes * Add block too high tests for obtain tips and extend tips * Add syncer tests for duplicate FindBlocks response hashes * Allow longer request delays for mocked services in syncer tests	2022-01-11 14:11:35 -03:00
teor	a4d1a1801c	Security: Drop blocks that are a long way ahead of the tip (#3167 ) * Document the chain verifier * Drop gossiped blocks that are too far ahead of the tip * Add extra gossiped block metrics * Allow extra gossiped blocks, now we have a stricter limit * Fix a comment * Check the exact number of blocks in a downloaded block response * Drop synced blocks that are too far ahead of the tip * Add extra synced block metrics * Test dropping gossiped blocks that are too far ahead of the tip * Allow an extra checkpoint's worth of blocks in the verifier queues * Actually let's try two extra checkpoints * Scale extra height limit with lookahead limit * Also drop blocks that are behind the finalized tip * Downgrade a noisy log * Use a debug log for already verified gossiped blocks * Use debug logs for already verified synced blocks	2021-12-17 13:31:51 -03:00
teor	1835ec2c8d	Add diagnostics for peer set hangs (#3203 ) * Use a named CancelHeartbeatTask unit struct for the channel type * Prefer cancel handles in selects, if both are ready * Fix message metrics to just show the command name * Add metrics for internal requests and responses * Add internal requests and responses to the messages dashboard * Add a canceled metric, and peer addresses to request and response metrics * Add a canceled messages graph * Add connection state metrics for currently open connections * Fix the connection state graph with new metrics * Always send an error before dropping pending responses * Move error detail logging into `fail_with` * Delete an unused timer future * Make error strings in metrics less verbose * Downgrade some error logs to info * Remove a redundant expect * Avoid unnecessary allocations for connection state metrics * Fix missed updates to mempool and block gossip metrics	2021-12-14 21:11:03 +00:00
Janito Vaqueiro Ferreira Filho	0960e4fb0b	Update to Tokio 1.13.0 (#2994 ) * Update `tower` to version `0.4.9` Update to latest version to add support for Tokio version 1. * Replace usage of `ServiceExt::ready_and` It was deprecated in favor of `ServiceExt::ready`. * Update Tokio dependency to version `1.13.0` This will break the build because the code isn't ready for the update, but future commits will fix the issues. * Replace import of `tokio::stream::StreamExt` Use `futures::stream::StreamExt` instead, because newer versions of Tokio don't have the `stream` feature. * Use `IntervalStream` in `zebra-network` In newer versions of Tokio `Interval` doesn't implement `Stream`, so the wrapper types from `tokio-stream` have to be used instead. * Use `IntervalStream` in `inventory_registry` In newer versions of Tokio the `Interval` type doesn't implement `Stream`, so `tokio_stream::wrappers::IntervalStream` has to be used instead. * Use `BroadcastStream` in `inventory_registry` In newer versions of Tokio `broadcast::Receiver` doesn't implement `Stream`, so `tokio_stream::wrappers::BroadcastStream` instead. This also requires changing the error type that is used. * Handle `Semaphore::acquire` error in `tower-batch` Newer versions of Tokio can return an error if the semaphore is closed. This shouldn't happen in `tower-batch` because the semaphore is never closed. * Handle `Semaphore::acquire` error in `zebrad` test On newer versions of Tokio `Semaphore::acquire` can return an error if the semaphore is closed. This shouldn't happen in the test because the semaphore is never closed. * Update some `zebra-network` dependencies Use versions compatible with Tokio version 1. * Upgrade Hyper to version 0.14 Use a version that supports Tokio version 1. * Update `metrics` dependency to version 0.17 And also update the `metrics-exporter-prometheus` to version 0.6.1. These updates are to make sure Tokio 1 is supported. * Use `f64` as the histogram data type `u64` isn't supported as the histogram data type in newer versions of `metrics`. * Update the initialization of the metrics component Make it compatible with the new version of `metrics`. * Simplify build version counter Remove all constants and use the new `metrics::incement_counter!` macro. * Change metrics output line to match on The snapshot string isn't included in the newer version of `metrics-exporter-prometheus`. * Update `sentry` to version 0.23.0 Use a version compatible with Tokio version 1. * Remove usage of `TracingIntegration` This seems to not be available from `sentry-tracing` anymore, so it needs to be replaced. * Add sentry layer to tracing initialization This seems like the replacement for `TracingIntegration`. * Remove unnecessary conversion Suggested by a Clippy lint. * Update Cargo lock file Apply all of the updates to dependencies. * Ban duplicate tokio dependencies Also ban git sources for tokio dependencies. * Stop allowing sentry-tracing git repository in `deny.toml` * Allow remaining duplicates after the tokio upgrade * Use C: drive for CI build output on Windows GitHub Actions uses a Windows image with two disk drives, and the default D: drive is smaller than the C: drive. Zebra currently uses a lot of space to build, so it has to use the C: drive to avoid CI build failures because of insufficient space. Co-authored-by: teor <teor@riseup.net>	2021-11-02 18:46:57 +00:00
Conrado Gouvea	84f2c07fbc	Ignore AlreadyInChain error in the syncer (#2890 ) * Ignore AlreadyInChain error in the syncer * Split Cancelled errors; add them to should_restart_sync exceptions * Also filter 'block is already comitted'; try to detect a wrong downcast	2021-10-20 11:07:19 +10:00
Conrado Gouvea	1ccb2de7c7	Add transaction downloader and verifier (#2679 ) * Add transaction downloader * Changed mempool downloader to be like inbound * Verifier working (logs result) * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * Fix coinbase check for mempool, improve is_coinbase() docs * Change other downloads.rs docs to reflect the mempool downloads.rs changes * Change TIMEOUTs to downloads.rs; add docs * Renamed is_coinbase() to has_valid_coinbase_transaction_inputs() and contains_coinbase_input() to has_any_coinbase_inputs(); reorder checks * Validate network upgrade for V4 transactions; check before computing sighash (for V5 too) * Add block_ prefix to downloads and verifier * Update zebra-consensus/src/transaction.rs Co-authored-by: teor <teor@riseup.net> * Add consensus doc; add more Block prefixes Co-authored-by: teor <teor@riseup.net>	2021-09-02 00:06:20 +00:00
teor	306fa88214	Document the correctness of Poll::Pending wakeups	2021-03-27 08:55:49 -04:00
teor	829a6f11c5	Document the behaviour of the `select!` macro	2021-03-27 08:55:49 -04:00
teor	92d95d4be5	Refactor inbound members into a consistent order And add download comments	2021-01-13 20:46:25 -05:00
teor	69fcf64d6c	Disable issue URLs for "duplicate hash" errors (#1517 ) In our README, we tell users to ignore these errors, so we should also disable the issue URL. Also include the hash in the error. (We don't want the span active for all messages, we just want the hash in the error.)	2020-12-16 08:14:42 +10:00
Henry de Valence	2a4a89c002	state,zebrad: tidy span levels for good INFO output This provides useful and not too noisy output at INFO level. We do an info-level message on every block commit instead of trying to do one message every N blocks, because this is useful both for initial block sync as well as continuous state updates on new blocks.	2020-11-23 14:16:39 +10:00
Henry de Valence	f0810b028d	state,consensus,sync: shorten span lengths These changes help reduce the size of the resulting spans, making the output more compact. Together they save about 30-40 characters.	2020-11-23 14:16:39 +10:00
Henry de Valence	b632a24436	zebrad: add diagnostics on cancelled download tasks	2020-11-17 14:56:27 -08:00
Henry de Valence	4c960c4e6d	zebrad: treat duplicate downloads as an error We should error if we notice that we're attempting to download the same blocks multiple times, because that indicates that peers reported bad information to us, or we got confused trying to interpret their responses.	2020-10-26 12:05:35 -07:00
Henry de Valence	91469faf3c	zebrad: eliminate duplicate span in sync	2020-10-26 12:05:35 -07:00
Henry de Valence	1d7309afe2	zebrad: correctly handle duplicates in DownloadSet Using the cancel_handles, we can deduplicate requests. This is important to do, because otherwise when we insert the second cancel handle, we'd drop the first one, cancelling an existing task for no reason.	2020-10-26 12:05:35 -07:00
Henry de Valence	12d25159c6	zebrad: use hedged requests in sync The hedge middleware implements hedged requests, as described in _The Tail At Scale_. The idea is that we auto-tune our retry logic according to the actual network conditions, pre-emptively retrying requests that exceed some latency percentile. This would hopefully solve the problem where our timeouts are too long on mainnet and too slow on testnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	5f229d1475	zebrad: use Downloads in sync Try to use the better cancellation logic to revert to previous sync algorithm. As designed, the sync algorithm is supposed to proceed by downloading state prospectively and handle errors by flushing the pipeline and starting over. This hasn't worked well, because we didn't previously cancel tasks properly. Now that we can, try to use something in the spirit of the original sync algorithm.	2020-10-26 12:05:35 -07:00
Henry de Valence	b90581a3d7	zebrad: create a Downloads Stream for syncing. This makes two changes relative to the existing download code: 1. It uses a oneshot to attempt to cancel the download task after it has started; 2. It encapsulates the download creation and cancellation logic into a Downloads struct.	2020-10-26 12:05:35 -07:00

35 Commits