Commit Graph

35 Commits

Author SHA1 Message Date
teor 628b3e39af
fix(net): Add outer timeouts for critical network operations to avoid hangs (#7869)
* Refactor out try_to_sync_once()

* Add outer timeouts for obtaining and extending tips

* Refactor out request_genesis_once()

* Wrap genesis download once in a timeout

* Increase the genesis timeout to avoid denial of service from old nodes

* Add an outer timeout to mempool crawls

* Add an outer timeout to mempool download/verify

* Remove threaded mutex blocking from the inbound service

* Explain why inbound readiness never hangs

* Fix whitespace that cargo fmt doesn't

* Avoid hangs by always resetting the past lookahead limit flag

* Document block-specific and syncer-wide errors

* Update zebrad/src/components/sync.rs

Co-authored-by: Marek <mail@marek.onl>

* Use correct condition for log messages

Co-authored-by: Marek <mail@marek.onl>

* Keep lookahead reset metric

---------

Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: Marek <mail@marek.onl>
2023-11-02 15:00:18 +00:00
Alfredo Garcia eb07bb31d6
rename(state): Rename state verifiers and related code (#6762)
* rename verifiers

* rename `PreparedBlock` to `SemanticallyVerifiedBlock`

* rename `CommitBlock` to `SemanticallyVerifiedBlock`

* rename `FinalizedBlock` to `CheckpointVerifiedBlock`

* rename `CommitFinalizedBlock` to `CommitCheckpointVerifiedBlock`

* rename `FinalizedWithTrees` to `ContextuallyVerifiedBlockWithTrees`

* rename `ContextuallyValidBlock` to `ContextuallyVerifiedBlock`

* change some `finalized` variables or function arguments to `checkpoint_verified`

* fix docs

* document the difference between `CheckpointVerifiedBlock` and `ContextuallyVerifiedBlock`

* fix doc links

* apply suggestions to request

Co-authored-by: Marek <mail@marek.onl>

* apply suggestions to service

Co-authored-by: Marek <mail@marek.onl>

* apply suggestions to finalized_state.rs and write.rs

Co-authored-by: Marek <mail@marek.onl>

* fmt

* change some more variable names

* change a few missing generics

* fix checkpoint log issue

* rename more `prepared` vars `semantically_verified`

* fix test regex

* fix test regex 2

---------

Co-authored-by: Marek <mail@marek.onl>
2023-06-01 12:29:03 +00:00
Marek 2a48d4cf25
change(chain): Refactor the handling of height differences (#6330)
* Unify the `impl`s of `Sub` and `Add` for `Height`

* Adjust tests for `Height` subtraction

* Use `Height` instead of `i32`

* Use `block:Height` in RPC tests

* Use `let .. else` statement

Co-authored-by: Arya <aryasolhi@gmail.com>

* Update zebra-consensus/src/block/subsidy/general.rs

* Refactor the handling of height differences

* Remove a redundant comment

* Update zebrad/src/components/sync/progress.rs

Co-authored-by: Arya <aryasolhi@gmail.com>

* Update progress.rs

* impl TryFrom<u32> for Height

* Make some test assertions clearer

* Refactor estimate_up_to()

* Restore a comment that was accidentally removed

* Document when estimate_distance_to_network_chain_tip() returns None

* Change HeightDiff to i64 and make Height.sub(Height) return HeightDiff (no Option)

* Update chain tip estimates for HeightDiff i64

* Update subsidy for HeightDiff i64

* Fix some height calculation test edge cases

* Fix the funding stream interval calculation

---------

Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2023-03-29 23:06:31 +00:00
Arya 3cbee9465a
change(rpc): Add proposal capability to getblocktemplate (#5870)
* adds ValidateBlock request to state

* adds `Request` enum in block verifier

skips solution check for BlockProposal requests

calls CheckBlockValidity instead of Commit block for BlockProposal requests

* uses new Request in references to chain verifier

* adds getblocktemplate proposal mode response type

* makes getblocktemplate-rpcs feature in zebra-consensus select getblocktemplate-rpcs in zebra-state

* Adds PR review revisions

* adds info log in CheckBlockProposalValidity

* Reverts replacement of match statement

* adds `GetBlockTemplate::capabilities` fn

* conditions calling checkpoint verifier on !request.is_proposal

* updates references to validate_and_commit_non_finalized

* adds snapshot test, updates test vectors

* adds `should_count_metrics` to NonFinalizedState

* Returns an error from chain verifier for block proposal requests below checkpoint height

adds feature flags

* adds "proposal" to GET_BLOCK_TEMPLATE_CAPABILITIES_FIELD

* adds back block::Request to zebra-consensus lib

* updates snapshots

* Removes unnecessary network arg

* skips req in tracing intstrument for read state

* Moves out block proposal validation to its own fn

* corrects `difficulty_threshold_is_valid` docs

adds/fixes some comments, adds TODOs

general cleanup from a self-review.

* Update zebra-state/src/service.rs

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Update zebra-rpc/src/methods/get_block_template_rpcs.rs

Co-authored-by: teor <teor@riseup.net>

* check best chain tip

* Update zebra-state/src/service.rs

Co-authored-by: teor <teor@riseup.net>

* Applies cleanup suggestions from code review

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2023-01-11 23:39:51 +00:00
teor c4fad29824
fix(sync): Pause new downloads when Zebra reaches the lookahead limit (#5561)
* Use correct release for getblocktemplate config

* Include at least 2 full checkpoints in the lookahead limit

* Increase full sync timeout to 36 hours

* Only log "synced block height too far ahead of the tip" once

* Replace AboveLookaheadHeightLimit error with pausing the syncer

* Use AboveLookaheadHeightLimit for blocks a very long way from the tip

* Also add the getblocktemplate config, and fix the test message

* Remove an outdated TODO comment

* Allow syncing again when a small number of blocks are in the queue

* Allow some dead code
2022-11-09 04:42:04 +00:00
teor 7b47aac370
Allow extra lookahead in the verifier, state, and block commit queues (#5490) 2022-10-27 20:16:42 +00:00
teor c812f880cf
cleanup(clippy): Use inline format strings (#5489)
* Inline format strings using an automated clippy fix

```sh
cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args
cargo fmt --all
```

* Remove unused & and &mut using an automated clippy fix

```sh
cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args
```
2022-10-27 13:25:18 +00:00
teor 737fbac3fc
Allow extra lookahead blocks in the verifier, state, and block commit task queues (#5465) 2022-10-24 14:55:57 +00:00
Arya a28350e742
change(state): Write non-finalized blocks to the state in a separate thread, to avoid network and RPC hangs (#5257)
* Add a new block commit task and channels, that don't do anything yet

* Add last_block_hash_sent to the state service, to avoid database accesses

* Update last_block_hash_sent regardless of commit errors

* Rename a field to StateService.max_queued_finalized_height

* Commit finalized blocks to the state in a separate task

* Check for panics in the block write task

* Wait for the block commit task in tests, and check for errors

* Always run a proptest that sleeps once

* Add extra debugging to state shutdowns

* Work around a RocksDB shutdown bug

* Close the finalized block channel when we're finished with it

* Only reset state queue once per error

* Update some TODOs

* Add a module doc comment

* Drop channels and check for closed channels in the block commit task

* Close state channels and tasks on drop

* Remove some duplicate fields across StateService and ReadStateService

* Try tweaking the shutdown steps

* Update and clarify some comments

* Clarify another comment

* Don't try to cancel RocksDB background work on drop

* Fix up some comments

* Remove some duplicate code

* Remove redundant workarounds for shutdown issues

* Remode a redundant channel close in the block commit task

* Remove a mistaken `!force` shutdown condition

* Remove duplicate force-shutdown code and explain it better

* Improve RPC error logging

* Wait for chain tip updates in the RPC tests

* Wait 2 seconds for chain tip updates before skipping them

* Remove an unnecessary block_in_place()

* Fix some test error messages that were changed by earlier fixes

* Expand some comments, fix typos

Co-authored-by: Marek <mail@marek.onl>

* Actually drop children of failed blocks

* Explain why we drop descendants of failed blocks

* Clarify a comment

* Wait for chain tip updates in a failing test on macOS

* Clean duplicate finalized blocks when the non-finalized state activates

* Send an error when receiving a duplicate finalized block

* Update checkpoint block behaviour, document its consensus rule

* Wait for chain tip changes in inbound_block_height_lookahead_limit test

* Wait for the genesis block to commit in the fake peer set mempool tests

* Disable unreliable mempool verification check in the send transaction test

* Appease rustfmt

* Use clear_finalized_block_queue() everywhere that blocks are dropped

* Document how Finalized and NonFinalized clones are different

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <teor@riseup.net>

* updates comments, renames send_process_queued, other minor cleanup

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* updates comments, renames send_process_queued, other minor cleanup

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <teor@riseup.net>

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* removes duplicate field definitions on StateService that were a result of a bad merge

* update NotReadyToBeCommitted error message

* Appear rustfmt

* Fix doc links

* Rename a function to initial_contextual_validity()

* Do error tasks on Err, and success tasks on Ok

* Simplify parent_error_map truncation

* Rewrite best_tip() to use tip()

* Rename latest_mem() to latest_non_finalized_state()

```sh
fastmod latest_mem latest_non_finalized_state zebra*
cargo fmt --all
```

* Simplify latest_non_finalized_state() using a new WatchReceiver API

* Expand some error messages

* Send the result after updating the channels, and document why

* wait for chain_tip_update before cancelling download in mempool_cancel_mined

* adds `sent_non_finalized_block_hashes` field to StateService

* adds batched sent_hash insertions and checks sent hashes in queue_and_commit_non_finalized before adding a block to the queue

* check that the `curr_buf` in SentHashes is not empty before pushing it to the `sent_bufs`

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt

* Check for finalized block heights using zs_contains()

* adds known_utxos field to SentHashes

* updates comment on SentHashes.add method

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* return early when there's a duplicate hash in QueuedBlocks.queue instead of panicking

* Make finalized UTXOs near the final checkpoint available for full block verification

* Replace a checkpoint height literal with the actual config

* Update mainnet and testnet checkpoints - 7 October 2022

* Fix some state service init arguments

* Allow more lookahead in the downloader, but less lookahead in the syncer

* Add the latest config to the tests, and fix the latest config check

* Increase the number of finalized blocks checked for non-finalized block UTXO spends

* fix(log): reduce verbose logs for block commits (#5348)

* Remove some verbose block write channel logs

* Only warn about tracing endpoint if the address is actually set

* Use CloneError instead of formatting a non-cloneable error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Increase block verify timeout

* Work around a known block timeout bug by using a shorter timeout

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Marek <mail@marek.onl>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-10-11 19:25:45 +00:00
teor 87f4308caf
fix(sync): Temporarily set full verification concurrency to 30 blocks (#4726)
* Return the maximum checkpoint height from the chain verifier

* Return the verified block height from the sync downloader

* Track the verified height in the syncer

* Use a lower concurrency limit during full verification

* Get the tip from the state before the first verified block

* Limit the number of submitted download and verify blocks in a batch

* Adjust lookahead limits when transitioning to full verification

* Keep unused extra hashes and submit them to the downloader later

* Remove redundant verified_height and state_tip()

* Split the checkpoint and full verify concurrency configs

* Decrease full verification concurrency to 5 blocks

10 concurrent blocks causes 3 minute stalls on some blocks on my machine.
(And it has about 4x as many cores as a standard machine.)

* cargo +stable fmt --all

* Remove a log that's verbose with smaller lookahead limits

* Apply the full verify concurrency limit to the inbound service

* Add a summary of the config changes to the CHANGELOG

* Increase the default full verify concurrency limit to 30
2022-07-06 10:13:57 -04:00
teor c75a68e655
fix(sync): change default sync config to improve reliability (#4670)
* Decrease the default lookahead limit to 400

* Increase the block verification timeout to 10 minutes

* Halve the default concurrent downloads config

* Try to run the spawned download task before queueing the next download

* Allow verification to be cancelled if the verifier is busy
2022-06-22 18:17:21 +00:00
teor fee10ae014
Add height and hash info to syncer errors (#4287) 2022-05-11 06:51:06 +00:00
teor 56f766f9b8
fix(sync): fix testnet syncer loop on large Orchard blocks (#4286)
* Return BlockDownloadVerifyError from download_and_verify

* Check block requests and genesis for temporary errors

* Ignore DuplicateBlockQueuedForDownload as a temporary error

* Propagate error info to the syncer main loop

* Sleep after temporary genesis download and verify errors
2022-05-04 22:04:34 +00:00
teor 9f2028feff
3. Send notfound when Zebra doesn't have a block or transaction (#3466)
* refactor(network): rename Advertised to Available

```sh
fastmod Advertised Available zebra*
fastmod advertised available zebra*
```

* refactor(network): allow different available and missing types inside an InventoryStatus

And rename it to ResponseStatus.

Split the methods between ResponseStatus and an InventoryStatus alias.

* refactor(network): add a block_hash convenience method to InventoryHash

* test(network): improve failure logs for connection tests

* fix(inbound): move address sanitization into the response future

* feat(network): send notfound when Zebra doesn't have a block or transaction

* doc(network): move module docs to the top of each module

This makes them more likely to get updated when the module changes.

* fix(network): stop sending unsupported missing inventory types to the registry

* test(network): inbound messages are forwarded to the registry

* test(inbound): test Peers requests to the inbound service, directly and via TCP

* test(network): notfound block responses are sent by the inbound service

* test(network): notfound tx responses are sent by the inbound service

* test(network): increase sync test mock service timeout

The code that these tests use hasn't actually changed much,
and they are only failing on some platforms (coverage, macOS).

So it seems like the extra concurrent inbound tests have pushed them
past their time limit.
(Perhaps due to TCP system calls, or extra serialization work.)

* doc(network): fix typo

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

* test(network): remove unnecessary multi-threaded runtime from tests

This prevents `MockService<zebra_state>` timeouts
in the `sync_block_too_high_extend_tips` test,
at the cost of reducing coverage of different execution orders.

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2022-02-14 01:51:34 +00:00
teor fa071562fd
fix(network): increase state concurrency and syncer lookahead (#3455)
* fix(state): set state concurrency based on other services' concurrency

* fix(sync): increase the sync downloader lookahead limit

It seems like the recent tokio upgrade made this code even more efficient,
so on testnet we can have around 6000 blocks in flight.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-02 22:44:15 +00:00
Alfredo Garcia 3c1ba59001
Reduce log level of components (#3418)
* reduce log level of components

* revert some log downgrades

* dedupe log
2022-01-28 14:24:53 -03:00
teor d076b999f3
Fix syncer download order and add sync tests (#3168)
* Refactor so that RetryLimit::Future is std::marker::Sync

* Make the syncer future std::marker::Send by spawning tips futures

* Download synced blocks in chain order, not HashSet order

* Improve MockService failure messages

* Add closure-based responses to the MockService API

* Move MockChainTip to zebra-chain

* Add a MockChainTipSender type alias

* Support MockChainTip in ChainSync and its downloader

* Add syncer tests for obtain tips, extend tips, and wrong block hashes

* Add block too high tests for obtain tips and extend tips

* Add syncer tests for duplicate FindBlocks response hashes

* Allow longer request delays for mocked services in syncer tests
2022-01-11 14:11:35 -03:00
teor a4d1a1801c
Security: Drop blocks that are a long way ahead of the tip (#3167)
* Document the chain verifier

* Drop gossiped blocks that are too far ahead of the tip

* Add extra gossiped block metrics

* Allow extra gossiped blocks, now we have a stricter limit

* Fix a comment

* Check the exact number of blocks in a downloaded block response

* Drop synced blocks that are too far ahead of the tip

* Add extra synced block metrics

* Test dropping gossiped blocks that are too far ahead of the tip

* Allow an extra checkpoint's worth of blocks in the verifier queues

* Actually let's try two extra checkpoints

* Scale extra height limit with lookahead limit

* Also drop blocks that are behind the finalized tip

* Downgrade a noisy log

* Use a debug log for already verified gossiped blocks

* Use debug logs for already verified synced blocks
2021-12-17 13:31:51 -03:00
teor 1835ec2c8d
Add diagnostics for peer set hangs (#3203)
* Use a named CancelHeartbeatTask unit struct for the channel type

* Prefer cancel handles in selects, if both are ready

* Fix message metrics to just show the command name

* Add metrics for internal requests and responses

* Add internal requests and responses to the messages dashboard

* Add a canceled metric, and peer addresses to request and response metrics

* Add a canceled messages graph

* Add connection state metrics for currently open connections

* Fix the connection state graph with new metrics

* Always send an error before dropping pending responses

* Move error detail logging into `fail_with`

* Delete an unused timer future

* Make error strings in metrics less verbose

* Downgrade some error logs to info

* Remove a redundant expect

* Avoid unnecessary allocations for connection state metrics

* Fix missed updates to mempool and block gossip metrics
2021-12-14 21:11:03 +00:00
Janito Vaqueiro Ferreira Filho 0960e4fb0b
Update to Tokio 1.13.0 (#2994)
* Update `tower` to version `0.4.9`

Update to latest version to add support for Tokio version 1.

* Replace usage of `ServiceExt::ready_and`

It was deprecated in favor of `ServiceExt::ready`.

* Update Tokio dependency to version `1.13.0`

This will break the build because the code isn't ready for the update,
but future commits will fix the issues.

* Replace import of `tokio::stream::StreamExt`

Use `futures::stream::StreamExt` instead, because newer versions of
Tokio don't have the `stream` feature.

* Use `IntervalStream` in `zebra-network`

In newer versions of Tokio `Interval` doesn't implement `Stream`, so the
wrapper types from `tokio-stream` have to be used instead.

* Use `IntervalStream` in `inventory_registry`

In newer versions of Tokio the `Interval` type doesn't implement
`Stream`, so `tokio_stream::wrappers::IntervalStream` has to be used
instead.

* Use `BroadcastStream` in `inventory_registry`

In newer versions of Tokio `broadcast::Receiver` doesn't implement
`Stream`, so `tokio_stream::wrappers::BroadcastStream` instead. This
also requires changing the error type that is used.

* Handle `Semaphore::acquire` error in `tower-batch`

Newer versions of Tokio can return an error if the semaphore is closed.
This shouldn't happen in `tower-batch` because the semaphore is never
closed.

* Handle `Semaphore::acquire` error in `zebrad` test

On newer versions of Tokio `Semaphore::acquire` can return an error if
the semaphore is closed. This shouldn't happen in the test because the
semaphore is never closed.

* Update some `zebra-network` dependencies

Use versions compatible with Tokio version 1.

* Upgrade Hyper to version 0.14

Use a version that supports Tokio version 1.

* Update `metrics` dependency to version 0.17

And also update the `metrics-exporter-prometheus` to version 0.6.1.
These updates are to make sure Tokio 1 is supported.

* Use `f64` as the histogram data type

`u64` isn't supported as the histogram data type in newer versions of
`metrics`.

* Update the initialization of the metrics component

Make it compatible with the new version of `metrics`.

* Simplify build version counter

Remove all constants and use the new `metrics::incement_counter!` macro.

* Change metrics output line to match on

The snapshot string isn't included in the newer version of
`metrics-exporter-prometheus`.

* Update `sentry` to version 0.23.0

Use a version compatible with Tokio version 1.

* Remove usage of `TracingIntegration`

This seems to not be available from `sentry-tracing` anymore, so it
needs to be replaced.

* Add sentry layer to tracing initialization

This seems like the replacement for `TracingIntegration`.

* Remove unnecessary conversion

Suggested by a Clippy lint.

* Update Cargo lock file

Apply all of the updates to dependencies.

* Ban duplicate tokio dependencies

Also ban git sources for tokio dependencies.

* Stop allowing sentry-tracing git repository in `deny.toml`

* Allow remaining duplicates after the tokio upgrade

* Use C: drive for CI build output on Windows

GitHub Actions uses a Windows image with two disk drives, and the
default D: drive is smaller than the C: drive. Zebra currently uses a
lot of space to build, so it has to use the C: drive to avoid CI build
failures because of insufficient space.

Co-authored-by: teor <teor@riseup.net>
2021-11-02 18:46:57 +00:00
Conrado Gouvea 84f2c07fbc
Ignore AlreadyInChain error in the syncer (#2890)
* Ignore AlreadyInChain error in the syncer

* Split Cancelled errors; add them to should_restart_sync exceptions

* Also filter 'block is already comitted'; try to detect a wrong downcast
2021-10-20 11:07:19 +10:00
Conrado Gouvea 1ccb2de7c7
Add transaction downloader and verifier (#2679)
* Add transaction downloader

* Changed mempool downloader to be like inbound

* Verifier working (logs result)

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Fix coinbase check for mempool, improve is_coinbase() docs

* Change other downloads.rs docs to reflect the mempool downloads.rs changes

* Change TIMEOUTs to downloads.rs; add docs

* Renamed is_coinbase() to has_valid_coinbase_transaction_inputs() and contains_coinbase_input() to has_any_coinbase_inputs(); reorder checks

* Validate network upgrade for V4 transactions; check before computing sighash (for V5 too)

* Add block_ prefix to downloads and verifier

* Update zebra-consensus/src/transaction.rs

Co-authored-by: teor <teor@riseup.net>

* Add consensus doc; add more Block prefixes

Co-authored-by: teor <teor@riseup.net>
2021-09-02 00:06:20 +00:00
teor 306fa88214 Document the correctness of Poll::Pending wakeups 2021-03-27 08:55:49 -04:00
teor 829a6f11c5 Document the behaviour of the `select!` macro 2021-03-27 08:55:49 -04:00
teor 92d95d4be5 Refactor inbound members into a consistent order
And add download comments
2021-01-13 20:46:25 -05:00
teor 69fcf64d6c
Disable issue URLs for "duplicate hash" errors (#1517)
In our README, we tell users to ignore these errors, so we should also
disable the issue URL.

Also include the hash in the error. (We don't want the span active for
all messages, we just want the hash in the error.)
2020-12-16 08:14:42 +10:00
Henry de Valence 2a4a89c002 state,zebrad: tidy span levels for good INFO output
This provides useful and not too noisy output at INFO level.  We do an
info-level message on every block commit instead of trying to do one
message every N blocks, because this is useful both for initial block
sync as well as continuous state updates on new blocks.
2020-11-23 14:16:39 +10:00
Henry de Valence f0810b028d state,consensus,sync: shorten span lengths
These changes help reduce the size of the resulting spans, making the
output more compact.  Together they save about 30-40 characters.
2020-11-23 14:16:39 +10:00
Henry de Valence b632a24436 zebrad: add diagnostics on cancelled download tasks 2020-11-17 14:56:27 -08:00
Henry de Valence 4c960c4e6d zebrad: treat duplicate downloads as an error
We should error if we notice that we're attempting to download the same
blocks multiple times, because that indicates that peers reported bad
information to us, or we got confused trying to interpret their
responses.
2020-10-26 12:05:35 -07:00
Henry de Valence 91469faf3c zebrad: eliminate duplicate span in sync 2020-10-26 12:05:35 -07:00
Henry de Valence 1d7309afe2 zebrad: correctly handle duplicates in DownloadSet
Using the cancel_handles, we can deduplicate requests.  This is
important to do, because otherwise when we insert the second cancel
handle, we'd drop the first one, cancelling an existing task for no
reason.
2020-10-26 12:05:35 -07:00
Henry de Valence 12d25159c6 zebrad: use hedged requests in sync
The hedge middleware implements hedged requests, as described in _The
Tail At Scale_. The idea is that we auto-tune our retry logic according
to the actual network conditions, pre-emptively retrying requests that
exceed some latency percentile. This would hopefully solve the problem
where our timeouts are too long on mainnet and too slow on testnet.
2020-10-26 12:05:35 -07:00
Henry de Valence 5f229d1475 zebrad: use Downloads in sync
Try to use the better cancellation logic to revert to previous sync
algorithm.  As designed, the sync algorithm is supposed to proceed by
downloading state prospectively and handle errors by flushing the
pipeline and starting over.  This hasn't worked well, because we didn't
previously cancel tasks properly.  Now that we can, try to use something
in the spirit of the original sync algorithm.
2020-10-26 12:05:35 -07:00
Henry de Valence b90581a3d7 zebrad: create a Downloads Stream for syncing.
This makes two changes relative to the existing download code:

1.  It uses a oneshot to attempt to cancel the download task after it
    has started;

2.  It encapsulates the download creation and cancellation logic into a
    Downloads struct.
2020-10-26 12:05:35 -07:00