Commit Graph

99 Commits

Author SHA1 Message Date
Za Wilcox 3bef54b764
change(consensus): Refactor production code for network consensus rules to `Network` methods (#8340)
* begin refactor suggested as "step 2": https://github.com/ZcashFoundation/zebra/issues/7968#issue-2003245309
Squashed from multiple commits to enable partial rebase

* break out more little traits

* add activation implementation leveraging From<Network> for lrz::cons::

* for transfer of ownership I cannot return a type that's owned by the method

* hrp_sapling_extended_full_viewing_key

* complete implementation of interface of Parameters on Network reuse Parameters on zcash Network where possible

* move doc-comments to trait declarations (from impls)

* Simplify/complete Parameters impl for Network

* Add checkpoint_list method, move documentation, etc

* move last match network to inside network method

* add back comment

* use zcash_address for parameter types in zebra-chain

* use inherent methods instead of big parameters passthrough

* revert to implementation of From on zcash_primitives::..::Network vs &zcash_prim...

* move match

* add test to block maximum time rule

* update changelog

* finish porting target_difficutly_limit

* remove obscelete code comment

* revert non-functional change

* finish migrating target_difficulty_limit, checkpoint_list

* update changelog

---------

Co-authored-by: Hazel OHearn <gygaxis@zingolabs.org>
2024-03-12 21:41:44 +00:00
teor 2ac6921d60
feat(mine): Add an internal Zcash miner to Zebra (#8136)
* Patch equihash to use the solver branch

* Add an internal-miner feature and set up its dependencies

* Remove 'Experimental' from mining RPC docs

* Fix a nightly clippy::question_mark lint

* Move a byte array utility function to zebra-chain

* fixup! Add an internal-miner feature and set up its dependencies

* Add an equihash::Solution::solve() method with difficulty checks

* Check solution is valid before returning it

* Add a TODO to check for peers before mining

* Move config validation into GetBlockTemplateRpcImpl::new()

* fixup! fixup! Add an internal-miner feature and set up its dependencies

* Use the same generic constraints for GetBlockTemplateRpcImpl struct and impls

* Start adding an internal miner component

* Add the miner task to the start command

* Add basic miner code

* Split out a method to mine one block

* Spawn to a blocking thread

* Wait until a valid template is available

* Handle shutdown

* Run mining on low priority threads

* Ignore some invalid solutions

* Use a difference nonce for each solver thread

* Update TODOs

* Change the patch into a renamed dependency to simplify crate releases

* Clean up instrumentation and TODOs

* Make RPC instances cloneable and clean up generics

* Make LongPollId Copy so it's easier to use

* Add API to restart mining if there's a new block template

* Actually restart mining if there's a new block template

* Tidy instrumentation

* fixup! Move config validation into GetBlockTemplateRpcImpl::new()

* fixup! Make RPC instances cloneable and clean up generics

* Run the template generator and one miner concurrently

* Reduce logging

* Fix a bug in getblocktemplate RPC tip change detection

* Work around some watch channel change bugs

* Rate-limit template changes in the receiver

* Run one mining solver per available core

* Use updated C code with double-free protection

* Update to the latest solver branch

* Return and submit all valid solutions

* Document what INPUT_LENGTH means

* Fix watch channel change detection

* Don't return early when a mining task fails

* Spawn async miner tasks to avoid cooperative blocking, deadlocks, and improve shutdown responsiveness

* Make existing parallelism docs and configs consistent

* Add a mining parallelism config

* Use the minimum of the configured or available threads for mining

* Ignore optional feature fields in tests

* Downgrade some frequent logs to debug

* Document new zebrad features and tasks

* Describe the internal-miner feature in the CHANGELOG

* Update dependency to de-duplicate equihash solutions

* Use futures::StreamExt instead of TryStreamExt

* Fix a panic message typo
2024-01-11 14:41:01 +00:00
dependabot[bot] f21d7c6934
build(deps): bump the prod group with 6 updates (#8125)
* build(deps): bump the prod group with 6 updates

Bumps the prod group with 6 updates:

| Package | From | To |
| --- | --- | --- |
| [futures](https://github.com/rust-lang/futures-rs) | `0.3.29` | `0.3.30` |
| [tokio](https://github.com/tokio-rs/tokio) | `1.35.0` | `1.35.1` |
| [metrics](https://github.com/metrics-rs/metrics) | `0.21.1` | `0.22.0` |
| [metrics-exporter-prometheus](https://github.com/metrics-rs/metrics) | `0.12.2` | `0.13.0` |
| [reqwest](https://github.com/seanmonstar/reqwest) | `0.11.22` | `0.11.23` |
| [owo-colors](https://github.com/jam1garner/owo-colors) | `3.5.0` | `4.0.0` |


Updates `futures` from 0.3.29 to 0.3.30
- [Release notes](https://github.com/rust-lang/futures-rs/releases)
- [Changelog](https://github.com/rust-lang/futures-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/futures-rs/compare/0.3.29...0.3.30)

Updates `tokio` from 1.35.0 to 1.35.1
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.35.0...tokio-1.35.1)

Updates `metrics` from 0.21.1 to 0.22.0
- [Changelog](https://github.com/metrics-rs/metrics/blob/main/release.toml)
- [Commits](https://github.com/metrics-rs/metrics/compare/metrics-v0.21.1...metrics-v0.22.0)

Updates `metrics-exporter-prometheus` from 0.12.2 to 0.13.0
- [Changelog](https://github.com/metrics-rs/metrics/blob/main/release.toml)
- [Commits](https://github.com/metrics-rs/metrics/compare/metrics-exporter-prometheus-v0.12.2...metrics-exporter-prometheus-v0.13.0)

Updates `reqwest` from 0.11.22 to 0.11.23
- [Release notes](https://github.com/seanmonstar/reqwest/releases)
- [Changelog](https://github.com/seanmonstar/reqwest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/seanmonstar/reqwest/compare/v0.11.22...v0.11.23)

Updates `owo-colors` from 3.5.0 to 4.0.0
- [Commits](https://github.com/jam1garner/owo-colors/compare/v3.5.0...v4.0.0)

---
updated-dependencies:
- dependency-name: futures
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod
- dependency-name: metrics
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod
- dependency-name: metrics-exporter-prometheus
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: prod
- dependency-name: reqwest
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod
- dependency-name: owo-colors
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: prod
...

Signed-off-by: dependabot[bot] <support@github.com>

* update all metric macros

* fix deprecated function

* fix duplicated deps

* Fix an incorrect gauge method call

* Expand documentation and error messages for best chain length

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2024-01-02 01:26:54 +00:00
teor 8c717c92dd
change(scan): Create a scanner storage database, but don't use it yet (#8031)
* Create an empty storage/db module

* Use ephemeral storage in tests

* Populate storage inside new() method

* Move scanner setup into an init() method

* Pass the network to scanner init

* Create a database but don't actually use it

* Skip shutdown format checks when skipping format upgrades

* Allow the scanner to skip launching format upgrades in production

* Refactor skipping format upgrades so it is consistent

* Allow checking configs for equality

* Restore Network import
2023-11-30 12:59:15 +00:00
teor 628b3e39af
fix(net): Add outer timeouts for critical network operations to avoid hangs (#7869)
* Refactor out try_to_sync_once()

* Add outer timeouts for obtaining and extending tips

* Refactor out request_genesis_once()

* Wrap genesis download once in a timeout

* Increase the genesis timeout to avoid denial of service from old nodes

* Add an outer timeout to mempool crawls

* Add an outer timeout to mempool download/verify

* Remove threaded mutex blocking from the inbound service

* Explain why inbound readiness never hangs

* Fix whitespace that cargo fmt doesn't

* Avoid hangs by always resetting the past lookahead limit flag

* Document block-specific and syncer-wide errors

* Update zebrad/src/components/sync.rs

Co-authored-by: Marek <mail@marek.onl>

* Use correct condition for log messages

Co-authored-by: Marek <mail@marek.onl>

* Keep lookahead reset metric

---------

Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: Marek <mail@marek.onl>
2023-11-02 15:00:18 +00:00
Alfredo Garcia 0cc48a322a
fix(docs): docs after new rust version (#7375)
* fix docs build

* fix docs build errors in sapling trees

* fix docs build in sprout joinsplits

* fix doc build in handshake

* fix docs build in zebra-state

* fix docs build in zebrad

* new line fix
2023-08-24 11:31:10 +00:00
teor e6c3b87872
Stop panicking on shutdown in the syncer and network init (#7104) 2023-07-02 20:08:11 +00:00
teor f6641eaaee
cleanup(gossip): Use a separate named constant for the gossip interval (#6704)
* Use a named consttant for the gossip interval

* Update tests

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2023-05-22 19:21:53 +00:00
Alfredo Garcia 58bd898f5b
feat(zebrad): Refuse to run zebrad when release is too old (#6351)
* refuse to run Zebra if it is too old

* update the release checklist to consider the constants

* bring newline back

* apply new end of support code

* attempt to add tests (not working yet)

* move eos to progress task

* move tests

* add acceptance test (not working)

* fix tests

* change to block height checks (ugly code)

* change warn days

* refactor estimated blocks per day, etc

* move end of support code to its own task

* change test

* fix some docs

* move constants

* remove uneeded conversions

* downgrade tracing

* reduce end of support time, fix ci changing debugs to info again

* update instructions

* add failure messages

* cargo lock update

* unify releaase name constant

* change info msg

* clippy fixes

* add a block explorer

* ignore testnet in end of support task

* change panic to 16 weeks

* add some documentation about end of support

* Tweak docs wording

---------

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2023-04-28 14:13:21 +00:00
Marek 2a48d4cf25
change(chain): Refactor the handling of height differences (#6330)
* Unify the `impl`s of `Sub` and `Add` for `Height`

* Adjust tests for `Height` subtraction

* Use `Height` instead of `i32`

* Use `block:Height` in RPC tests

* Use `let .. else` statement

Co-authored-by: Arya <aryasolhi@gmail.com>

* Update zebra-consensus/src/block/subsidy/general.rs

* Refactor the handling of height differences

* Remove a redundant comment

* Update zebrad/src/components/sync/progress.rs

Co-authored-by: Arya <aryasolhi@gmail.com>

* Update progress.rs

* impl TryFrom<u32> for Height

* Make some test assertions clearer

* Refactor estimate_up_to()

* Restore a comment that was accidentally removed

* Document when estimate_distance_to_network_chain_tip() returns None

* Change HeightDiff to i64 and make Height.sub(Height) return HeightDiff (no Option)

* Update chain tip estimates for HeightDiff i64

* Update subsidy for HeightDiff i64

* Fix some height calculation test edge cases

* Fix the funding stream interval calculation

---------

Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: teor <teor@riseup.net>
2023-03-29 23:06:31 +00:00
Arya 571cbfba7a
change(state): Stop re-downloading blocks that are in non-finalized side chains (#6335)
* Adds 'Contains' request in state, and:

- adds finalized block hashes to sent_blocks

- replaces Depth call with Contains in sync, inbound, and block verifier

* removes unnecessary From impl

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* updates references to Request::Contains

* Renames zs::response::BlockLocation to KnownBlocks

* Updates AlreadyInChain error

* update docs for sent_hashes.add_finalized

* Update zebra-consensus/src/block.rs

Co-authored-by: teor <teor@riseup.net>

* Update comment for `sent_blocks` field in state service

* update KnownBlock request to check the non-finalized state before responding that a block is in the queue

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* update references to renamed method

* Clear sent_blocks when there's a reset

* Move self.finalized_block_write_sender.is_none() to can_fork_chain_at

* revert changes related to checking queue

---------

Co-authored-by: teor <teor@riseup.net>
2023-03-24 07:10:07 +00:00
Arya 3cbee9465a
change(rpc): Add proposal capability to getblocktemplate (#5870)
* adds ValidateBlock request to state

* adds `Request` enum in block verifier

skips solution check for BlockProposal requests

calls CheckBlockValidity instead of Commit block for BlockProposal requests

* uses new Request in references to chain verifier

* adds getblocktemplate proposal mode response type

* makes getblocktemplate-rpcs feature in zebra-consensus select getblocktemplate-rpcs in zebra-state

* Adds PR review revisions

* adds info log in CheckBlockProposalValidity

* Reverts replacement of match statement

* adds `GetBlockTemplate::capabilities` fn

* conditions calling checkpoint verifier on !request.is_proposal

* updates references to validate_and_commit_non_finalized

* adds snapshot test, updates test vectors

* adds `should_count_metrics` to NonFinalizedState

* Returns an error from chain verifier for block proposal requests below checkpoint height

adds feature flags

* adds "proposal" to GET_BLOCK_TEMPLATE_CAPABILITIES_FIELD

* adds back block::Request to zebra-consensus lib

* updates snapshots

* Removes unnecessary network arg

* skips req in tracing intstrument for read state

* Moves out block proposal validation to its own fn

* corrects `difficulty_threshold_is_valid` docs

adds/fixes some comments, adds TODOs

general cleanup from a self-review.

* Update zebra-state/src/service.rs

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Update zebra-rpc/src/methods/get_block_template_rpcs.rs

Co-authored-by: teor <teor@riseup.net>

* check best chain tip

* Update zebra-state/src/service.rs

Co-authored-by: teor <teor@riseup.net>

* Applies cleanup suggestions from code review

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2023-01-11 23:39:51 +00:00
teor 4078e244d3
fix(lint): Box large error types to resolve the clippy large result err variant lint (#5759)
* Box errors to deal with large error warnings, add accessor methods for error properties

* Remove or explain some large enum variant lints

* Turn some tickets into TODOs

* Allow missing docs on single-field error enum variants

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-12-08 06:11:33 +00:00
teor c4fad29824
fix(sync): Pause new downloads when Zebra reaches the lookahead limit (#5561)
* Use correct release for getblocktemplate config

* Include at least 2 full checkpoints in the lookahead limit

* Increase full sync timeout to 36 hours

* Only log "synced block height too far ahead of the tip" once

* Replace AboveLookaheadHeightLimit error with pausing the syncer

* Use AboveLookaheadHeightLimit for blocks a very long way from the tip

* Also add the getblocktemplate config, and fix the test message

* Remove an outdated TODO comment

* Allow syncing again when a small number of blocks are in the queue

* Allow some dead code
2022-11-09 04:42:04 +00:00
teor 4a8349fa8a
fix(sync): Make the syncer ignore some new block verification errors (#5537)
* Fix error text for state service for syncer

* Fix error handling in syncer

* cargo fmt --all

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-11-04 06:57:45 +00:00
teor 34313b8857
fix(build): Disable new Rust 1.65 lints and fix some test errors (#5549)
* allow(clippy::result_large_err)

* Increase the async executor delay expected by tests

* Split getblocktemplate-rpcs OS tests into a separate matrix job

* Add new patch jobs

* allow(unknown_lints)
2022-11-04 02:00:56 +00:00
teor c812f880cf
cleanup(clippy): Use inline format strings (#5489)
* Inline format strings using an automated clippy fix

```sh
cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args
cargo fmt --all
```

* Remove unused & and &mut using an automated clippy fix

```sh
cargo clippy --fix --all-features --all-targets -- -A clippy::all -W clippy::uninlined_format_args
```
2022-10-27 13:25:18 +00:00
Alfredo Garcia 868ba1325e
refactor(config): Move configs to components (#5460)
* move configs to componenets

* remove trailing slash

* remove link from function behind feature

* fix config call in metrics
2022-10-24 23:39:00 +00:00
teor fd517ee4a7
Make timeouts shorter to reduce full sync times (#5397) 2022-10-14 20:20:24 +00:00
Arya a28350e742
change(state): Write non-finalized blocks to the state in a separate thread, to avoid network and RPC hangs (#5257)
* Add a new block commit task and channels, that don't do anything yet

* Add last_block_hash_sent to the state service, to avoid database accesses

* Update last_block_hash_sent regardless of commit errors

* Rename a field to StateService.max_queued_finalized_height

* Commit finalized blocks to the state in a separate task

* Check for panics in the block write task

* Wait for the block commit task in tests, and check for errors

* Always run a proptest that sleeps once

* Add extra debugging to state shutdowns

* Work around a RocksDB shutdown bug

* Close the finalized block channel when we're finished with it

* Only reset state queue once per error

* Update some TODOs

* Add a module doc comment

* Drop channels and check for closed channels in the block commit task

* Close state channels and tasks on drop

* Remove some duplicate fields across StateService and ReadStateService

* Try tweaking the shutdown steps

* Update and clarify some comments

* Clarify another comment

* Don't try to cancel RocksDB background work on drop

* Fix up some comments

* Remove some duplicate code

* Remove redundant workarounds for shutdown issues

* Remode a redundant channel close in the block commit task

* Remove a mistaken `!force` shutdown condition

* Remove duplicate force-shutdown code and explain it better

* Improve RPC error logging

* Wait for chain tip updates in the RPC tests

* Wait 2 seconds for chain tip updates before skipping them

* Remove an unnecessary block_in_place()

* Fix some test error messages that were changed by earlier fixes

* Expand some comments, fix typos

Co-authored-by: Marek <mail@marek.onl>

* Actually drop children of failed blocks

* Explain why we drop descendants of failed blocks

* Clarify a comment

* Wait for chain tip updates in a failing test on macOS

* Clean duplicate finalized blocks when the non-finalized state activates

* Send an error when receiving a duplicate finalized block

* Update checkpoint block behaviour, document its consensus rule

* Wait for chain tip changes in inbound_block_height_lookahead_limit test

* Wait for the genesis block to commit in the fake peer set mempool tests

* Disable unreliable mempool verification check in the send transaction test

* Appease rustfmt

* Use clear_finalized_block_queue() everywhere that blocks are dropped

* Document how Finalized and NonFinalized clones are different

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <teor@riseup.net>

* updates comments, renames send_process_queued, other minor cleanup

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* sends non-finalized blocks to the block write task

* passes ZebraDb to commit_new_chain, commit_block, and no_duplicates_in_finalized_chain instead of FinalizedState

* updates comments, renames send_process_queued, other minor cleanup

* Update zebra-state/src/service/write.rs

Co-authored-by: teor <teor@riseup.net>

* update assert_block_can_be_validated comment

* removes `mem` field from StateService

* removes `disk` field from StateService and updates block_iter to use `ZebraDb` instead of the finalized state

* updates tests that use the disk to use read_service.db instead

* moves best_tip to a read fn and returns finalized & non-finalized states from setup instead of the state service

* changes `contextual_validity` to get the network from the finalized_state instead of another param

* swaps out StateService with FinalizedState and NonFinalizedState in tests

* adds NotReadyToBeCommitted error and returns it from validate_and_commit when a blocks parent hash is not in any chain

* removes NonFinalizedWriteCmd and calls, moves update_latest_channels above rsp_tx.send

* makes parent_errors_map an indexmap

* clears non-finalized block queue when the receiver is dropped and when the StateService is being dropped

* removes duplicate field definitions on StateService that were a result of a bad merge

* update NotReadyToBeCommitted error message

* Appear rustfmt

* Fix doc links

* Rename a function to initial_contextual_validity()

* Do error tasks on Err, and success tasks on Ok

* Simplify parent_error_map truncation

* Rewrite best_tip() to use tip()

* Rename latest_mem() to latest_non_finalized_state()

```sh
fastmod latest_mem latest_non_finalized_state zebra*
cargo fmt --all
```

* Simplify latest_non_finalized_state() using a new WatchReceiver API

* Expand some error messages

* Send the result after updating the channels, and document why

* wait for chain_tip_update before cancelling download in mempool_cancel_mined

* adds `sent_non_finalized_block_hashes` field to StateService

* adds batched sent_hash insertions and checks sent hashes in queue_and_commit_non_finalized before adding a block to the queue

* check that the `curr_buf` in SentHashes is not empty before pushing it to the `sent_bufs`

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* Fix rustfmt

* Check for finalized block heights using zs_contains()

* adds known_utxos field to SentHashes

* updates comment on SentHashes.add method

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

* return early when there's a duplicate hash in QueuedBlocks.queue instead of panicking

* Make finalized UTXOs near the final checkpoint available for full block verification

* Replace a checkpoint height literal with the actual config

* Update mainnet and testnet checkpoints - 7 October 2022

* Fix some state service init arguments

* Allow more lookahead in the downloader, but less lookahead in the syncer

* Add the latest config to the tests, and fix the latest config check

* Increase the number of finalized blocks checked for non-finalized block UTXO spends

* fix(log): reduce verbose logs for block commits (#5348)

* Remove some verbose block write channel logs

* Only warn about tracing endpoint if the address is actually set

* Use CloneError instead of formatting a non-cloneable error

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>

* Increase block verify timeout

* Work around a known block timeout bug by using a shorter timeout

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Marek <mail@marek.onl>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-10-11 19:25:45 +00:00
teor 87f4308caf
fix(sync): Temporarily set full verification concurrency to 30 blocks (#4726)
* Return the maximum checkpoint height from the chain verifier

* Return the verified block height from the sync downloader

* Track the verified height in the syncer

* Use a lower concurrency limit during full verification

* Get the tip from the state before the first verified block

* Limit the number of submitted download and verify blocks in a batch

* Adjust lookahead limits when transitioning to full verification

* Keep unused extra hashes and submit them to the downloader later

* Remove redundant verified_height and state_tip()

* Split the checkpoint and full verify concurrency configs

* Decrease full verification concurrency to 5 blocks

10 concurrent blocks causes 3 minute stalls on some blocks on my machine.
(And it has about 4x as many cores as a standard machine.)

* cargo +stable fmt --all

* Remove a log that's verbose with smaller lookahead limits

* Apply the full verify concurrency limit to the inbound service

* Add a summary of the config changes to the CHANGELOG

* Increase the default full verify concurrency limit to 30
2022-07-06 10:13:57 -04:00
teor d4b9353d67
feat(log): Show the current network upgrade in progress logs (#4694)
* Improve time logging using humantime

* Only log full seconds, ignore the fractional part

* Move humantime_seconds to tracing::fmt

* Move the progress task to its own module

* Add missing humantime dependency

* Log the network upgrade in progress logs

* Log when Zebra verifies the final checkpoint
2022-06-28 02:51:41 +00:00
teor 49cda21f37
Slightly increase some syncer defaults (#4679)
This is a partial revert of PR #4670.
2022-06-24 00:58:49 +00:00
teor c75a68e655
fix(sync): change default sync config to improve reliability (#4670)
* Decrease the default lookahead limit to 400

* Increase the block verification timeout to 10 minutes

* Halve the default concurrent downloads config

* Try to run the spawned download task before queueing the next download

* Allow verification to be cancelled if the verifier is busy
2022-06-22 18:17:21 +00:00
teor 3a7c2c8926
Replace the lookahead limit panic with a warning (#4662)
And decrease the minimum to 400 blocks
2022-06-21 11:07:32 +00:00
teor ca0520b2e8
change(deps): Upgrade tracing-subscriber and related dependencies (#4517)
* Upgrade tracing and related dependencies

```sh
cargo upgrade --workspace
tracing-error
tracing-subscrber

color-eyre

tracing-flame
tracing-journald

sentry
sentry-tracing

metrics
metrics-exporter-prometheus
reqwest
```

* Update duplicate dependency checks

* Enable the tracing/env-filter feature

* Fix type inference for metrics

Manual changes, plus:
```sh
fastmod "as _" "as f64"
```

* Tidy up some unrelated test code

* Update metrics-exporter-prometheus API

And make unused dependencies optional.

* Adjust test regexes to new tracing format

Also fix some regex bugs, and refactor to simplify.

* Disable color-eyre span traces and track caller in release builds

* Add a feature that enables extra debugging in release builds

* Clean up some redundant features

* Increase a test timeout
2022-06-01 13:53:51 +10:00
teor fee10ae014
Add height and hash info to syncer errors (#4287) 2022-05-11 06:51:06 +00:00
Janito Vaqueiro Ferreira Filho 79d58285fb
Increase block validation timeouts (#4156)
* Increase UTXO lookup timeout

Avoid block validation failures because UTXOs aren't available on time.

* Increase the block verification timeout

Attempt to reduce the synchronization restarts and consequently improve
performance.

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-05-05 00:01:12 +00:00
teor 56f766f9b8
fix(sync): fix testnet syncer loop on large Orchard blocks (#4286)
* Return BlockDownloadVerifyError from download_and_verify

* Check block requests and genesis for temporary errors

* Ignore DuplicateBlockQueuedForDownload as a temporary error

* Propagate error info to the syncer main loop

* Sleep after temporary genesis download and verify errors
2022-05-04 22:04:34 +00:00
teor d476c18339
fix(sync): Add an extra block retry, to speed up the initial sync (#4185)
* Increase the block download retry limit to 3

* Remove an obsolete timing check from the tests

We cancel all block downloads for each sync restart,
so we don't need to wait for the maximum hedged download time.
2022-04-26 16:28:09 +00:00
Dimitris Apostolou a0c65181cb
Fix typos (#3956)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-03-28 10:13:37 +10:00
teor 9d1702e93e
Make sync error logs more user-friendly (#3944)
- use info level, there is nothing the user needs to do,
  particularly for a single error
- explain that the errors are temporary
- hide backtraces, because they look like crashes
2022-03-26 02:28:38 +00:00
Janito Vaqueiro Ferreira Filho 78080d88d4
fix(sync): prevent synchronizer loop when very close to tip (#3854)
* Refactor to split `ChainSync::sync` method in two

Replace the use of loop labels and `continue` for control flow, and use
early return from a separate method instead. This also allows removing
the `started_once` flag.

* Refactor to create `handle_block_response` helper

Reduce duplicate code and make the main synchronization methods a little
more concise to improve readability.

* Only cancel downloads in case of error

Leave active downloads running if the tips have been exhausted, because
it could have reached the chain tip.
2022-03-18 00:31:12 +00:00
teor a4dd3b7396
4. Avoid repeated requests to peers after partial responses or errors (#3505)
* fix(network): split synthetic NotFoundRegistry from message NotFoundResponse

* docs(network): Improve `notfound` message documentation

* refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender

```
fastmod MustUseOneshotSender MustUseClientResponseSender zebra*
```

* docs(network): fix a comment typo

* refactor(network): remove generics from MustUseClientResponseSender

* refactor(network): add an inventory collector to Client, but don't use it yet

* feat(network): register missing peer responses as missing inventory

We register this missing inventory based on peer responses,
or connection errors or timeouts.

Inbound message inventory tracking requires peers to send `notfound` messages.
But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages.
This missing inventory tracking works regardless of peer `notfound` messages.

* refactor(network): rename ResponseStatus to InventoryResponse

```sh
fastmod ResponseStatus InventoryResponse zebra*
```

* refactor(network): rename InventoryStatus::inner() to to_inner()

* fix(network): remove a redundant runtime.enter() in a test

* doc(network): the exact time used to filter outbound peers doesn't matter

* fix(network): handle block requests slightly more efficiently

* doc(network): fix a typo

* fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse

* doc(test): clarify some test comments

* test(network): test synthetic notfound from connection errors and peer inventory routing

* test(network): improve inbound test diagnostics

* feat(network): add a proptest-impl feature to zebra-network

* feat(network): add a test-only connect_isolated_with_inbound function

* test(network): allow a response on the isolated peer test connection

* test(network): fix failures in test synthetic notfound

* test(network): Simplify SharedPeerError test assertions

* test(network): test synthetic notfound from partially successful requests

* test(network): MissingInventoryCollector ignores local NotFoundRegistry errors

* fix(network): decrease the inventory rotation interval

This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block.

Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit.
This should speed up syncing near the tip, and on testnet.

* fmt(network): cargo fmt --all

* cleanup(network): remove unnecessary allow(dead_code)

* cleanup(network): stop importing the whole sync module into tests

* doc(network): clarify syncer inventory retry constraint

* doc(network): add a TODO for a fix to ensure API behaviour remains consistent

* doc(network): fix a function doc typo

* doc(network): clarify how we handle peers that don't send `notfound`

* docs(network): clarify a test comment

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 01:44:33 +00:00
Alfredo Garcia e5b5ea5889
feat(log): log the state tip height as part of sync progress logs (#3437)
* feat(log): log the state tip height as part of sync progress logs

* fix(log): downgrade some verbose state logs to debug

* feat(log): log successful gossiped block verification at info level

These logs help us diagnose slow progress near the tip.

There won't be very many of these logs,
because they only happen near the tip.

* fix(log): spawn top-level tasks within the global Zebra tracing span

* fix(log): spawn blocking top-level tasks within the global Zebra tracing span

Co-authored-by: teor <teor@riseup.net>
2022-01-28 19:12:19 -03:00
Alfredo Garcia 3c1ba59001
Reduce log level of components (#3418)
* reduce log level of components

* revert some log downgrades

* dedupe log
2022-01-28 14:24:53 -03:00
teor d076b999f3
Fix syncer download order and add sync tests (#3168)
* Refactor so that RetryLimit::Future is std::marker::Sync

* Make the syncer future std::marker::Send by spawning tips futures

* Download synced blocks in chain order, not HashSet order

* Improve MockService failure messages

* Add closure-based responses to the MockService API

* Move MockChainTip to zebra-chain

* Add a MockChainTipSender type alias

* Support MockChainTip in ChainSync and its downloader

* Add syncer tests for obtain tips, extend tips, and wrong block hashes

* Add block too high tests for obtain tips and extend tips

* Add syncer tests for duplicate FindBlocks response hashes

* Allow longer request delays for mocked services in syncer tests
2022-01-11 14:11:35 -03:00
Janito Vaqueiro Ferreira Filho b71833292d
Use `MockedClientHandle` in other tests (#3241)
* Move `MockedClientHandle` to `peer` module

It's more closely related to a `Client` than the `PeerSet`, and this
prepares it to be used by other tests.

* Rename `MockedClientHandle` to `ClientTestHarness`

Reduce confusion, and clarify that the client is not mocked.

Co-authored-by: teor <teor@riseup.net>

* Add clarification to `mock_peers` documentation

Explicitly say how the generated data is returned.

* Rename method to `wants_connection_heartbeats`

The `Client` service only represents one direction of a connection, so
`is_connected` is not the exact term.

Co-authored-by: teor <teor@riseup.net>

* Mock `Client` instead of `LoadTrackedClient`

Move where the conversion from mocked `Client` to mocked
`LoadTrackedClient` in order to make the test helper more easily used by
other tests.

* Use `ClientTestHarness` in `initialize` tests

Replace the boilerplate code to create a fake `Client` instance with
usages of the `ClientTestHarness` constructor.

* Allow receiving requests from `Client` instance

Create a helper type to wrap the result, to make it easier to assert on
specific events after trying to receive a request.

* Allow inspecting the current error in the slot

Share the `ErrorSlot` between the `Client` and the handle, so that the
handle can be used to inspect the contents of the `ErrorSlot`.

* Allow placing an error into the `ErrorSlot`

Assuming it is initially empty. If it already has an error, the code
will panic.

* Allow gracefully closing the request receiver

Close the endpoint with the appropriate call to the `close()` method.

* Allow dropping the request receiver endpoint

Forcefully closes the endpoint.

* Rename field to `client_request_receiver`

Also rename the related methods to include
`outbound_client_request_receiver` to make it more precise.

Co-authored-by: teor <teor@riseup.net>

* Allow dropping the heartbeat shutdown receiver

Allows the `Client` to detect that the channel has been closed.

* Rename fn. to `drop_heartbeat_shutdown_receiver`

Make it clear that it affects the heartbeat task.

Co-authored-by: teor <teor@riseup.net>

* Move `NowOrLater` into a new `now-or-later` crate

Make it easily accessible to other crates.

* Add `IsReady` extension trait for `Service`

Simplifies checking if a service is immediately ready to be called.

* Add extension method to check for readiness error

Checks if the `Service` isn't immediately ready because a call to
`ready` immediately returns an error.

* Rename method to `is_failed`

Avoid negated method names.

Co-authored-by: teor <teor@riseup.net>

* Add a `IsReady::is_pending` extension method

Checks if a `Service` is not ready to be called.

* Use `ClientTestHarness` in `Client` test vectors

Reduce repeated code and try to improve readability.

* Create a new `ClientTestHarnessBuilder` type

A builder to create test `Client` instances using mock data which can be
tracked and manipulated through a `ClientTestHarness`.

* Allow configuring the `Client`'s mocked version

Add a `with_version` builder method.

* Use `ClientTestHarnessBuilder` in `PeerVersions`

Use the builder to set the peer version, so that the `version` parameter
can be removed from the constructor later.

* Use a default mock version where possible

Reduce noise when setting up the harness for tests that don't really
care about the remote peer version.

* Remove `Version` parameter from the `build` method

The `with_version` builder method should be used instead.

* Fix some typos and outdated info in the release checklist

* Add extra client tests for zero and multiple readiness checks (#3273)

And document existing tests.

* Replace `NowOrLater` with `futures::poll!` (#3272)

* Replace NowOrLater with the futures::poll! macro in zebrad

* Replace NowOrLater with the futures::poll! macro in zebra-test

* Remove the now-or-later crate

* remove unused imports

* rustfmt

Co-authored-by: teor <teor@riseup.net>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-12-22 06:13:26 +10:00
teor 6cbd7dce43
Fix task handling bugs, so peers are more likely to be available (#3191)
* Tweak crawler timings so peers are more likely to be available

* Tweak min peer connection interval so we try all peers

* Let other tasks run between fanouts, so we're more likely to choose different peers

* Let other tasks run between retries, so we're more likely to choose different peers

* Let other tasks run after peer crawler DemandDrop

This makes it more likely that peers will become ready.
2021-12-20 09:02:31 +10:00
teor a4d1a1801c
Security: Drop blocks that are a long way ahead of the tip (#3167)
* Document the chain verifier

* Drop gossiped blocks that are too far ahead of the tip

* Add extra gossiped block metrics

* Allow extra gossiped blocks, now we have a stricter limit

* Fix a comment

* Check the exact number of blocks in a downloaded block response

* Drop synced blocks that are too far ahead of the tip

* Add extra synced block metrics

* Test dropping gossiped blocks that are too far ahead of the tip

* Allow an extra checkpoint's worth of blocks in the verifier queues

* Actually let's try two extra checkpoints

* Scale extra height limit with lookahead limit

* Also drop blocks that are behind the finalized tip

* Downgrade a noisy log

* Use a debug log for already verified gossiped blocks

* Use debug logs for already verified synced blocks
2021-12-17 13:31:51 -03:00
teor a92c431c03
Ignore NotFound errors in the syncer (#3131) 2021-12-02 11:28:20 -03:00
teor c85ea18b43
Fix slow Zebra startup times, to reduce CI failures (#3104)
* Tweak a log message

* Only retry failed DNS once, then use the other DNS responses

* Limit broadcasts to half the peers

* Use a longer minimum interval for GetAddr requests

* Reduce the syncer and mempool crawler fanouts

* Stop resetting the mempool twice when it starts up

This spawns two crawlers, which send two fanouts,
so it can use up a lot of peers.

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-11-30 21:04:32 +00:00
teor 375a997d2f
Stop downloading unnecessary blocks in Zebra acceptance tests (#3072)
* Implement graceful shutdown for the peer set

* Use the minimum lookahead limit in acceptance tests

* Enable a doctest that compiles with newly public modules
2021-11-19 01:55:38 +00:00
Janito Vaqueiro Ferreira Filho 0960e4fb0b
Update to Tokio 1.13.0 (#2994)
* Update `tower` to version `0.4.9`

Update to latest version to add support for Tokio version 1.

* Replace usage of `ServiceExt::ready_and`

It was deprecated in favor of `ServiceExt::ready`.

* Update Tokio dependency to version `1.13.0`

This will break the build because the code isn't ready for the update,
but future commits will fix the issues.

* Replace import of `tokio::stream::StreamExt`

Use `futures::stream::StreamExt` instead, because newer versions of
Tokio don't have the `stream` feature.

* Use `IntervalStream` in `zebra-network`

In newer versions of Tokio `Interval` doesn't implement `Stream`, so the
wrapper types from `tokio-stream` have to be used instead.

* Use `IntervalStream` in `inventory_registry`

In newer versions of Tokio the `Interval` type doesn't implement
`Stream`, so `tokio_stream::wrappers::IntervalStream` has to be used
instead.

* Use `BroadcastStream` in `inventory_registry`

In newer versions of Tokio `broadcast::Receiver` doesn't implement
`Stream`, so `tokio_stream::wrappers::BroadcastStream` instead. This
also requires changing the error type that is used.

* Handle `Semaphore::acquire` error in `tower-batch`

Newer versions of Tokio can return an error if the semaphore is closed.
This shouldn't happen in `tower-batch` because the semaphore is never
closed.

* Handle `Semaphore::acquire` error in `zebrad` test

On newer versions of Tokio `Semaphore::acquire` can return an error if
the semaphore is closed. This shouldn't happen in the test because the
semaphore is never closed.

* Update some `zebra-network` dependencies

Use versions compatible with Tokio version 1.

* Upgrade Hyper to version 0.14

Use a version that supports Tokio version 1.

* Update `metrics` dependency to version 0.17

And also update the `metrics-exporter-prometheus` to version 0.6.1.
These updates are to make sure Tokio 1 is supported.

* Use `f64` as the histogram data type

`u64` isn't supported as the histogram data type in newer versions of
`metrics`.

* Update the initialization of the metrics component

Make it compatible with the new version of `metrics`.

* Simplify build version counter

Remove all constants and use the new `metrics::incement_counter!` macro.

* Change metrics output line to match on

The snapshot string isn't included in the newer version of
`metrics-exporter-prometheus`.

* Update `sentry` to version 0.23.0

Use a version compatible with Tokio version 1.

* Remove usage of `TracingIntegration`

This seems to not be available from `sentry-tracing` anymore, so it
needs to be replaced.

* Add sentry layer to tracing initialization

This seems like the replacement for `TracingIntegration`.

* Remove unnecessary conversion

Suggested by a Clippy lint.

* Update Cargo lock file

Apply all of the updates to dependencies.

* Ban duplicate tokio dependencies

Also ban git sources for tokio dependencies.

* Stop allowing sentry-tracing git repository in `deny.toml`

* Allow remaining duplicates after the tokio upgrade

* Use C: drive for CI build output on Windows

GitHub Actions uses a Windows image with two disk drives, and the
default D: drive is smaller than the C: drive. Zebra currently uses a
lot of space to build, so it has to use the C: drive to avoid CI build
failures because of insufficient space.

Co-authored-by: teor <teor@riseup.net>
2021-11-02 18:46:57 +00:00
Janito Vaqueiro Ferreira Filho 595d75d5fb
Fix synchronization delay issue (#2921)
* Create a `NowOrLater` helper type

A replacement for `FutureExt::now_or_never` that ensures that the task
is scheduled for waking up later when the inner future is ready.

* Use `NowOrLater` to fix possible delay bug

Previous usage of `now_or_never` meant that the underlying task wasn't
being scheduled to awake when the `Downloads` stream produced a new
item. Using `NowOrLater` instead fixes that issue.
2021-10-21 10:34:12 +10:00
Conrado Gouvea 84f2c07fbc
Ignore AlreadyInChain error in the syncer (#2890)
* Ignore AlreadyInChain error in the syncer

* Split Cancelled errors; add them to should_restart_sync exceptions

* Also filter 'block is already comitted'; try to detect a wrong downcast
2021-10-20 11:07:19 +10:00
Alfredo Garcia 724967d488
Send `AdvertiseTransactionIds` to peers (#2823)
* bradcast transactions to peers after they get inserted into mempool

* remove network argument from mempool init

* remove dbg left

* remove return value in mempool enable call

* rename channel sender and receiver vars

* change unwrap() to expect()

* change the channel to a hashset

* fix build

* fix tests

* rustfmt

* fix tiny space issue inside macro

Co-authored-by: teor <teor@riseup.net>

* check errors/panics in transaction gossip tests

* fix build of newly added tests

* Stop dropping the inbound service and mempool in a test

Keeping the mempool around avoids a transaction broadcast task error,
so we can test that there are no other errors in the task.

* Tweak variable names and add comments

* Avoid unexpected drops by returning a mempool guard in tests

* Use BoxError to simplify service types in tests

* Make all returned service types consistent in tests

We want to be able to change the setup without changing the tests.

Co-authored-by: teor <teor@riseup.net>
2021-10-08 08:59:46 -03:00
teor 04d2cfb3d0
Gossip recently verified block hashes to peers (#2729)
* Implement a task that gossips verified block hashes

* Log an info message for block broadcasts

* Simplify the gossip task

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>

* Re-use the old tip change if there is no new tip change

Also improve the comments.

* Add an assertion message

* Rename task join handles and futures in start method

* Add a dedicated BlockGossipError type

This type helps distinguish between syncer and state errors.

* Test that committed blocks are gossiped to peers

Also do a minor type cleanup on the existing test code,
replacing `Option<Vec<_>>` with `Vec<_>`.

* Formatting

* Remove excess newlines

Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>

* Clear the initial gossiped blocks during test setup

Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Alfredo Garcia <oxarbitrage@gmail.com>
2021-10-07 07:46:37 -03:00
Conrado Gouvea c6878d9b63
Cancel download and verify tasks when the mempool is deactivated (#2764)
* Cancel download and verify tasks when the mempool is deactivated

* Refactor enable/disable logic to use a state enum

* Add helper test functions to enable/disable the mempool

* Add documentation about errors on service calls

* Improvements from review

* Improve documentation

* Fix bug in test

* Apply suggestions from code review

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2021-09-29 09:06:40 +10:00
Janito Vaqueiro Ferreira Filho 83a2e30e33
Create a `SyncStatus` helper type (#2685)
* Create a `SyncStatus` helper type

Keeps track if the synchronizer is close to the chain tip or not.

* Refactor `ChainSync` ctor. to return `SyncStatus`

Change the constructor API so that it returns a higher level construct.

* Test if `SyncStatus` waits for the chain tip

Test if waiting for the chain tip to be reached correctly finishes when
the chain tip is reached. This is done by sending recent sync lengths to
the `SyncStatus` instance, and checking that every time a separate
`SyncStatus` instance determines it has reached the tip the original
instance wakes up.

* Add a temporary attribute to allow dead code

The code added isn't used yet, so we'll add a temporary waiver until
another PR is merged to use them.
2021-08-30 10:01:33 +10:00