zebra

Commit Graph

Author	SHA1	Message	Date
Janito Vaqueiro Ferreira Filho	4c4dbfe7cd	Reject connections from outdated peers (#2519 ) * Simplify state service initialization in test Use the test helper function to remove redundant code. * Create `BestTipHeight` helper type This type abstracts away the calculation of the best tip height based on the finalized block height and the best non-finalized chain's tip. * Add `best_tip_height` field to `StateService` The receiver endpoint is currently ignored. * Return receiver endpoint from service constructor Make it available so that the best tip height can be watched. * Update finalized height after finalizing blocks After blocks from the queue are finalized and committed to disk, update the finalized block height. * Update best non-finalized height after validation Update the value of the best non-finalized chain tip block height after a new block is committed to the non-finalized state. * Update finalized height after loading from disk When `FinalizedState` is first created, it loads the state from persistent storage, and the finalized tip height is updated. Therefore, the `best_tip_height` must be notified of the initial value. * Update the finalized height on checkpoint commit When a checkpointed block is commited, it bypasses the non-finalized state, so there's an extra place where the finalized height has to be updated. * Add `best_tip_height` to `Handshake` service It can be configured using the `Builder::with_best_tip_height`. It's currently not used, but it will be used to determine if a connection to a remote peer should be rejected or not based on that peer's protocol version. * Require best tip height to init. `zebra_network` Without it the handshake service can't properly enforce the minimum network protocol version from peers. Zebrad obtains the best tip height endpoint from `zebra_state`, and the test vectors simply use a dummy endpoint that's fixed at the genesis height. * Pass `best_tip_height` to proto. ver. negotiation The protocol version negotiation code will reject connections to peers if they are using an old protocol version. An old version is determined based on the current known best chain tip height. * Handle an optional height in `Version` Fallback to the genesis height in `None` is specified. * Reject connections to peers on old proto. versions Avoid connecting to peers that are on protocol versions that don't recognize a network update. * Document why peers on old versions are rejected Describe why it's a security issue above the check. * Test if `BestTipHeight` starts with `None` Check if initially there is no best tip height. * Test if best tip height is max. of latest values After applying a list of random updates where each one either sets the finalized height or the non-finalized height, check that the best tip height is the maximum of the most recently set finalized height and the most recently set non-finalized height. * Add `queue_and_commit_finalized` method A small refactor to make testing easier. The handling of requests for committing non-finalized and finalized blocks is now more consistent. * Add `assert_block_can_be_validated` helper Refactor to move into a separate method some assertions that are done before a block is validated. This is to allow moving these assertions more easily to simplify testing. * Remove redundant PoW block assertion It's also checked in `zebra_state::service::check::block_is_contextually_valid`, and it was getting in the way of tests that received a gossiped block before finalizing enough blocks. * Create a test strategy for test vector chain Splits a chain loaded from the test vectors in two parts, containing the blocks to finalize and the blocks to keep in the non-finalized state. * Test committing blocks update best tip height Create a mock blockchain state, with a chain of finalized blocks and a chain of non-finalized blocks. Commit all the blocks appropriately, and verify that the best tip height is updated. Co-authored-by: teor <teor@riseup.net>	2021-08-08 23:52:52 +00:00
teor	1a57023eac	Security: Use canonical SocketAddrs to avoid duplicate peer connections, Feature: Send local listener to peers (#2276 ) * Always send our local listener with the latest time Previously, whenever there was an inbound request for peers, we would clone the address book and update it with the local listener. This had two impacts: - the listener could conflict with an existing entry, rather than unconditionally replacing it, and - the listener was briefly included in the address book metrics. As a side-effect, this change also makes sanitization slightly faster, because it avoids some useless peer filtering and sorting. * Skip listeners that are not valid for outbound connections * Filter sanitized addresses Zebra based on address state This fix correctly prevents Zebra gossiping client addresses to peers, but still keeps the client in the address book to avoid reconnections. * Add a full set of DateTime32 and Duration32 calculation methods * Refactor sanitize to use the new DateTime32/Duration32 methods * Security: Use canonical SocketAddrs to avoid duplicate connections If we allow multiple variants for each peer address, we can make multiple connections to that peer. Also make sure sanitized MetaAddrs are valid for outbound connections. * Test that address books contain the local listener address Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>	2021-06-22 02:16:59 +00:00
Alfredo Garcia	96a1b661f0	Rate limit initial genesis block download retries, Credit: Equilibrium (#2255 ) * implement and test a rate limit in `request_genesis()` * add `request_genesis_is_rate_limited` test to sync * add ensure_timeouts constraint for GENESIS_TIMEOUT_RETRY * Suppress expected warning logs in zebrad tests Co-authored-by: teor <teor@riseup.net>	2021-06-09 23:39:51 +00:00
teor	b18c32f30f	Add the database format to the panic metadata (#2249 ) Seems like it might be useful as we add more stuff to the state.	2021-06-04 14:42:15 +10:00
teor	2f0f379a9e	Standardise clippy lints and require docs (#2238 ) * Standardise lints across Zebra crates, and add missing docs The only remaining module with missing docs is `zebra_test::command` * Todo -> TODO * Clarify what a transcript ErrorChecker does Also change `Error` -> `BoxError` * TransError -> ExpectedTranscriptError * Output Descriptions -> Output descriptions	2021-06-04 08:48:40 +10:00
teor	52dcaa2544	Stop ignoring lightweight git tags in panic metadata Unfortunately, Zebra's first alpha release is an annotated tag, but GitHub defaults to lightweight tags. (At least for pre-releases.)	2021-05-20 09:00:56 +10:00
teor	bcc59d11c3	Refactor metadata so git vars must be optional We don't test non-git builds, but we can use the type system to make sure they are always optional.	2021-05-20 09:00:56 +10:00
teor	b6c5ef8041	Add VERGEN_CARGO_PROFILE to the panic env vars Some panics should only happen on debug profiles.	2021-05-20 09:00:56 +10:00
teor	62f053de9e	Enable cargo env vars when there is no .git But still disable git env vars. This change requires vergen 5.1.4 or later.	2021-05-20 09:00:56 +10:00
teor	92828bbb29	Reliability: send local listener address to peers When peers ask for peer addresses, add our local listener address to the set of addresses, sanitize, then truncate. Sanitize shuffles addresses, so if there are lots of addresses in the address book, our address will only be sent to some peers.	2021-05-18 14:02:19 +10:00
teor	74e155ff9f	Spelling: gossipped -> gossiped (#2119 )	2021-05-07 13:01:11 +02:00
teor	7e2c3a2fc7	Clarify a duplicate log message	2021-04-21 23:59:29 -04:00
Kirill Fomichev	5b2f1cdfd5	Add journald support through tracing-journald (#2034 ) * Add journald support through tracing-journald * change journald to use_journald * more fixes	2021-04-22 09:31:06 +10:00
teor	96b3c94dbc	Add the new commit count and git hash to the version (#2038 ) * Use the git version + new commit count + hash for the app version This helps diagnose bugs in versions of Zebra built from git branches, rather than git version tags. * Fill in assert * Also log semver string * Fix syntax * Handle vergen using the cargo package version or raw git tag * s/Semver/SemVer/ Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>	2021-04-21 22:14:36 +00:00
teor	0203d1475a	Refactor and document correctness for std::sync::Mutex<AddressBook>	2021-04-21 17:14:47 -04:00
teor	79c0c4ec57	Stop assuming there will always be a git commit Enable builds where: * there is no google cloud git commit env var, and * there is no `.git` directory. By making all `vergen` env vars optional, and skipping any env vars that don't exist.	2021-04-20 13:48:31 -04:00
Kirill Fomichev	43e792b9a4	Update to vergen 5, add branch, commit time, and build target to the panic metadata, automatically update app version from crate version (#2029 ) * build(deps): bump vergen from 3.2.0 to 5.1.1 * fix hardcoded version for Tracing struct * add additional metadata * remove extra allocations for metadata * Remove zebrad code version from release checklist The zebrad code automatically uses the crate version now. * Sort panic metadata into rough categories Co-authored-by: teor <teor@riseup.net>	2021-04-20 06:48:14 +10:00
teor	1e4e5924ca	clippy: factor common code out of an if-else block	2021-04-14 23:16:45 -04:00
teor	a417c7c8c7	Use meaningful names for select! variables	2021-04-13 23:56:16 -04:00
Alfredo Garcia	5ec05e91e1	update version strings for v1.0.0-alpha.6	2021-04-08 18:48:34 -04:00
teor	306fa88214	Document the correctness of Poll::Pending wakeups	2021-03-27 08:55:49 -04:00
teor	829a6f11c5	Document the behaviour of the `select!` macro	2021-03-27 08:55:49 -04:00
Deirdre Connolly	ca1d2de87d	Bump versions for v1.0.0-alpha.5 (#1932 ) Zebra's latest alpha checkpoints on Canopy activation, continues our work on NU5, and fixes a security issue. Some notable changes include: ## Added - Log address book metrics when PeerSet or CandidateSet don't have many peers (#1906) - Document test coverage workflow (#1919) - Add a final job to CI, so we can easily require all the CI jobs to pass (#1927) ## Changed - Zebra has moved its mandatory checkpoint from Sapling to Canopy (#1898, #1926) - This is a breaking change for users that depend on the exact height of the mandatory checkpoint. ## Fixed - tower-batch: wake waiting workers on close to avoid hangs (#1908) - Assert that pre-Canopy blocks use checkpointing (#1909) - Fix CI disk space usage by disabling incremental compilation in coverage builds (#1923) ## Security - Stop relying on unchecked length fields when preallocating vectors (#1925)	2021-03-22 22:05:01 -04:00
Alfredo Garcia	d49eaab68e	Bump versions for zebrad 1.0.0-alpha.4 (#1913 ) * Bump versions for zebrad 1.0.0-alpha.4 * add Cargo.lock	2021-03-16 21:12:37 -03:00
Jack Grigg	bae9a7ecd5	Expose binary data in metrics This enables slicing and aggregating metrics based on zebrad version: https://www.robustperception.io/exposing-the-software-version-to-prometheus	2021-03-17 09:38:07 +10:00
teor	d494af1e90	Document how the syncer resists memory DoS	2021-03-11 06:24:46 -05:00
teor	c6358b157c	Reduce inbound concurrency to limit memory usage Inbound malicious blocks can use a large amount of RAM when deserialized. Limit inbound concurrency, so that the total amount of RAM remains small.	2021-03-11 06:24:46 -05:00
teor	7558f74c78	Bump versions for zebrad 1.0.0-alpha.3	2021-02-23 10:39:13 -05:00
teor	e61b5e50a2	Diagnostics for CI port conflict failures (#1766 ) Log a "Trying..." message before each listener opens, to see if the delay is inside Zebra, or in the test harness or OS. Also report the configured and actual ports where possible, for better diagnostics.	2021-02-18 12:15:09 -03:00
teor	972103d797	Fix tracing macro syntax	2021-02-17 11:09:22 -05:00
teor	253d1c02b3	Make sync logging a bit less verbose And tweak some log content	2021-02-17 11:09:22 -05:00
teor	cc7d5bd2ad	Update comments for the inbound service (#1740 )	2021-02-16 06:14:40 +10:00
teor	372a432179	Update the call_all comment in Inbound (#1737 )	2021-02-16 06:14:16 +10:00
teor	0b76352468	Document a state_contains bug (#1715 ) * Document a state_contains bug in the syncer and Inbound	2021-02-10 09:05:14 +10:00
Deirdre Connolly	0c5daa8410	Bump versions for zebrad 1.0.0-alpha.2 Including tower-batch bump to 0.2.0, tower-fallback to 0.2.0, zebra-script to 1.0.0-alpha.3	2021-02-09 16:14:29 -05:00
teor	dce11358d7	Log when the syncer awaits peer readiness (#1714 )	2021-02-10 07:09:27 +10:00
Alfredo Garcia	d7c40af2a8	Fix shutdown panics (#1637 ) * add a shutdown flag in zebra_chain::shutdown * fix network panic on shutdown * fix checkpoint panic on shutdown	2021-02-03 19:03:28 +10:00
teor	6679a124e3	Require Inbound setup handlers to provide a result Rather than having them default to `Ok(())`, which is incorrect for some error handlers.	2021-02-03 08:32:10 +10:00
teor	09c8c89462	Make sure FailedInit never escapes Inbound::poll_ready	2021-02-03 08:32:10 +10:00
teor	134a5e78bd	Consistently use `network_setup` for the Inbound Setup	2021-02-03 08:32:10 +10:00
teor	1c8362fe01	Remove unused imports	2021-02-03 08:32:10 +10:00
Jane Lusby	4cf331562c	combine network setup into an exhaustive match	2021-02-03 08:32:10 +10:00
Jane Lusby	4d6ef89248	avoid using async blocks to avoid lifetime bug with generators	2021-02-03 08:32:10 +10:00
Jane Lusby	685a592399	Add clonable wrapper around TryRecvError	2021-02-03 08:32:10 +10:00
teor	6ffeb670ed	Log the failed response in an unreachable panic	2021-02-03 08:32:10 +10:00
teor	eac4fd181a	Add a Setup enum to manage Inbound network setup internal state This change encodes a bunch of invariants in the type system, and adds explicit failure states for: * a closed oneshot, * bugs in the initialization code.	2021-02-03 08:32:10 +10:00
teor	32b032204a	Consistently return Response::Nil during setup And log an info-level message as a diagnostic, in case setup takes a long time.	2021-02-03 08:32:10 +10:00
teor	94eb91305b	Stop using ServiceExt::call_all due to buffer bugs ServiceExt::call_all leaks Tower::Buffer reservations, so we can't use it in Zebra. Instead, use a loop in the returned future. See #1593 for details.	2021-02-03 08:32:10 +10:00
teor	64bc45cd2e	Fix state readiness hangs for Inbound Use `ServiceExt::oneshot` to perform state requests. Explain that `ServiceExt::call_all` calls `poll_ready` internally. Document a state service invariant imposed by `ServiceExt::call_all`.	2021-02-03 08:32:10 +10:00
teor	4d1a2fd02e	Make the Inbound invariant clearer	2021-02-03 08:32:10 +10:00
teor	2a25b9ee72	Remove services that are never `call`ed from Inbound Uses the `ServiceExt::oneshot` design pattern from #1593.	2021-02-03 08:32:10 +10:00
Alfredo Garcia	4b34482264	Add hints to port conflict and lock file panics (#1535 ) * add hint for port error * add issue filter for port panic * add lock file hint * add metrics endpoint port conflict hint * add hint for tracing endpoint port conflict * add acceptance test for resource conflics * Split out common conflict test code into a function * Add state, metrics, and tracing conflict tests * Add a full set of stderr acceptance test functions This change makes the stdout and stderr acceptance test interfaces identical. * move Zcash listener opening * add todo about hint for disk full * add constant for lock file * match path in state cache * don't match windows cache path * Use Display for state path logs Avoids weird escaping on Windows when using Debug * Add Windows conflict error messages * Turn PORT_IN_USE_ERROR into a regex And add another alternative Windows-specific port error Co-authored-by: teor <teor@riseup.net> Co-authored-by: Jane Lusby <jane@zfnd.org>	2021-01-29 22:36:33 +10:00
teor	24f1b9bad1	Document the Inbound service in the start module (#1653 )	2021-01-29 22:19:06 +10:00
teor	21b0360114	Limit concurrent inbound gossipped block requests Uses the "load shed directly" design pattern from #1618.	2021-01-29 11:02:26 +10:00
teor	3d9888f736	Rewrite a sync comment	2021-01-29 11:02:26 +10:00
Deirdre Connolly	1b09538277	Bump versions for zebrad 1.0.0-alpha.1 (#1646 ) * Bump versions where appropriate Tested with cargo install --locked --path etc * Remove fixed panics from 'Known Issues' * Change to alpha release series in the README Co-authored-by: teor <teor@riseup.net>	2021-01-27 20:31:39 -05:00
teor	391c53aa60	Move BoxError to zebrad's lib.rs For consistency with other crates.	2021-01-27 12:14:27 -08:00
teor	9cdf41f5f4	Panic if the lookahead limit is misconfigured (#1589 )	2021-01-14 14:06:30 +10:00
teor	92d95d4be5	Refactor inbound members into a consistent order And add download comments	2021-01-13 20:46:25 -05:00
teor	fb76eb2e6b	Add download and verify timeouts to the inbound service	2021-01-13 20:46:25 -05:00
teor	973aec8ccc	Refactor sync members into a consistent order And add comments about correctness and usage.	2021-01-13 20:46:25 -05:00
teor	c2893dce51	Warn when the user's configured lookahead limit is ignored	2021-01-13 20:46:25 -05:00
teor	3699bbdae6	Add some additional sync correctness constraints And adjust the sync restart delay as a consequence.	2021-01-13 20:46:25 -05:00
teor	cef0a492d8	Add a timeout to sync service block verification This timeout stops the sync service hanging when it is missing required blocks, but the lookahead queue is full of dependent verify tasks, so the missing blocks never get downloaded.	2021-01-13 20:46:25 -05:00
teor	c75cbdea79	Log configured network in every log message (#1568 ) * Add the configured network to error reports * Log the configured network at error level * Create the global span immediately after activating tracing And leak the span guard, so the span is always active. * Include panic metadata in the report and URL * Use `Main` and `Test` in the global span `net=Mainnet` is a bit redundant	2021-01-12 07:46:56 +10:00
teor	b1f14f47c6	Rewrite GetData handling to match the zcashd implementation (#1518 ) * Rewrite GetData handling to match the zcashd implementation `zcashd` silently ignores missing blocks, but sends found transactions followed by a `NotFound` message: `e7b425298f/src/main.cpp (L5497)` This is significantly different to the behaviour expected by the old Zebra connection state machine, which expected `NotFound` for blocks. Also change Zebra's GetData responses to peer request so they ignore missing blocks. * Stop hanging on incomplete transaction or block responses Instead, if the peer sends an unexpected block, unexpected transaction, or NotFound message: 1. end the request, and return a partial response containing any items that were successfully received 2. if none of the expected blocks or transactions were received, return an error, and close the connection	2021-01-04 13:25:35 +10:00
teor	69fcf64d6c	Disable issue URLs for "duplicate hash" errors (#1517 ) In our README, we tell users to ignore these errors, so we should also disable the issue URL. Also include the hash in the error. (We don't want the span active for all messages, we just want the hash in the error.)	2020-12-16 08:14:42 +10:00
Alfredo Garcia	41833340c1	downgrade remaining version strings to 1.0.0-alpha.0 (#1488 )	2020-12-15 11:21:00 +10:00
Deirdre Connolly	2d1698a120	Comment out Sentry stacktraces for now While panic = abort, Sentry collects the same one-line stack trace for all panics, making it incorrectly dedupe different errors into one.	2020-12-12 13:26:52 -05:00
Deirdre Connolly	cff28f7ac8	Use the commit sha as the sentry release	2020-12-09 13:06:18 -05:00
Jane Lusby	400213e2b3	integrate sentry with our existing panic reporting logic	2020-12-09 13:06:18 -05:00
Deirdre Connolly	f1ec1d626d	Tidy for now	2020-12-09 13:06:18 -05:00
Deirdre Connolly	44e1051dee	Debug	2020-12-09 13:06:18 -05:00
Deirdre Connolly	8b268e3f71	Don't keep guard around	2020-12-09 13:06:18 -05:00
Deirdre Connolly	25f6fd25b3	Test catching panic	2020-12-09 13:06:18 -05:00
Deirdre Connolly	6a17549945	Try sentry-tracing integration	2020-12-09 13:06:18 -05:00
Deirdre Connolly	c03a3a2606	Pull DSN from runtime env, enable Sentry debug mode with RUST_LOG=debug	2020-12-09 13:06:18 -05:00
Deirdre Connolly	27e42f4ed5	Set up Sentry error collection via a feature flag	2020-12-09 13:06:18 -05:00
Deirdre Connolly	47d78d4cf4	Try sentry::init()	2020-12-09 13:06:18 -05:00
teor	16ffb1dbbf	Disable issue URLs on all timeouts (#1470 ) This change helps prevent spurious bug reports.	2020-12-08 07:47:01 +10:00
Jane Lusby	ef7e91c3c7	disable color-eyre colors if not connected to a tty (#1443 ) * disable color-eyre colors if not connected to a tty * check if color is disabled	2020-12-04 11:05:25 +10:00
Jane Lusby	90f944709b	fix git commit logic to work on gcloud (#1442 )	2020-12-03 15:18:55 +10:00
teor	0e42d8b6c1	Always enable color_eyre, even when color is disabled We want to automatically disable colors upstream in color_eyre, and add a config that allows users to always turn off color.	2020-12-02 10:25:44 -08:00
teor	bed34168c1	Automatically disable abscissa colors and color_eyre when writing to a file	2020-12-02 10:25:44 -08:00
teor	97d1a81b7c	Automatically disable colors when tracing to a file	2020-12-02 10:25:44 -08:00
Henry de Valence	f0db75e712	cargo fmt	2020-12-01 19:16:41 -08:00
Jane Lusby	a91d0f0bb6	Include short sha in log messages and error urls (#1410 ) As we approach our alpha release we've decided we want to plan ahead for the user bug reports we will eventually receive. One of the bigger issues we foresee is determining exactly what version of the software users are running, and particularly how easy it may or may not be for users to accidentally discard this information when reporting bugs. To defend against this, we've decided to include the exact git sha for any given build in the compiled artifact. This information will then be re-exported as a span early in the application startup process, so that all logs and error messages should include the sha as their very first span. We've also added this sha as issue metadata for `color-eyre`'s github issue url auto generation feature, which should make sure that the sha is easily available in bug reports we receive, even in the absence of logs. Co-authored-by: teor <teor@riseup.net>	2020-12-01 12:13:20 -08:00
Jane Lusby	fceef849cf	remove unused mutability to defuse deadlock	2020-12-01 11:03:13 -05:00
Henry de Valence	1df9284444	zebrad: add a use_color option to the tracing config. This is useful for creating searchable logs without having to filter color codes after the fact.	2020-11-30 15:25:50 -08:00
Henry de Valence	e8c16b172f	zebrad: pass TracingSection to Tracing component	2020-11-30 15:25:50 -08:00
Alfredo Garcia	4544463059	Inbound `FindBlocks` and `FindHeaders` (#1347 ) * implement inbound `FindBlocks` * Handle inbound peer FindHeaders requests * handle request before having any chain tip * Split `find_chain_hashes` into smaller functions Add a `max_len` argument to support `FindHeaders` requests. Rewrite the hash collection code to use heights, so we can handle the `stop` hash and "no intersection" cases correctly. * Split state height functions into "any chain" and "best chain" * Rename the best chain block method to `best_block` * Move fmt utilities to zebra_chain::fmt * Summarise Debug for some Message variants Co-authored-by: teor <teor@riseup.net> Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-12-01 07:30:37 +10:00
Henry de Valence	fa02b266ca	clippy	2020-11-25 10:55:44 -08:00
Henry de Valence	de8415dcb1	tidy spans	2020-11-25 10:55:44 -08:00
Henry de Valence	05837797b1	tidy imports	2020-11-25 10:55:44 -08:00
Henry de Valence	77bf327b07	fix errors (2)	2020-11-25 10:55:44 -08:00
Henry de Valence	527f4d39ed	fix errors	2020-11-25 10:55:44 -08:00
Henry de Valence	e645e3bf0c	remove async	2020-11-25 10:55:44 -08:00
Henry de Valence	6569977549	test compile change	2020-11-25 10:55:44 -08:00
Alfredo Garcia	486e55104a	create Downloads for Inbound	2020-11-25 10:55:44 -08:00
Henry de Valence	2a4a89c002	state,zebrad: tidy span levels for good INFO output This provides useful and not too noisy output at INFO level. We do an info-level message on every block commit instead of trying to do one message every N blocks, because this is useful both for initial block sync as well as continuous state updates on new blocks.	2020-11-23 14:16:39 +10:00
Henry de Valence	f0810b028d	state,consensus,sync: shorten span lengths These changes help reduce the size of the resulting spans, making the output more compact. Together they save about 30-40 characters.	2020-11-23 14:16:39 +10:00
teor	d4da9609ee	Update the max_concurrent_block_requests docs In #1298, we decreased `max_concurrent_block_requests`, but forgot to update the docs.	2020-11-20 10:08:57 -08:00
Henry de Valence	ba3c19142c	deps: update hyper, metrics to tokio 0.3 The metrics code becomes much simpler because the current version of the metrics crate builds its own single-threaded runtime on a dedicated worker thread, so no dependency on the main Zebra Tokio runtime is required.	2020-11-20 10:08:16 -08:00
Henry de Valence	add94c1c45	deps: move to tokio 0.3, tower 0.4 This change is mostly mechanical, with the exception of the changes to the `tower-batch` middleware. This middleware was adapted from `tower::buffer`, and the `tower::buffer` code was changed to implement its own bounded queue, because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See `ddc64e8d4d` for more context on the Tower changes. To match Tower as closely as possible in order to be able to upstream `tower-batch`, those changes are copied from `tower::Buffer` to `tower-batch`.	2020-11-20 10:08:16 -08:00
Henry de Valence	4953f21670	fixup! zebrad: hack to skip alreadyverified errors	2020-11-18 03:09:06 -05:00
Henry de Valence	d2fc01755b	zebrad: more reasonable concurrent block limit This helps prevent overloading the network with too many concurrent block requests. On a fast network, we're likely to still have enough room to saturate our bandwidth. In the worst case, with 2MB blocks, downloading 50 blocks concurrently is 100MB of queued downloads. If we need to download this in 20 seconds to avoid peer connection timeouts, the implied worst-case minimum speed is 5MB/s. In practice, this minimum speed will likely be much lower.	2020-11-17 14:56:27 -08:00
Henry de Valence	aa7538ab15	zebrad: hack to skip alreadyverified errors	2020-11-17 14:56:27 -08:00
Henry de Valence	e55392b61e	zebrad: explicitly select the threaded scheduler.	2020-11-17 14:56:27 -08:00
Henry de Valence	6de824bd99	zebrad: remove block verification timeout Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.	2020-11-17 14:56:27 -08:00
Henry de Valence	e9c847bbd7	zebrad: avoid a borrow in the ChainSync future	2020-11-17 14:56:27 -08:00
Henry de Valence	b632a24436	zebrad: add diagnostics on cancelled download tasks	2020-11-17 14:56:27 -08:00
Henry de Valence	ec411574ee	zebrad: improve sync diagnostics	2020-11-17 14:56:27 -08:00
Henry de Valence	e0c92167bc	Revert "Hedge every syncer block download request" This reverts commit `656bd24ba7`. The Hedge middleware keeps a pair of histograms, writing into one in the current time interval and reading from the previous time interval's data. This means that the reverted change resulted in doubling all block downloads until after at least the second measurement interval (which means that the time measurements are also incorrect, as they're operating under double the network load...)	2020-11-12 16:45:47 -05:00
Alfredo Garcia	128643d81e	Call `zebra_test::init` where needed. (#1227 ) * Add missing `zebra_test::init()` to zebra-chain * Add missing `zebra_test::init()` to zebra-consensus * Add missing `zebra_test::init()` to zebra-network * Add missing `zebra_test::init()` to zebra-state * Add missing `zebra_test::init()` to zebra-test * Add missing `zebra_test::init()` to zebrad	2020-11-10 10:29:25 +10:00
Henry de Valence	0ad648fb6a	zebrad: make lookahead limit configurable. Sets the default value to the previous lookahead limit. My testing on mainnet suggested that the newly lower value (changed when the checkpoint frequency was decreased) is low enough to cause stalls, even when using hedged requests.	2020-11-01 10:47:46 -08:00
teor	92c623eddf	Log each genesis download This change helps us diagnose sync hangs.	2020-10-28 11:31:04 -04:00
teor	656bd24ba7	Hedge every syncer block download request Remove the minimum data points from the syncer hedge configuragtion. When there are no data points, hedge sends the second request immediately. Where there are less than 1/(1-latency_percentile) data points (20), hedge delays the second request by the highest recent download time. This change should improve genesis and post-restart sync latency.	2020-10-28 11:31:04 -04:00
Henry de Valence	4c960c4e6d	zebrad: treat duplicate downloads as an error We should error if we notice that we're attempting to download the same blocks multiple times, because that indicates that peers reported bad information to us, or we got confused trying to interpret their responses.	2020-10-26 12:05:35 -07:00
Henry de Valence	4127d086ea	zebrad: clarify hedge layering motivation Co-authored-by: teor <teor@riseup.net>	2020-10-26 12:05:35 -07:00
Henry de Valence	253bab042e	sync: add a concurrency limit for block downloads	2020-10-26 12:05:35 -07:00
Henry de Valence	0a405c737d	zebrad: check state in obtaintips, not extendtips. The original sync algorithm split the sync process into two phases, one that obtained prospective chain tips, and another that attempted to extend those chain tips as far as possible until encountering an error (at which point the prospective state is discarded and the process restarts). Because a previous implementation of this algorithm didn't properly enforce linkage between segments of the chain while extending tips, sometimes it would get confused and fail to discard responses that did not extend a tip. To mitigate this, a check against the state was added. However, this check can cause stalls while checkpointing, because when a checkpoint is reached we may suddenly need to commit thousands of blocks to the state. Because the sync algorithm now has a a `CheckedTip` structure that ensures that a new segment of hashes actually extends an existing one, we don't need to check against the state while extending a tip, because we don't get confused while interpreting responses. This change results in significantly smoother progress on mainnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	65e0c22fbe	state: don't pre-buffer the service There's no reason to return a pre-Buffer'd service (there's no need for internal access to the state service, as in zebra-network), but wrapping it internally removes control of the buffer size from the caller.	2020-10-26 12:05:35 -07:00
Henry de Valence	ce2ac3336f	zebrad: add debug message before state check This reveals that there may be contention in access to the state, as this takes a long time.	2020-10-26 12:05:35 -07:00
Henry de Valence	91469faf3c	zebrad: eliminate duplicate span in sync	2020-10-26 12:05:35 -07:00
Henry de Valence	b5a43f4516	zebrad: remove implementation details from docs The timeout behavior in zebra-network is an implementation detail, not a feature of the public API. So it shouldn't be mentioned in the doc comments -- if we want timeout behavior, we have to layer it ourselves.	2020-10-26 12:05:35 -07:00
Henry de Valence	1d7309afe2	zebrad: correctly handle duplicates in DownloadSet Using the cancel_handles, we can deduplicate requests. This is important to do, because otherwise when we insert the second cancel handle, we'd drop the first one, cancelling an existing task for no reason.	2020-10-26 12:05:35 -07:00
Henry de Valence	56fe4f4379	zebrad: unify sync restart logic This lets us keep the main loop simple and just write `continue 'sync;` to keep going.	2020-10-26 12:05:35 -07:00
Henry de Valence	12d25159c6	zebrad: use hedged requests in sync The hedge middleware implements hedged requests, as described in _The Tail At Scale_. The idea is that we auto-tune our retry logic according to the actual network conditions, pre-emptively retrying requests that exceed some latency percentile. This would hopefully solve the problem where our timeouts are too long on mainnet and too slow on testnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	5f229d1475	zebrad: use Downloads in sync Try to use the better cancellation logic to revert to previous sync algorithm. As designed, the sync algorithm is supposed to proceed by downloading state prospectively and handle errors by flushing the pipeline and starting over. This hasn't worked well, because we didn't previously cancel tasks properly. Now that we can, try to use something in the spirit of the original sync algorithm.	2020-10-26 12:05:35 -07:00
Henry de Valence	b90581a3d7	zebrad: create a Downloads Stream for syncing. This makes two changes relative to the existing download code: 1. It uses a oneshot to attempt to cancel the download task after it has started; 2. It encapsulates the download creation and cancellation logic into a Downloads struct.	2020-10-26 12:05:35 -07:00
Henry de Valence	b636660d6a	zebrad: rename sync::Error alias to BoxError.	2020-10-26 12:05:35 -07:00
Henry de Valence	cab96aa1a8	zebrad: clarify config help text (#1194 )	2020-10-22 15:03:01 +10:00
Alfredo Garcia	21ad6ffc47	Reverse displayed endianness of transaction and block hashes (#1171 ) * Reverse displayed endianness of transaction and block hashes * fix zebra-checkpoints utility for new hash order * Stop using "zebrad revhex" in zebrad-hash-lookup * Rebuild checkpoint lists in new hash order This change also adds additional checkpoints to the end of each list. * Replace TransactionHash with transaction::Hash This change should have been made in #905, but we missed Debug impls and some docs. Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com> Co-authored-by: teor <teor@riseup.net>	2020-10-22 07:54:02 +10:00
Henry de Valence	eb43893de0	consensus: minimize API, clean docs This reduces the API surface to the minimum required for functionality, and cleans up module documentation. The stub mempool module is deleted entirely, since it will need to be redone later anyways.	2020-10-20 11:16:22 -04:00
Alfredo Garcia	c0a14ecc8c	move genesis parameters to zebra-chain (#1151 )	2020-10-12 14:08:23 -07:00
Jane Lusby	855f9b5bcb	Implement MVP of NonFinalizedState and integrate it with the state service (#1101 ) * implement most of the chain functions * implement fork * fix outpoint handling in Chain struct * update expect for work * split utxo into two sets * update the Chain definition * remove allow attribute in zebra-state/lib.rs * merge ChainSet type into MemoryState * Add error messages to asserts * export proptest impls for use in downstream crates * add testjob for disabled feature in zebra-chain * try to fix github actions syntax * add module doc comment * update RFC for utxos * add missing header * working proptest for Chain * propagate back results over channel * Start updating RFC to match changes * implement queued block pruning * and now it syncs wooo! * remove empty modules * setup config for proptests * re-enable missing_docs lint * update RFC to match changes in impl * add documentation * use more explicit variable names	2020-10-08 13:07:32 +10:00
Jane Lusby	40e22808c7	disable reporting url for timeout errors (#1087 ) * disable reporting url for timeout errors * revert newline removal * switch to released color-eyre version	2020-09-21 16:15:09 -07:00
Henry de Valence	fe61090a64	zebrad: make Inbound Poll::Ready before setup. The Inbound service only needs the network setup for some requests, but it can service other requests without it. Making it return Poll::Pending until the network setup finishes means that initial network connections may view the Inbound service as overloaded and attempt to load-shed.	2020-09-21 09:26:39 -07:00
Henry de Valence	9c021025a7	network: fill in remaining request/response pairs	2020-09-20 10:21:18 -07:00
Henry de Valence	4b35fea492	zebrad: document Inbound, ChainSync responsibilities	2020-09-18 18:34:25 -07:00
Henry de Valence	65877cb4b1	zebrad: make Inbound propagate backpressure	2020-09-18 18:34:25 -07:00
Henry de Valence	55f46967b2	zebrad: serve blocks from Inbound service The original version of this commit ran into https://github.com/rust-lang/rust/issues/64552 again. Thanks to @yaahc for suggesting a workaround (using futures combinators to avoid writing an async block).	2020-09-18 18:34:25 -07:00
Henry de Valence	170f588ffb	network: document load-shedding behavior This was part of the original design and is described in the Connection internals, but we never documented it externally.	2020-09-18 18:34:25 -07:00
Henry de Valence	1d0ebf89c6	zebrad: move seed command into inbound component Remove the seed command entirely, and make the behavior it provided (responding to `Request::Peers`) part of the ordinary functioning of the start command. The new `Inbound` service should be expanded to handle all request types.	2020-09-18 18:34:25 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
Jane Lusby	ca648ff27c	Enable issue-url feature in color-eyre (#1072 ) * Enable issue-url feature in color-eyre * get version automatically * and the url!	2020-09-17 15:09:18 -07:00
Henry de Valence	3133214e4f	zebrad: use new state API	2020-09-11 13:37:49 -07:00
teor	b1e1291f45	Log inbound peer requests at debug Logging at info was a bit too verbose. Also add a short log message.	2020-09-10 09:46:53 -07:00
Henry de Valence	24de90c900	zebrad: tidy sync imports	2020-09-10 09:45:52 -07:00
Henry de Valence	9b6e66c1b9	zebrad: rename Syncer to ChainSync This name clarifies what is being synced and avoids an agent-noun construction.	2020-09-10 09:45:52 -07:00

1 2 3 4 5 ...

427 Commits