zebra

Commit Graph

Author	SHA1	Message	Date
Henry de Valence	d2fc01755b	zebrad: more reasonable concurrent block limit This helps prevent overloading the network with too many concurrent block requests. On a fast network, we're likely to still have enough room to saturate our bandwidth. In the worst case, with 2MB blocks, downloading 50 blocks concurrently is 100MB of queued downloads. If we need to download this in 20 seconds to avoid peer connection timeouts, the implied worst-case minimum speed is 5MB/s. In practice, this minimum speed will likely be much lower.	2020-11-17 14:56:27 -08:00
Henry de Valence	aa7538ab15	zebrad: hack to skip alreadyverified errors	2020-11-17 14:56:27 -08:00
Henry de Valence	e55392b61e	zebrad: explicitly select the threaded scheduler.	2020-11-17 14:56:27 -08:00
Henry de Valence	6de824bd99	zebrad: remove block verification timeout Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.	2020-11-17 14:56:27 -08:00
Henry de Valence	e9c847bbd7	zebrad: avoid a borrow in the ChainSync future	2020-11-17 14:56:27 -08:00
Henry de Valence	b632a24436	zebrad: add diagnostics on cancelled download tasks	2020-11-17 14:56:27 -08:00
Henry de Valence	ec411574ee	zebrad: improve sync diagnostics	2020-11-17 14:56:27 -08:00
teor	54cb9277ef	Allow some new clippy nightly lints	2020-11-17 10:07:37 +10:00
dependabot[bot]	8c5f6d0177	build(deps): bump once_cell from 1.5.1 to 1.5.2 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/matklad/once_cell/releases) - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.5.1...v1.5.2) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-13 14:48:11 -05:00
Jane Lusby	7c0275ac0b	reorganize stop check (#1288 ) * reorganize stop check * remove unused enum * move out and make it unique Co-authored-by: teor <teor@riseup.net>	2020-11-13 11:37:52 +10:00
Henry de Valence	e0c92167bc	Revert "Hedge every syncer block download request" This reverts commit `656bd24ba7`. The Hedge middleware keeps a pair of histograms, writing into one in the current time interval and reading from the previous time interval's data. This means that the reverted change resulted in doubling all block downloads until after at least the second measurement interval (which means that the time measurements are also incorrect, as they're operating under double the network load...)	2020-11-12 16:45:47 -05:00
Alfredo Garcia	128643d81e	Call `zebra_test::init` where needed. (#1227 ) * Add missing `zebra_test::init()` to zebra-chain * Add missing `zebra_test::init()` to zebra-consensus * Add missing `zebra_test::init()` to zebra-network * Add missing `zebra_test::init()` to zebra-state * Add missing `zebra_test::init()` to zebra-test * Add missing `zebra_test::init()` to zebrad	2020-11-10 10:29:25 +10:00
teor	efef2a2bd7	Reduce acceptance test sled memory usage (#1236 ) * Use the default memory limit in the acceptance tests PR #1233 changed the default `memory_cache_bytes`, but left the acceptance tests with their old value.	2020-11-10 07:42:30 +10:00
dependabot[bot]	a58299a0f0	build(deps): bump color-eyre from 0.5.6 to 0.5.7 Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.6 to 0.5.7. - [Release notes](https://github.com/yaahc/color-eyre/releases) - [Changelog](https://github.com/yaahc/color-eyre/blob/master/CHANGELOG.md) - [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.6...v0.5.7) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-09 08:40:55 -05:00
dependabot[bot]	1e3cf6dc5c	build(deps): bump tracing-subscriber from 0.2.14 to 0.2.15 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.14 to 0.2.15. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.14...tracing-subscriber-0.2.15) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-04 20:37:40 -05:00
dependabot[bot]	785fc30481	build(deps): bump hyper from 0.13.8 to 0.13.9 Bumps [hyper](https://github.com/hyperium/hyper) from 0.13.8 to 0.13.9. - [Release notes](https://github.com/hyperium/hyper/releases) - [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/hyper/compare/v0.13.8...v0.13.9) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-04 20:07:18 -05:00
Henry de Valence	0ad648fb6a	zebrad: make lookahead limit configurable. Sets the default value to the previous lookahead limit. My testing on mainnet suggested that the newly lower value (changed when the checkpoint frequency was decreased) is low enough to cause stalls, even when using hedged requests.	2020-11-01 10:47:46 -08:00
teor	92c623eddf	Log each genesis download This change helps us diagnose sync hangs.	2020-10-28 11:31:04 -04:00
teor	656bd24ba7	Hedge every syncer block download request Remove the minimum data points from the syncer hedge configuragtion. When there are no data points, hedge sends the second request immediately. Where there are less than 1/(1-latency_percentile) data points (20), hedge delays the second request by the highest recent download time. This change should improve genesis and post-restart sync latency.	2020-10-28 11:31:04 -04:00
teor	ea510b7d41	Run a block sync in CI with 2 large checkpoints (#1193 ) * Run large checkpoint sync tests in CI * Improve test child output match error context * Add a debug_stop_at_height config * Use stop at height in acceptance tests And add some restart acceptance tests, to make sure the stop at height feature works correctly.	2020-10-27 19:25:29 +10:00
Henry de Valence	4c960c4e6d	zebrad: treat duplicate downloads as an error We should error if we notice that we're attempting to download the same blocks multiple times, because that indicates that peers reported bad information to us, or we got confused trying to interpret their responses.	2020-10-26 12:05:35 -07:00
Henry de Valence	4127d086ea	zebrad: clarify hedge layering motivation Co-authored-by: teor <teor@riseup.net>	2020-10-26 12:05:35 -07:00
Henry de Valence	253bab042e	sync: add a concurrency limit for block downloads	2020-10-26 12:05:35 -07:00
Henry de Valence	0a405c737d	zebrad: check state in obtaintips, not extendtips. The original sync algorithm split the sync process into two phases, one that obtained prospective chain tips, and another that attempted to extend those chain tips as far as possible until encountering an error (at which point the prospective state is discarded and the process restarts). Because a previous implementation of this algorithm didn't properly enforce linkage between segments of the chain while extending tips, sometimes it would get confused and fail to discard responses that did not extend a tip. To mitigate this, a check against the state was added. However, this check can cause stalls while checkpointing, because when a checkpoint is reached we may suddenly need to commit thousands of blocks to the state. Because the sync algorithm now has a a `CheckedTip` structure that ensures that a new segment of hashes actually extends an existing one, we don't need to check against the state while extending a tip, because we don't get confused while interpreting responses. This change results in significantly smoother progress on mainnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	65e0c22fbe	state: don't pre-buffer the service There's no reason to return a pre-Buffer'd service (there's no need for internal access to the state service, as in zebra-network), but wrapping it internally removes control of the buffer size from the caller.	2020-10-26 12:05:35 -07:00
Henry de Valence	ce2ac3336f	zebrad: add debug message before state check This reveals that there may be contention in access to the state, as this takes a long time.	2020-10-26 12:05:35 -07:00
Henry de Valence	91469faf3c	zebrad: eliminate duplicate span in sync	2020-10-26 12:05:35 -07:00
Henry de Valence	b5a43f4516	zebrad: remove implementation details from docs The timeout behavior in zebra-network is an implementation detail, not a feature of the public API. So it shouldn't be mentioned in the doc comments -- if we want timeout behavior, we have to layer it ourselves.	2020-10-26 12:05:35 -07:00
Henry de Valence	1d7309afe2	zebrad: correctly handle duplicates in DownloadSet Using the cancel_handles, we can deduplicate requests. This is important to do, because otherwise when we insert the second cancel handle, we'd drop the first one, cancelling an existing task for no reason.	2020-10-26 12:05:35 -07:00
Henry de Valence	56fe4f4379	zebrad: unify sync restart logic This lets us keep the main loop simple and just write `continue 'sync;` to keep going.	2020-10-26 12:05:35 -07:00
Henry de Valence	12d25159c6	zebrad: use hedged requests in sync The hedge middleware implements hedged requests, as described in _The Tail At Scale_. The idea is that we auto-tune our retry logic according to the actual network conditions, pre-emptively retrying requests that exceed some latency percentile. This would hopefully solve the problem where our timeouts are too long on mainnet and too slow on testnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	5f229d1475	zebrad: use Downloads in sync Try to use the better cancellation logic to revert to previous sync algorithm. As designed, the sync algorithm is supposed to proceed by downloading state prospectively and handle errors by flushing the pipeline and starting over. This hasn't worked well, because we didn't previously cancel tasks properly. Now that we can, try to use something in the spirit of the original sync algorithm.	2020-10-26 12:05:35 -07:00
Henry de Valence	b90581a3d7	zebrad: create a Downloads Stream for syncing. This makes two changes relative to the existing download code: 1. It uses a oneshot to attempt to cancel the download task after it has started; 2. It encapsulates the download creation and cancellation logic into a Downloads struct.	2020-10-26 12:05:35 -07:00
Henry de Valence	b636660d6a	zebrad: rename sync::Error alias to BoxError.	2020-10-26 12:05:35 -07:00
dependabot[bot]	ff51c2e0c0	build(deps): bump tracing-subscriber from 0.2.13 to 0.2.14 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.13 to 0.2.14. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.13...tracing-subscriber-0.2.14) Signed-off-by: dependabot[bot] <support@github.com>	2020-10-23 15:02:02 -04:00
Henry de Valence	cab96aa1a8	zebrad: clarify config help text (#1194 )	2020-10-22 15:03:01 +10:00
Alfredo Garcia	21ad6ffc47	Reverse displayed endianness of transaction and block hashes (#1171 ) * Reverse displayed endianness of transaction and block hashes * fix zebra-checkpoints utility for new hash order * Stop using "zebrad revhex" in zebrad-hash-lookup * Rebuild checkpoint lists in new hash order This change also adds additional checkpoints to the end of each list. * Replace TransactionHash with transaction::Hash This change should have been made in #905, but we missed Debug impls and some docs. Co-authored-by: Ramana Venkata <vramana@users.noreply.github.com> Co-authored-by: teor <teor@riseup.net>	2020-10-22 07:54:02 +10:00
teor	e52a1c07a3	Ignore longer sync tests by default	2020-10-21 21:08:04 +10:00
teor	0d121833af	Add sync tests that download 2000 blocks	2020-10-21 21:08:04 +10:00
teor	6fe3cc56dd	Refactor sync test to be more flexible And add documentation	2020-10-21 00:58:08 -04:00
teor	1d35c5a0b9	Enable the zebrad sync tests by default If your test environment does not have DNS or network access, set the ZEBRA_SKIP_NETWORK_TESTS environmental variable to disable these tests.	2020-10-21 00:58:08 -04:00
Henry de Valence	eb43893de0	consensus: minimize API, clean docs This reduces the API surface to the minimum required for functionality, and cleans up module documentation. The stub mempool module is deleted entirely, since it will need to be redone later anyways.	2020-10-20 11:16:22 -04:00
teor	d9fbba8a55	Skip the sync tests when ZEBRA_SKIP_NETWORK_TESTS is set	2020-10-16 15:21:01 -04:00
teor	04ce907dbf	Remove duplicate code in zebra_test::command	2020-10-15 19:54:00 -04:00
teor	32bbc19c6b	Fix a timeout bug in zebra_test::command And add tests for the command functionality. Also document some remaining bugs (see #1140).	2020-10-15 19:54:00 -04:00
teor	92f0c934cf	Add a sync acceptance test for the Testnet	2020-10-15 19:54:00 -04:00
Alfredo Garcia	2d3c3bcc23	add systemd service file	2020-10-14 15:33:00 -04:00
Alfredo Garcia	c0a14ecc8c	move genesis parameters to zebra-chain (#1151 )	2020-10-12 14:08:23 -07:00
dependabot[bot]	76e7e3d714	build(deps): bump tracing-subscriber from 0.2.12 to 0.2.13 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.12 to 0.2.13. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.12...tracing-subscriber-0.2.13) Signed-off-by: dependabot[bot] <support@github.com>	2020-10-08 15:09:32 -04:00
Jane Lusby	855f9b5bcb	Implement MVP of NonFinalizedState and integrate it with the state service (#1101 ) * implement most of the chain functions * implement fork * fix outpoint handling in Chain struct * update expect for work * split utxo into two sets * update the Chain definition * remove allow attribute in zebra-state/lib.rs * merge ChainSet type into MemoryState * Add error messages to asserts * export proptest impls for use in downstream crates * add testjob for disabled feature in zebra-chain * try to fix github actions syntax * add module doc comment * update RFC for utxos * add missing header * working proptest for Chain * propagate back results over channel * Start updating RFC to match changes * implement queued block pruning * and now it syncs wooo! * remove empty modules * setup config for proptests * re-enable missing_docs lint * update RFC to match changes in impl * add documentation * use more explicit variable names	2020-10-08 13:07:32 +10:00
dependabot[bot]	23a62a2d87	build(deps): bump inferno from 0.10.0 to 0.10.1 Bumps [inferno](https://github.com/jonhoo/inferno) from 0.10.0 to 0.10.1. - [Release notes](https://github.com/jonhoo/inferno/releases) - [Changelog](https://github.com/jonhoo/inferno/blob/master/CHANGELOG.md) - [Commits](https://github.com/jonhoo/inferno/compare/v0.10.0...v0.10.1) Signed-off-by: dependabot[bot] <support@github.com>	2020-10-06 05:31:01 -04:00
dependabot[bot]	d769f62a73	build(deps): bump color-eyre from 0.5.5 to 0.5.6 Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.5 to 0.5.6. - [Release notes](https://github.com/yaahc/color-eyre/releases) - [Changelog](https://github.com/yaahc/color-eyre/blob/master/CHANGELOG.md) - [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.5...v0.5.6) Signed-off-by: dependabot[bot] <support@github.com>	2020-10-05 11:26:23 -04:00
Jane Lusby	40e22808c7	disable reporting url for timeout errors (#1087 ) * disable reporting url for timeout errors * revert newline removal * switch to released color-eyre version	2020-09-21 16:15:09 -07:00
Henry de Valence	fe61090a64	zebrad: make Inbound Poll::Ready before setup. The Inbound service only needs the network setup for some requests, but it can service other requests without it. Making it return Poll::Pending until the network setup finishes means that initial network connections may view the Inbound service as overloaded and attempt to load-shed.	2020-09-21 09:26:39 -07:00
dependabot[bot]	85241a49d6	build(deps): bump hyper from 0.13.7 to 0.13.8 Bumps [hyper](https://github.com/hyperium/hyper) from 0.13.7 to 0.13.8. - [Release notes](https://github.com/hyperium/hyper/releases) - [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/hyper/compare/v0.13.7...v0.13.8) Signed-off-by: dependabot[bot] <support@github.com>	2020-09-21 11:58:31 -04:00
Henry de Valence	9c021025a7	network: fill in remaining request/response pairs	2020-09-20 10:21:18 -07:00
Henry de Valence	4b35fea492	zebrad: document Inbound, ChainSync responsibilities	2020-09-18 18:34:25 -07:00
Henry de Valence	65877cb4b1	zebrad: make Inbound propagate backpressure	2020-09-18 18:34:25 -07:00
Henry de Valence	55f46967b2	zebrad: serve blocks from Inbound service The original version of this commit ran into https://github.com/rust-lang/rust/issues/64552 again. Thanks to @yaahc for suggesting a workaround (using futures combinators to avoid writing an async block).	2020-09-18 18:34:25 -07:00
Henry de Valence	170f588ffb	network: document load-shedding behavior This was part of the original design and is described in the Connection internals, but we never documented it externally.	2020-09-18 18:34:25 -07:00
Henry de Valence	1d0ebf89c6	zebrad: move seed command into inbound component Remove the seed command entirely, and make the behavior it provided (responding to `Request::Peers`) part of the ordinary functioning of the start command. The new `Inbound` service should be expanded to handle all request types.	2020-09-18 18:34:25 -07:00
Henry de Valence	1d3892e1dc	network: rename alias to BoxError This is shorter and consistent with Tower (which is why we use it in the first place).	2020-09-18 18:34:25 -07:00
Jane Lusby	ca648ff27c	Enable issue-url feature in color-eyre (#1072 ) * Enable issue-url feature in color-eyre * get version automatically * and the url!	2020-09-17 15:09:18 -07:00
dependabot[bot]	ba32d27f6e	build(deps): bump tracing-subscriber from 0.2.11 to 0.2.12 (#1059 ) Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.11 to 0.2.12. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.11...tracing-subscriber-0.2.12) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2020-09-14 13:49:07 -07:00
Jane Lusby	a7b418bfe5	Add test for first checkpoint verification (#1018 ) * add test for first checkpoint sync Prior this this change we've not had any tests that verify our sync / network logic is well behaved. This PR cleans up the test helper code to make error reports more consistent and uses this cleaned up API to implement a checkpoint sync test which runs zebrad until it reads the first checkpoint event from stdout. Co-authored-by: teor <teor@riseup.net> * move include out of unix cfg Co-authored-by: teor <teor@riseup.net>	2020-09-11 13:39:39 -07:00
Henry de Valence	3133214e4f	zebrad: use new state API	2020-09-11 13:37:49 -07:00
teor	b1e1291f45	Log inbound peer requests at debug Logging at info was a bit too verbose. Also add a short log message.	2020-09-10 09:46:53 -07:00
Henry de Valence	24de90c900	zebrad: tidy sync imports	2020-09-10 09:45:52 -07:00
Henry de Valence	9b6e66c1b9	zebrad: rename Syncer to ChainSync This name clarifies what is being synced and avoids an agent-noun construction.	2020-09-10 09:45:52 -07:00
Henry de Valence	0bc79686b8	zebrad: move sync into components module. Part of #1030.	2020-09-10 09:45:52 -07:00
teor	adafe1d189	Restart sync after the first failed ObtainTips The ObtainTips retry was redundant. The timeout wasn't much shorter, but it made the code and sync logic more complicated.	2020-09-09 15:35:09 -07:00
teor	2a68ef5acb	Update the peerset buffer size and sync timeout Also add a bunch of comments and documentation for network-constrained nodes, and for testnet.	2020-09-08 12:44:33 -07:00
teor	b062a682b0	Refactor "waiting for pending blocks" log	2020-09-08 12:44:33 -07:00
teor	e6e859dce2	Tweak sync timeouts * increase the EWMA default and decay * increase the block download retries * increase the request and block download timeouts * increase the sync timeout	2020-09-08 12:44:33 -07:00
teor	ce12d4dadc	Add timeouts for tip responses and block verify tasks	2020-09-08 12:44:33 -07:00
teor	379ce5c1b8	Retry obtain and extend tips on failure	2020-09-08 12:44:33 -07:00
Alfredo Garcia	ca1a451895	Add test for metrics and tracing endpoints (#1000 ) * add metrics and tracking endpoint tests * test endpoints more * add change filter test for tracing * add await to post * separate metrics and tracing tests * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> Co-authored-by: teor <teor@riseup.net>	2020-09-07 17:05:23 -07:00
Alfredo Garcia	454e75e7c0	Rename old references to BlockHeaderHash and BlockHeight (#1002 ) * rename some references * Apply suggestions from code review Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: teor <teor@riseup.net> Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: teor <teor@riseup.net>	2020-09-04 15:40:48 -07:00
teor	48497d4857	Ignore sync errors when the block is already verified (#980 ) * Ignore sync errors when the block is already verified If we get an error for a block that is already in our state, we don't need to restart the sync. It was probably a duplicate download. Also: Process any ready tasks before reset, so the logs and metrics are up to date. (But ignore the errors, because we're about to reset.) Improve sync logging and metrics during the download and verify task. * Remove duplicate hashes in logs Co-authored-by: Jane Lusby <jlusby42@gmail.com> * Log the sync hash span at warn level Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-09-04 08:13:00 +10:00
teor	437549d8e9	Always drop the final hash in peer responses (#991 ) To workaround a zcashd bug that squashes responses together.	2020-09-04 08:09:34 +10:00
teor	c770daa51f	If the first ExtendTips hash is bad, discard it and re-check (#992 )	2020-09-04 08:08:19 +10:00
Alfredo Garcia	5485f4429a	Add config path to acceptance tests (#946 ) * add and apply config mode to get_child * remove option to read config from current directory * remove argument from get_child	2020-09-03 13:13:23 -07:00
Jane Lusby	ffdec0cb23	Remove in-memory state service (#974 ) * Remove in-memory state service * make the config compatible with toml again * checkpoint commit to see how much I still have to revert * back to the starting point... * remove unused dependency * reorganize error handling a bit * need to make a new color-eyre release now * reorder again because I have problems * remove unnecessary helpers * revert changes to config loading * add back missing space * Switch to released color-eyre version * add back missing newline again... * improve error message on unix when terminated by signal * add context to last few asserts in acceptance tests * instrument some of the helpers * remove accidental extra space * try to make this compile on windows * reorg platform specific code * hide on_disk module and fix broken link	2020-09-01 12:39:04 -07:00
teor	3fdfcb3179	fix: remove old tips that are behind new tips This change makes sync less reliant on the exact order of ObtainTips and ExtendTips responses.	2020-09-01 11:42:48 -04:00
teor	a6d6e65940	fix: fix the flamegraph module comment	2020-09-01 11:40:18 -04:00
Ramana Venkata	448250f901	Deduplicate test config defaults (#971 ) Fixes #967	2020-08-31 12:43:43 -07:00
Ramana Venkata	ad0001f7f7	zebra-state: Add support for temporary sled databases (#939 ) * Test config with persistent sled database * Test ephemeral config * Add misconfigured ephemeral test	2020-08-31 18:32:55 +10:00
teor	fa04072298	Make the checkpoint limit test more readable (#941 ) * fix: Pass zebra_consensus::Config in a test * fix: Remove a redundant import	2020-08-24 11:34:10 -07:00
teor	78201b456d	feature: Implement checkpoint_sync for checkpoint verification * add CheckpointList::new_up_to(limit: NetworkUpgrade) * if checkpoint_sync is false, limit checkpoints to Sapling * update tests for CheckpointList and chain::init	2020-08-24 15:34:46 +10:00
teor	06f4a59664	feature: Add a checkpoint_sync config option (The option doesn't do anything yet.)	2020-08-24 15:34:46 +10:00
Ramana Venkata	991c70723a	zebrad: Create zebrad.toml in acceptance tests from ZebradConfig Fixes #929	2020-08-23 21:24:19 -04:00
teor	3400b72699	fix: Make the start acceptance tests stricter	2020-08-21 07:22:53 +10:00
teor	02e6027c57	refactor: Remove duplicate acceptance test code	2020-08-21 07:22:53 +10:00
teor	1e0e4914a0	fix: Improve an acceptance test failure message If the tests conflict with a local zebrad, zcashd, or other tests, they need to be run with a custom config, or in an isolated environment.	2020-08-21 07:22:53 +10:00
teor	b8e8d4f548	fix: Remove some deeply-nested instrument spans Closes #923.	2020-08-20 14:52:39 -04:00
Alfredo Garcia	d349f2bbc2	Refactor acceptance serialized_tests (#920 ) * add network listening address to default config	2020-08-20 07:48:22 +10:00
Henry de Valence	103b663c40	chain: rename BlockHeight to block::Height	2020-08-17 11:46:34 -07:00
Henry de Valence	61dea90e2f	chain: rename BlockHeaderHash to block::Hash This is the first in a sequence of changes that change the block:: items to not include Block as a prefix in their name, in accordance with the Rust API guidelines.	2020-08-17 11:46:34 -07:00
Henry de Valence	948b067808	chain: move Network, NetworkUpgrade to parameters Also, avoid using star-imports of the enum variants, which pollutes the namespace.	2020-08-17 11:46:34 -07:00
Henry de Valence	0d1f56ad2f	chain: remove utils module A catch-all utils module can really easily slip into being a place to stash miscellaneous functions that don't really belong anywhere in particular.	2020-08-17 11:46:34 -07:00
Deirdre Connolly	27ed2288b5	Remove redundant clones for PathBufs	2020-08-14 20:15:24 -04:00
Alfredo Garcia	e73f976194	Valid generated config acceptance test (#859 ) * add valid generated config test * change to pathbuf * use -c to make sure we are using the generated file * add and use a ZebraTestDir type * change approach to generate tempdir in top of each test * pass tempdir to test_cmd and set current dir to it * add and use a `generated_config_path` variable in tests	2020-08-13 13:31:13 -07:00
Henry de Valence	a79ce97957	Fix sync algorithm. (#887 ) * checkpoint: reject older of duplicate verification requests. If we get a duplicate block verification request, we should drop the older one in favor of the newer one, because the older request is likely to have been canceled. Previously, this code would accept up to four duplicate verification requests, then fail all subsequent ones. * sync: add a timeout layer to block requests. Note that if this timeout is too short, we'll bring down the peer set in a retry storm. * sync: restart syncing on error Restart the syncing process when an error occurs, rather than ignoring it. Restarting means we discard all tips and start over with a new block locator, so we can have another chance to "unstuck" ourselves. * sync: additional debug info * sync: handle lookahead limit correctly. Instead of extracting all the completed task results, the previous code pulled results out until there were fewer tasks than the lookahead limit, then stopped. This meant that completed tasks could be left until the limit was exceeded again. Instead, extract all completed results, and use the number of pending tasks to decide whether to extend the tip or wait for blocks to finish. * network: add debug instrumentation to retry policy * sync: instrument the spawned task * sync: streamline ObtainTips/ExtendTips logic & tracing This change does three things: 1. It aligns the implementation of ObtainTips and ExtendTips so that they use the same deduplication method. This means that when debugging we only have one deduplication algorithm to focus on. 2. It streamlines the tracing output to not include information already included in spans. Both obtain_tips and extend_tips have their own spans attached to the events, so it's not necessary to add Scope: prefixes in messages. 3. It changes the messages to be focused on reporting the actual events rather than the interpretation of the events (e.g., "got genesis hash in response" rather than "peer could not extend tip"). The motivation for this change is that when debugging, the interpretation of events is already known to be incorrect, in the sense that the mental model of the code (no bug) does not match its behavior (has bug), so presenting minimally-interpreted events forces interpretation relative to the actual code. * sync: hack to work around zcashd behavior * sync: localize debug statement in extend_tips * sync: change algorithm to define tips as pairs of hashes. This is different enough from the existing description that its comments no longer apply, so I removed them. A further chunk of work is to change the sync RFC to document this algorithm. * sync: reduce block timeout * state: add resource limits for sled Closes #888 * sync: add a restart timeout constant * sync: de-pub constants	2020-08-12 16:48:01 -07:00
Henry de Valence	299afe13df	zebra-network tweaks. (#877 ) * network: move gossiped peer selection logic into address book. * network: return BoxService from init. * zebrad: add note on why we truncate thegossiped peer list Co-authored-by: Jane Lusby <jlusby42@gmail.com> * Remove unused .rustfmt.toml Many of these options are never actually loaded by our CI because of a channel mismatch, where they're not applied on stable but only on nightly (see the logs from a rustfmt job). This means that we can get different settings when running `cargo fmt` on the nightly and stable channels, which was causing a CI failure on this PR. Reverting back to the default rustfmt settings avoids this problem and keeps us in line with upstream rustfmt. There's no loss to us since we were using the defaults anyways. Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-08-11 13:07:44 -07:00
dependabot[bot]	945b019739	build(deps): bump tracing-subscriber from 0.2.10 to 0.2.11 (#873 ) Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.10 to 0.2.11. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.10...tracing-subscriber-0.2.11) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2020-08-11 10:30:50 -07:00
teor	2550c44d48	Make sync ignore known hashes (#853 ) * fix: Handle known ObtainTips correctly enumerate never returns a value beyond the end of the vector. * fix: Ignore known tips in ExtendTips Some peers send us known tips when we try to extend. * fix: Ignore known hashes when downloading Despite all our other checks, we still end up downloading some hashes multiple times. * fix: Increase the number of retries The old sync code relied on duplicate block fetches to make progress, but the last few commits have removed some of those duplicates. Instead, just retry the fetches that fail. * fix: Tweak comments Co-authored-by: Jane Lusby <jlusby42@gmail.com> * fix: Cleanup the state_contains interface in Sync * Fix brackets Oops Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-08-10 16:17:50 -07:00
Alfredo Garcia	c9093e4d59	Make more checks in non server acceptance tests (#860 ) * make sure no info is printed in non server tests * check exact full output for validity instead of log msgs * add end of output character to version regex * use coercions, use equality operator Co-authored-by: Jane Lusby <jlusby42@gmail.com> Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-08-10 12:50:48 -07:00
Alfredo Garcia	9c387521bd	Print endpoint addresses at startup (#867 ) * print tracing and metrics endpoints in startup * print network address in startup	2020-08-10 12:47:26 -07:00
teor	e95358dbe3	fix: Increase the number of retries The old sync code relied on duplicate block fetches to make progress, but the last few commits have removed some of those duplicates. Instead, just retry the fetches that fail.	2020-08-10 18:58:21 +10:00
teor	faac50697c	feature: Add a verified blocks metrics counter We have a counter for pending "download and verify" futures. But these futures are spawned, so they can complete in any order. They can also complete before we receive their results.	2020-08-10 15:12:08 +10:00
teor	6aeefcee8b	fix: Improve sync diagnostics	2020-08-10 15:12:08 +10:00
Henry de Valence	6d1a4b2218	Load config after initializing the Terminal (#848 )	2020-08-06 17:22:40 -07:00
Alfredo Garcia	c52481c041	fix logs	2020-08-07 09:21:57 +10:00
Jane Lusby	3e9c6f054b	fix log level default for server commands (#840 ) * fix log level default for server commands * remove dbg	2020-08-06 11:23:00 -07:00
Henry de Valence	a77328ad7c	Refactor tracing components (#834 ) * Split tracing component code into modules. * Repatriate Tracing and simplify config handling. We upstreamed our Tracing component, expecting not to have to exert fine control over the tracing settings. But this turned out not to be the case, and now that we want to do other things (flamegraphs, journalctl, opentelemetry, etc), we end up with really awkward code (as in the current flamegraph handling). This also makes use of the changes to `init()` to load the config early to pass configuration data into the components, which avoids the need for the refactoring in #775. Finally, we restore support for the `-v` flag when the filter is unset. Closes #831. * Disable tracing and metrics endpoints by default. Closes #660. * Switch back to upstream Abscissa. * Integrate flamegraph support into the new Tracing component. * Pass -v in acceptance tests to get info-level output. * Clean up acceptance test code.	2020-08-06 10:29:31 -07:00
Jane Lusby	867dd0b475	Setup tracing-flame for use profiling zebrad (#436 ) * Setup tracing-flame for use profiling zebrad * start work on conditional flamegraph generation * review time! * update comments * Update Cargo.toml * disable default features for inferno * reorganize * missing one trait * Apply suggestions from code review * graceful shutdown! * remove special case handling on ctrlc for cleanup * rename signal fn to better represent its responsibility * remove unused global hook for flushing flamegraph * move tracing logic to the right file * just copy linkerd's signal handling logic * update book * make zebrad app drop on shutdown normally * Update zebrad/src/components/tokio.rs Co-authored-by: teor <teor@riseup.net> * Update zebrad/src/application.rs Co-authored-by: teor <teor@riseup.net> * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * cleanup a little * ooh yea there's an API for that * setup env-filter for backup subscriber * document env filter * document return codes * forgot to save * Update book/src/applications/zebrad.md Co-authored-by: teor <teor@riseup.net> Co-authored-by: teor <teor@riseup.net>	2020-08-05 16:35:56 -07:00
Henry de Valence	4a03d76a41	Remove environment variables in favor of documented config options. (#827 ) * Load tracing filter only from config and simplify logic. * Configure the state storage in the config, not an environment variable. This also changes the config so that the path is always set rather than being optional, because Zebra always needs a place to store its config.	2020-08-05 11:48:08 -07:00
Henry de Valence	82da4a5326	Remove connect command.	2020-08-04 23:34:45 -07:00
Alfredo Garcia	e037466e26	Acceptance tests - check kill signal (#814 ) * check kill signal exit code * change names and add docs * change exit_status() to was_killed() * change assert calls	2020-08-04 13:38:39 -07:00
dependabot[bot]	8e268150a7	build(deps): bump tracing-subscriber from 0.2.9 to 0.2.10 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.9 to 0.2.10. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.9...tracing-subscriber-0.2.10) Signed-off-by: dependabot[bot] <support@github.com>	2020-08-03 21:11:50 -04:00
Alfredo Garcia	5f23970377	move env variable creation to test_cmd	2020-08-03 15:50:48 -04:00
Alfredo Garcia	2dacd0a62b	change default state path	2020-08-03 15:50:48 -04:00
Alfredo Garcia	f2d7bb3177	Command execution tests (#690 ) * add zebrad acceptance tests * add custom command test helpers that work with kill * add and use info event for start and seed commands * combine conflicting tests into one test case Co-authored-by: Jane Lusby <jane@zfnd.org>	2020-08-01 16:15:26 +10:00
Alfredo Garcia	617f1d80ef	move docs to zebra book	2020-07-29 19:44:21 -07:00
Alfredo Garcia	6297a7cd19	document zebrad enviroment variables	2020-07-29 19:44:21 -07:00
teor	050c46388f	fix: Open the endpoints after the config is loaded We get the injected TokioComponent dependency before the config is loaded, so we can't use it to open the endpoints. And we can't define after_config, because we use derive(Component). So we work around these issues by opening the endpoints manually, from the application's after_config.	2020-07-29 16:03:52 +10:00
teor	e7437cc551	feature: Get endpoint addresses from config	2020-07-29 16:03:52 +10:00
teor	11090dbf91	feature: Separate Mainnet and Testnet state	2020-07-29 01:45:19 -04:00
Alfredo Garcia	5b3c6e4c6c	Port bash checkpoint scripts to zebra-checkpoints single rust binary (#740 ) * make zebra-checkpoints * fix LOOKAHEAD_LIMIT scope * add a default cli path * change doc usage text * add tracing * move MAX_CHECKPOINT_HEIGHT_GAP to zebra-consensus * do byte_reverse_hex in a map	2020-07-25 17:53:00 +10:00
Henry de Valence	b59cfc49b7	sync: create requests sequentially to respect backpressure. This seems like a better design on principle but also appears to give a much nicer sawtooth pattern of queued blocks in the checkpointer and a much smoother pattern of block requests.	2020-07-24 18:36:00 -04:00
Henry de Valence	4aa00ad216	Align crate versions and user-agent with NU numbers. We had a brief discussion on discord and it seemed like we had consensus on the following versioning policy: * zebrad: match major version to NU version, so we will start by releasing zebrad 3.0.0; * zebra-* libraries: start by matching zebrad's version, then increment major versions of each library as we need to make breaking changes (potentially faster than the zebrad version, always respecting semver but making no guarantees about the longevity of major releases). This commit sets all of the crate versions to 3.0.0-alpha.0 -- the -alpha.0 marks it as a prerelease not subject to perfect adherence to compatibility guarantees.	2020-07-24 11:46:37 -07:00
dependabot[bot]	f7c59c99b5	build(deps): bump tracing-subscriber from 0.2.8 to 0.2.9 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.8 to 0.2.9. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.8...tracing-subscriber-0.2.9) Signed-off-by: dependabot[bot] <support@github.com>	2020-07-24 14:31:44 -04:00
teor	2acfcf3a90	Make the CheckpointVerifier handle partial restarts (#736 ) Also put generic bounds on the BlockVerifier struct, so we get better compilation errors.	2020-07-24 11:47:48 +10:00
teor	77a1fefa1e	Download genesis (#731 ) * feature: Add more CheckpointVerifier tracing * fix: Download the genesis block	2020-07-23 10:56:52 -07:00
Jane Lusby	c1a1493159	use dirs crate for default location of state and config (#714 ) * use dirs crate for default location of state and config * panic if a path isn't specified for zebra-state	2020-07-23 21:12:20 +10:00
teor	c95c825707	fix: Lookup the genesis hash based on the network	2020-07-23 03:46:24 -04:00
Henry de Valence	4a98b8fa0d	Add basic metrics to the syncer.	2020-07-22 21:59:00 -07:00
Henry de Valence	c2c2a28e8b	Improve tracing output in chain verifier	2020-07-22 21:59:00 -07:00
Jane Lusby	7d4e717182	Add block locator request to state layer (#712 ) * Add block locator request to state layer * pass genesis in request * Update zebrad/src/commands/start/sync.rs * fix errors	2020-07-22 18:01:31 -07:00
Henry de Valence	49aa41544d	sync: try to ignore spurious inv messages. Closes #697. per https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971 The response to a getblocks message is an inv message with the hashes of the following blocks. However, inv messages are also sent unsolicited to gossip new blocks across the network. Normally, this wouldn't be a problem, because for every other request we filter only for the messages that are relevant to us. But because the response to a getblocks message is an inv, the network layer doesn't (and can't) distinguish between the response inv and the unsolicited inv. But there is a mitigation we can do. In our sync algorithm we have two phases: (1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call "ExtendTips" to extend those as far as possible. The unsolicited inv messages have length 1, but when extending tips we expect to get more than one hash. So we could reject responses in ExtendTips that have length 1 in order to ignore these messages. This way we automatically ignore gossip messages during initial block sync (while we're extending a tip) but we don't ignore length-1 responses while trying to obtain tips (while querying the network for new tips).	2020-07-22 17:55:52 -07:00
teor	9b97ebbd61	feature: Choose checkpoints based on the config	2020-07-23 10:26:25 +10:00
teor	3d721a96a5	feature: Add the state config to the config file	2020-07-23 10:26:25 +10:00
teor	89ac2793d6	feature: Use ChainVerifier in the sync service	2020-07-23 10:26:25 +10:00
Jane Lusby	a722cf33f7	enable new tracing instrumentation in tokio	2020-07-22 14:39:54 -04:00
Henry de Valence	928b0beb5d	sync: unindent fetch task	2020-07-21 20:16:23 -07:00
Henry de Valence	b722818e02	sync: remove redundant tracing specifier Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-07-21 20:16:23 -07:00
Henry de Valence	1047d2f690	sync: add backpressure to syncer Closes #617. Closes #698. The remaining work on the syncer is alluded to in a new comment: 1. Correctly constructing a block locator object 2. Detecting when we've stopped making progress syncing and restarting obtain_tips.	2020-07-21 20:16:23 -07:00
Alfredo Garcia	db2eb80b3e	Create consensus utils and move byte_reverse_hex function to it (#705 ) * move byte_reverse_hex function	2020-07-22 12:29:14 +10:00
teor	e5bb96715f	fix: Reduce sync error logs to info or warn Network issues are very common.	2020-07-21 10:13:03 -07:00
teor	a0dbe85acd	fix: Rewrite the config usage comment	2020-07-21 12:58:55 -04:00

1 2 3 4 5 ...

388 Commits