zebra

Commit Graph

Author	SHA1	Message	Date
Deirdre Connolly	4a67e0e7bb	Enable stateful/long sync tests by features, mount rocksdb-based state at Sapling activation for sync_past_sapling_mainnet test	2020-11-24 11:04:30 -05:00
Deirdre Connolly	d813603bac	Remove defunct memory_cache_bytes from test config	2020-11-24 11:04:30 -05:00
Jane Lusby	c2a57d7e49	slight comment tweek	2020-11-24 11:04:30 -05:00
Jane Lusby	99c5acc94f	rename test fn	2020-11-24 11:04:30 -05:00
Jane Lusby	602d8c4898	document tests	2020-11-24 11:04:30 -05:00
Jane Lusby	17fdbe941b	fix stdout issue with test framework for cached data tests	2020-11-24 11:04:30 -05:00
Jane Lusby	0f51891359	revert unnecessary change in sync_until	2020-11-24 11:04:30 -05:00
Jane Lusby	4bfe747f34	update acceptance tests	2020-11-24 11:04:30 -05:00
Jane Lusby	d093b4e528	Add network integration test for quick post sapling sync testing	2020-11-24 11:04:30 -05:00
dependabot[bot]	a4af90c2b0	build(deps): bump color-eyre from 0.5.7 to 0.5.8 Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.7 to 0.5.8. - [Release notes](https://github.com/yaahc/color-eyre/releases) - [Changelog](https://github.com/yaahc/color-eyre/blob/master/CHANGELOG.md) - [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.7...v0.5.8) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-24 09:59:22 -05:00
Henry de Valence	2a4a89c002	state,zebrad: tidy span levels for good INFO output This provides useful and not too noisy output at INFO level. We do an info-level message on every block commit instead of trying to do one message every N blocks, because this is useful both for initial block sync as well as continuous state updates on new blocks.	2020-11-23 14:16:39 +10:00
Henry de Valence	f0810b028d	state,consensus,sync: shorten span lengths These changes help reduce the size of the resulting spans, making the output more compact. Together they save about 30-40 characters.	2020-11-23 14:16:39 +10:00
teor	d4da9609ee	Update the max_concurrent_block_requests docs In #1298, we decreased `max_concurrent_block_requests`, but forgot to update the docs.	2020-11-20 10:08:57 -08:00
Henry de Valence	ba3c19142c	deps: update hyper, metrics to tokio 0.3 The metrics code becomes much simpler because the current version of the metrics crate builds its own single-threaded runtime on a dedicated worker thread, so no dependency on the main Zebra Tokio runtime is required.	2020-11-20 10:08:16 -08:00
Henry de Valence	add94c1c45	deps: move to tokio 0.3, tower 0.4 This change is mostly mechanical, with the exception of the changes to the `tower-batch` middleware. This middleware was adapted from `tower::buffer`, and the `tower::buffer` code was changed to implement its own bounded queue, because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See `ddc64e8d4d` for more context on the Tower changes. To match Tower as closely as possible in order to be able to upstream `tower-batch`, those changes are copied from `tower::Buffer` to `tower-batch`.	2020-11-20 10:08:16 -08:00
Jane Lusby	4c9bb87df2	zebra-state: replace sled with rocksdb (#1325 ) ## Motivation Prior to this PR we've been using `sled` as our database for storing persistent chain data on the disk between boots. We picked sled over rocksdb to minimize our c++ dependencies despite it being a less mature codebase. The theory was if it worked well enough we'd prefer to have a pure rust codebase, but if we ever ran into problems we knew we could easily swap it out with rocksdb. Well, we ran into problems. Sled's memory usage was particularly high, and it seemed to be leaking memory. On top of all that, the performance for writes was pretty poor, causing us to become bottle-necked on sled instead of the network. ## Solution This PR replaces `sled` with `rocksdb`. We've seen a 10x improvement in memory usage out of the box, no more leaking, and much better write performance. With this change writing chain data to disk is no longer a limiting factor in how quickly we can sync the chain. The code in this pull request has: - [x] Documentation Comments - [x] Unit Tests and Property Tests ## Review @hdevalence	2020-11-18 18:05:06 -08:00
Henry de Valence	4953f21670	fixup! zebrad: hack to skip alreadyverified errors	2020-11-18 03:09:06 -05:00
Henry de Valence	d2fc01755b	zebrad: more reasonable concurrent block limit This helps prevent overloading the network with too many concurrent block requests. On a fast network, we're likely to still have enough room to saturate our bandwidth. In the worst case, with 2MB blocks, downloading 50 blocks concurrently is 100MB of queued downloads. If we need to download this in 20 seconds to avoid peer connection timeouts, the implied worst-case minimum speed is 5MB/s. In practice, this minimum speed will likely be much lower.	2020-11-17 14:56:27 -08:00
Henry de Valence	aa7538ab15	zebrad: hack to skip alreadyverified errors	2020-11-17 14:56:27 -08:00
Henry de Valence	e55392b61e	zebrad: explicitly select the threaded scheduler.	2020-11-17 14:56:27 -08:00
Henry de Valence	6de824bd99	zebrad: remove block verification timeout Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.	2020-11-17 14:56:27 -08:00
Henry de Valence	e9c847bbd7	zebrad: avoid a borrow in the ChainSync future	2020-11-17 14:56:27 -08:00
Henry de Valence	b632a24436	zebrad: add diagnostics on cancelled download tasks	2020-11-17 14:56:27 -08:00
Henry de Valence	ec411574ee	zebrad: improve sync diagnostics	2020-11-17 14:56:27 -08:00
teor	54cb9277ef	Allow some new clippy nightly lints	2020-11-17 10:07:37 +10:00
dependabot[bot]	8c5f6d0177	build(deps): bump once_cell from 1.5.1 to 1.5.2 Bumps [once_cell](https://github.com/matklad/once_cell) from 1.5.1 to 1.5.2. - [Release notes](https://github.com/matklad/once_cell/releases) - [Changelog](https://github.com/matklad/once_cell/blob/master/CHANGELOG.md) - [Commits](https://github.com/matklad/once_cell/compare/v1.5.1...v1.5.2) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-13 14:48:11 -05:00
Jane Lusby	7c0275ac0b	reorganize stop check (#1288 ) * reorganize stop check * remove unused enum * move out and make it unique Co-authored-by: teor <teor@riseup.net>	2020-11-13 11:37:52 +10:00
Henry de Valence	e0c92167bc	Revert "Hedge every syncer block download request" This reverts commit `656bd24ba7`. The Hedge middleware keeps a pair of histograms, writing into one in the current time interval and reading from the previous time interval's data. This means that the reverted change resulted in doubling all block downloads until after at least the second measurement interval (which means that the time measurements are also incorrect, as they're operating under double the network load...)	2020-11-12 16:45:47 -05:00
Alfredo Garcia	128643d81e	Call `zebra_test::init` where needed. (#1227 ) * Add missing `zebra_test::init()` to zebra-chain * Add missing `zebra_test::init()` to zebra-consensus * Add missing `zebra_test::init()` to zebra-network * Add missing `zebra_test::init()` to zebra-state * Add missing `zebra_test::init()` to zebra-test * Add missing `zebra_test::init()` to zebrad	2020-11-10 10:29:25 +10:00
teor	efef2a2bd7	Reduce acceptance test sled memory usage (#1236 ) * Use the default memory limit in the acceptance tests PR #1233 changed the default `memory_cache_bytes`, but left the acceptance tests with their old value.	2020-11-10 07:42:30 +10:00
dependabot[bot]	a58299a0f0	build(deps): bump color-eyre from 0.5.6 to 0.5.7 Bumps [color-eyre](https://github.com/yaahc/color-eyre) from 0.5.6 to 0.5.7. - [Release notes](https://github.com/yaahc/color-eyre/releases) - [Changelog](https://github.com/yaahc/color-eyre/blob/master/CHANGELOG.md) - [Commits](https://github.com/yaahc/color-eyre/compare/v0.5.6...v0.5.7) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-09 08:40:55 -05:00
dependabot[bot]	1e3cf6dc5c	build(deps): bump tracing-subscriber from 0.2.14 to 0.2.15 Bumps [tracing-subscriber](https://github.com/tokio-rs/tracing) from 0.2.14 to 0.2.15. - [Release notes](https://github.com/tokio-rs/tracing/releases) - [Commits](https://github.com/tokio-rs/tracing/compare/tracing-subscriber-0.2.14...tracing-subscriber-0.2.15) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-04 20:37:40 -05:00
dependabot[bot]	785fc30481	build(deps): bump hyper from 0.13.8 to 0.13.9 Bumps [hyper](https://github.com/hyperium/hyper) from 0.13.8 to 0.13.9. - [Release notes](https://github.com/hyperium/hyper/releases) - [Changelog](https://github.com/hyperium/hyper/blob/master/CHANGELOG.md) - [Commits](https://github.com/hyperium/hyper/compare/v0.13.8...v0.13.9) Signed-off-by: dependabot[bot] <support@github.com>	2020-11-04 20:07:18 -05:00
Henry de Valence	0ad648fb6a	zebrad: make lookahead limit configurable. Sets the default value to the previous lookahead limit. My testing on mainnet suggested that the newly lower value (changed when the checkpoint frequency was decreased) is low enough to cause stalls, even when using hedged requests.	2020-11-01 10:47:46 -08:00
teor	92c623eddf	Log each genesis download This change helps us diagnose sync hangs.	2020-10-28 11:31:04 -04:00
teor	656bd24ba7	Hedge every syncer block download request Remove the minimum data points from the syncer hedge configuragtion. When there are no data points, hedge sends the second request immediately. Where there are less than 1/(1-latency_percentile) data points (20), hedge delays the second request by the highest recent download time. This change should improve genesis and post-restart sync latency.	2020-10-28 11:31:04 -04:00
teor	ea510b7d41	Run a block sync in CI with 2 large checkpoints (#1193 ) * Run large checkpoint sync tests in CI * Improve test child output match error context * Add a debug_stop_at_height config * Use stop at height in acceptance tests And add some restart acceptance tests, to make sure the stop at height feature works correctly.	2020-10-27 19:25:29 +10:00
Henry de Valence	4c960c4e6d	zebrad: treat duplicate downloads as an error We should error if we notice that we're attempting to download the same blocks multiple times, because that indicates that peers reported bad information to us, or we got confused trying to interpret their responses.	2020-10-26 12:05:35 -07:00
Henry de Valence	4127d086ea	zebrad: clarify hedge layering motivation Co-authored-by: teor <teor@riseup.net>	2020-10-26 12:05:35 -07:00
Henry de Valence	253bab042e	sync: add a concurrency limit for block downloads	2020-10-26 12:05:35 -07:00
Henry de Valence	0a405c737d	zebrad: check state in obtaintips, not extendtips. The original sync algorithm split the sync process into two phases, one that obtained prospective chain tips, and another that attempted to extend those chain tips as far as possible until encountering an error (at which point the prospective state is discarded and the process restarts). Because a previous implementation of this algorithm didn't properly enforce linkage between segments of the chain while extending tips, sometimes it would get confused and fail to discard responses that did not extend a tip. To mitigate this, a check against the state was added. However, this check can cause stalls while checkpointing, because when a checkpoint is reached we may suddenly need to commit thousands of blocks to the state. Because the sync algorithm now has a a `CheckedTip` structure that ensures that a new segment of hashes actually extends an existing one, we don't need to check against the state while extending a tip, because we don't get confused while interpreting responses. This change results in significantly smoother progress on mainnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	65e0c22fbe	state: don't pre-buffer the service There's no reason to return a pre-Buffer'd service (there's no need for internal access to the state service, as in zebra-network), but wrapping it internally removes control of the buffer size from the caller.	2020-10-26 12:05:35 -07:00
Henry de Valence	ce2ac3336f	zebrad: add debug message before state check This reveals that there may be contention in access to the state, as this takes a long time.	2020-10-26 12:05:35 -07:00
Henry de Valence	91469faf3c	zebrad: eliminate duplicate span in sync	2020-10-26 12:05:35 -07:00
Henry de Valence	b5a43f4516	zebrad: remove implementation details from docs The timeout behavior in zebra-network is an implementation detail, not a feature of the public API. So it shouldn't be mentioned in the doc comments -- if we want timeout behavior, we have to layer it ourselves.	2020-10-26 12:05:35 -07:00
Henry de Valence	1d7309afe2	zebrad: correctly handle duplicates in DownloadSet Using the cancel_handles, we can deduplicate requests. This is important to do, because otherwise when we insert the second cancel handle, we'd drop the first one, cancelling an existing task for no reason.	2020-10-26 12:05:35 -07:00
Henry de Valence	56fe4f4379	zebrad: unify sync restart logic This lets us keep the main loop simple and just write `continue 'sync;` to keep going.	2020-10-26 12:05:35 -07:00
Henry de Valence	12d25159c6	zebrad: use hedged requests in sync The hedge middleware implements hedged requests, as described in _The Tail At Scale_. The idea is that we auto-tune our retry logic according to the actual network conditions, pre-emptively retrying requests that exceed some latency percentile. This would hopefully solve the problem where our timeouts are too long on mainnet and too slow on testnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	5f229d1475	zebrad: use Downloads in sync Try to use the better cancellation logic to revert to previous sync algorithm. As designed, the sync algorithm is supposed to proceed by downloading state prospectively and handle errors by flushing the pipeline and starting over. This hasn't worked well, because we didn't previously cancel tasks properly. Now that we can, try to use something in the spirit of the original sync algorithm.	2020-10-26 12:05:35 -07:00
Henry de Valence	b90581a3d7	zebrad: create a Downloads Stream for syncing. This makes two changes relative to the existing download code: 1. It uses a oneshot to attempt to cancel the download task after it has started; 2. It encapsulates the download creation and cancellation logic into a Downloads struct.	2020-10-26 12:05:35 -07:00

1 2 3 4 5 ...

305 Commits