zebra

Commit Graph

Author	SHA1	Message	Date
Deirdre Connolly	1b09538277	Bump versions for zebrad 1.0.0-alpha.1 (#1646 ) * Bump versions where appropriate Tested with cargo install --locked --path etc * Remove fixed panics from 'Known Issues' * Change to alpha release series in the README Co-authored-by: teor <teor@riseup.net>	2021-01-27 20:31:39 -05:00
teor	391c53aa60	Move BoxError to zebrad's lib.rs For consistency with other crates.	2021-01-27 12:14:27 -08:00
teor	9cdf41f5f4	Panic if the lookahead limit is misconfigured (#1589 )	2021-01-14 14:06:30 +10:00
teor	92d95d4be5	Refactor inbound members into a consistent order And add download comments	2021-01-13 20:46:25 -05:00
teor	fb76eb2e6b	Add download and verify timeouts to the inbound service	2021-01-13 20:46:25 -05:00
teor	973aec8ccc	Refactor sync members into a consistent order And add comments about correctness and usage.	2021-01-13 20:46:25 -05:00
teor	c2893dce51	Warn when the user's configured lookahead limit is ignored	2021-01-13 20:46:25 -05:00
teor	3699bbdae6	Add some additional sync correctness constraints And adjust the sync restart delay as a consequence.	2021-01-13 20:46:25 -05:00
teor	cef0a492d8	Add a timeout to sync service block verification This timeout stops the sync service hanging when it is missing required blocks, but the lookahead queue is full of dependent verify tasks, so the missing blocks never get downloaded.	2021-01-13 20:46:25 -05:00
teor	b1f14f47c6	Rewrite GetData handling to match the zcashd implementation (#1518 ) * Rewrite GetData handling to match the zcashd implementation `zcashd` silently ignores missing blocks, but sends found transactions followed by a `NotFound` message: `e7b425298f/src/main.cpp (L5497)` This is significantly different to the behaviour expected by the old Zebra connection state machine, which expected `NotFound` for blocks. Also change Zebra's GetData responses to peer request so they ignore missing blocks. * Stop hanging on incomplete transaction or block responses Instead, if the peer sends an unexpected block, unexpected transaction, or NotFound message: 1. end the request, and return a partial response containing any items that were successfully received 2. if none of the expected blocks or transactions were received, return an error, and close the connection	2021-01-04 13:25:35 +10:00
teor	69fcf64d6c	Disable issue URLs for "duplicate hash" errors (#1517 ) In our README, we tell users to ignore these errors, so we should also disable the issue URL. Also include the hash in the error. (We don't want the span active for all messages, we just want the hash in the error.)	2020-12-16 08:14:42 +10:00
Alfredo Garcia	41833340c1	downgrade remaining version strings to 1.0.0-alpha.0 (#1488 )	2020-12-15 11:21:00 +10:00
Deirdre Connolly	44e1051dee	Debug	2020-12-09 13:06:18 -05:00
Deirdre Connolly	25f6fd25b3	Test catching panic	2020-12-09 13:06:18 -05:00
teor	97d1a81b7c	Automatically disable colors when tracing to a file	2020-12-02 10:25:44 -08:00
Jane Lusby	fceef849cf	remove unused mutability to defuse deadlock	2020-12-01 11:03:13 -05:00
Henry de Valence	1df9284444	zebrad: add a use_color option to the tracing config. This is useful for creating searchable logs without having to filter color codes after the fact.	2020-11-30 15:25:50 -08:00
Henry de Valence	e8c16b172f	zebrad: pass TracingSection to Tracing component	2020-11-30 15:25:50 -08:00
Alfredo Garcia	4544463059	Inbound `FindBlocks` and `FindHeaders` (#1347 ) * implement inbound `FindBlocks` * Handle inbound peer FindHeaders requests * handle request before having any chain tip * Split `find_chain_hashes` into smaller functions Add a `max_len` argument to support `FindHeaders` requests. Rewrite the hash collection code to use heights, so we can handle the `stop` hash and "no intersection" cases correctly. * Split state height functions into "any chain" and "best chain" * Rename the best chain block method to `best_block` * Move fmt utilities to zebra_chain::fmt * Summarise Debug for some Message variants Co-authored-by: teor <teor@riseup.net> Co-authored-by: Jane Lusby <jlusby42@gmail.com>	2020-12-01 07:30:37 +10:00
Henry de Valence	fa02b266ca	clippy	2020-11-25 10:55:44 -08:00
Henry de Valence	de8415dcb1	tidy spans	2020-11-25 10:55:44 -08:00
Henry de Valence	05837797b1	tidy imports	2020-11-25 10:55:44 -08:00
Henry de Valence	77bf327b07	fix errors (2)	2020-11-25 10:55:44 -08:00
Henry de Valence	527f4d39ed	fix errors	2020-11-25 10:55:44 -08:00
Henry de Valence	e645e3bf0c	remove async	2020-11-25 10:55:44 -08:00
Henry de Valence	6569977549	test compile change	2020-11-25 10:55:44 -08:00
Alfredo Garcia	486e55104a	create Downloads for Inbound	2020-11-25 10:55:44 -08:00
Henry de Valence	2a4a89c002	state,zebrad: tidy span levels for good INFO output This provides useful and not too noisy output at INFO level. We do an info-level message on every block commit instead of trying to do one message every N blocks, because this is useful both for initial block sync as well as continuous state updates on new blocks.	2020-11-23 14:16:39 +10:00
Henry de Valence	f0810b028d	state,consensus,sync: shorten span lengths These changes help reduce the size of the resulting spans, making the output more compact. Together they save about 30-40 characters.	2020-11-23 14:16:39 +10:00
Henry de Valence	ba3c19142c	deps: update hyper, metrics to tokio 0.3 The metrics code becomes much simpler because the current version of the metrics crate builds its own single-threaded runtime on a dedicated worker thread, so no dependency on the main Zebra Tokio runtime is required.	2020-11-20 10:08:16 -08:00
Henry de Valence	add94c1c45	deps: move to tokio 0.3, tower 0.4 This change is mostly mechanical, with the exception of the changes to the `tower-batch` middleware. This middleware was adapted from `tower::buffer`, and the `tower::buffer` code was changed to implement its own bounded queue, because Tokio 0.3 removed the `mpsc::Sender::poll_send` method. See `ddc64e8d4d` for more context on the Tower changes. To match Tower as closely as possible in order to be able to upstream `tower-batch`, those changes are copied from `tower::Buffer` to `tower-batch`.	2020-11-20 10:08:16 -08:00
Henry de Valence	4953f21670	fixup! zebrad: hack to skip alreadyverified errors	2020-11-18 03:09:06 -05:00
Henry de Valence	aa7538ab15	zebrad: hack to skip alreadyverified errors	2020-11-17 14:56:27 -08:00
Henry de Valence	e55392b61e	zebrad: explicitly select the threaded scheduler.	2020-11-17 14:56:27 -08:00
Henry de Valence	6de824bd99	zebrad: remove block verification timeout Because we set the lookahead limit to be at least twice the size of a checkpoint, we don't have a risk of timeouts.	2020-11-17 14:56:27 -08:00
Henry de Valence	e9c847bbd7	zebrad: avoid a borrow in the ChainSync future	2020-11-17 14:56:27 -08:00
Henry de Valence	b632a24436	zebrad: add diagnostics on cancelled download tasks	2020-11-17 14:56:27 -08:00
Henry de Valence	ec411574ee	zebrad: improve sync diagnostics	2020-11-17 14:56:27 -08:00
Henry de Valence	e0c92167bc	Revert "Hedge every syncer block download request" This reverts commit `656bd24ba7`. The Hedge middleware keeps a pair of histograms, writing into one in the current time interval and reading from the previous time interval's data. This means that the reverted change resulted in doubling all block downloads until after at least the second measurement interval (which means that the time measurements are also incorrect, as they're operating under double the network load...)	2020-11-12 16:45:47 -05:00
Alfredo Garcia	128643d81e	Call `zebra_test::init` where needed. (#1227 ) * Add missing `zebra_test::init()` to zebra-chain * Add missing `zebra_test::init()` to zebra-consensus * Add missing `zebra_test::init()` to zebra-network * Add missing `zebra_test::init()` to zebra-state * Add missing `zebra_test::init()` to zebra-test * Add missing `zebra_test::init()` to zebrad	2020-11-10 10:29:25 +10:00
Henry de Valence	0ad648fb6a	zebrad: make lookahead limit configurable. Sets the default value to the previous lookahead limit. My testing on mainnet suggested that the newly lower value (changed when the checkpoint frequency was decreased) is low enough to cause stalls, even when using hedged requests.	2020-11-01 10:47:46 -08:00
teor	92c623eddf	Log each genesis download This change helps us diagnose sync hangs.	2020-10-28 11:31:04 -04:00
teor	656bd24ba7	Hedge every syncer block download request Remove the minimum data points from the syncer hedge configuragtion. When there are no data points, hedge sends the second request immediately. Where there are less than 1/(1-latency_percentile) data points (20), hedge delays the second request by the highest recent download time. This change should improve genesis and post-restart sync latency.	2020-10-28 11:31:04 -04:00
Henry de Valence	4c960c4e6d	zebrad: treat duplicate downloads as an error We should error if we notice that we're attempting to download the same blocks multiple times, because that indicates that peers reported bad information to us, or we got confused trying to interpret their responses.	2020-10-26 12:05:35 -07:00
Henry de Valence	4127d086ea	zebrad: clarify hedge layering motivation Co-authored-by: teor <teor@riseup.net>	2020-10-26 12:05:35 -07:00
Henry de Valence	253bab042e	sync: add a concurrency limit for block downloads	2020-10-26 12:05:35 -07:00
Henry de Valence	0a405c737d	zebrad: check state in obtaintips, not extendtips. The original sync algorithm split the sync process into two phases, one that obtained prospective chain tips, and another that attempted to extend those chain tips as far as possible until encountering an error (at which point the prospective state is discarded and the process restarts). Because a previous implementation of this algorithm didn't properly enforce linkage between segments of the chain while extending tips, sometimes it would get confused and fail to discard responses that did not extend a tip. To mitigate this, a check against the state was added. However, this check can cause stalls while checkpointing, because when a checkpoint is reached we may suddenly need to commit thousands of blocks to the state. Because the sync algorithm now has a a `CheckedTip` structure that ensures that a new segment of hashes actually extends an existing one, we don't need to check against the state while extending a tip, because we don't get confused while interpreting responses. This change results in significantly smoother progress on mainnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	ce2ac3336f	zebrad: add debug message before state check This reveals that there may be contention in access to the state, as this takes a long time.	2020-10-26 12:05:35 -07:00
Henry de Valence	91469faf3c	zebrad: eliminate duplicate span in sync	2020-10-26 12:05:35 -07:00
Henry de Valence	b5a43f4516	zebrad: remove implementation details from docs The timeout behavior in zebra-network is an implementation detail, not a feature of the public API. So it shouldn't be mentioned in the doc comments -- if we want timeout behavior, we have to layer it ourselves.	2020-10-26 12:05:35 -07:00
Henry de Valence	1d7309afe2	zebrad: correctly handle duplicates in DownloadSet Using the cancel_handles, we can deduplicate requests. This is important to do, because otherwise when we insert the second cancel handle, we'd drop the first one, cancelling an existing task for no reason.	2020-10-26 12:05:35 -07:00
Henry de Valence	56fe4f4379	zebrad: unify sync restart logic This lets us keep the main loop simple and just write `continue 'sync;` to keep going.	2020-10-26 12:05:35 -07:00
Henry de Valence	12d25159c6	zebrad: use hedged requests in sync The hedge middleware implements hedged requests, as described in _The Tail At Scale_. The idea is that we auto-tune our retry logic according to the actual network conditions, pre-emptively retrying requests that exceed some latency percentile. This would hopefully solve the problem where our timeouts are too long on mainnet and too slow on testnet.	2020-10-26 12:05:35 -07:00
Henry de Valence	5f229d1475	zebrad: use Downloads in sync Try to use the better cancellation logic to revert to previous sync algorithm. As designed, the sync algorithm is supposed to proceed by downloading state prospectively and handle errors by flushing the pipeline and starting over. This hasn't worked well, because we didn't previously cancel tasks properly. Now that we can, try to use something in the spirit of the original sync algorithm.	2020-10-26 12:05:35 -07:00
Henry de Valence	b90581a3d7	zebrad: create a Downloads Stream for syncing. This makes two changes relative to the existing download code: 1. It uses a oneshot to attempt to cancel the download task after it has started; 2. It encapsulates the download creation and cancellation logic into a Downloads struct.	2020-10-26 12:05:35 -07:00
Henry de Valence	b636660d6a	zebrad: rename sync::Error alias to BoxError.	2020-10-26 12:05:35 -07:00
Henry de Valence	eb43893de0	consensus: minimize API, clean docs This reduces the API surface to the minimum required for functionality, and cleans up module documentation. The stub mempool module is deleted entirely, since it will need to be redone later anyways.	2020-10-20 11:16:22 -04:00
Alfredo Garcia	c0a14ecc8c	move genesis parameters to zebra-chain (#1151 )	2020-10-12 14:08:23 -07:00
Jane Lusby	855f9b5bcb	Implement MVP of NonFinalizedState and integrate it with the state service (#1101 ) * implement most of the chain functions * implement fork * fix outpoint handling in Chain struct * update expect for work * split utxo into two sets * update the Chain definition * remove allow attribute in zebra-state/lib.rs * merge ChainSet type into MemoryState * Add error messages to asserts * export proptest impls for use in downstream crates * add testjob for disabled feature in zebra-chain * try to fix github actions syntax * add module doc comment * update RFC for utxos * add missing header * working proptest for Chain * propagate back results over channel * Start updating RFC to match changes * implement queued block pruning * and now it syncs wooo! * remove empty modules * setup config for proptests * re-enable missing_docs lint * update RFC to match changes in impl * add documentation * use more explicit variable names	2020-10-08 13:07:32 +10:00
Henry de Valence	fe61090a64	zebrad: make Inbound Poll::Ready before setup. The Inbound service only needs the network setup for some requests, but it can service other requests without it. Making it return Poll::Pending until the network setup finishes means that initial network connections may view the Inbound service as overloaded and attempt to load-shed.	2020-09-21 09:26:39 -07:00
Henry de Valence	9c021025a7	network: fill in remaining request/response pairs	2020-09-20 10:21:18 -07:00
Henry de Valence	4b35fea492	zebrad: document Inbound, ChainSync responsibilities	2020-09-18 18:34:25 -07:00
Henry de Valence	65877cb4b1	zebrad: make Inbound propagate backpressure	2020-09-18 18:34:25 -07:00
Henry de Valence	55f46967b2	zebrad: serve blocks from Inbound service The original version of this commit ran into https://github.com/rust-lang/rust/issues/64552 again. Thanks to @yaahc for suggesting a workaround (using futures combinators to avoid writing an async block).	2020-09-18 18:34:25 -07:00
Henry de Valence	1d0ebf89c6	zebrad: move seed command into inbound component Remove the seed command entirely, and make the behavior it provided (responding to `Request::Peers`) part of the ordinary functioning of the start command. The new `Inbound` service should be expanded to handle all request types.	2020-09-18 18:34:25 -07:00
Jane Lusby	ca648ff27c	Enable issue-url feature in color-eyre (#1072 ) * Enable issue-url feature in color-eyre * get version automatically * and the url!	2020-09-17 15:09:18 -07:00
Henry de Valence	3133214e4f	zebrad: use new state API	2020-09-11 13:37:49 -07:00
Henry de Valence	24de90c900	zebrad: tidy sync imports	2020-09-10 09:45:52 -07:00
Henry de Valence	9b6e66c1b9	zebrad: rename Syncer to ChainSync This name clarifies what is being synced and avoids an agent-noun construction.	2020-09-10 09:45:52 -07:00
Henry de Valence	0bc79686b8	zebrad: move sync into components module. Part of #1030.	2020-09-10 09:45:52 -07:00
teor	a6d6e65940	fix: fix the flamegraph module comment	2020-09-01 11:40:18 -04:00
Alfredo Garcia	9c387521bd	Print endpoint addresses at startup (#867 ) * print tracing and metrics endpoints in startup * print network address in startup	2020-08-10 12:47:26 -07:00
Henry de Valence	a77328ad7c	Refactor tracing components (#834 ) * Split tracing component code into modules. * Repatriate Tracing and simplify config handling. We upstreamed our Tracing component, expecting not to have to exert fine control over the tracing settings. But this turned out not to be the case, and now that we want to do other things (flamegraphs, journalctl, opentelemetry, etc), we end up with really awkward code (as in the current flamegraph handling). This also makes use of the changes to `init()` to load the config early to pass configuration data into the components, which avoids the need for the refactoring in #775. Finally, we restore support for the `-v` flag when the filter is unset. Closes #831. * Disable tracing and metrics endpoints by default. Closes #660. * Switch back to upstream Abscissa. * Integrate flamegraph support into the new Tracing component. * Pass -v in acceptance tests to get info-level output. * Clean up acceptance test code.	2020-08-06 10:29:31 -07:00
Jane Lusby	867dd0b475	Setup tracing-flame for use profiling zebrad (#436 ) * Setup tracing-flame for use profiling zebrad * start work on conditional flamegraph generation * review time! * update comments * Update Cargo.toml * disable default features for inferno * reorganize * missing one trait * Apply suggestions from code review * graceful shutdown! * remove special case handling on ctrlc for cleanup * rename signal fn to better represent its responsibility * remove unused global hook for flushing flamegraph * move tracing logic to the right file * just copy linkerd's signal handling logic * update book * make zebrad app drop on shutdown normally * Update zebrad/src/components/tokio.rs Co-authored-by: teor <teor@riseup.net> * Update zebrad/src/application.rs Co-authored-by: teor <teor@riseup.net> * Apply suggestions from code review Co-authored-by: teor <teor@riseup.net> * cleanup a little * ooh yea there's an API for that * setup env-filter for backup subscriber * document env filter * document return codes * forgot to save * Update book/src/applications/zebrad.md Co-authored-by: teor <teor@riseup.net> Co-authored-by: teor <teor@riseup.net>	2020-08-05 16:35:56 -07:00
teor	050c46388f	fix: Open the endpoints after the config is loaded We get the injected TokioComponent dependency before the config is loaded, so we can't use it to open the endpoints. And we can't define after_config, because we use derive(Component). So we work around these issues by opening the endpoints manually, from the application's after_config.	2020-07-29 16:03:52 +10:00
teor	e7437cc551	feature: Get endpoint addresses from config	2020-07-29 16:03:52 +10:00
Deirdre Connolly	05316dee21	Listen on 0.0.0.0, not 127.0.0.1 Turns out when your node faces the internet directly, it has to listen to those addresses directly.	2020-06-19 03:46:09 -04:00
Henry de Valence	f98cda40f9	Remove unused import.	2020-02-21 06:48:25 -05:00
Henry de Valence	75d3d44fb3	Metrics MVP: add two metrics and export them to Prometheus. Co-authored-by: Deirdre Connolly <deirdre@zfnd.org>	2020-02-14 20:14:05 -05:00
Henry de Valence	4fcb550aa6	Fix a deadlock in TokioComponent. The components are accessed by a lock on application state. When some command calls block_on to enter an async context, it obtained a write lock on the entire application state. This meant that if the application state were accessed later in an async context, a deadlock would occur. Instead the TokioComponent holds an Option<Runtime> now, so that before calling block_on, the caller can .take() the runtime and release the lock. Since we only ever enter an async context once, it's not a problem that the component is then missing its runtime, as once we are inside of a task we can access the runtime.	2020-01-15 12:06:31 -08:00
Henry de Valence	ab3db201ee	Change TracingEndpoint to forward to the Abscissa Tracing component.	2020-01-15 12:06:31 -08:00
Tony Arcieri	45eb81a204	Upgrade to Abscissa v0.5	2020-01-15 12:06:31 -08:00
Henry de Valence	2965187b91	Upgrade tokio, futures, hyper to released versions.	2019-12-13 17:42:15 -05:00
Henry de Valence	79c36a979c	Use try_bind when building tracing endpoint. Prior to this commit, the tracing endpoint would attempt to bind the given address or panic; now, if it is unable to bind the given address it displays an error but continues running the rest of the application. This means that we can spin up multiple Zebra instances for load testing.	2019-09-30 21:30:36 -04:00
Henry de Valence	c8a3d47b56	Use tracing::instrument and monitor for messages.	2019-09-23 22:17:12 -04:00
Henry de Valence	df7801d623	Temporarily change hyper to git version. This avoids some crate selection conflicts, but makes some futures extension traits fall out of order? This seems to be an issue with `pin-project` resolved in the git branch of `hyper` (but not yet released).	2019-09-22 17:27:08 -04:00
Henry de Valence	a64a051276	Clean tracing_subscriber deprecation warnings.	2019-09-20 16:02:55 -04:00
Henry de Valence	e0cd099487	Fix type with updated tracing-subscriber An updated tracing-subscriber version changed one of the public types; because we hardcode the type instead of being generic over S: Subscriber, this was actually a breaking change. As noted in the comment adjacent to this line, we would rather be generic over S, but this requires fixing a bug in abscissa's proc-macros, so in the meantime we hardcode the type.	2019-09-18 17:32:06 -04:00
Deirdre Connolly	162b37fe8d	Tracing endpoint (#3 ) * Add a TracingConfig and some components Co-authored-by: Deirdre Connolly <deirdre@zfnd.org> * Restructure, use dependency injection, initialize tracing * Start a placeholder loop in start command * Add hyper alpha.1, bump tokio to alpha.4 * Hello world endpoint using async/await from hyper 0.13 alpha Also cleaned up some linter messages. Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca> * Update to tracing_subscriber 0.1 * fmt * add rust-toolchain * Remove hyper::Version import * wip: start filter_handler impl * Add .rustfmt.toml * rustfmt * Tidy up .rustfmt.toml * Add filter reloading handling. * bump toolchain * Remove generated hello world acceptance tests. These test the behaviour of the autogenerated binary and work as examples of how to test the behaviour of abscissa binaries. Since we don't print "Hello World" any more, they fail, but we don't yet have replacement behaviour to add tests for, so they're removed for now. * Clean up config file handling with Option::and_then.	2019-09-09 13:05:42 -07:00

1 2 3

139 Commits