* add valid generated config test
* change to pathbuf
* use -c to make sure we are using the generated file
* add and use a ZebraTestDir type
* change approach to generate tempdir in top of each test
* pass tempdir to test_cmd and set current dir to it
* add and use a `generated_config_path` variable in tests
* checkpoint: reject older of duplicate verification requests.
If we get a duplicate block verification request, we should drop the older one
in favor of the newer one, because the older request is likely to have been
canceled. Previously, this code would accept up to four duplicate verification
requests, then fail all subsequent ones.
* sync: add a timeout layer to block requests.
Note that if this timeout is too short, we'll bring down the peer set in a
retry storm.
* sync: restart syncing on error
Restart the syncing process when an error occurs, rather than ignoring it.
Restarting means we discard all tips and start over with a new block locator,
so we can have another chance to "unstuck" ourselves.
* sync: additional debug info
* sync: handle lookahead limit correctly.
Instead of extracting all the completed task results, the previous code pulled
results out until there were fewer tasks than the lookahead limit, then
stopped. This meant that completed tasks could be left until the limit was
exceeded again. Instead, extract all completed results, and use the number of
pending tasks to decide whether to extend the tip or wait for blocks to finish.
* network: add debug instrumentation to retry policy
* sync: instrument the spawned task
* sync: streamline ObtainTips/ExtendTips logic & tracing
This change does three things:
1. It aligns the implementation of ObtainTips and ExtendTips so that they use
the same deduplication method. This means that when debugging we only have one
deduplication algorithm to focus on.
2. It streamlines the tracing output to not include information already
included in spans. Both obtain_tips and extend_tips have their own spans
attached to the events, so it's not necessary to add Scope: prefixes in
messages.
3. It changes the messages to be focused on reporting the actual
events rather than the interpretation of the events (e.g., "got genesis hash in
response" rather than "peer could not extend tip"). The motivation for this
change is that when debugging, the interpretation of events is already known to
be incorrect, in the sense that the mental model of the code (no bug) does not
match its behavior (has bug), so presenting minimally-interpreted events forces
interpretation relative to the actual code.
* sync: hack to work around zcashd behavior
* sync: localize debug statement in extend_tips
* sync: change algorithm to define tips as pairs of hashes.
This is different enough from the existing description that its comments no
longer apply, so I removed them. A further chunk of work is to change the sync
RFC to document this algorithm.
* sync: reduce block timeout
* state: add resource limits for sled
Closes#888
* sync: add a restart timeout constant
* sync: de-pub constants
* change several tests to transcript in consensus chain tests
* rename transcripts
* rename state transcript
* fix spandocs
* add timeout layer to tests
* run transcripts on the wrapped timeout service, remove ready calls
This changes the `light_client_root_hash` field to `light_client_root_bytes` to
hint that it's unparsed, and makes it public to match the rest of the
`BlockHeader` fields. The `LightClientRootHash` serialization methods are
hidden from the public API, so that the `LightClientRootHash` has to be
constructed by the method on the `Block`.
* network: move gossiped peer selection logic into address book.
* network: return BoxService from init.
* zebrad: add note on why we truncate thegossiped peer list
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* Remove unused .rustfmt.toml
Many of these options are never actually loaded by our CI because of a channel
mismatch, where they're not applied on stable but only on nightly (see the logs
from a rustfmt job). This means that we can get different settings when
running `cargo fmt` on the nightly and stable channels, which was causing a CI
failure on this PR. Reverting back to the default rustfmt settings avoids this
problem and keeps us in line with upstream rustfmt. There's no loss to us
since we were using the defaults anyways.
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Handle known ObtainTips correctly
enumerate never returns a value beyond the end of the vector.
* fix: Ignore known tips in ExtendTips
Some peers send us known tips when we try to extend.
* fix: Ignore known hashes when downloading
Despite all our other checks, we still end up downloading some hashes
multiple times.
* fix: Increase the number of retries
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
* fix: Tweak comments
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* fix: Cleanup the state_contains interface in Sync
* Fix brackets
Oops
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
* make sure no info is printed in non server tests
* check exact full output for validity instead of log msgs
* add end of output character to version regex
* use coercions, use equality operator
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
The old sync code relied on duplicate block fetches to make progress,
but the last few commits have removed some of those duplicates.
Instead, just retry the fetches that fail.
Instead of creating a block locator all the way back to the genesis
block, only ask for blocks within the reorg limit (99 blocks).
Use the reorg limit as the final locator. (Or if the chain is less
than 99 blocks, use the genesis block.)
Fixes some instances of #818 at very small block heights.
The state service was providing block locators starting at the parent of
the current tip. Instead, include the current tip in the block locator.
Also handle an edge case where we could include the genesis block twice,
if the current tip height was a power of two.
Fixes an instance of #818 where we re-download the current tip.
* use the right variant in LightClientRootHash::from_bytes()
* make block.header.light_client_root_hash pub(super)
* add tests for LightClientRootHash and block.light_client_root_hash
We have a counter for pending "download and verify" futures. But these
futures are spawned, so they can complete in any order. They can also
complete before we receive their results.
Log a message with the height when we get duplicate blocks.
Downgrade some verifier errors and warnings to info and debug, because
some peers on mainnet consistently provide bad blocks.
Reduce the amount of time that the block verifier waits for the previous
block, before polling again.
(Waiting for 2 seconds resulted in some apparent block verifier hangs.)
This is a temporary fix, until the state layer handles context checks.
* Reorganize the book.
This PR has one unfortunate change, which is that the README.md and
CONTRIBUTING.md files in the book are symlinks to files in the parent
directory. The motivation for this is to ensure that we don't maintain two
copies of the same data, and that the landing page of the website matches the
landing page of the Github repo, etc. However, I'm not sure whether these
symlinks will work correctly on Windows.
The alternatives are:
- Duplicate the contents of the files and expect that people will know to keep
them in sync;
- Use relative links `../../README.md` in the `SUMMARY.md`. This seemed like
it caused mdbook to dump the rendered files into the repository root rather
than keeping them in the `book` directory.
- Use a symlink (chosen option). This may not work on Windows but I think that
the worst outcome would be that the book would be unbuildable unless someone
used WSL or something. This seems like the least bad option.
* Remove symlinks in favor of #include
Turns out the symlinks aren't required!
Closes#536.
This removes:
- the user-agent (we can add a mechanism to specify extra BIP14 components later, if any users ask us for that feature);
- the EWMA parameters (these were put in the config just to avoid making a choice);
- the peer connection timeout (we can change the default value if anyone ever has a problem with it);
- the peer set request buffer size (setting this too low can make the application deadlock);
The new peer interval is left in.
* Split tracing component code into modules.
* Repatriate Tracing and simplify config handling.
We upstreamed our Tracing component, expecting not to have to exert fine
control over the tracing settings. But this turned out not to be the case, and
now that we want to do other things (flamegraphs, journalctl, opentelemetry,
etc), we end up with really awkward code (as in the current flamegraph
handling).
This also makes use of the changes to `init()` to load the config early to pass
configuration data into the components, which avoids the need for the
refactoring in #775.
Finally, we restore support for the `-v` flag when the filter is unset. Closes#831.
* Disable tracing and metrics endpoints by default.
Closes#660.
* Switch back to upstream Abscissa.
* Integrate flamegraph support into the new Tracing component.
* Pass -v in acceptance tests to get info-level output.
* Clean up acceptance test code.
* Setup tracing-flame for use profiling zebrad
* start work on conditional flamegraph generation
* review time!
* update comments
* Update Cargo.toml
* disable default features for inferno
* reorganize
* missing one trait
* Apply suggestions from code review
* graceful shutdown!
* remove special case handling on ctrlc for cleanup
* rename signal fn to better represent its responsibility
* remove unused global hook for flushing flamegraph
* move tracing logic to the right file
* just copy linkerd's signal handling logic
* update book
* make zebrad app drop on shutdown normally
* Update zebrad/src/components/tokio.rs
Co-authored-by: teor <teor@riseup.net>
* Update zebrad/src/application.rs
Co-authored-by: teor <teor@riseup.net>
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* cleanup a little
* ooh yea there's an API for that
* setup env-filter for backup subscriber
* document env filter
* document return codes
* forgot to save
* Update book/src/applications/zebrad.md
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Load tracing filter only from config and simplify logic.
* Configure the state storage in the config, not an environment variable.
This also changes the config so that the path is always set rather than being
optional, because Zebra always needs a place to store its config.