* refactor(state): split the disk_format module
* refactor(ci): add the new disk_db file to the state CI list
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* change `anchorSapling` type
* implement PartialEq manually for clippy
* use `unique_by` in place of `sorted`
* replace panic with new error
* improve some serialize/deserialize calls for sapling anchors
* fix arbitrary for sapling::tree::Root
* remove dedup()
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* doc(README): remove completed Zebra goals
* doc(README): docker now uses bullseye
* doc(README): clarify and expand disk requirements
* doc(README): add network latency requirement
Also note extra network usage after database format changes.
* doc(run): de-duplicate README info
* doc(run): speed up Zebra's performance
* Allow forcing colored output in `zebrad`
Add a configuration item that allows forcing Zebra to output in color
mode even if the output device is not a terminal.
* Allow enabling colored output from Zebra in tests
Force Zebrad instances to use colored output if the
`ZEBRA_FORCE_USE_COLOR` environment variable is set.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(ci): clarify ignored test name
`--include-ignored` runs all tests, including tests
that would normally be ignored.
`-Zunstable-options` enables all unstable options,
but it doesn't do anything by itself.
There is a lot of overlap with "test-all" in this job,
which we might want to fix in a future PR.
* fix(ci): remove unused -Zunstable-options
`--include-ignored` is now stable, so `unstable-options` is not needed.
* fix(test): delete a redundant test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(test): use the short SHA from actual run if valid
* fix(test): if condition must evaluate to a single false
* fix(test): do not run logs and upload if not needed
* imp(test): allow test stateful sync after disk regeneration
This takes is fast enough, so it shouldn't do any harm if run just after a ~3 hours test
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Dependabot creates branches with versions using a dot notation, and some tests fails because of this
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* refactor (cd): overall pipeline improvement
- Use a more ENV configurable Dockerfile
- Remove cloudbuild dependency
- Use compute optimized machine types
- Use SSD instead of normal hard drives
- Move Sentry endpoint to secrets
- Use a single yml for auto & manual deploy
- Migrate to Google Artifact Registry
* refactor (cd): overall pipeline improvement
- Use a more ENV configurable Dockerfile
- Remove cloudbuild dependency
- Use compute optimized machine types
- Use SSD instead of normal hard drives
- Move Sentry endpoint to secrets
- Use a single yml for auto & manual deploy
- Migrate to Google Artifact Registry
* refactor (cd): use newer google auth action
* fix (cd): use newer secret as gcp credential
* fix (docker): do not create extra directories
* fix (docker): ignore .github for caching purposes
* fix (docker): use latest rust
* fix (cd): bump build timeout
* fix: use a better name for manual deployment
* refactor (docker): use standard directories for executable
* fix (cd): most systems expect a "latest" tag
Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used.
* fix (cd): push the build image and the cache separately
The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter.
This also allows for smaller release images.
* fix (cd): remove unused GHA cache
We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage
* refactor (cd): use cargo-chef for caching rust deps
* fix: move build system deps before cargo cheg cook
* fix (release): use newer debian to reduce vulnerabilities
* fix (cd): use same zone, region and service accounts
* fix (cd): use same disk size and type for all deployments
* refactor (cd): activate interactive shells
Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines
* refactor (test): use docker artifact from registry
Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP
Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint.
* tmp (cd): bump timeout for building from scratch
* tmp (test): bump build time
* fix (cd, test): bump build time-out to 210 minutes
* fix (docker): do not build with different settings
Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations
* revert (docker): do not fix the rust version
* fix (docker): build on the root directory
* refactor(docker): Use base image commands and tools
* fix (cd): use correct variables & values, add build concurrency
* fix(cd): use Mainnet instead of mainnet
* imp: remove checkout as Buildkit uses the git context
* fix (docker): just Buildkit uses a .dockerignore in a path
* imp (cd): just use needed variables in the right place
* imp (cd): do not checkout if not needed
* test: run on push
* refactor(docker): reduce build changes
* fix(cd): not checking out was limiting some variables
* refactor(test): add an multistage exclusive for testing
* fix(cd): remove tests as a runtime dependency
* fix(cd): use default service account with cloud-platform scope
* fix(cd): revert checkout actions
* fix: use GA c2 instead of Preview c2d machine types
* fix(actions): remove workflow_dispatch from patched actions
This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one
* fix(actions): remove patches from push actions
* test: validate changes on each push
* fix(test): wrong file syntax on test job
* fix(test): add missing env parameters
* fix(docker): Do not rebuild to download params and run tests
* fix(test): setup gcloud and loginto artifact just when needed
Try not to rebuild the tests
* fix(test): use GCP container to sync past mandatory checkpoint
* fix(test): missing separators
* test
* fix(test): mount the available disk
* push
* refactor(test): merge disk regeneration into test.yml
* fix(cd): minor typo fixes
* fix(docker): rebuild on .github changes
* fix(cd): keep compatibility with gcr.io
To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts.
* fix(cd): typo and scope
* fix(cd): typos everywhere
* refactor(test): use smarter docker wait and keep old registry
* fix(cd): do not constraint the CPUs for bigger machines
* revert(cd): reduce PR diff as there's a separate one for tests
* fix(docker): add .github as it has no impact on caching
* fix(test): run command correctly
* fix(test): wiat and create image if previous step succeded
* force rebuild
* fix(test): do not restrict interdependant steps based on event
* force push
* feat(docker): add google OS Config agent
Use a separate step to have better flexibility in case a better approach is available
* fix(test): remove all hardoced values and increase disks
* fix(test): use correct commands on deploy
* fix(test): use args as required by google
* fix(docker): try not to invalidate zebrad download cache
* fix(test): minor typo
* refactor(test): decouple jobs for better modularity
This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails.
* fix(test): Do not try to execute ss and commands in one line
* fix(test): do not show undeeded information in the terminal
* fix(test): sleep befor/after machine creation/deletion
* fix(docker): do not download zcash params twice
* feat(docker): add google OS Config agent
Use a separate step to have better flexibility in case a better approach is available
* merge: docker-actions-refactor into docker-test-refactor
* test docker wait scenarios
* fix(docker): $HOME variables is not being expanded
* fix(test): allow docker wait to work correctly
* fix(docker): do not use variables while using COPY
* fix(docker): allow to use zebrad as a command
* fix(cd): use test .yml from main
* fix(cd): Do not duplicate network values
The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow
* fix(test): use bigger machine type for compute intensive tasks
* refactor(test): add tests in CI file
* fix(test): remove duplicated tests
* fix(test): typo
* test: build on .github changes temporarily
* fix(test): bigger machines have no effect on sync times
* feat: add an image to inherit from with zcash params
* fix(cd): use the right image name and allow push to test
* fix(cd): use the right docker target and remove extra builds
* refactor(docker): use cached zcash params from previous build
* fix(cd): finalize for merging
* imp(cd): add double safety measure for production
* fix(cd): use specific SHA for containers
* fix(cd): use latest gcloud action version
* fix(test): use the network as Mainnet and remove the uppercase from tests
* fix(test): run disk regeneration on specific file change
Just run this regeneration when changing the following files:
https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rshttps://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rshttps://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs
* refactor(test): seggregate disks regeneration from tests
Allow to regenerate disks without running tests, and to run tests from previous disk regeneration.
Disk will be regenerated just if specific files were changed, or triggered manually.
Tests will run just if a disk regeneration was not manually triggered.
* fix(test): gcp disks require lower case conventions
* fix(test): validate logs being emmited by docker
GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it
* test
* fix(test): force tty terminal
* fix(test): use a one line command to test terminal output
* fix(test): always delete test instance
* fix(test): use short SHA from the PR head
Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub.
We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA
* fix(ci): do not trigger CI on docker changes
There's no impact in this workflow when a change is done in the dockerfile
* Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step.
As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there.
* doc(test): document reasoning for new steps
* fix(test): increase machine type and ssh timeout
* fix(test): run tests on creation and follow container logs
This allows to follow logs in Github Actions terminal, while the GCP container is still running.
Just delete the instance when following the logs ends successfully or fails
* finalize(test): do not rebuild image when changing actions
* fix(test): run tests on creation and follow container logs
This allows to follow logs in Github Actions terminal, while the GCP container is still running.
Just delete the instance when following the logs ends successfully or fails
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Create a `MempoolBehavior` helper type
Prepare to replace the `Option<Height>` with a custom type that is more
flexible and provides more meaning when used.
* Use `MempoolBehavior` instead of `Option<Height>`
Clarify what the argument is, and prepare the code so that a new variant
can be added to the `MempoolBehavior` later.
* Allow the mempool to be automatically activated
Add a new variant to `MempoolBehavior` that indicates that the mempool
should become active during the test without needing a forced activation
height.
* Test full synchronization
Run `zebrad` and wait until full synchronization completes. Check if the
mempool is automatically activated when it reaches the tip.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(clippy): for loop with only one item
* fix(clippy): manual Range::contains
Also clarified the surrounding code because it was unclear.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Previously, the test would wait 10 seconds for the process to launch.
Now it waits until the process has used the conflicting resource.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(network): split synthetic NotFoundRegistry from message NotFoundResponse
* docs(network): Improve `notfound` message documentation
* refactor(network): Rename MustUseOneshotSender to MustUseClientResponseSender
```
fastmod MustUseOneshotSender MustUseClientResponseSender zebra*
```
* docs(network): fix a comment typo
* refactor(network): remove generics from MustUseClientResponseSender
* refactor(network): add an inventory collector to Client, but don't use it yet
* feat(network): register missing peer responses as missing inventory
We register this missing inventory based on peer responses,
or connection errors or timeouts.
Inbound message inventory tracking requires peers to send `notfound` messages.
But `zcashd` skips `notfound` for blocks, so we can't rely on peer messages.
This missing inventory tracking works regardless of peer `notfound` messages.
* refactor(network): rename ResponseStatus to InventoryResponse
```sh
fastmod ResponseStatus InventoryResponse zebra*
```
* refactor(network): rename InventoryStatus::inner() to to_inner()
* fix(network): remove a redundant runtime.enter() in a test
* doc(network): the exact time used to filter outbound peers doesn't matter
* fix(network): handle block requests slightly more efficiently
* doc(network): fix a typo
* fmt(network): `cargo fmt` after rename ResponseStatus to InventoryResponse
* doc(test): clarify some test comments
* test(network): test synthetic notfound from connection errors and peer inventory routing
* test(network): improve inbound test diagnostics
* feat(network): add a proptest-impl feature to zebra-network
* feat(network): add a test-only connect_isolated_with_inbound function
* test(network): allow a response on the isolated peer test connection
* test(network): fix failures in test synthetic notfound
* test(network): Simplify SharedPeerError test assertions
* test(network): test synthetic notfound from partially successful requests
* test(network): MissingInventoryCollector ignores local NotFoundRegistry errors
* fix(network): decrease the inventory rotation interval
This stops us waiting 3-4 sync resets (4 minutes) before we retry a missing block.
Now we wait 1-2 sync resets (2 minutes), which is still a reasonable rate limit.
This should speed up syncing near the tip, and on testnet.
* fmt(network): cargo fmt --all
* cleanup(network): remove unnecessary allow(dead_code)
* cleanup(network): stop importing the whole sync module into tests
* doc(network): clarify syncer inventory retry constraint
* doc(network): add a TODO for a fix to ensure API behaviour remains consistent
* doc(network): fix a function doc typo
* doc(network): clarify how we handle peers that don't send `notfound`
* docs(network): clarify a test comment
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(network): allow more inbound than outbound connections
* refactor(network): access constants using consistent paths
* fixup! fix(network): allow more inbound than outbound connections
* fixup! fixup! fix(network): allow more inbound than outbound connections
* refactor(network): convert to standard test module layout
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* refactor(network): rename Advertised to Available
```sh
fastmod Advertised Available zebra*
fastmod advertised available zebra*
```
* refactor(network): allow different available and missing types inside an InventoryStatus
And rename it to ResponseStatus.
Split the methods between ResponseStatus and an InventoryStatus alias.
* refactor(network): add a block_hash convenience method to InventoryHash
* test(network): improve failure logs for connection tests
* fix(inbound): move address sanitization into the response future
* feat(network): send notfound when Zebra doesn't have a block or transaction
* doc(network): move module docs to the top of each module
This makes them more likely to get updated when the module changes.
* fix(network): stop sending unsupported missing inventory types to the registry
* test(network): inbound messages are forwarded to the registry
* test(inbound): test Peers requests to the inbound service, directly and via TCP
* test(network): notfound block responses are sent by the inbound service
* test(network): notfound tx responses are sent by the inbound service
* test(network): increase sync test mock service timeout
The code that these tests use hasn't actually changed much,
and they are only failing on some platforms (coverage, macOS).
So it seems like the extra concurrent inbound tests have pushed them
past their time limit.
(Perhaps due to TCP system calls, or extra serialization work.)
* doc(network): fix typo
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* test(network): remove unnecessary multi-threaded runtime from tests
This prevents `MockService<zebra_state>` timeouts
in the `sync_block_too_high_extend_tips` test,
at the cost of reducing coverage of different execution orders.
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* Log chain progress while Zebra is syncing
This helps test if the chain tip estimate is accurate,
and helps diagnose problems during full sync tests.
* Update to the latest chain tip estimate API
The keyword is `paths` and the actions were using `path`
That's the reason why most actions have been running, and there's been no impact in time savings
* Remove redundant documentation
The documentation was exactly the same as the documentation from the
trait.
* Calculate a mock time block delta for tests
Simulate a block being added to the chain with a random block time based
on the previous block time and the target spacing time.
* Add a `time` field to `ChainTipBlock`
Store the block time so that it's ready for a future chain that allows
obtaining the chain tip's block time.
* Add `ChainTip::best_tip_block_time` method
Allow obtaining the bes chain tip's block time.
* Add method to obtain both height and block time
Prevent any data races by returning both values so that they refer to
the same chain tip.
* Add `NetworkUpgrade::all_target_spacings` method
Returns all the target spacings defined for a network.
* Create a `NetworkChainTipEstimator` helper type
Isolate the code to calculate the height estimation in a new type, so
that it's easier to understand and doesn't decrease the readability of
the `chain_tip.rs` file.
* Add `ChainTip::estimate_network_chain_tip_height`
This is more of an extension method than a trait method. It uses the
`NetworkChainTipHeightEstimator` to actually perform the estimation, but
obtains the initial information from the current best chain tip.
* Fix typo in documentation
There was an extra closing bracket in the summary line.
* Refactor `MockChainTipSender` into a separate type
Prepare to allow mocking the block time of the best tip as well as the
block height.
* Allow sending mock best tip block times
Add a separate `watch` channel to send the best tip block times from a
`MockChainTipSender` to a `MockChainTip`.
The `best_tip_height_and_block_time` implementation will only return a
value if there's a height and a block time value for the best tip.
* Fix off-by-one height estimation error
Use Euclidean division to force the division result to round down
instead of rounding towards zero. This fixes an off-by-one error when
estimating a height that is lower than the current height, because the
fractionary result was being discarded, and it should have forced the
height to go one block back.
* Fix panics on local times very far in the past
Detect situations that might cause the block height estimate to
underflow, and return the genesis height instead.
* Fix another off-by-one height estimation error
The implementation of `chrono::Duration::num_seconds` adds one to the
number of seconds if it's negative. This breaks the division
calculation, so it has to be compensated for.
* Test network chain tip height estimation
Generate pairs of block heights and check that it's possible to estimate
the larger height from the smaller height and a displaced time
difference.
* Support large block heights
* Document consensus rules referring to expiry heights
* Refactor the docs
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Fix the formatting of an error message
* refactor: Simplify coinbase expiry code so the consensus rule is clear (#3408)
* Fix some outdated TODO comments
* refactor(coinbase expiry): Simplify the code so consensus rule is clear
* Fix the formatting of an error message
* Remove a redundant comment
Co-authored-by: Marek <mail@marek.onl>
Co-authored-by: Marek <mail@marek.onl>
* Check the max expiry height at parse time
* Test that 2^31 - 1 is the last valid height
* Add tests for nExpiryHeight
* Add tests for expiry heights of V4 transactions
* Add tests for V5 transactions
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* fix(zcash-params): Do not update parameters image on PR
We should not update a direct dependency of our Docker image to be writeable by a PR from anywhere, a local branch or a fork branch, before that change has been approved by a human and merged to #main
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
* refactor (cd): overall pipeline improvement
- Use a more ENV configurable Dockerfile
- Remove cloudbuild dependency
- Use compute optimized machine types
- Use SSD instead of normal hard drives
- Move Sentry endpoint to secrets
- Use a single yml for auto & manual deploy
- Migrate to Google Artifact Registry
* refactor (cd): overall pipeline improvement
- Use a more ENV configurable Dockerfile
- Remove cloudbuild dependency
- Use compute optimized machine types
- Use SSD instead of normal hard drives
- Move Sentry endpoint to secrets
- Use a single yml for auto & manual deploy
- Migrate to Google Artifact Registry
* refactor (cd): use newer google auth action
* fix (cd): use newer secret as gcp credential
* fix (docker): do not create extra directories
* fix (docker): ignore .github for caching purposes
* fix (docker): use latest rust
* fix: use a better name for manual deployment
* refactor (docker): use standard directories for executable
* fix (cd): most systems expect a "latest" tag
Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used.
* fix (cd): push the build image and the cache separately
The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter.
This also allows for smaller release images.
* fix (cd): remove unused GHA cache
We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage
* refactor (cd): use cargo-chef for caching rust deps
* fix (release): use newer debian to reduce vulnerabilities
* fix (cd): use same zone, region and service accounts
* fix (cd): use same disk size and type for all deployments
* refactor (cd): activate interactive shells
Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines
* fix (docker): do not build with different settings
Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations
* fix(cd): use Mainnet instead of mainnet
* fix(docker): remove tests as a runtime dependency
* fix(cd): use default service account with cloud-platform scope
* fix(cd): keep compatibility with gcr.io
To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts.
* fix(docker): do not download zcash params twice
* feat(docker): add google OS Config agent
Use a separate step to have better flexibility in case a better approach is available
* fix(docker): allow to use zebrad as a command
* feat: add an image to inherit from with zcash params
* refactor(docker): use cached zcash params from previous build
* imp(cd): add double safety measure for production
* document the `header` field
* document the `nVersionGroupId` field
* document the `nConsensusBranchId` field
* document the `lock_time` field
* document the `nExpiryHeight` field (and some missing `lock_time`)
* add missing note to `header` field in serialization
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* add a test for peerset always broadcast while there are available peers
* fix minors from review
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
* split the test into two
* simplify some code
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>