* add cargo update steps to task list
* Fix a formatting issue in the checklist
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
---------
Co-authored-by: teor <teor@riseup.net>
* Stop running multiple full syncs on different branches
* Fully fix concurrency, and require found cache or generated cache
* Use correct syntax and job dependencies
* ci(build): unpin specific `buildkit` version
We previously had an issue with the following error: `cannot reuse body, request must be retried`
This commonly was a wrong error, caused by a containerd issue which has being tracked and solved here: https://github.com/docker/build-push-action/issues/761#issuecomment-1406261692
We're having errors when building, and this might be caused by an underliying error which containerd is not showing us correctly.
* ci(deploy): Use specific subnetworks on GCP VMs
After migrating to our new GCP project, some references were not being applied correctly, as this workflow was not referencing new resources the right way. It was even outdated on some specific parts.
* add user agent as argument, use git to auto build zebra user agent
* try to fix test
* fix typo
* change expect text
* remove newline
* fix some docs
Co-authored-by: Marek <mail@marek.onl>
---------
Co-authored-by: Marek <mail@marek.onl>
* Split checking for cached state disks into its own workflow
* Fix workflow field order
* Run the top-level workflow when the reusable workflow changes
* And run dependent workflows for pull requests as well
* Remove redundant output names
* Document the existing and new workflow jobs
* Add the network to the "no disk found" message
* Tweak existing docs and descriptions
* Generate Zebra checkpoints on testnet
* Add a full sync testnet entrypoint, and simplify mainnet env vars
* Only run the full testnet sync on the main branch
* Deduplicate and update the zebra-checkpoints docs
* Add instructions for automatic checkpoint generation
* Hide some details in the release checklist
* Update release checkpoint instructions to use CI
* Only update the cache in one job on mainnet
* Split checking for cached state disks into its own workflow
* Fix workflow field order
* Run the top-level workflow when the reusable workflow changes
* And run dependent workflows for pull requests as well
* Remove redundant output names
* Document the existing and new workflow jobs
* Add the network to the "no disk found" message
* refuse to run Zebra if it is too old
* update the release checklist to consider the constants
* bring newline back
* apply new end of support code
* attempt to add tests (not working yet)
* move eos to progress task
* move tests
* add acceptance test (not working)
* fix tests
* change to block height checks (ugly code)
* change warn days
* refactor estimated blocks per day, etc
* move end of support code to its own task
* change test
* fix some docs
* move constants
* remove uneeded conversions
* downgrade tracing
* reduce end of support time, fix ci changing debugs to info again
* update instructions
* add failure messages
* cargo lock update
* unify releaase name constant
* change info msg
* clippy fixes
* add a block explorer
* ignore testnet in end of support task
* change panic to 16 weeks
* add some documentation about end of support
* Tweak docs wording
---------
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Add extra test type modes to support zebra-checkpoints
* Add Mainnet and Testnet zebra-checkpoints test harnesses
* Add zebra-checkpoints to test docker images
* Add zebra-checkpoints test entrypoints
* Add Mainnet CI workflow for zebra-checkpoints
* Enable zebra-checkpoints feature in the test image
* Use the same features for (almost) all the docker tests
* Make workflow features match Docker features
* Add a feature note
* Add a zebra-checkpoints test feature to zebrad
* Remove the "no cached state" testnet code
* Log a startup message to standard error when launching zebra-checkpoints
* Rename tests to avoid partial name conflicts
* Fix log formatting
* Add sentry feature to experimental docker image build
* Explain what ENTRYPOINT_FEATURES is used for
* Use the correct zebra-checkpoints path
* Silence zebrad logs while generating checkpoints
* Fix zebra-checkpoints log handling
* Re-enable waiting for zebrad to fully sync
* Add documentation for how to run these tests individually
* Start generating checkpoints from the last compiled-in checkpoint
* Fix clippy lints
* Revert changes to TestType
* Wait for all the checkpoints before finishing
* Add more stderr debugging to zebra-checkpoints
* Fix an outdated module comment
* Add a workaround for zebra-checkpoints launch/run issues
* Use temp dir and log what it is
* Log extra metadata about the zebra-checkpoints binary
* Add note about unstable feature -Z bindeps
* Temporarily make the test run faster and with debug info
* Log the original test command name when showing stdout and stderr
* Try zebra-checkpoints in the system path first, then the cargo path
* Fix slow thread close bug in dual process test harness
* If the logs are shown, don't say they are hidden
* Run `zebra-checkpoints --help` to work out what's going on in CI
* Build `zebra-utils` binaries for `zebrad` integration tests
* Revert temporary debugging changes
* Revert changes that were moved to another PR
* refactor(ci): use GitHub secrets and variables
We've been using values that are variable across multiple workflows,
and those can only be changed if modifying the workflows, but we should
be able to change the values without committing new changes in the code
for this purpose we're now using GitHub Variables, and even moving
non-sensitive information into variables instead of secrets. Allowing
more flexibility and other scenarios that should be easier to manage,
like deploying to Mainnet or Testnet.
* refactor(ci): use new GitHub variables for GCP auth
* fix(ci): typo
* fix(ci): do not use multiple variables for the same value
* fix(ci): typo in variable
* fix(vars): use different variables for machine types
* fix(vars): missing substitution
* fix: typo
* fix: make the input CI network override the default network
* Use the correct network variable for creating disks
---------
Co-authored-by: teor <teor@riseup.net>
* Duplicates Dockerfile
* updates mining-testnet Dockerfile with getblocktemplate-rpcs feature, Testnet by default, and an RPC port
* renames mining-testnet.Dockerfile and adds workflow for publishing images on release
* replaces space-seperated features with commas
* Adds .experimental tag suffix, removes new dockerfile, makes lightwalletd tests conditional
* updates build-args to pass on features directly
* adds "lightwalletd-grpc-tests" as default test_features in build-docker-image
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* adds tag suffix to cache keys
---------
Co-authored-by: teor <teor@riseup.net>
* Remove unused dependencies
* Check for newly unused dependencies in CI
* Use the correct grep command
* Always show cargo machete output
* Ignore cargo machete exit status, use grep instead
* Use if instead of && and subshells
* Invert if logic
Docker does not allow to use multiple tags for cache images in their
`cache-from` and `cache-to` options. Which makes some images to lose
previous tags, if the same.
This cause automated mechanims for deletion to fail, as those cache images
were dependant on the main image, and deleting the cache alone was raising
an error if the main image was not deleted first.
A workaround for this is making a separate repository which can hold this
cache images. Which also reduces clutter from the main registries.
* ci(lwd): run the send transactions test on each PR update
The send transactions test was moved to the main branch in #5480 because
it was very slow.
It's much faster (~30m) with #5015 and now it can be run for every PR
update again.
* fix(actions): remove references to the workflow_dispatch
* ci: add a test to validate Zebra's config file and path
* fix: use `ZEBRA_CONF_PATH` as single variable locating the conf
* fix: do not remove the containers
* fix: use extended regex
* fix: use different steps to validate the conf tests
* fix: do not specify a default CMD for running Docker in test builds
* fix: use actual starting commands for entrypoint
* fix: do not add cargo twice if cargo is in $1
* fix: allow to run `zebrad` in the `tests` stage of Dockerfile
* fix: new entrypoint does not allow an empty CMD
* fix: do not duplicate the `zebrad` command
* fix: segregate configuration jobs
* refactor(entrypoint): handle better parameters conditions
* fix: make `zebrad` an executable command in `tests` stage
* Show the commands that are being executed in the new docker test
* Show full logs without tee or grep
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* fix: use the actual path inside docker
* fix: use `grep` with exit code
If the container is logging to stderr, piping works only for stdout, so we're adding `2>&1`
* fix: use `grep -q` to get an exit code
* fix: fail if any error is detected
* fix: fail if this test takes more than 5 minutes
* fix: update patch workflows
* feat: test Dockerfile `runtime` config
* fix: depend on the configuration test to continue
Co-authored-by: teor <teor@riseup.net>
* feat(ci): delete unused artifacts in registries
Previous behavior:
Docker artifacts are costing us a good part of our infrastructure budget,
and we needed a way to remove unused artifacts.
Expected behavior:
Delete unused (not just old) docker artifacts in GAR (Google Artifact Registry),
preferably using a generic solution is this needs to be expanded into other
Docker registries.
Solution:
Implement GCR Cleaner https://github.com/GoogleCloudPlatform/gcr-cleaner,
as this tools provided integration with `docker/login-action` to interact
with multiple Docker v2 registries.
* fix(action): use hours instead of days
* chore: add TODO
* Update .github/workflows/delete-gcp-resources.yml
Co-authored-by: teor <teor@riseup.net>
* fix: allow the action to fail if some images can't be deleted
Co-authored-by: teor <teor@riseup.net>
* ci: reduce the amount of API calls made by `arduino/setup-protoc@v1`
If the `protoc` compiler version is not available locally, the action will look for the most recent version. We use a fixed version to reduce the amount of API calls being done in all workflows, but mainly on `build-crates-individually.yml`
* fix: decrease `protoc` version until the action is fixed
* fix(ci): remove warnings caused by missing `actions/checkout`
* fix: typo in arguments
* fix: add the whole disk name as this is a single instance
* fix: add dis name to mount
* Changelog with trivial entries
* Delete trivial entries
* Summarise known issues in README, but don't change the list yet
* Add block timeouts to known issues
* Update the release template to add missing version files
* Bump crate versions
* Add the required Rust version to the release checklist
* Update the Rust version requirement to 1.65, Zebra now uses `let ... else ...`
* Update checkpoints
* Add checkpoints to the CHANGELOG
* Breaking Rust compiler version change
* Clarify the latest stable supported rust version
* Update release-checklist.md
* Add complex code or requirements to the PR template
* Add complex code and testing sections to the ticket template
Remove the design section, because we're not using it
* Merge freeze and merge PRs
* Remove a redundant sprout full sync job
* Add two new full sync jobs
* Allow the full sync test to run for 48 hours (estimated current time 40-45 hours)
Previous behavior:
Our Docker Hub is missing the documentation we use in the Zebra repository
Expected behavior:
Each time we change our README.md, or on demand, update the documentation
on Docker Hub with it. Also update the short description using our repository
description.
Solution:
Implement https://github.com/peter-evans/dockerhub-description
* Re-apply: add acceptance test for getblocktemplate method in CI (#5653)
Revert "Revert "change(tests): add acceptance test for getblocktemplate method in CI (#5653)" (#5672)"
This reverts commit 6446e0ec1b.
* Fix incorrect MAX_CONTEXT_BLOCKS assertion in state
* Actually negate the miner fee for the RPC output
* Try the RPC again after waiting for transactions to verify
* Log before the test waits for the mempool to verify transactions
* Use the new ssh key secrets in CI
* chore: add Network as a label
* Fix network parameter in continous-delivery.yml
* Standardise network usage in zcashd-manual-deploy
* Use lowercase network labels
* Fix some shellcheck errors
* Hard-code a Mainnet default to support contexts where env is not available
* Fix string syntax
* Fix more shellcheck errors
* Update .github/workflows/zcashd-manual-deploy.yml
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: Arya <aryasolhi@gmail.com>
* feat(gcp): add label to instances for cost and logs grouping
Previous behavior:
We couldn't search GCP logs using the instance name if that instance was
already deleted. And if we want to know how we're spending our budget its
also difficult to know if specific tests or type of instances are the one
responsible for a certain % of the costs
Fixes#5153
Fixses #5543
Expected behavior:
Be able to search logs using the test ID or at least the github reference,
and be able to group GCP costs by labels
Solution:
- Add labels to instances
* chore: add Network as a label
* Revert "chore: add Network as a label"
This reverts commit 146f747d50.
* Update .github/workflows/zcashd-manual-deploy.yml
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Fail CI if there are doc warnings
* Try a different syntax
* Use an env var
* Fix spacing, add reference
* Allow intra-doc links
* Note that rustdoc settings need to stay in sync
* fix(cd): allow deploying instance templates without disk errors
Motivation:
PR #5670 failed in `main` as it was tested with `gcloud compute instances create-with-container`
and even the manual deployment uses `instances`, and it works.
But the one that failed uses `gcloud compute instance-templates create-with-container`
using `instance-template` and it's complaining with: `When attaching or creating a disk that is also being mounted to a container, must specify the disk name`
Based on the documentation, the name is optional when using `create-with-container`,
for both `instances` and `instance-templates`
Source: https://cloud.google.com/sdk/gcloud/reference/compute/instance-templates/create-with-container#--container-mount-disk
Solution:
Revert this specific job as how it was, and do not scale the instances
above 1, as this would cause the following error:
`Instance template specifies a disk with a custom name. This will cause instance group not to scale beyond 1 instance per zone.`
* chore: reduce diff
* adds test for getblocktemplate rpc method
* adds the new test to CI
* adds a couple logs
* Adds example for running the test in acceptance.rs
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* ci(compute): use debian public image on the VM, not the container
Previous behavior:
We were pulling the debian image the wrong way, as this was being used
as a container but it was meant to be the VM image
The image being pulled to create the internal container has been causing
crashes as this images do not exists on Google's container repositories
Expected behavior:
Use a public image as debian-11 to get multiple benefits from it, as being
able to use machine-images (#5615) and automatic disk resizing (which
is now possible as we're using COS images, but those are more restrictive)
Solution
Add `--image-project=debian-cloud` and `--image-family=debian-11` as
stated in the official documentation: https://cloud.google.com/sdk/gcloud/reference/compute/instances/create-with-container#--image-project
More info: https://cloud.google.com/compute/docs/images/os-details#import
* fix: use a public image with docker on the host
* fix(logs): missing sudo before docker command
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* feat(ssh): enable OS Login for GCP test instances
* fix(ssh): force service account impersonation for OS Login
* debug: show actual user trying to impersonate SA
* fix(glcloud): configure gcloud before running commands
* fix(ssh): add VM zone to ssh command
* fix(auth): bringing changes from #5614
* fix(auth): impersonation is working as expected now
* fix(gcloud): setup the GCP CLI after authenticating (#5606)
Previous behavior:
`gcloud` commands have been running without an appropiate authentication
as the `auth` auction was sucessfully executed, but the actual gcloud
CLI being used in further jobs was not using the correct configuration
nor credentials
Expected behavior:
All `gcloud` commands should be properly configured and authenticated.
Solution:
Add the `google-github-actions/setup-gcloud` action after each
`google-github-actions/auth` invocation, and before running any `gcloud`
command.
Remove the need of an OAuth Access token when not required by following
steps
* fix(auth): revert to latest version
* fix: wrong replace
* fix(ci): use a specific debian image for VM containers
* fix(ssh): delete generated SSH keys by CI after 30 seconds
* debug: remove debug commands
* fix(compute): use a lightweight container image
* fix(ci): add missing sudo to docker command
* Update .github/workflows/deploy-gcp-tests.yml
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
* fix(ssh): delete ssh-keys for the specific GHA service account
Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Previous behavior:
`gcloud` commands have been running without an appropiate authentication
as the `auth` auction was sucessfully executed, but the actual gcloud
CLI being used in further jobs was not using the correct configuration
nor credentials
Expected behavior:
All `gcloud` commands should be properly configured and authenticated.
Solution:
Add the `google-github-actions/setup-gcloud` action after each
`google-github-actions/auth` invocation, and before running any `gcloud`
command.
Remove the need of an OAuth Access token when not required by following
steps
* updates mod docs for tests that use future blocks
* updates submitblock test to use TestType methods
* prunes redundant code
* adds check_sync_logs_until
* adds assertion for needs cached state & rpc server
* updates get_raw_future_blocks fn with rpc calls
* updates to get_raw_future_blocks fn and submit_block test
* Rename LightwalletdTestType to TestType
* moves TestType and random_known_rpc_port_config to test_type.rs and config.rs
* moves get_raw_future_blocks to cached_state.rs
* updates ci workflows to include submit block test
* adds get_future_blocks fn and uses it in load_transactions_from_future_blocks
* updates CI docker
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Applies suggestions from code review
* Updates misnamed closure param
* updates mod docs for test_type.rs
Co-authored-by: teor <teor@riseup.net>
* Use correct release for getblocktemplate config
* Include at least 2 full checkpoints in the lookahead limit
* Increase full sync timeout to 36 hours
* Only log "synced block height too far ahead of the tip" once
* Replace AboveLookaheadHeightLimit error with pausing the syncer
* Use AboveLookaheadHeightLimit for blocks a very long way from the tip
* Also add the getblocktemplate config, and fix the test message
* Remove an outdated TODO comment
* Allow syncing again when a small number of blocks are in the queue
* Allow some dead code
* Only run multiple test jobs if they are needed for a long test
* Remove unused job steps
* Remove trailing whitespace
* Follow logs in the Run step
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Run CI jobs on dependent PRs
* Change job names to be unique
* Fix outdated workflow name
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Make "test all" log output shorter
* Use different docker instance names
* Spell out command-line arguments
* Fix option names
* Use nocapture on basic tests but not ignored tests
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* allow(clippy::result_large_err)
* Increase the async executor delay expected by tests
* Split getblocktemplate-rpcs OS tests into a separate matrix job
* Add new patch jobs
* allow(unknown_lints)
* Fix the branch name in the release template
* Use a docker command with colour and Ctrl-C support
* Make branch name example more readable
* Fix a link typo
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* docs: add user documentation on how to use Zebra with docker
Motivation:
We don't have a user documentation on how to use/deploy Zebra using our
the Dockerfile available in our repository or were the images are being
hosted
Solution:
Add user documentation explaining how to pull the image from the Docker
Hub or how to build it locally. With extra information on which images
we're hosting and where we're hosting it
* docs(docker): use existing getting started header
* Update book/src/user/docker.md
Co-authored-by: Arya <aryasolhi@gmail.com>
* docs(docker): add build alternative instructions from Docker
* docs: add docker documentation to Rust doc sidebar
* docs: update checklist with docker user documentation
* Update README.md
Co-authored-by: teor <teor@riseup.net>
* Update new refs to rc.1
Co-authored-by: Arya <aryasolhi@gmail.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* only run the send transaction test on the main branch
* adds patch job
* Add concurrency rule to the send transactions test
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Allow send tx test to be triggered manually
Co-authored-by: teor <teor@riseup.net>
* test all features
* increase task timeout
* add a new task for feature tests
* add `getblocktemplate-rpcs` tests to docker integration
* run the getblocktemplate-rpcs feature as a separated step in docker
* move getblocktemplate-rpcs in docker as a separated task
* ci(sync): only run the `lightwalletd` full sync on the `main` branch
Previous behavior:
In PR #5164, we made lightwalletd sync all the way to the tip in its full
sync test.
This increases that test's time from 1 hour to 4 hours, which makes the CI
we run on each PR change increase from 3 hours to 6 hours.
Expected behavior:
Run the lightwalletd full sync just on `main` or if a state disk for the
actual version is not found.
Solution:
Add the `github.event_name == 'push' && github.ref_name == 'main'` condition
to the `lightwalletd-full-sync` test.
Fixes#5316
* Allow lwd full syncs to be triggered manually (#5400)
* Limit checkpoint and lwd full sync concurrency
* Add a patch job for lightwalletd-full-sync
Co-authored-by: teor <teor@riseup.net>
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"
This reverts commit b366d6e7bb.
* ci(ssh): use sudo for docker commands if user is not root
* ci(ssh): specify the service account to connect with
* ci(ssh): increase the Google Cloud instance sshd connection limit
* chore: add a new line at the end of the script
* chore: update our VM image to bullseye
* chore: fix `tj-actions/changed-files` file comparison
* ci(disk): use an official image on CI VMs for disk auto-resizing
Previous behavior:
We've presented issues in the past with resizing as the device is busy,
for example:
```
e2fsck: Cannot continue, aborting.
/dev/sdb is in use.
```
Expected behavior:
We've been manually resizing the disk as this task was not being done
automatically, but having an official Public Image from GCP would make
this easier (automatic) and it also integrates better with other GCP
services
Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian
Solution:
- Use `debian-11` from the official public images https://cloud.google.com/compute/docs/images/os-details#debian
- Remove the manual disk resizing from the pipeline
* ci: increase VM disk size to fit future cached states sizes
Some GCP disk images are 160 GB, which means they could get to the current
200 GB size soon.
* Increment Zebra versions
* Initial draft changelog
* Add blog post to the release checklist
* Say "user testing"
Co-authored-by: teor <teor@riseup.net>
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"
This reverts commit b366d6e7bb.
* ci(ssh): use sudo for docker commands if user is not root
* ci(ssh): specify the service account to connect with
* ci(ssh): increase the Google Cloud instance sshd connection limit
* chore: add a new line at the end of the script
* chore: update our VM image to bullseye
* chore: fix `tj-actions/changed-files` file comparison
* Add latest and edge tags to Docker images
* Document how latest tag actually works
* Try a different syntax for is_default_branch
* Try again
* One last try
* Revert changes that don't work
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Add a Docker run command to the README
* Update the README with some user-relevant release candidate goals
* Update the release template for the release candidate
* Fix beta crate explanation
* Be more specific about what "this PR" means
* Update docker command for latest tag changes
* Update README Docker command based on tag changes
* Make Zebra release versions more vague in README.md
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* Move build instructions to build section
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* Add newlines to separate heading and paragraphs
* Remove extra newline
* Add a note for a future command update
* Remove manual build check, it doesn't have tier 1 support
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* refactor(ssh): connect using `ssh-compute` action by Google
Previous behavior:
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`
This was still happening after implementing https://github.com/ZcashFoundation/zebra/pull/5292
Excpected behavior:
Ensure we're not creating SSH key pairs on the fly to improve our connections
guarantees
Solution:
- Enable the Cloud Identity-Aware Proxy API in GCP
- Create a firewall rule to enable connections from IAP
- Grant the required IAM permissions to enable IAP TCP forwarding
- Generate an SSH keys pair and set a private key as an input param
- Set the GitHub Action SA to have authorized ssh connection to the VMs
- Implement the `google-github-actions/ssh-compute` action to connect
* fix(ssh): id `compute-ssh` cannot be used more than once within the same scope
* fix(ci): try to enclose commands to override parsing issues
* tmp: remove ssh_args
* fix(action): secrets must be inherited to be used
* tmp: validate command enclosing fixes executin
* fix(ssh): ssh_args are not implemented correctly
* fix(ssh): login with the root user
* fix(privelege): uso sudo with docker commands
* tmp: add sudo
* fix(ssh): use sudo for all docker commands
* fix(ssh): add missing `sudo` commands
* fix(ssh): get sync height from ssh stdout
* fix(height): get the height correctly
Previous behavior:
The following error was causing an exit 1 in GitHub Actions when a pushing
to the `main` branch
```
Error: Similar commit hashes detected: previous sha is equivalent to the
current sha
```
Expeceted behavior:
Allow the linter to run succesfully even if the previous SHA has no files
changed
Solution:
Add `fetch-depth: 2` to retrieve the preceding commit
Previous behavior
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`
Expected behavior
If the connection fails, attempt to reconnect once again (or multiple times)
Solution
Add the `ConnectionAttempts` and `ConnectTimeout` with 20 and 5 values
respectively, which attempst to reconnect 19 more times every 5 seconds
* Explain how to use the release template
I always have to look this up every time.
Also delete a long description of semantic versioning.
* Delete extra info that is already in the template elsewhere
* Explain how `Cargo.lock` gets updated
* Use a branch name that Google Cloud will accept
* Update release instructions for Docker binaries
* Add extra release testing steps
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
* Combine high and medium queues into a batched queue
* Explain how to check config syntax
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Update release-drafter.yml
* Explain where we got the workflow from
* Automatically add "trivial" label to dependabot updates
* Add categories and auto-labels to release drafter
* Update release PR template for automatic release drafter versions
* Also strip PR series numbers and leading spaces from changelog entries
* Update release note version check
* Update label names
* Add missing ! in conventional commits regex
Co-authored-by: Marek <mail@marek.onl>
* Make versioning steps more specific
Co-authored-by: Marek <mail@marek.onl>
* Remove conflicting detailed versioning explanations
Co-authored-by: Marek <mail@marek.onl>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Use branch then main build caches
* Revert cache order to try branch cache first
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Try running coverage with Rust 1.63
* Run GitHub Actions tests with Rust 1.63
* Change from stable to 1.63 in the patch file
* Use Rust 1.63 to download Zcash parameters
* Use Rust 1.63 to build Docker zebrad images
* Make Rust 1.63 a supported platform, and make stable temporarily unsupported
* Delete test instances after 3 days
* Use correct delete command, improve shell quoting
* Use sed to provide the correct zone or region
* Fix quoting
* Fix IFS
* Fix IFS for multiple disks
* Document why we can't quote some shell variables
* Document that instances can get deleted
* Fix exact names in deletion docs
* feat(release): create Docker hub binaries when tagging
* fix(release): add a release workflow for binaries
* fix(release): trigger on tag creation, not pushing to it
* fix(release): use the same conditions for logging into DockerHub
* fix(release): add missing parameter to access GH secrets
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* ci(release): just publish to DockerHub when a release is published
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* ci(release): filter prerelease event correctly
* ci(release): fix tags
* ci(release): use `zebra` and not `zebrad` as the repository
* ci(release): do not try to login to Docker if not a release
* Update .github/workflows/build-docker-image.yml
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Increase search range for sync height
* Update sync height regexes for zebrad and lwd cached states
* Add labels to cached state images
* Update deploy-gcp-tests.yml
* Don't create new cached states for lwd updates
* Add a missing line continuation
* Fix a comment
* Revert a mistaken comment change
* Clarify a TODO comment
* Partially revert to old docker height log handling
* Use an output for the cached disk name
* Disable fmt cache and create shared clippy cache
* Make Cargo.lock check use the shared clippy cache
* Add a TODO for Windows Rust cache path
* Fix quoting for Windows path
* Use correct sharedKey spelling
* Increase search range for sync height
* Update sync height regexes for zebrad and lwd cached states
* Add labels to cached state images
* Add a missing line continuation
Previous behavior:
The `tj-actions/changed-files` crashed when making pushes to main, as no
fetch depth was defined on the previous checkout action. Which is now r
required after b216561b5b
Expected behavior:
Do not fail with this new requirement
Solution:
Change the chekout action `fetch-depth` to 2, allowing to compare with
the previous commit
* bump prost, tonic and tonic-build
* add protoc as a dependency step in the CI
* bump console-subscriber
* add protoc to `build-crates-individually`
* add protoc to docs build
* install protoc in lint.yml
* change protoc installation location in lint.yml
* add protoc to `Check Cargo.lock is up to date`
* ci(build): keep protoc pinned to the same major version
* ci(build): avoid rate limiting with `arduino/setup-protoc@v1`
* cargo upgrade --workspace console-subscriber
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: teor <teor@riseup.net>
* Fix delete GCP resources commands
* Don't create a GCP credentials file
* Keep the latest 2 images
* Explain time
* Show the names of disks that are being deleted
* Actually run the image delete steps
* Only delete commit-based instance templates
* Document automated deletion
Previous behavior:
Disk images are being accumulated in GCP for a few years, but this
generates unneeded costs as we're not using images older than 1-2 weeks.
Expected behavior:
Delete previously unused images based on a timefrime.
Solution:
Delete images created on a pull request older than 30 days, from the
`main` branch if older than 60 days, and any other image older than 90
days.
A TODO is on place as we'd like to keep at least the 2 latest images of
each type (zebra checkpoint, zebra tip, lwd tip). Once we've excluded
those images, we can delete any older images after 1 week.
Previous behavior:
As we disabled beta Rust tests in PR #4930, because the parameter
downloads were unstable with beta Rust, we're no longer testing it.
Expected behavior:
Re-enable beta rust tests in CI OSes
Solution:
Remove the parameter exluding beta Rust
* ci(concurrency)!: run a single CI workflow as required
Previous behavior:
Multiple Mainnet full syncs were able to run on the main branch at the
same time, and pushing multiple commits to the same branch would run
multiple CI workflows, when only the run from last commit was relevant
Expected behavior:
Ensure that only a single CI workflow runs at the same time in PRs.
The latest commit should cancel any previous running workflows from the
same PR.
Solution:
Use GitHub actions concurrency feature https://docs.github.com/en/actions/using-jobs/using-concurrency
Fixes https://github.com/ZcashFoundation/zebra/issues/4977
Fixes https://github.com/ZcashFoundation/zebra/issues/4857
* docs: typo
* ci(concurrency): do not cancel running full syncs
Co-authored-by: teor <teor@riseup.net>
* fix(concurrency): explain the behavior better & add new ones
Co-authored-by: teor <teor@riseup.net>
Previous behavior:
When a push was detected in the `main` branch, the workflow would run the
`versioning` job and crash trying to detect the version being deployed as
there was none.
Expected behavior:
Do not fail the `versioning` job when pushing to `main`
Solution:
Limit the `versioning` job to only run when a release event is triggered
and allow the `deploy-nodes` job to run even if `versioning` is skipped
* Show the arguments of acceptance test functions in the logs
* Show all the logs in the "Run tests" jobs
* Document expected "broken pipe" error from `tee`
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* feat(build): deploy long running instances on release
Previous behavior:
Each time we merged to main new nodes would be deployed, this is an
expected behavior as we need to ensure nodes get deployed and run
without issues, but this could also replace nodes very hastily.
Expected behavior:
We want instances which would run for a longer time, to allow us to
troubleshoot issues or inspect the behavior of this instances for longer
periods of time (2+ weeks)
Applied solution:
Deploy a versioned manage instance group (MiG) using the major version
of the release semver. We just use the first part of the version to
replace old instances, and change it when a major version is released
to keep a segregation between new and old versions.
* ci(build): allow v0 as a major version tag
* fix(build): use rust conventions for versioning
* fix(deploy): improve documentation and trigger on release
* Update .github/workflows/continous-delivery.yml
Co-authored-by: teor <teor@riseup.net>
* fix(versioning): typo
* fix(deploy): use `zebrad-v1` as the instance name, with no SHA
* fix(deploy): create and update MiG must use the same name
* docs(deployments): add Continuous Delivery process
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Expand cached state disks before running tests
* Install partition management tool
* There isn't actually a partition on the cached state image
* Make e2fsck non-interactive
* Limit the length of image names to 63 characters
* Ignore possibly long branch names when matching images, just match the commit
* Increase full sync timeout to 24 hours
Expected sync time is ~21 hours as of August 2022.
* Split final checkpoint job into two smaller jobs to avoid timeouts
Also make regexes easier to read.
* Fix a job name typo
Previous behavior:
If warnings or error are added in `.cargo/config.toml` or `clippy.toml`,
and those could generate CI failures, we wouldn't catch those new as the
pipelines are not run when this files are changed
Expected behavior:
If warnings or error are added in `.cargo/config.toml` or `clippy.toml`,
run all the builds and test jobs which also track a `Cargo.toml`.
Solution:
Add `.cargo/config.toml` and `clippy.toml` as paths to all the required
jobs which needs to be triggered when these files changes.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Save cached state on full syncs and updates
* Add an -update suffix to CI images created by updating cached state
* Make disk image names unique by adding a time suffix
* Use the latest image from any branch, but prefer the current commit if available
* Document Zebra's continuous integration tests
* Fix typos in environmental variable names
* Expand documentation
* Fix variable name typo
* Fix shell syntax
Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated
Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code
Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.
Note: 95a6bc2a27
Fixes https://github.com/ZcashFoundation/zebra/issues/4846
* update timeout
* update the doc comment
* Increase test timeouts for Zebra update syncs
* Stop failing the 1740k job if the cached state is after block 1740k
Co-authored-by: teor <teor@riseup.net>
* Apply the same Rust logging settings to all GitHub workflows
* Enable full optimisations in dev builds for downloading large parameter files
* Disable beta Rust tests in CI
* Make code execution time logs shorter
* Do ZK parameter preloads in the lightwalletd tests that need them
* Try to re-launch `lightwalletd` when it hangs during sync tests
* Increase full sync timeout
* Clear the `zebrad` logs during `lightwalletd` tests, to avoid logging deadlocks
* Actually clear more than one line of logs
* Check zebrad and lightwalletd output in parallel threads, while waiting for zebrad
* Check zebrad and lightwalletd output in parallel threads, while waiting for lightwalletd
* Improve test logging
* Fix a log typo
* Only wait for lightwalletd once, because its logs stop after the initial sync
* Look for cached state disks for this commit and branch first
* Only copy the state once in the send transactions test
* Wait longer for lightwalletd gRPC server startup
* Add some function docs
* cargo fmt --all
* Fix clippy::let_and_return
* Increase lightwalletd test timeouts for zebrad slowness
* Add a `zebrad_update_sync()` test, that update syncs Zebra without lightwalletd
* Run the zebrad-update-sync test in CI
* Add extra zebrad time to workaround lightwalletd bugs
* Initialize the rayon threadpool with a new config for CPU-bound threads
* Verify proofs and signatures on the rayon thread pool
* Only spawn one concurrent batch per verifier, for now
* Allow tower-batch to queue multiple batches
* Fix up a potentially incorrect comment
* Rename some variables for concurrent batches
* Spawn multiple batches concurrently, without any limits
* Simplify batch worker loop using OptionFuture
* Clear pending batches once they finish
* Stop accepting new items when we're at the concurrent batch limit
* Fail queued requests on drop
* Move pending_items and the batch timer into the worker struct
* Add worker fields to batch trace logs
* Run docker tests on PR series
* During full verification, process 20 blocks concurrently
* Remove an outdated comment about yielding to other tasks
* Make the release checklist shorter and hide some details
* Ignore any `fastmod` updates to previous release notes in `CHANGELOG.md`
* Use recent versions in examples
* Fix markdown that doesn't render correctly
* Fix some weird line breaks
* Use capital letters to start list items
* Clarify `fastmod` and `CHANGELOG.md`
* Clarify version format by changing highlighting
* Checkout zebra in each job to avoid warnings
But put TODOs where we might be able to skip checkouts
* Split log following into sprout checkpoints, sapling/orchard checkpoints, and full validation
* Make job IDs shorter
* Use /dev/stderr because docker doesn't have a tty
* remove pipefail
* Revert "remove pipefail"
This reverts commit a7ee37bebdc107a4215e7dd307b189d925969234.
* Make tee ignore errors writing to a grep pipe
* Avoid launching multiple docker instances for duplicate jobs
* Ignore broken pipe error messages and statuses
* fix(ci): docker wait not finding container
We had this issue before, I can't recall if this was a parsing error between GitHub Actions and gcloud `--command` parsing, but we had to change this into two pieces.
This implementation keeps it how we did it before 9b9578c999/.github/workflows/test.yml (L235-L243)
* docs: remove pending TODO
We can't remove `actions/checkout` nor set `create_credentials_file` to `false` as next steps won't be able to authenticate to GCP.
We can surely remove `actions/checkout` and leave `create_credentials_file` as `true`, but this will raise a warning on each step, and there's no benefit of doing so.
* Show `docker wait` and `gcloud ssh` output
* If `docker wait` fails, get the exit code using `docker inspect`
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Put arguments to "docker run" on different lines
And update some comments.
* Split docker run into launch, logs, and wait
* Remove mistaken "needs state" condition on log and results job
* Exit the ssh and the job with the container test's exit status
* Split full sync into checkpoint and full validation
* Sort workflow variables into categories and add descriptions
* Split Create instance/volume and Run test into separate jobs
* Copy initial conditions to all jobs in the series
* Actually create a cached state image
* fix(state): use same disk naming convention for all test instances
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
* feat(ci): build each crate individually
* fix(ci): use valid names for each job
* feat(ci): builds and checks with and without all features
* refactor(ci): build job matrix dinamically
* fix: use a JSON_CRATES variable with resulting values
* test: check-matrix
* fix(ci): use "crate" in singular for reference
* imp(ci): use a matrix for feature build arguments
* fix(ci): use correct naming and includes
* fix(ci): implement most recommendations given in review
* fix(ci): use simpler shell script
* fix: typo
* fix: add string to file, not cmd
* fix: some shellchecks
* fix(ci): remove warnings and errors from shellcheck
* imp(ci): add patch file for `Build crates individually` workflow
* Remove unused configs in patch job
Co-authored-by: teor <teor@riseup.net>
* feat(actions): delete old GCP resources
* fix(ci): delete old instances templates
* fix(actions): use correct date arguments and conversion
* fix(actions): missing command in gcloud
* fix(gcp): if an instance can't be deleted, continue
* refacor(action): cleanup and execute monthly
* increase lightwalletd timeout
* switch back to aditya's fork
* manually point to new aditya's lightwalletd image
* disable sync_one_checkpoint_testnet test
* disable restart_stop_at_height in testnet
* rever to 'latest' lightwalletd image
* Remove a duplicate lightwalletd error message
* Reactivate some error messages that have been fixed
* Fix confusing lightwalletd cached state path logs
* Add the gRPC tests to the lightwalletd test suite function
* Make test regexes compatible with zcash/lightwalletd
* Add logging to gRPC tests
* Switch to zcash/lightwalletd for testing
* Upgrade tracing and related dependencies
```sh
cargo upgrade --workspace
tracing-error
tracing-subscrber
color-eyre
tracing-flame
tracing-journald
sentry
sentry-tracing
metrics
metrics-exporter-prometheus
reqwest
```
* Update duplicate dependency checks
* Enable the tracing/env-filter feature
* Fix type inference for metrics
Manual changes, plus:
```sh
fastmod "as _" "as f64"
```
* Tidy up some unrelated test code
* Update metrics-exporter-prometheus API
And make unused dependencies optional.
* Adjust test regexes to new tracing format
Also fix some regex bugs, and refactor to simplify.
* Disable color-eyre span traces and track caller in release builds
* Add a feature that enables extra debugging in release builds
* Clean up some redundant features
* Increase a test timeout