* only run the send transaction test on the main branch
* adds patch job
* Add concurrency rule to the send transactions test
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* Allow send tx test to be triggered manually
Co-authored-by: teor <teor@riseup.net>
* test all features
* increase task timeout
* add a new task for feature tests
* add `getblocktemplate-rpcs` tests to docker integration
* run the getblocktemplate-rpcs feature as a separated step in docker
* move getblocktemplate-rpcs in docker as a separated task
* ci(sync): only run the `lightwalletd` full sync on the `main` branch
Previous behavior:
In PR #5164, we made lightwalletd sync all the way to the tip in its full
sync test.
This increases that test's time from 1 hour to 4 hours, which makes the CI
we run on each PR change increase from 3 hours to 6 hours.
Expected behavior:
Run the lightwalletd full sync just on `main` or if a state disk for the
actual version is not found.
Solution:
Add the `github.event_name == 'push' && github.ref_name == 'main'` condition
to the `lightwalletd-full-sync` test.
Fixes#5316
* Allow lwd full syncs to be triggered manually (#5400)
* Limit checkpoint and lwd full sync concurrency
* Add a patch job for lightwalletd-full-sync
Co-authored-by: teor <teor@riseup.net>
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"
This reverts commit b366d6e7bb.
* ci(ssh): use sudo for docker commands if user is not root
* ci(ssh): specify the service account to connect with
* ci(ssh): increase the Google Cloud instance sshd connection limit
* chore: add a new line at the end of the script
* chore: update our VM image to bullseye
* chore: fix `tj-actions/changed-files` file comparison
* ci(disk): use an official image on CI VMs for disk auto-resizing
Previous behavior:
We've presented issues in the past with resizing as the device is busy,
for example:
```
e2fsck: Cannot continue, aborting.
/dev/sdb is in use.
```
Expected behavior:
We've been manually resizing the disk as this task was not being done
automatically, but having an official Public Image from GCP would make
this easier (automatic) and it also integrates better with other GCP
services
Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian
Solution:
- Use `debian-11` from the official public images https://cloud.google.com/compute/docs/images/os-details#debian
- Remove the manual disk resizing from the pipeline
* ci: increase VM disk size to fit future cached states sizes
Some GCP disk images are 160 GB, which means they could get to the current
200 GB size soon.
* Increment Zebra versions
* Initial draft changelog
* Add blog post to the release checklist
* Say "user testing"
Co-authored-by: teor <teor@riseup.net>
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"
This reverts commit b366d6e7bb.
* ci(ssh): use sudo for docker commands if user is not root
* ci(ssh): specify the service account to connect with
* ci(ssh): increase the Google Cloud instance sshd connection limit
* chore: add a new line at the end of the script
* chore: update our VM image to bullseye
* chore: fix `tj-actions/changed-files` file comparison
* Add latest and edge tags to Docker images
* Document how latest tag actually works
* Try a different syntax for is_default_branch
* Try again
* One last try
* Revert changes that don't work
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Add a Docker run command to the README
* Update the README with some user-relevant release candidate goals
* Update the release template for the release candidate
* Fix beta crate explanation
* Be more specific about what "this PR" means
* Update docker command for latest tag changes
* Update README Docker command based on tag changes
* Make Zebra release versions more vague in README.md
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* Move build instructions to build section
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* Add newlines to separate heading and paragraphs
* Remove extra newline
* Add a note for a future command update
* Remove manual build check, it doesn't have tier 1 support
Co-authored-by: Pili Guerra <mpguerra@users.noreply.github.com>
* refactor(ssh): connect using `ssh-compute` action by Google
Previous behavior:
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`
This was still happening after implementing https://github.com/ZcashFoundation/zebra/pull/5292
Excpected behavior:
Ensure we're not creating SSH key pairs on the fly to improve our connections
guarantees
Solution:
- Enable the Cloud Identity-Aware Proxy API in GCP
- Create a firewall rule to enable connections from IAP
- Grant the required IAM permissions to enable IAP TCP forwarding
- Generate an SSH keys pair and set a private key as an input param
- Set the GitHub Action SA to have authorized ssh connection to the VMs
- Implement the `google-github-actions/ssh-compute` action to connect
* fix(ssh): id `compute-ssh` cannot be used more than once within the same scope
* fix(ci): try to enclose commands to override parsing issues
* tmp: remove ssh_args
* fix(action): secrets must be inherited to be used
* tmp: validate command enclosing fixes executin
* fix(ssh): ssh_args are not implemented correctly
* fix(ssh): login with the root user
* fix(privelege): uso sudo with docker commands
* tmp: add sudo
* fix(ssh): use sudo for all docker commands
* fix(ssh): add missing `sudo` commands
* fix(ssh): get sync height from ssh stdout
* fix(height): get the height correctly
Previous behavior:
The following error was causing an exit 1 in GitHub Actions when a pushing
to the `main` branch
```
Error: Similar commit hashes detected: previous sha is equivalent to the
current sha
```
Expeceted behavior:
Allow the linter to run succesfully even if the previous SHA has no files
changed
Solution:
Add `fetch-depth: 2` to retrieve the preceding commit
Previous behavior
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`
Expected behavior
If the connection fails, attempt to reconnect once again (or multiple times)
Solution
Add the `ConnectionAttempts` and `ConnectTimeout` with 20 and 5 values
respectively, which attempst to reconnect 19 more times every 5 seconds
* Explain how to use the release template
I always have to look this up every time.
Also delete a long description of semantic versioning.
* Delete extra info that is already in the template elsewhere
* Explain how `Cargo.lock` gets updated
* Use a branch name that Google Cloud will accept
* Update release instructions for Docker binaries
* Add extra release testing steps
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
* Combine high and medium queues into a batched queue
* Explain how to check config syntax
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Update release-drafter.yml
* Explain where we got the workflow from
* Automatically add "trivial" label to dependabot updates
* Add categories and auto-labels to release drafter
* Update release PR template for automatic release drafter versions
* Also strip PR series numbers and leading spaces from changelog entries
* Update release note version check
* Update label names
* Add missing ! in conventional commits regex
Co-authored-by: Marek <mail@marek.onl>
* Make versioning steps more specific
Co-authored-by: Marek <mail@marek.onl>
* Remove conflicting detailed versioning explanations
Co-authored-by: Marek <mail@marek.onl>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Use branch then main build caches
* Revert cache order to try branch cache first
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Try running coverage with Rust 1.63
* Run GitHub Actions tests with Rust 1.63
* Change from stable to 1.63 in the patch file
* Use Rust 1.63 to download Zcash parameters
* Use Rust 1.63 to build Docker zebrad images
* Make Rust 1.63 a supported platform, and make stable temporarily unsupported
* Delete test instances after 3 days
* Use correct delete command, improve shell quoting
* Use sed to provide the correct zone or region
* Fix quoting
* Fix IFS
* Fix IFS for multiple disks
* Document why we can't quote some shell variables
* Document that instances can get deleted
* Fix exact names in deletion docs
* feat(release): create Docker hub binaries when tagging
* fix(release): add a release workflow for binaries
* fix(release): trigger on tag creation, not pushing to it
* fix(release): use the same conditions for logging into DockerHub
* fix(release): add missing parameter to access GH secrets
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* ci(release): just publish to DockerHub when a release is published
* Apply suggestions from code review
Co-authored-by: teor <teor@riseup.net>
* ci(release): filter prerelease event correctly
* ci(release): fix tags
* ci(release): use `zebra` and not `zebrad` as the repository
* ci(release): do not try to login to Docker if not a release
* Update .github/workflows/build-docker-image.yml
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: teor <teor@riseup.net>
* Increase search range for sync height
* Update sync height regexes for zebrad and lwd cached states
* Add labels to cached state images
* Update deploy-gcp-tests.yml
* Don't create new cached states for lwd updates
* Add a missing line continuation
* Fix a comment
* Revert a mistaken comment change
* Clarify a TODO comment
* Partially revert to old docker height log handling
* Use an output for the cached disk name
* Disable fmt cache and create shared clippy cache
* Make Cargo.lock check use the shared clippy cache
* Add a TODO for Windows Rust cache path
* Fix quoting for Windows path
* Use correct sharedKey spelling
* Increase search range for sync height
* Update sync height regexes for zebrad and lwd cached states
* Add labels to cached state images
* Add a missing line continuation
Previous behavior:
The `tj-actions/changed-files` crashed when making pushes to main, as no
fetch depth was defined on the previous checkout action. Which is now r
required after b216561b5b
Expected behavior:
Do not fail with this new requirement
Solution:
Change the chekout action `fetch-depth` to 2, allowing to compare with
the previous commit
* bump prost, tonic and tonic-build
* add protoc as a dependency step in the CI
* bump console-subscriber
* add protoc to `build-crates-individually`
* add protoc to docs build
* install protoc in lint.yml
* change protoc installation location in lint.yml
* add protoc to `Check Cargo.lock is up to date`
* ci(build): keep protoc pinned to the same major version
* ci(build): avoid rate limiting with `arduino/setup-protoc@v1`
* cargo upgrade --workspace console-subscriber
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: teor <teor@riseup.net>
* Fix delete GCP resources commands
* Don't create a GCP credentials file
* Keep the latest 2 images
* Explain time
* Show the names of disks that are being deleted
* Actually run the image delete steps
* Only delete commit-based instance templates
* Document automated deletion
Previous behavior:
Disk images are being accumulated in GCP for a few years, but this
generates unneeded costs as we're not using images older than 1-2 weeks.
Expected behavior:
Delete previously unused images based on a timefrime.
Solution:
Delete images created on a pull request older than 30 days, from the
`main` branch if older than 60 days, and any other image older than 90
days.
A TODO is on place as we'd like to keep at least the 2 latest images of
each type (zebra checkpoint, zebra tip, lwd tip). Once we've excluded
those images, we can delete any older images after 1 week.
Previous behavior:
As we disabled beta Rust tests in PR #4930, because the parameter
downloads were unstable with beta Rust, we're no longer testing it.
Expected behavior:
Re-enable beta rust tests in CI OSes
Solution:
Remove the parameter exluding beta Rust
* ci(concurrency)!: run a single CI workflow as required
Previous behavior:
Multiple Mainnet full syncs were able to run on the main branch at the
same time, and pushing multiple commits to the same branch would run
multiple CI workflows, when only the run from last commit was relevant
Expected behavior:
Ensure that only a single CI workflow runs at the same time in PRs.
The latest commit should cancel any previous running workflows from the
same PR.
Solution:
Use GitHub actions concurrency feature https://docs.github.com/en/actions/using-jobs/using-concurrency
Fixes https://github.com/ZcashFoundation/zebra/issues/4977
Fixes https://github.com/ZcashFoundation/zebra/issues/4857
* docs: typo
* ci(concurrency): do not cancel running full syncs
Co-authored-by: teor <teor@riseup.net>
* fix(concurrency): explain the behavior better & add new ones
Co-authored-by: teor <teor@riseup.net>
Previous behavior:
When a push was detected in the `main` branch, the workflow would run the
`versioning` job and crash trying to detect the version being deployed as
there was none.
Expected behavior:
Do not fail the `versioning` job when pushing to `main`
Solution:
Limit the `versioning` job to only run when a release event is triggered
and allow the `deploy-nodes` job to run even if `versioning` is skipped
* Show the arguments of acceptance test functions in the logs
* Show all the logs in the "Run tests" jobs
* Document expected "broken pipe" error from `tee`
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* feat(build): deploy long running instances on release
Previous behavior:
Each time we merged to main new nodes would be deployed, this is an
expected behavior as we need to ensure nodes get deployed and run
without issues, but this could also replace nodes very hastily.
Expected behavior:
We want instances which would run for a longer time, to allow us to
troubleshoot issues or inspect the behavior of this instances for longer
periods of time (2+ weeks)
Applied solution:
Deploy a versioned manage instance group (MiG) using the major version
of the release semver. We just use the first part of the version to
replace old instances, and change it when a major version is released
to keep a segregation between new and old versions.
* ci(build): allow v0 as a major version tag
* fix(build): use rust conventions for versioning
* fix(deploy): improve documentation and trigger on release
* Update .github/workflows/continous-delivery.yml
Co-authored-by: teor <teor@riseup.net>
* fix(versioning): typo
* fix(deploy): use `zebrad-v1` as the instance name, with no SHA
* fix(deploy): create and update MiG must use the same name
* docs(deployments): add Continuous Delivery process
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Expand cached state disks before running tests
* Install partition management tool
* There isn't actually a partition on the cached state image
* Make e2fsck non-interactive
* Limit the length of image names to 63 characters
* Ignore possibly long branch names when matching images, just match the commit
* Increase full sync timeout to 24 hours
Expected sync time is ~21 hours as of August 2022.
* Split final checkpoint job into two smaller jobs to avoid timeouts
Also make regexes easier to read.
* Fix a job name typo
Previous behavior:
If warnings or error are added in `.cargo/config.toml` or `clippy.toml`,
and those could generate CI failures, we wouldn't catch those new as the
pipelines are not run when this files are changed
Expected behavior:
If warnings or error are added in `.cargo/config.toml` or `clippy.toml`,
run all the builds and test jobs which also track a `Cargo.toml`.
Solution:
Add `.cargo/config.toml` and `clippy.toml` as paths to all the required
jobs which needs to be triggered when these files changes.
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Save cached state on full syncs and updates
* Add an -update suffix to CI images created by updating cached state
* Make disk image names unique by adding a time suffix
* Use the latest image from any branch, but prefer the current commit if available
* Document Zebra's continuous integration tests
* Fix typos in environmental variable names
* Expand documentation
* Fix variable name typo
* Fix shell syntax
Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated
Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code
Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.
Note: 95a6bc2a27
Fixes https://github.com/ZcashFoundation/zebra/issues/4846
* update timeout
* update the doc comment
* Increase test timeouts for Zebra update syncs
* Stop failing the 1740k job if the cached state is after block 1740k
Co-authored-by: teor <teor@riseup.net>
* Apply the same Rust logging settings to all GitHub workflows
* Enable full optimisations in dev builds for downloading large parameter files
* Disable beta Rust tests in CI