* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"
This reverts commit b366d6e7bb.
* ci(ssh): use sudo for docker commands if user is not root
* ci(ssh): specify the service account to connect with
* ci(ssh): increase the Google Cloud instance sshd connection limit
* chore: add a new line at the end of the script
* chore: update our VM image to bullseye
* chore: fix `tj-actions/changed-files` file comparison
* ci(disk): use an official image on CI VMs for disk auto-resizing
Previous behavior:
We've presented issues in the past with resizing as the device is busy,
for example:
```
e2fsck: Cannot continue, aborting.
/dev/sdb is in use.
```
Expected behavior:
We've been manually resizing the disk as this task was not being done
automatically, but having an official Public Image from GCP would make
this easier (automatic) and it also integrates better with other GCP
services
Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian
Solution:
- Use `debian-11` from the official public images https://cloud.google.com/compute/docs/images/os-details#debian
- Remove the manual disk resizing from the pipeline
* ci: increase VM disk size to fit future cached states sizes
Some GCP disk images are 160 GB, which means they could get to the current
200 GB size soon.
* ci(concurrency)!: run a single CI workflow as required
Previous behavior:
Multiple Mainnet full syncs were able to run on the main branch at the
same time, and pushing multiple commits to the same branch would run
multiple CI workflows, when only the run from last commit was relevant
Expected behavior:
Ensure that only a single CI workflow runs at the same time in PRs.
The latest commit should cancel any previous running workflows from the
same PR.
Solution:
Use GitHub actions concurrency feature https://docs.github.com/en/actions/using-jobs/using-concurrency
Fixes https://github.com/ZcashFoundation/zebra/issues/4977
Fixes https://github.com/ZcashFoundation/zebra/issues/4857
* docs: typo
* ci(concurrency): do not cancel running full syncs
Co-authored-by: teor <teor@riseup.net>
* fix(concurrency): explain the behavior better & add new ones
Co-authored-by: teor <teor@riseup.net>
Previous behavior:
When a push was detected in the `main` branch, the workflow would run the
`versioning` job and crash trying to detect the version being deployed as
there was none.
Expected behavior:
Do not fail the `versioning` job when pushing to `main`
Solution:
Limit the `versioning` job to only run when a release event is triggered
and allow the `deploy-nodes` job to run even if `versioning` is skipped
* feat(build): deploy long running instances on release
Previous behavior:
Each time we merged to main new nodes would be deployed, this is an
expected behavior as we need to ensure nodes get deployed and run
without issues, but this could also replace nodes very hastily.
Expected behavior:
We want instances which would run for a longer time, to allow us to
troubleshoot issues or inspect the behavior of this instances for longer
periods of time (2+ weeks)
Applied solution:
Deploy a versioned manage instance group (MiG) using the major version
of the release semver. We just use the first part of the version to
replace old instances, and change it when a major version is released
to keep a segregation between new and old versions.
* ci(build): allow v0 as a major version tag
* fix(build): use rust conventions for versioning
* fix(deploy): improve documentation and trigger on release
* Update .github/workflows/continous-delivery.yml
Co-authored-by: teor <teor@riseup.net>
* fix(versioning): typo
* fix(deploy): use `zebrad-v1` as the instance name, with no SHA
* fix(deploy): create and update MiG must use the same name
* docs(deployments): add Continuous Delivery process
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated
Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code
Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.
Note: 95a6bc2a27
Fixes https://github.com/ZcashFoundation/zebra/issues/4846