Commit Graph

60 Commits

Author SHA1 Message Date
dependabot[bot] 2117ee403a
build(deps): bump actions/checkout from 3.2.0 to 3.3.0 (#5918)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.2.0 to 3.3.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3.2.0...v3.3.0)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-05 18:08:18 +00:00
dependabot[bot] baf9784876
build(deps): bump shimataro/ssh-key-action from 2.4.0 to 2.5.0 (#5896)
Bumps [shimataro/ssh-key-action](https://github.com/shimataro/ssh-key-action) from 2.4.0 to 2.5.0.
- [Release notes](https://github.com/shimataro/ssh-key-action/releases)
- [Changelog](https://github.com/shimataro/ssh-key-action/blob/develop/CHANGELOG.md)
- [Commits](https://github.com/shimataro/ssh-key-action/compare/v2.4.0...v2.5.0)

---
updated-dependencies:
- dependency-name: shimataro/ssh-key-action
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-02 13:59:00 +00:00
dependabot[bot] 3e00426de4
build(deps): bump actions/checkout from 3.1.0 to 3.2.0 (#5855)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3.1.0...v3.2.0)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 21:24:49 +00:00
teor d8834c010e
fix(ci): Increase full sync jobs and timeout (#5781)
* Remove a redundant sprout full sync job

* Add two new full sync jobs

* Allow the full sync test to run for 48 hours (estimated current time 40-45 hours)
2022-12-06 11:36:05 +10:00
teor a763eec9f3
fix(ci): Fix network parameter in continous-delivery.yml, and add network labels to GCP jobs (#5710)
* chore: add Network as a label

* Fix network parameter in continous-delivery.yml

* Standardise network usage in zcashd-manual-deploy

* Use lowercase network labels

* Fix some shellcheck errors

* Hard-code a Mainnet default to support contexts where env is not available

* Fix string syntax

* Fix more shellcheck errors

* Update .github/workflows/zcashd-manual-deploy.yml

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: Arya <aryasolhi@gmail.com>
2022-11-25 21:11:22 +00:00
Gustavo Valverde c7745415b6
feat(gcp): add label to instances for cost and logs grouping (#5693)
* feat(gcp): add label to instances for cost and logs grouping

Previous behavior:
We couldn't search GCP logs using the instance name if that instance was
already deleted. And if we want to know how we're spending our budget its
also difficult to know if specific tests or type of instances are the one
responsible for a certain % of the costs

Fixes #5153
Fixses #5543

Expected behavior:
Be able to search logs using the test ID or at least the github reference,
and be able to group GCP costs by labels

Solution:
- Add labels to instances

* chore: add Network as a label

* Revert "chore: add Network as a label"

This reverts commit 146f747d50.

* Update .github/workflows/zcashd-manual-deploy.yml

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: teor <teor@riseup.net>
2022-11-24 03:34:31 +00:00
Gustavo Valverde e10b522bf8
ci: filter READY images before using them on test machines (#5696) 2022-11-23 03:53:49 +00:00
Gustavo Valverde 7353a9be5b
fix(ssh): add a fixed SSH key to use with `gcloud` (#5671)
* fix: use a fixed ssh key for `gcloud compute ssh`

* fix: typo

* fix: add missing SSH key installation steps
2022-11-21 18:18:26 +00:00
Gustavo Valverde 7b73aa0c84
ci: use Container-Optimized OS public image on the VM (#5617)
* ci(compute): use debian public image on the VM, not the container

Previous behavior:

We were pulling the debian image the wrong way, as this was being used
as a container but it was meant to be the VM image

The image being pulled to create the internal container has been causing
crashes as this images do not exists on Google's container repositories

Expected behavior:

Use a public image as debian-11 to get multiple benefits from it, as being
able to use machine-images (#5615) and automatic disk resizing (which
is now possible as we're using COS images, but those are more restrictive)

Solution

Add `--image-project=debian-cloud` and `--image-family=debian-11` as
stated in the official documentation: https://cloud.google.com/sdk/gcloud/reference/compute/instances/create-with-container#--image-project

More info: https://cloud.google.com/compute/docs/images/os-details#import

* fix: use a public image with docker on the host

* fix(logs): missing sudo before docker command

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-11-16 23:08:28 +00:00
Gustavo Valverde 844ebf0dbd
feat(ssh): enable OS Login for GCP test instances (#5602)
* feat(ssh): enable OS Login for GCP test instances

* fix(ssh): force service account impersonation for OS Login

* debug: show actual user trying to impersonate SA

* fix(glcloud): configure gcloud before running commands

* fix(ssh): add VM zone to ssh command

* fix(auth): bringing changes from #5614

* fix(auth): impersonation is working as expected now

* fix(gcloud): setup the GCP CLI after authenticating (#5606)

Previous behavior:
`gcloud` commands have been running without an appropiate authentication
as the `auth` auction was sucessfully executed, but the actual gcloud
CLI being used in further jobs was not using the correct configuration
nor credentials

Expected behavior:
All `gcloud` commands should be properly configured and authenticated.

Solution:
Add the `google-github-actions/setup-gcloud` action after each
`google-github-actions/auth` invocation, and before running any `gcloud`
command.

Remove the need of an OAuth Access token when not required by following
steps

* fix(auth): revert to latest version

* fix: wrong replace

* fix(ci): use a specific debian image for VM containers

* fix(ssh): delete generated SSH keys by CI after 30 seconds

* debug: remove debug commands

* fix(compute): use a lightweight container image

* fix(ci): add missing sudo to docker command

* Update .github/workflows/deploy-gcp-tests.yml

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>

* fix(ssh): delete ssh-keys for the specific GHA service account

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
2022-11-16 14:27:09 +00:00
dependabot[bot] cbc2e393ca
build(deps): bump google-github-actions/setup-gcloud from 1.0.0 to 1.0.1 (#5614)
Bumps [google-github-actions/setup-gcloud](https://github.com/google-github-actions/setup-gcloud) from 1.0.0 to 1.0.1.
- [Release notes](https://github.com/google-github-actions/setup-gcloud/releases)
- [Changelog](https://github.com/google-github-actions/setup-gcloud/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/setup-gcloud/compare/v1.0.0...v1.0.1)

---
updated-dependencies:
- dependency-name: google-github-actions/setup-gcloud
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-10 17:16:52 +00:00
Gustavo Valverde a815e9d252
fix(gcloud): setup the GCP CLI after authenticating (#5606)
Previous behavior:
`gcloud` commands have been running without an appropiate authentication
as the `auth` auction was sucessfully executed, but the actual gcloud
CLI being used in further jobs was not using the correct configuration
nor credentials

Expected behavior:
All `gcloud` commands should be properly configured and authenticated.

Solution:
Add the `google-github-actions/setup-gcloud` action after each
`google-github-actions/auth` invocation, and before running any `gcloud`
command.

Remove the need of an OAuth Access token when not required by following
steps
2022-11-10 06:32:21 +00:00
dependabot[bot] be24a364da
build(deps): bump google-github-actions/auth from 0.8.3 to 1.0.0 (#5596)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.8.3 to 1.0.0.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.8.3...v1.0.0)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-10 00:12:39 +00:00
teor f94231fe34
fix(ci): Stop using multiple jobs for quick Google Cloud tests (#5560)
* Only run multiple test jobs if they are needed for a long test

* Remove unused job steps

* Remove trailing whitespace

* Follow logs in the Run step

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-11-07 22:29:37 +00:00
Arya 2cadb7304b
ci(sync): increase the height of blocks for some full sync jobs (#5391)
* adds extra step to CI docker before logs-checkpoint

* replaces logs-1790k with logs-1800k

* Update .github/workflows/deploy-gcp-tests.yml

Co-authored-by: teor <teor@riseup.net>

* Update .github/workflows/deploy-gcp-tests.yml

Co-authored-by: teor <teor@riseup.net>

Co-authored-by: Arya <arya2@users.noreply.github.com>
Co-authored-by: teor <teor@riseup.net>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-10-21 06:01:06 +00:00
dependabot[bot] 3cbf8dafaf
build(deps): bump google-github-actions/auth from 0.8.2 to 0.8.3 (#5413)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.8.2 to 0.8.3.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.8.2...v0.8.3)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-17 21:54:05 +00:00
dependabot[bot] 291b85d8ec
build(deps): bump google-github-actions/auth from 0.8.1 to 0.8.2 (#5404)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.8.1 to 0.8.2.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.8.1...v0.8.2)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-16 12:34:36 +00:00
Gustavo Valverde 25b46ea0ec
ci(disk): use an official GCP image on CI VMs for disk auto-resizing, make CI & CD disks 300GB (#5371)
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"

This reverts commit b366d6e7bb.

* ci(ssh): use sudo for docker commands if user is not root

* ci(ssh): specify the service account to connect with

* ci(ssh): increase the Google Cloud instance sshd connection limit

* chore: add a new line at the end of the script

* chore: update our VM image to bullseye

* chore: fix `tj-actions/changed-files` file comparison

* ci(disk): use an official image on CI VMs for disk auto-resizing

Previous behavior:
We've presented issues in the past with resizing as the device is busy,
for example:

```
e2fsck: Cannot continue, aborting.
/dev/sdb is in use.
```

Expected behavior:
We've been manually resizing the disk as this task was not being done
automatically, but having an official Public Image from GCP would make
this easier (automatic) and it also integrates better with other GCP
services

Configuration differences: https://cloud.google.com/compute/docs/images/os-details#notable-difference-debian

Solution:
- Use `debian-11` from the official public images https://cloud.google.com/compute/docs/images/os-details#debian
- Remove the manual disk resizing from the pipeline

* ci: increase VM disk size to fit future cached states sizes

Some GCP disk images are 160 GB, which means they could get to the current
200 GB size soon.
2022-10-16 08:01:59 -04:00
Gustavo Valverde 658fbd923a
ci(ssh): revert using `ssh-compute` action & increase sshd connection limit (#5367)
* Revert "ci(ssh): connect using `ssh-compute` action by Google (#5330)"

This reverts commit b366d6e7bb.

* ci(ssh): use sudo for docker commands if user is not root

* ci(ssh): specify the service account to connect with

* ci(ssh): increase the Google Cloud instance sshd connection limit

* chore: add a new line at the end of the script

* chore: update our VM image to bullseye

* chore: fix `tj-actions/changed-files` file comparison
2022-10-11 00:11:49 +00:00
dependabot[bot] 58b0ed1d85
build(deps): bump actions/checkout from 3.0.2 to 3.1.0 (#5329)
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.0.2 to 3.1.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v3.0.2...v3.1.0)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-10-05 21:45:06 +00:00
Gustavo Valverde b366d6e7bb
ci(ssh): connect using `ssh-compute` action by Google (#5330)
* refactor(ssh): connect using `ssh-compute` action by Google

Previous behavior:
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`

This was still happening after implementing https://github.com/ZcashFoundation/zebra/pull/5292

Excpected behavior:
Ensure we're not creating SSH key pairs on the fly to improve our connections
guarantees

Solution:
- Enable the Cloud Identity-Aware Proxy API in GCP
- Create a firewall rule to enable connections from IAP
- Grant the required IAM permissions to enable IAP TCP forwarding
- Generate an SSH keys pair and set a private key as an input param
- Set the GitHub Action SA to have authorized ssh connection to the VMs
- Implement the `google-github-actions/ssh-compute` action to connect

* fix(ssh): id `compute-ssh` cannot be used more than once within the same scope

* fix(ci): try to enclose commands to override parsing issues

* tmp: remove ssh_args

* fix(action): secrets must be inherited to be used

* tmp: validate command enclosing fixes executin

* fix(ssh): ssh_args are not implemented correctly

* fix(ssh): login with the root user

* fix(privelege): uso sudo with docker commands

* tmp: add sudo

* fix(ssh): use sudo for all docker commands

* fix(ssh): add missing `sudo` commands

* fix(ssh): get sync height from ssh stdout

* fix(height): get the height correctly
2022-10-05 09:02:40 +00:00
Gustavo Valverde aaad60dec7
ci(deploy): retry ssh connections if it fails (#5292)
Previous behavior
From time to time SSH connections to deployed VMs fails with the following
error: `kex_exchange_identification: Connection closed by remote host`

Expected behavior
If the connection fails, attempt to reconnect once again (or multiple times)

Solution
Add the `ConnectionAttempts` and `ConnectTimeout` with 20 and 5 values
respectively, which attempst to reconnect 19 more times every 5 seconds
2022-09-28 21:45:31 +00:00
teor 01ca74d0fb
Create a new cached state image every 12 hours (#5191)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-09-28 00:33:15 +00:00
teor c57f129df4
Add a full sync job for 1790k blocks (#5166) 2022-09-15 06:18:10 +00:00
teor 2c0f906692
Fix checkpoint disk image names so they are short enough for Google Cloud (#5128) 2022-09-12 21:28:21 +00:00
teor a58b72c92b
fix(ci): Wait 1 day before creating cached state image updates (#5088)
* Increase search range for sync height

* Update sync height regexes for zebrad and lwd cached states

* Add labels to cached state images

* Update deploy-gcp-tests.yml

* Don't create new cached states for lwd updates

* Add a missing line continuation

* Fix a comment

* Revert a mistaken comment change

* Clarify a TODO comment

* Partially revert to old docker height log handling

* Use an output for the cached disk name
2022-09-08 20:25:00 +00:00
teor fb2a1e8595
1. fix(ci): Label lwd cached state images with their sync height (#5086)
* Increase search range for sync height

* Update sync height regexes for zebrad and lwd cached states

* Add labels to cached state images

* Add a missing line continuation
2022-09-07 05:06:34 +00:00
dependabot[bot] 594f4c41cf
build(deps): bump google-github-actions/auth from 0.8.0 to 0.8.1 (#5029)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.8.0 to 0.8.1.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.8.0...v0.8.1)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-01 23:48:28 +00:00
teor e9597c0406
Split a long full sync job (#5001) 2022-08-30 13:42:17 +00:00
teor d692c604b7
Update sync workflow docs for edge cases (#4973) 2022-08-29 05:29:38 +00:00
teor 4cda4eef66
fix(ci): Improve Zebra acceptance test diagnostics (#4958)
* Show the arguments of acceptance test functions in the logs

* Show all the logs in the "Run tests" jobs

* Document expected "broken pipe" error from `tee`

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-08-28 17:08:43 +00:00
teor 6fd3cdb3da
fix(ci): Expand cached state disks before running tests (#4962)
* Expand cached state disks before running tests

* Install partition management tool

* There isn't actually a partition on the cached state image

* Make e2fsck non-interactive

* Limit the length of image names to 63 characters

* Ignore possibly long branch names when matching images, just match the commit
2022-08-28 09:47:42 +00:00
teor 1d861b0d20
fix(ci): Increase full sync timeouts for longer syncs (#4961)
* Increase full sync timeout to 24 hours

Expected sync time is ~21 hours as of August 2022.

* Split final checkpoint job into two smaller jobs to avoid timeouts

Also make regexes easier to read.

* Fix a job name typo
2022-08-28 05:42:20 +10:00
teor aa3b0af15c
Fix a regular expression typo in a full sync job (#4950) 2022-08-26 13:31:10 +10:00
teor 0a39011b88
fix(ci): Write cached state images after update syncs, and use the latest image from any commit (#4949)
* Save cached state on full syncs and updates

* Add an -update suffix to CI images created by updating cached state

* Make disk image names unique by adding a time suffix

* Use the latest image from any branch, but prefer the current commit if available

* Document Zebra's continuous integration tests

* Fix typos in environmental variable names

* Expand documentation

* Fix variable name typo

* Fix shell syntax
2022-08-25 13:09:20 +00:00
teor 7fc3cdd2b2
Increase CI disk size to 200GB (#4945) 2022-08-25 16:41:45 +10:00
Gustavo Valverde bcc325d7f8
ci(auth): retry GCP authentication if fails (#4940)
Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated

Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code

Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.

Note: 95a6bc2a27

Fixes https://github.com/ZcashFoundation/zebra/issues/4846
2022-08-24 03:49:55 +00:00
Alfredo Garcia 9fb87425b7
fix(tests): Update timeout for Zebra sync tests (#4918)
* update timeout

* update the doc comment

* Increase test timeouts for Zebra update syncs

* Stop failing the 1740k job if the cached state is after block 1740k

Co-authored-by: teor <teor@riseup.net>
2022-08-24 10:06:18 +10:00
teor dd273fec70
Make sure Rust tests actually ran in deploy-gcp-tests.yml (#4710)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-08-03 17:22:35 +00:00
teor 65b0a8b6fa
fix(ci): split NU5 sync into two GitHub actions jobs (#4840)
* Split the NU5 job at block 1,740,000

* Fix the split regex

* Fix the lightwalletd sync to tip regex
2022-07-29 00:43:47 +00:00
teor 1cad4c5218
fix(ci): split canopy sync into a separate GitHub actions job (#4838)
* Split Canopy and NU5 sync jobs

* Look for cached state disks for this commit and branch first
2022-07-29 07:07:29 +10:00
teor 89a0410e23
fix(ci): fix hangs in lightwalletd tests by checking concurrent process output in different threads (#4828)
* Make code execution time logs shorter

* Do ZK parameter preloads in the lightwalletd tests that need them

* Try to re-launch `lightwalletd` when it hangs during sync tests

* Increase full sync timeout

* Clear the `zebrad` logs during `lightwalletd` tests, to avoid logging deadlocks

* Actually clear more than one line of logs

* Check zebrad and lightwalletd output in parallel threads, while waiting for zebrad

* Check zebrad and lightwalletd output in parallel threads, while waiting for lightwalletd

* Improve test logging

* Fix a log typo

* Only wait for lightwalletd once, because its logs stop after the initial sync

* Look for cached state disks for this commit and branch first

* Only copy the state once in the send transactions test

* Wait longer for lightwalletd gRPC server startup

* Add some function docs

* cargo fmt --all
2022-07-29 07:06:18 +10:00
teor c27166013d
Split out Canopy logs into a separate job (#4730) 2022-07-06 22:46:26 +00:00
teor 67dc26fbb5
fix(ci): Split Docker logs into sprout, other checkpoints, and full validation (#4704)
* Checkout zebra in each job to avoid warnings

But put TODOs where we might be able to skip checkouts

* Split log following into sprout checkpoints, sapling/orchard checkpoints, and full validation

* Make job IDs shorter

* Use /dev/stderr because docker doesn't have a tty

* remove pipefail

* Revert "remove pipefail"

This reverts commit a7ee37bebdc107a4215e7dd307b189d925969234.

* Make tee ignore errors writing to a grep pipe

* Avoid launching multiple docker instances for duplicate jobs

* Ignore broken pipe error messages and statuses

* fix(ci): docker wait not finding container

We had this issue before, I can't recall if this was a parsing error between GitHub Actions and gcloud `--command` parsing, but we had to change this into two pieces.

This implementation keeps it how we did it before 9b9578c999/.github/workflows/test.yml (L235-L243)

* docs: remove pending TODO

We can't remove  `actions/checkout` nor set `create_credentials_file` to `false` as next steps won't be able to authenticate to GCP.

We can surely remove `actions/checkout` and leave `create_credentials_file` as `true`, but this will raise a warning on each step, and there's no benefit of doing so.

* Show `docker wait` and `gcloud ssh` output

* If `docker wait` fails, get the exit code using `docker inspect`

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-30 10:33:01 +00:00
teor cbd703b3fc
refactor(ci): Split `docker run` into launch, `logs`, and `wait` (#4690)
* Put arguments to "docker run" on different lines

And update some comments.

* Split docker run into launch, logs, and wait

* Remove mistaken "needs state" condition on log and results job

* Exit the ssh and the job with the container test's exit status
2022-06-28 00:36:18 +00:00
teor b35ab67ef0
fix(ci): Split instance and volume creation out of the test job (#4675)
* Split full sync into checkpoint and full validation

* Sort workflow variables into categories and add descriptions

* Split Create instance/volume and Run test into separate jobs

* Copy initial conditions to all jobs in the series
2022-06-23 23:22:52 +00:00
teor 20850b4cb4
fix(ci): actually create a cached state image after running a sync (#4669)
* Actually create a cached state image

* fix(state): use same disk naming convention for all test instances

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-06-22 21:54:37 +00:00
teor ca0520b2e8
change(deps): Upgrade tracing-subscriber and related dependencies (#4517)
* Upgrade tracing and related dependencies

```sh
cargo upgrade --workspace
tracing-error
tracing-subscrber

color-eyre

tracing-flame
tracing-journald

sentry
sentry-tracing

metrics
metrics-exporter-prometheus
reqwest
```

* Update duplicate dependency checks

* Enable the tracing/env-filter feature

* Fix type inference for metrics

Manual changes, plus:
```sh
fastmod "as _" "as f64"
```

* Tidy up some unrelated test code

* Update metrics-exporter-prometheus API

And make unused dependencies optional.

* Adjust test regexes to new tracing format

Also fix some regex bugs, and refactor to simplify.

* Disable color-eyre span traces and track caller in release builds

* Add a feature that enables extra debugging in release builds

* Clean up some redundant features

* Increase a test timeout
2022-06-01 13:53:51 +10:00
Dimitris Apostolou b4eb7b9509
Fix typo (#4527) 2022-05-30 11:59:34 +10:00
Gustavo Valverde 374fb7b34f
refactor(ci): allow more time for tests to end gracefully (#4469)
* refactor(ci): keep tests jobs under the 6 hour timeout

When running a full sync or any other test which takes almost 5 hours, having those jobs running with other actions that might take several minutes, also reduces the overall time from the job_id.

We use a separate job for image creation and deletion to handle this cases.

* fix(ci): instance deletion can't run on non finished tests

* fix(ci): tests without a cached state might save to disk

* fix(ci): ignore failures when deleting an instance

* fix(ci): remove delete step `needs` redundancy

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-05-26 06:12:45 +00:00