Commit Graph

37 Commits

Author SHA1 Message Date
teor c57f129df4
Add a full sync job for 1790k blocks (#5166) 2022-09-15 06:18:10 +00:00
teor 2c0f906692
Fix checkpoint disk image names so they are short enough for Google Cloud (#5128) 2022-09-12 21:28:21 +00:00
teor a58b72c92b
fix(ci): Wait 1 day before creating cached state image updates (#5088)
* Increase search range for sync height

* Update sync height regexes for zebrad and lwd cached states

* Add labels to cached state images

* Update deploy-gcp-tests.yml

* Don't create new cached states for lwd updates

* Add a missing line continuation

* Fix a comment

* Revert a mistaken comment change

* Clarify a TODO comment

* Partially revert to old docker height log handling

* Use an output for the cached disk name
2022-09-08 20:25:00 +00:00
teor fb2a1e8595
1. fix(ci): Label lwd cached state images with their sync height (#5086)
* Increase search range for sync height

* Update sync height regexes for zebrad and lwd cached states

* Add labels to cached state images

* Add a missing line continuation
2022-09-07 05:06:34 +00:00
dependabot[bot] 594f4c41cf
build(deps): bump google-github-actions/auth from 0.8.0 to 0.8.1 (#5029)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.8.0 to 0.8.1.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.8.0...v0.8.1)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-01 23:48:28 +00:00
teor e9597c0406
Split a long full sync job (#5001) 2022-08-30 13:42:17 +00:00
teor d692c604b7
Update sync workflow docs for edge cases (#4973) 2022-08-29 05:29:38 +00:00
teor 4cda4eef66
fix(ci): Improve Zebra acceptance test diagnostics (#4958)
* Show the arguments of acceptance test functions in the logs

* Show all the logs in the "Run tests" jobs

* Document expected "broken pipe" error from `tee`

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-08-28 17:08:43 +00:00
teor 6fd3cdb3da
fix(ci): Expand cached state disks before running tests (#4962)
* Expand cached state disks before running tests

* Install partition management tool

* There isn't actually a partition on the cached state image

* Make e2fsck non-interactive

* Limit the length of image names to 63 characters

* Ignore possibly long branch names when matching images, just match the commit
2022-08-28 09:47:42 +00:00
teor 1d861b0d20
fix(ci): Increase full sync timeouts for longer syncs (#4961)
* Increase full sync timeout to 24 hours

Expected sync time is ~21 hours as of August 2022.

* Split final checkpoint job into two smaller jobs to avoid timeouts

Also make regexes easier to read.

* Fix a job name typo
2022-08-28 05:42:20 +10:00
teor aa3b0af15c
Fix a regular expression typo in a full sync job (#4950) 2022-08-26 13:31:10 +10:00
teor 0a39011b88
fix(ci): Write cached state images after update syncs, and use the latest image from any commit (#4949)
* Save cached state on full syncs and updates

* Add an -update suffix to CI images created by updating cached state

* Make disk image names unique by adding a time suffix

* Use the latest image from any branch, but prefer the current commit if available

* Document Zebra's continuous integration tests

* Fix typos in environmental variable names

* Expand documentation

* Fix variable name typo

* Fix shell syntax
2022-08-25 13:09:20 +00:00
teor 7fc3cdd2b2
Increase CI disk size to 200GB (#4945) 2022-08-25 16:41:45 +10:00
Gustavo Valverde bcc325d7f8
ci(auth): retry GCP authentication if fails (#4940)
Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated

Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code

Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.

Note: 95a6bc2a27

Fixes https://github.com/ZcashFoundation/zebra/issues/4846
2022-08-24 03:49:55 +00:00
Alfredo Garcia 9fb87425b7
fix(tests): Update timeout for Zebra sync tests (#4918)
* update timeout

* update the doc comment

* Increase test timeouts for Zebra update syncs

* Stop failing the 1740k job if the cached state is after block 1740k

Co-authored-by: teor <teor@riseup.net>
2022-08-24 10:06:18 +10:00
teor dd273fec70
Make sure Rust tests actually ran in deploy-gcp-tests.yml (#4710)
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-08-03 17:22:35 +00:00
teor 65b0a8b6fa
fix(ci): split NU5 sync into two GitHub actions jobs (#4840)
* Split the NU5 job at block 1,740,000

* Fix the split regex

* Fix the lightwalletd sync to tip regex
2022-07-29 00:43:47 +00:00
teor 1cad4c5218
fix(ci): split canopy sync into a separate GitHub actions job (#4838)
* Split Canopy and NU5 sync jobs

* Look for cached state disks for this commit and branch first
2022-07-29 07:07:29 +10:00
teor 89a0410e23
fix(ci): fix hangs in lightwalletd tests by checking concurrent process output in different threads (#4828)
* Make code execution time logs shorter

* Do ZK parameter preloads in the lightwalletd tests that need them

* Try to re-launch `lightwalletd` when it hangs during sync tests

* Increase full sync timeout

* Clear the `zebrad` logs during `lightwalletd` tests, to avoid logging deadlocks

* Actually clear more than one line of logs

* Check zebrad and lightwalletd output in parallel threads, while waiting for zebrad

* Check zebrad and lightwalletd output in parallel threads, while waiting for lightwalletd

* Improve test logging

* Fix a log typo

* Only wait for lightwalletd once, because its logs stop after the initial sync

* Look for cached state disks for this commit and branch first

* Only copy the state once in the send transactions test

* Wait longer for lightwalletd gRPC server startup

* Add some function docs

* cargo fmt --all
2022-07-29 07:06:18 +10:00
teor c27166013d
Split out Canopy logs into a separate job (#4730) 2022-07-06 22:46:26 +00:00
teor 67dc26fbb5
fix(ci): Split Docker logs into sprout, other checkpoints, and full validation (#4704)
* Checkout zebra in each job to avoid warnings

But put TODOs where we might be able to skip checkouts

* Split log following into sprout checkpoints, sapling/orchard checkpoints, and full validation

* Make job IDs shorter

* Use /dev/stderr because docker doesn't have a tty

* remove pipefail

* Revert "remove pipefail"

This reverts commit a7ee37bebdc107a4215e7dd307b189d925969234.

* Make tee ignore errors writing to a grep pipe

* Avoid launching multiple docker instances for duplicate jobs

* Ignore broken pipe error messages and statuses

* fix(ci): docker wait not finding container

We had this issue before, I can't recall if this was a parsing error between GitHub Actions and gcloud `--command` parsing, but we had to change this into two pieces.

This implementation keeps it how we did it before 9b9578c999/.github/workflows/test.yml (L235-L243)

* docs: remove pending TODO

We can't remove  `actions/checkout` nor set `create_credentials_file` to `false` as next steps won't be able to authenticate to GCP.

We can surely remove `actions/checkout` and leave `create_credentials_file` as `true`, but this will raise a warning on each step, and there's no benefit of doing so.

* Show `docker wait` and `gcloud ssh` output

* If `docker wait` fails, get the exit code using `docker inspect`

Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-06-30 10:33:01 +00:00
teor cbd703b3fc
refactor(ci): Split `docker run` into launch, `logs`, and `wait` (#4690)
* Put arguments to "docker run" on different lines

And update some comments.

* Split docker run into launch, logs, and wait

* Remove mistaken "needs state" condition on log and results job

* Exit the ssh and the job with the container test's exit status
2022-06-28 00:36:18 +00:00
teor b35ab67ef0
fix(ci): Split instance and volume creation out of the test job (#4675)
* Split full sync into checkpoint and full validation

* Sort workflow variables into categories and add descriptions

* Split Create instance/volume and Run test into separate jobs

* Copy initial conditions to all jobs in the series
2022-06-23 23:22:52 +00:00
teor 20850b4cb4
fix(ci): actually create a cached state image after running a sync (#4669)
* Actually create a cached state image

* fix(state): use same disk naming convention for all test instances

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-06-22 21:54:37 +00:00
teor ca0520b2e8
change(deps): Upgrade tracing-subscriber and related dependencies (#4517)
* Upgrade tracing and related dependencies

```sh
cargo upgrade --workspace
tracing-error
tracing-subscrber

color-eyre

tracing-flame
tracing-journald

sentry
sentry-tracing

metrics
metrics-exporter-prometheus
reqwest
```

* Update duplicate dependency checks

* Enable the tracing/env-filter feature

* Fix type inference for metrics

Manual changes, plus:
```sh
fastmod "as _" "as f64"
```

* Tidy up some unrelated test code

* Update metrics-exporter-prometheus API

And make unused dependencies optional.

* Adjust test regexes to new tracing format

Also fix some regex bugs, and refactor to simplify.

* Disable color-eyre span traces and track caller in release builds

* Add a feature that enables extra debugging in release builds

* Clean up some redundant features

* Increase a test timeout
2022-06-01 13:53:51 +10:00
Dimitris Apostolou b4eb7b9509
Fix typo (#4527) 2022-05-30 11:59:34 +10:00
Gustavo Valverde 374fb7b34f
refactor(ci): allow more time for tests to end gracefully (#4469)
* refactor(ci): keep tests jobs under the 6 hour timeout

When running a full sync or any other test which takes almost 5 hours, having those jobs running with other actions that might take several minutes, also reduces the overall time from the job_id.

We use a separate job for image creation and deletion to handle this cases.

* fix(ci): instance deletion can't run on non finished tests

* fix(ci): tests without a cached state might save to disk

* fix(ci): ignore failures when deleting an instance

* fix(ci): remove delete step `needs` redundancy

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-05-26 06:12:45 +00:00
Gustavo Valverde abef3842ce
fix(ci): mount the lwd-cache dir to the `lightwalletd-full-sync` (#4486)
* fix(ci): allow for the lightwalletd-full-sync to mount the lwd-cache dir

* fix(ci): compare with a string

* imp(ci): run a lightwalletd tip if there's no lwd tip disk available

* docs(ci): add TODO explaining this is a  temporal condition
2022-05-25 11:39:03 -04:00
dependabot[bot] 703e621734
build(deps): bump google-github-actions/auth from 0.7.3 to 0.8.0 (#4478)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.7.3 to 0.8.0.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.7.3...v0.8.0)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-24 19:16:18 +00:00
teor 712ef40438
3. Require network names in cached state disk names (#4392)
* Require a cached state rebuild if the state version changes

* Find cached state disks with the same state version

And prefer `main` to other branches.

* Tweak filters to make them more specific

* Try adding inner quotes

* Try brackets instead

* Try two filters, rather than three

* Use Mainnet as the default network, remove duplicate env var

* Match the exact disk name format in one regular expression

* Log the exact expected disk name, including the network

* Consistently use CACHED_DISK_NAME as the env var name

* Temporary allow missing $NETWORK in disk names

* Print the exact search string

* Debug log the search string

* Use a generic alphabetical pattern rather than a regex group

Google Cloud doesn't seem to support regex groups.

* Add network name to disk match docs

* Fix the logged network name

* Make jobs that use cached state wait for state rebuilds

* Run jobs that need cached state even if the rebuild was skipped

* Fix missing dependencies

And update a TODO

* Revert "Use a generic alphabetical pattern rather than a regex group"

This reverts commit 970afe7b17.

* Revert "Temporary allow missing $NETWORK in disk names"

This reverts commit f1f66500c3.

* Make jobs that use cached state wait for state rebuilds

* Run jobs that need cached state even if the rebuild was skipped

* Fix missing dependencies

And update a TODO

* refactor(ci): look for available disks instead of files changed

This ensure that if the constants.rs file was changed, we search for disks available in the whole repository with the same state.

If there's no disk available a rebuild is triggered depending the missing disk. And if there's a disk available, tests are run with this one.

* fix(ci): lwd syncs needs to wait for zebra disk rebuild

* docs(ci): use better comments on integration tests

* fix(ci): we must authenticate to GCP to find disks

* fix(ci): add needed permissions for google auth

* fix(ci): the output needs to be echoed

* imp(ci): reduce diff with main

* fix(ci): remove redundant dependency

Co-authored-by: teor <teor@riseup.net>

* fix(ci): also add `false` to the JSON object output

* fix(ci): hasty copy/paste

* fix(ci): standardize comments

* fix(ci): run disk rebuilds if no disk was found

* fix(ci): build on any event if a cached disk is not found

* fix(ci): reduce diff with main

* docs(ci): reduce main diff

* fix(ci): sync .patch file with changes on the workflow

* fix(ci): consider network changes in new get-available-disks

* force GHA trigger

Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-05-20 00:44:11 +00:00
dependabot[bot] 2d7b7c2c5b
build(deps): bump google-github-actions/auth from 0.7.2 to 0.7.3 (#4419)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.7.2 to 0.7.3.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.7.2...v0.7.3)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-18 21:38:39 +00:00
dependabot[bot] acccb9896c
build(deps): bump google-github-actions/auth from 0.7.1 to 0.7.2 (#4404)
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-05-17 20:28:38 +00:00
Dimitris Apostolou 59ae77d04b
Fix typos (#4397) 2022-05-16 05:33:08 +00:00
Gustavo Valverde 77529a8cbd
feat(ci): add `lightwalletd_update_sync` test to CI (#4269)
* fix(ci): lwd state condition

* fix(ci): differentiate tests that need a lwd cached state

* fix(ci): use the right state and save name for each test

* docs(ci): minor comment fixes

* docs(ci): better input description

* fix(ci): end `if` condition correctly

* fix(images): pass the state version to following steps

* fix(ci): $needs_lwd_state condition was inverted

* fix(ci): reduce disk selection code

* docs(ci): better disk search conditional explanation

* fix(ci): end if condition correctly

* fix(ci): evaluate $needs_zebra_state correctly

* fix(ci): use nested condition for readability

* fix(ci): disk search was using the wrong variable
2022-05-13 18:02:05 -04:00
Gustavo Valverde e3a65d86e0
feat(ci): add `lightwalletd_full_sync` test to CI (#4268)
* Temporarily use an earlier lightwalletd version

This checks if commit
e146dbf5c2
contains a mempool refresh deadlock bug.

* Actually rebuild the lightwalletd image

* Delete an unfinished comment

* Remove duplicate test in entrypoint.sh

* Keep a recent change to make tests consistent

* fix(ci): remove not used variable `lwd_state_dir`

* fix(ci): state wast not being added to the image name

* fix(ci): mount a docker volume with lightwalletd dir

If the volume doesn't mount this lwd cached state dir, the content won't be saved to the mounted disk in the VM

* fix(ci): lwd state condition

* docs(ci): explain disk mounting logic

* docs(ci): explain disk mounting decision better

* docs(ci): add a description for confusing input names

Co-authored-by: teor <teor@riseup.net>
2022-05-13 15:20:17 +00:00
teor d0ef9b3dc0
0. fix(ci): only use cached state disks with the same state version (#4391)
* Require a cached state rebuild if the state version changes

* Find cached state disks with the same state version

And prefer `main` to other branches.

* Tweak filters to make them more specific

* Try adding inner quotes

* Try brackets instead

* Try two filters, rather than three

* Use Mainnet as the default network, remove duplicate env var

* Match the exact disk name format in one regular expression

* Log the exact expected disk name, including the network

* Consistently use CACHED_DISK_NAME as the env var name

* Temporary allow missing $NETWORK in disk names

* Print the exact search string

* Debug log the search string

* Use a generic alphabetical pattern rather than a regex group

Google Cloud doesn't seem to support regex groups.

* Add network name to disk match docs

* Fix the logged network name

* imp(ci): remove gcp verbose log

Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-05-13 03:07:37 +00:00
Gustavo Valverde 228f16be50
refactor(actions): rename workflow files (#3941)
* refactor(actions): rename workflow files

* refactor(worflows): change files according new approach
2022-05-09 15:54:16 -04:00