Previous behavior:
Sometimes Google Cloud authentication fails, this might happen before
IAM permissions are fully propagated
Expected behavior:
If the authentication fails, retry at least 3 times before exiting with
a non zero exit code
Applied solution:
Google GitHub Actions for auth recently added this a `retries` feature
which is now implemented to workaround this issue.
Note: 95a6bc2a27
Fixes https://github.com/ZcashFoundation/zebra/issues/4846
* update timeout
* update the doc comment
* Increase test timeouts for Zebra update syncs
* Stop failing the 1740k job if the cached state is after block 1740k
Co-authored-by: teor <teor@riseup.net>
* Apply the same Rust logging settings to all GitHub workflows
* Enable full optimisations in dev builds for downloading large parameter files
* Disable beta Rust tests in CI
* Make code execution time logs shorter
* Do ZK parameter preloads in the lightwalletd tests that need them
* Try to re-launch `lightwalletd` when it hangs during sync tests
* Increase full sync timeout
* Clear the `zebrad` logs during `lightwalletd` tests, to avoid logging deadlocks
* Actually clear more than one line of logs
* Check zebrad and lightwalletd output in parallel threads, while waiting for zebrad
* Check zebrad and lightwalletd output in parallel threads, while waiting for lightwalletd
* Improve test logging
* Fix a log typo
* Only wait for lightwalletd once, because its logs stop after the initial sync
* Look for cached state disks for this commit and branch first
* Only copy the state once in the send transactions test
* Wait longer for lightwalletd gRPC server startup
* Add some function docs
* cargo fmt --all
* Fix clippy::let_and_return
* Increase lightwalletd test timeouts for zebrad slowness
* Add a `zebrad_update_sync()` test, that update syncs Zebra without lightwalletd
* Run the zebrad-update-sync test in CI
* Add extra zebrad time to workaround lightwalletd bugs
* Initialize the rayon threadpool with a new config for CPU-bound threads
* Verify proofs and signatures on the rayon thread pool
* Only spawn one concurrent batch per verifier, for now
* Allow tower-batch to queue multiple batches
* Fix up a potentially incorrect comment
* Rename some variables for concurrent batches
* Spawn multiple batches concurrently, without any limits
* Simplify batch worker loop using OptionFuture
* Clear pending batches once they finish
* Stop accepting new items when we're at the concurrent batch limit
* Fail queued requests on drop
* Move pending_items and the batch timer into the worker struct
* Add worker fields to batch trace logs
* Run docker tests on PR series
* During full verification, process 20 blocks concurrently
* Remove an outdated comment about yielding to other tasks
* Checkout zebra in each job to avoid warnings
But put TODOs where we might be able to skip checkouts
* Split log following into sprout checkpoints, sapling/orchard checkpoints, and full validation
* Make job IDs shorter
* Use /dev/stderr because docker doesn't have a tty
* remove pipefail
* Revert "remove pipefail"
This reverts commit a7ee37bebdc107a4215e7dd307b189d925969234.
* Make tee ignore errors writing to a grep pipe
* Avoid launching multiple docker instances for duplicate jobs
* Ignore broken pipe error messages and statuses
* fix(ci): docker wait not finding container
We had this issue before, I can't recall if this was a parsing error between GitHub Actions and gcloud `--command` parsing, but we had to change this into two pieces.
This implementation keeps it how we did it before 9b9578c999/.github/workflows/test.yml (L235-L243)
* docs: remove pending TODO
We can't remove `actions/checkout` nor set `create_credentials_file` to `false` as next steps won't be able to authenticate to GCP.
We can surely remove `actions/checkout` and leave `create_credentials_file` as `true`, but this will raise a warning on each step, and there's no benefit of doing so.
* Show `docker wait` and `gcloud ssh` output
* If `docker wait` fails, get the exit code using `docker inspect`
Co-authored-by: Conrado Gouvea <conrado@zfnd.org>
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* Put arguments to "docker run" on different lines
And update some comments.
* Split docker run into launch, logs, and wait
* Remove mistaken "needs state" condition on log and results job
* Exit the ssh and the job with the container test's exit status
* Split full sync into checkpoint and full validation
* Sort workflow variables into categories and add descriptions
* Split Create instance/volume and Run test into separate jobs
* Copy initial conditions to all jobs in the series
* Actually create a cached state image
* fix(state): use same disk naming convention for all test instances
Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
* feat(ci): build each crate individually
* fix(ci): use valid names for each job
* feat(ci): builds and checks with and without all features
* refactor(ci): build job matrix dinamically
* fix: use a JSON_CRATES variable with resulting values
* test: check-matrix
* fix(ci): use "crate" in singular for reference
* imp(ci): use a matrix for feature build arguments
* fix(ci): use correct naming and includes
* fix(ci): implement most recommendations given in review
* fix(ci): use simpler shell script
* fix: typo
* fix: add string to file, not cmd
* fix: some shellchecks
* fix(ci): remove warnings and errors from shellcheck
* imp(ci): add patch file for `Build crates individually` workflow
* Remove unused configs in patch job
Co-authored-by: teor <teor@riseup.net>
* feat(actions): delete old GCP resources
* fix(ci): delete old instances templates
* fix(actions): use correct date arguments and conversion
* fix(actions): missing command in gcloud
* fix(gcp): if an instance can't be deleted, continue
* refacor(action): cleanup and execute monthly
* increase lightwalletd timeout
* switch back to aditya's fork
* manually point to new aditya's lightwalletd image
* disable sync_one_checkpoint_testnet test
* disable restart_stop_at_height in testnet
* rever to 'latest' lightwalletd image
* Remove a duplicate lightwalletd error message
* Reactivate some error messages that have been fixed
* Fix confusing lightwalletd cached state path logs
* Add the gRPC tests to the lightwalletd test suite function
* Make test regexes compatible with zcash/lightwalletd
* Add logging to gRPC tests
* Switch to zcash/lightwalletd for testing
* Upgrade tracing and related dependencies
```sh
cargo upgrade --workspace
tracing-error
tracing-subscrber
color-eyre
tracing-flame
tracing-journald
sentry
sentry-tracing
metrics
metrics-exporter-prometheus
reqwest
```
* Update duplicate dependency checks
* Enable the tracing/env-filter feature
* Fix type inference for metrics
Manual changes, plus:
```sh
fastmod "as _" "as f64"
```
* Tidy up some unrelated test code
* Update metrics-exporter-prometheus API
And make unused dependencies optional.
* Adjust test regexes to new tracing format
Also fix some regex bugs, and refactor to simplify.
* Disable color-eyre span traces and track caller in release builds
* Add a feature that enables extra debugging in release builds
* Clean up some redundant features
* Increase a test timeout
* refactor(ci): keep tests jobs under the 6 hour timeout
When running a full sync or any other test which takes almost 5 hours, having those jobs running with other actions that might take several minutes, also reduces the overall time from the job_id.
We use a separate job for image creation and deletion to handle this cases.
* fix(ci): instance deletion can't run on non finished tests
* fix(ci): tests without a cached state might save to disk
* fix(ci): ignore failures when deleting an instance
* fix(ci): remove delete step `needs` redundancy
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* feat(ci): add a codespell linting action
* fix(ci): run this job if the lint workflow is changed
* ci(codespell): add configuration file
* ci(codespell): exclude mermaid.min.js
* fix: wrong mermaid.min.js location
* ci(codespell): Sur from "Big Sur" is being considered as misspelled
* ci(codespell): make warning the max level
This won't restrict PRs from merging
* ci(codespell): lint on every push
* test: create a misspelling
* Revert "test: create a misspelling"
This reverts commit a2c91cda1e.
* fix(ci): allow for the lightwalletd-full-sync to mount the lwd-cache dir
* fix(ci): compare with a string
* imp(ci): run a lightwalletd tip if there's no lwd tip disk available
* docs(ci): add TODO explaining this is a temporal condition