zebra/.github/workflows/test.yml

507 lines
26 KiB
YAML
Raw Normal View History

name: Test
on:
workflow_dispatch:
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
inputs:
network:
default: 'Mainnet'
regenerate-disks:
type: boolean
default: false
description: Just update stateful disks
pull_request:
branches:
- main
paths:
# code and tests
- '**/*.rs'
# hard-coded checkpoints
- '**/*.txt'
# test data snapshots
- '**/*.snap'
# dependencies
- '**/Cargo.toml'
- '**/Cargo.lock'
# workflow definitions
- 'docker/**'
- '.github/workflows/test.yml'
feat(actions)!: add full sync test (#3582) * add(tests): full sync test * fix(test): add build * fix(deploy): escape double dashes '--' correctly * fix(test): remove unexpected --no-capture arg error: Found argument '--nocapture' which wasn't expected, or isn't valid in this context * refactor(docker): use default executable as entrypoint * refactor(startup): add a custom entrypoint * fix(test): add missing TEST_FULL_SYNC variable * test(timeout): use the biggest machine * fix * fix(deploy): use latest successful image * typo * refactor(docker): generate config file at startup * revert(build): changes were made to docker * fix(docker): send variables correctly to the entrypoint * test different conf file approach * fix(env): add RUN_TEST env variable * ref: use previous approach * fix(color): use environment variable * fix(resources): use our normal machine size * fix(ci): double CPU and RAM for full sync test * fix(test): check for zebrad test output in the correct order The mempool is only activated once, so we must check for that log first. After mempool activation, the stop regex is logged at least once. (It might be logged before as well, but we can't rely on that.) When checking that the mempool didn't activate, wait for the `zebrad` command to exit, then check the entire log. * fix(ci): run full sync test with full compiler optimisations * fix(tests): reintroduce tests and run full sync on approval * fix(tests): reduce the changelog Co-authored-by: teor <teor@riseup.net>
2022-03-02 06:15:24 -08:00
pull_request_review:
branches:
- main
paths:
- '**/*.rs'
- '**/*.txt'
- '**/Cargo.toml'
- '**/Cargo.lock'
- 'docker/**'
- '.github/workflows/test.yml'
types: [submitted]
env:
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
CARGO_INCREMENTAL: '1'
ZEBRA_SKIP_IPV6_TESTS: "1"
NETWORK: Mainnet
PROJECT_ID: ${{ secrets.GCP_PROJECT_ID }}
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
GAR_BASE: us-docker.pkg.dev/${{ secrets.GCP_PROJECT_ID }}/zebra
GCR_BASE: gcr.io/${{ secrets.GCP_PROJECT_ID }}
REGION: us-central1
ZONE: us-central1-a
feat(actions)!: add full sync test (#3582) * add(tests): full sync test * fix(test): add build * fix(deploy): escape double dashes '--' correctly * fix(test): remove unexpected --no-capture arg error: Found argument '--nocapture' which wasn't expected, or isn't valid in this context * refactor(docker): use default executable as entrypoint * refactor(startup): add a custom entrypoint * fix(test): add missing TEST_FULL_SYNC variable * test(timeout): use the biggest machine * fix * fix(deploy): use latest successful image * typo * refactor(docker): generate config file at startup * revert(build): changes were made to docker * fix(docker): send variables correctly to the entrypoint * test different conf file approach * fix(env): add RUN_TEST env variable * ref: use previous approach * fix(color): use environment variable * fix(resources): use our normal machine size * fix(ci): double CPU and RAM for full sync test * fix(test): check for zebrad test output in the correct order The mempool is only activated once, so we must check for that log first. After mempool activation, the stop regex is logged at least once. (It might be logged before as well, but we can't rely on that.) When checking that the mempool didn't activate, wait for the `zebrad` command to exit, then check the entire log. * fix(ci): run full sync test with full compiler optimisations * fix(tests): reintroduce tests and run full sync on approval * fix(tests): reduce the changelog Co-authored-by: teor <teor@riseup.net>
2022-03-02 06:15:24 -08:00
MACHINE_TYPE: c2d-standard-16
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
IMAGE_NAME: zebrad-test
jobs:
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
build:
name: Build images
timeout-minutes: 210
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
# Setup Docker Buildx to allow use of docker cache layers from GH
- name: Set up Docker Buildx
id: buildx
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
uses: docker/setup-buildx-action@v1
- name: Login to Google Artifact Registry
uses: docker/login-action@v1.12.0
with:
registry: us-docker.pkg.dev
username: _json_key
password: ${{ secrets.GOOGLE_CREDENTIALS }}
- name: Login to Google Container Registry
uses: docker/login-action@v1.12.0
with:
registry: gcr.io
username: _json_key
password: ${{ secrets.GOOGLE_CREDENTIALS }}
# Build and push image to Google Artifact Registry
- name: Build & push
id: docker_build
uses: docker/build-push-action@v2.8.0
with:
target: tester
context: .
file: ./docker/Dockerfile
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
tags: |
${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:latest
${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
${{ env.GCR_BASE }}/${{ env.GITHUB_REPOSITORY_SLUG_URL }}/${{ env.IMAGE_NAME }}:latest
${{ env.GCR_BASE }}/${{ env.GITHUB_REPOSITORY_SLUG_URL }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
build-args: |
NETWORK=${{ github.event.inputs.network || env.NETWORK }}
SHORT_SHA=${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
RUST_BACKTRACE=full
ZEBRA_SKIP_NETWORK_TESTS="1"
CHECKPOINT_SYNC=${{ github.event.inputs.checkpoint_sync || true }}
RUST_LOG=debug
SENTRY_DSN=${{ secrets.SENTRY_ENDPOINT }}
push: true
cache-from: type=gha
cache-to: type=gha,mode=max
# Run all the zebra tests, including tests that are ignored by default
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
test-all:
name: Test all
runs-on: ubuntu-latest
needs: build
if: ${{ github.event.inputs.regenerate-disks != 'true' }}
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
- name: Run all zebrad tests
run: |
docker pull ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
docker run -e ZEBRA_SKIP_IPV6_TESTS --name zebrad-tests -t ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} cargo test --locked --release --features enable-sentry --workspace -- --include-ignored
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
test-fake-activation-heights:
name: Test with fake activation heights
runs-on: ubuntu-latest
needs: build
if: ${{ github.event.inputs.regenerate-disks != 'true' }}
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
- name: Run tests with fake activation heights
run: |
docker pull ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
docker run -e ZEBRA_SKIP_IPV6_TESTS --name zebrad-tests -t ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} cargo test --locked --release --package zebra-state --lib -- with_fake_activation_heights
# Test that Zebra syncs and checkpoints a few thousand blocks from an empty state
test-empty-sync:
name: Test checkpoint sync from empty state
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
runs-on: ubuntu-latest
needs: build
if: ${{ github.event.inputs.regenerate-disks != 'true' }}
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
- name: Run zebrad large sync tests
run: |
docker pull ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
docker run -e ZEBRA_SKIP_IPV6_TESTS --name zebrad-tests -t ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} cargo test --locked --release --features enable-sentry --test acceptance sync_large_checkpoints_ -- --ignored
test-lightwalletd-integration:
name: Test integration with lightwalletd
runs-on: ubuntu-latest
needs: build
if: ${{ github.event.inputs.regenerate-disks != 'true' }}
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
- name: Run tests with included lightwalletd binary
run: |
docker pull ${{ env.GAR_BASE }}/zebrad-test:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}
docker run -e ZEBRA_SKIP_IPV6_TESTS -e ZEBRA_TEST_LIGHTWALLETD --name zebrad-tests -t ${{ env.GAR_BASE }}/zebrad-test:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} cargo test --locked --release --features enable-sentry --test acceptance -- lightwalletd_integration
env:
ZEBRA_TEST_LIGHTWALLETD: '1'
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
regenerate-stateful-disks:
name: Regenerate stateful disks
runs-on: ubuntu-latest
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
needs: build
outputs:
disk_short_sha: ${{ steps.disk-short-sha.outputs.disk_short_sha }}
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
fetch-depth: '2'
# only run this job if the database format might have changed
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
- name: Get specific changed files
id: changed-files-specific
uses: tj-actions/changed-files@v17.2
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
with:
files: |
/zebra-state/**/config.rs
/zebra-state/**/constants.rs
/zebra-state/**/finalized_state.rs
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
/zebra-state/**/disk_format.rs
/zebra-state/**/disk_db.rs
/zebra-state/**/zebra_db.rs
/zebra-state/**/zebra_db/block.rs
/zebra-state/**/zebra_db/chain.rs
/zebra-state/**/zebra_db/shielded.rs
/zebra-state/**/zebra_db/transparent.rs
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
- name: Downcase network name for disks
run: |
echo LOWER_NET_NAME="${{ github.event.inputs.network || env.NETWORK }}" | awk '{print tolower($0)}' >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.5.0
with:
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
credentials_json: ${{ secrets.GOOGLE_CREDENTIALS }}
- name: Create GCP compute instance
id: create-instance
if: ${{ steps.changed-files-specific.outputs.any_changed == 'true' || github.event.inputs.regenerate-disks == 'true' }}
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
run: |
gcloud compute instances create-with-container "zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" \
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
--boot-disk-size 100GB \
--boot-disk-type pd-ssd \
--create-disk name="zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy",size=100GB,type=pd-ssd \
--container-mount-disk mount-path='/zebrad-cache',name="zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy" \
--container-image ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
--container-restart-policy=never \
--container-stdin \
--container-tty \
--container-command="cargo" \
--container-arg="test" \
--container-arg="--locked" \
--container-arg="--release" \
--container-arg="--features" \
--container-arg="enable-sentry,test_sync_to_mandatory_checkpoint_${{ env.lower_net_name }}" \
--container-arg="--manifest-path" \
--container-arg="zebrad/Cargo.toml" \
--container-arg="sync_to_mandatory_checkpoint_${{ env.lower_net_name }}" \
--container-env=ZEBRA_SKIP_IPV6_TESTS=1 \
--machine-type ${{ env.MACHINE_TYPE }} \
--scopes cloud-platform \
--metadata=google-monitoring-enabled=true,google-logging-enabled=true \
--tags zebrad \
--zone "${{ env.ZONE }}"
# TODO: this approach is very mesy, but getting the just created container name is very error prone and GCP doesn't have a workaround for this without requiring a TTY
# This TODO relates to the following issues:
# https://github.com/actions/runner/issues/241
# https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/SSH-into-Compute-Container-not-easily-possible/td-p/170915
- name: Get container name from logs
id: get-container-name
if: steps.create-instance.outcome == 'success'
run: |
INSTANCE_ID=$(gcloud compute instances describe zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} --zone ${{ env.ZONE }} --format='value(id)')
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "Using instance: $INSTANCE_ID"
while [[ ${CONTAINER_NAME} != *"zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}"* ]]; do
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "Using container: ${CONTAINER_NAME} from instance: ${INSTANCE_ID}"
sleep 10
done
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "::set-output name=zebra_container::$CONTAINER_NAME"
- name: Regenerate stateful disks logs
id: sync-to-checkpoint
if: steps.create-instance.outcome == 'success'
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
run: |
gcloud compute ssh \
zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command="docker logs --follow ${{ env.ZEBRA_CONTAINER }}"
env:
ZEBRA_CONTAINER: ${{ steps.get-container-name.outputs.zebra_container }}
# Create image from disk that will be used to sync past mandatory checkpoint test
# Force the image creation as the disk is still attached even though is not being used by the container
- name: Create image from state disk
# Only run if the earlier step succeeds
if: steps.sync-to-checkpoint.outcome == 'success'
run: |
gcloud compute images create zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy \
--force \
--source-disk=zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy \
--source-disk-zone=${{ env.ZONE }} \
--storage-location=us \
--description="Created from head branch ${{ env.GITHUB_HEAD_REF_SLUG_URL }} targeting ${{ env.GITHUB_BASE_REF_SLUG }} from PR ${{ env.GITHUB_REF_SLUG_URL }} with commit ${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA }}"
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
- name: Output and write the disk SHORT_SHA to a txt
id: disk-short-sha
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
if: steps.sync-to-checkpoint.outcome == 'success'
run: |
short_sha=$(echo "${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}")
echo "$short_sha" > latest-disk-state-sha.txt
echo "::set-output name=disk_short_sha::$short_sha"
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
- name: Upload the disk state txt
if: steps.sync-to-checkpoint.outcome == 'success'
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
uses: actions/upload-artifact@v2.3.1
with:
name: latest-disk-state-sha
path: latest-disk-state-sha.txt
retention-days: 1095
- name: Delete test instance
# Do not delete the instance if the sync timeouts in GitHub
if: ${{ steps.sync-to-checkpoint.outcome == 'success' }} || ${{ steps.sync-to-checkpoint.outcome == 'failure' }}
continue-on-error: true
run: |
gcloud compute instances delete "zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" --delete-disks all --zone "${{ env.ZONE }}"
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
# Test that Zebra syncs and fully validates a few thousand blocks from a cached post-checkpoint state
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
test-stateful-sync:
name: Test full validation sync from cached state
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
runs-on: ubuntu-latest
needs: [ build, regenerate-stateful-disks]
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
- name: Downcase network name for disks
run: |
echo LOWER_NET_NAME="${{ github.event.inputs.network || env.NETWORK }}" | awk '{print tolower($0)}' >> $GITHUB_ENV
# Get the latest uploaded txt with the disk SHORT_SHA from this workflow
- name: Download latest disk state SHORT_SHA
uses: dawidd6/action-download-artifact@v2.17.0
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
workflow: test.yml
workflow_conclusion: ''
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
name: latest-disk-state-sha
check_artifacts: true
- name: Get disk state SHA from txt
id: get-disk-sha
run: |
output=$(cat latest-disk-state-sha.txt)
echo "::set-output name=sha::$output"
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.5.0
with:
credentials_json: ${{ secrets.GOOGLE_CREDENTIALS }}
# Creates Compute Engine virtual machine instance w/ disks
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
- name: Create GCP compute instance
id: create-instance
run: |
gcloud compute instances create-with-container "zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" \
--boot-disk-size 100GB \
--boot-disk-type pd-ssd \
--create-disk=image=zebrad-cache-${{ env.DISK_SHORT_SHA }}-${{ env.lower_net_name }}-canopy,name=zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy,size=100GB,type=pd-ssd \
--container-mount-disk=mount-path='/zebrad-cache',name=zebrad-cache-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-${{ env.lower_net_name }}-canopy \
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
--container-image ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
--container-restart-policy=never \
--container-stdin \
--container-tty \
--container-command="cargo" \
--container-arg="test" \
--container-arg="--locked" \
--container-arg="--release" \
--container-arg="--features" \
--container-arg="enable-sentry,test_sync_past_mandatory_checkpoint_${{ env.lower_net_name }}" \
--container-arg="--manifest-path" \
--container-arg="zebrad/Cargo.toml" \
--container-arg="sync_past_mandatory_checkpoint_${{ env.lower_net_name }}" \
--container-env=ZEBRA_SKIP_IPV6_TESTS=1 \
--machine-type ${{ env.MACHINE_TYPE }} \
--scopes cloud-platform \
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
--metadata=google-monitoring-enabled=true,google-logging-enabled=true \
--tags zebrad \
--zone "${{ env.ZONE }}"
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
env:
DISK_SHORT_SHA: ${{ needs.regenerate-stateful-disks.outputs.disk_short_sha || steps.get-disk-sha.outputs.sha }}
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
# TODO: this approach is very mesy, but getting the just created container name is very error prone and GCP doesn't have a workaround for this without requiring a TTY
# This TODO relates to the following issues:
# https://github.com/actions/runner/issues/241
# https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/SSH-into-Compute-Container-not-easily-possible/td-p/170915
- name: Get container name from logs
id: get-container-name
if: steps.create-instance.outcome == 'success'
run: |
INSTANCE_ID=$(gcloud compute instances describe zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} --zone ${{ env.ZONE }} --format='value(id)')
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "Using instance: $INSTANCE_ID"
while [[ ${CONTAINER_NAME} != *"zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}"* ]]; do
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "Using container: ${CONTAINER_NAME} from instance: ${INSTANCE_ID}"
sleep 10
done
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
echo "::set-output name=zebra_container::$CONTAINER_NAME"
- name: Sync past mandatory checkpoint logs
id: sync-past-checkpoint
run: |
gcloud compute ssh \
zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command="docker logs --follow ${{ env.ZEBRA_CONTAINER }}"
env:
ZEBRA_CONTAINER: ${{ steps.get-container-name.outputs.zebra_container }}
- name: Delete test instance
refactor(test): dockerize tests and run sync in detached mode (#3459) * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): overall pipeline improvement - Use a more ENV configurable Dockerfile - Remove cloudbuild dependency - Use compute optimized machine types - Use SSD instead of normal hard drives - Move Sentry endpoint to secrets - Use a single yml for auto & manual deploy - Migrate to Google Artifact Registry * refactor (cd): use newer google auth action * fix (cd): use newer secret as gcp credential * fix (docker): do not create extra directories * fix (docker): ignore .github for caching purposes * fix (docker): use latest rust * fix (cd): bump build timeout * fix: use a better name for manual deployment * refactor (docker): use standard directories for executable * fix (cd): most systems expect a "latest" tag Caching from the latest image is one of the main reasons to add this extra tag. Before this commit, the inline cache was not being used. * fix (cd): push the build image and the cache separately The inline cache exporter only supports `min` cache mode. To enable `max` cache mode, push the image and the cache separately by using the registry cache exporter. This also allows for smaller release images. * fix (cd): remove unused GHA cache We're leveraging the registry to cache the actions, instead of using the 10GB limits from Github Actions cache storage * refactor (cd): use cargo-chef for caching rust deps * fix: move build system deps before cargo cheg cook * fix (release): use newer debian to reduce vulnerabilities * fix (cd): use same zone, region and service accounts * fix (cd): use same disk size and type for all deployments * refactor (cd): activate interactive shells Use interactive shells for manual and test deployments. This allow greater flexibility if troubleshooting is needed inside the machines * refactor (test): use docker artifact from registry Instead of using a VM to SSH into in to build and test. Build in GHA (to have the logs available), run the workspace tests in GHA, and just run the sync tests in GCP Use a cintainer VM with zebra's image directly on it, and pass the needed parameters to run the Sync past mandatory checkpoint. * tmp (cd): bump timeout for building from scratch * tmp (test): bump build time * fix (cd, test): bump build time-out to 210 minutes * fix (docker): do not build with different settings Compiling might be slow because different steps are compiling the same code 2-4 times because of the variations * revert (docker): do not fix the rust version * fix (docker): build on the root directory * refactor(docker): Use base image commands and tools * fix (cd): use correct variables & values, add build concurrency * fix(cd): use Mainnet instead of mainnet * imp: remove checkout as Buildkit uses the git context * fix (docker): just Buildkit uses a .dockerignore in a path * imp (cd): just use needed variables in the right place * imp (cd): do not checkout if not needed * test: run on push * refactor(docker): reduce build changes * fix(cd): not checking out was limiting some variables * refactor(test): add an multistage exclusive for testing * fix(cd): remove tests as a runtime dependency * fix(cd): use default service account with cloud-platform scope * fix(cd): revert checkout actions * fix: use GA c2 instead of Preview c2d machine types * fix(actions): remove workflow_dispatch from patched actions This causes GitHub confusion as it can't determined which of the actions using workflow_dispatch is the right one * fix(actions): remove patches from push actions * test: validate changes on each push * fix(test): wrong file syntax on test job * fix(test): add missing env parameters * fix(docker): Do not rebuild to download params and run tests * fix(test): setup gcloud and loginto artifact just when needed Try not to rebuild the tests * fix(test): use GCP container to sync past mandatory checkpoint * fix(test): missing separators * test * fix(test): mount the available disk * push * refactor(test): merge disk regeneration into test.yml * fix(cd): minor typo fixes * fix(docker): rebuild on .github changes * fix(cd): keep compatibility with gcr.io To prevent conflicts between registries, and migrate when the time is right, we'll keep pushing to both registries and use github actions cache to prevent conflicts between artifacts. * fix(cd): typo and scope * fix(cd): typos everywhere * refactor(test): use smarter docker wait and keep old registry * fix(cd): do not constraint the CPUs for bigger machines * revert(cd): reduce PR diff as there's a separate one for tests * fix(docker): add .github as it has no impact on caching * fix(test): run command correctly * fix(test): wiat and create image if previous step succeded * force rebuild * fix(test): do not restrict interdependant steps based on event * force push * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * fix(test): remove all hardoced values and increase disks * fix(test): use correct commands on deploy * fix(test): use args as required by google * fix(docker): try not to invalidate zebrad download cache * fix(test): minor typo * refactor(test): decouple jobs for better modularity This also allows faster tests as testing Zunstable won't be a dependency and it can't stop already started jobs if it fails. * fix(test): Do not try to execute ss and commands in one line * fix(test): do not show undeeded information in the terminal * fix(test): sleep befor/after machine creation/deletion * fix(docker): do not download zcash params twice * feat(docker): add google OS Config agent Use a separate step to have better flexibility in case a better approach is available * merge: docker-actions-refactor into docker-test-refactor * test docker wait scenarios * fix(docker): $HOME variables is not being expanded * fix(test): allow docker wait to work correctly * fix(docker): do not use variables while using COPY * fix(docker): allow to use zebrad as a command * fix(cd): use test .yml from main * fix(cd): Do not duplicate network values The Dockerfile has an ARG with a default value of 'Mainnet', if this value is changed it will be done manually on a workflow_dispatch, making the ENV option a uneeded duplicate in this workflow * fix(test): use bigger machine type for compute intensive tasks * refactor(test): add tests in CI file * fix(test): remove duplicated tests * fix(test): typo * test: build on .github changes temporarily * fix(test): bigger machines have no effect on sync times * feat: add an image to inherit from with zcash params * fix(cd): use the right image name and allow push to test * fix(cd): use the right docker target and remove extra builds * refactor(docker): use cached zcash params from previous build * fix(cd): finalize for merging * imp(cd): add double safety measure for production * fix(cd): use specific SHA for containers * fix(cd): use latest gcloud action version * fix(test): use the network as Mainnet and remove the uppercase from tests * fix(test): run disk regeneration on specific file change Just run this regeneration when changing the following files: https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state/disk_format.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/service/finalized_state.rs https://github.com/ZcashFoundation/zebra/blob/main/zebra-state/src/constants.rs * refactor(test): seggregate disks regeneration from tests Allow to regenerate disks without running tests, and to run tests from previous disk regeneration. Disk will be regenerated just if specific files were changed, or triggered manually. Tests will run just if a disk regeneration was not manually triggered. * fix(test): gcp disks require lower case conventions * fix(test): validate logs being emmited by docker GHA is transforming is somehow transforwing the variable to lowercase also, so we're changint it to adapt to it * test * fix(test): force tty terminal * fix(test): use a one line command to test terminal output * fix(test): always delete test instance * fix(test): use short SHA from the PR head Using the SHA from the base, creates confusion and it's not accurate with the SHA being shown and used on GitHub. We have to keep both as manual runs with `workflow_dispatch` does not have a PR SHA * fix(ci): do not trigger CI on docker changes There's no impact in this workflow when a change is done in the dockerfile * Instead of runing cargo test when the instance gets created, run this commands afterwards in a different step. As GHA TTY is not working as expected, and workarounds does not play nicely with `gcloud compute ssh` actions/runner#241 (comment) we decided to get the container name from the logs, log directly to the container and run the cargo command from there. * doc(test): document reasoning for new steps * fix(test): increase machine type and ssh timeout * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails * finalize(test): do not rebuild image when changing actions * fix(test): run tests on creation and follow container logs This allows to follow logs in Github Actions terminal, while the GCP container is still running. Just delete the instance when following the logs ends successfully or fails Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
2022-02-15 16:54:16 -08:00
# Do not delete the instance if the sync timeouts in GitHub
if: ${{ steps.sync-past-checkpoint.outcome == 'success' }} || ${{ steps.sync-past-checkpoint.outcome == 'failure' }}
continue-on-error: true
run: |
gcloud compute instances delete "zebrad-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" --delete-disks all --zone "${{ env.ZONE }}"
feat(actions)!: add full sync test (#3582) * add(tests): full sync test * fix(test): add build * fix(deploy): escape double dashes '--' correctly * fix(test): remove unexpected --no-capture arg error: Found argument '--nocapture' which wasn't expected, or isn't valid in this context * refactor(docker): use default executable as entrypoint * refactor(startup): add a custom entrypoint * fix(test): add missing TEST_FULL_SYNC variable * test(timeout): use the biggest machine * fix * fix(deploy): use latest successful image * typo * refactor(docker): generate config file at startup * revert(build): changes were made to docker * fix(docker): send variables correctly to the entrypoint * test different conf file approach * fix(env): add RUN_TEST env variable * ref: use previous approach * fix(color): use environment variable * fix(resources): use our normal machine size * fix(ci): double CPU and RAM for full sync test * fix(test): check for zebrad test output in the correct order The mempool is only activated once, so we must check for that log first. After mempool activation, the stop regex is logged at least once. (It might be logged before as well, but we can't rely on that.) When checking that the mempool didn't activate, wait for the `zebrad` command to exit, then check the entire log. * fix(ci): run full sync test with full compiler optimisations * fix(tests): reintroduce tests and run full sync on approval * fix(tests): reduce the changelog Co-authored-by: teor <teor@riseup.net>
2022-03-02 06:15:24 -08:00
# Test that Zebra can run a full mainnet sync after a PR is approved
test-full-sync:
name: Test full Mainnet sync
runs-on: ubuntu-latest
needs: [ build]
if: github.event.review.state == 'approved'
steps:
- uses: actions/checkout@v2.4.0
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.5.0
with:
credentials_json: ${{ secrets.GOOGLE_CREDENTIALS }}
# Creates Compute Engine virtual machine instance w/ disks
- name: Create GCP compute instance
id: create-instance
run: |
gcloud compute instances create-with-container "sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" \
--boot-disk-size 100GB \
--boot-disk-type pd-extreme \
--container-image ${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
--container-restart-policy=never \
--container-stdin \
--container-tty \
--container-env=ZEBRA_SKIP_IPV6_TESTS=1,TEST_FULL_SYNC=1,ZEBRA_FORCE_USE_COLOR=1,FULL_SYNC_MAINNET_TIMEOUT_MINUTES=600 \
--machine-type ${{ env.MACHINE_TYPE }} \
--scopes cloud-platform \
--metadata=google-monitoring-enabled=true,google-logging-enabled=true \
--tags zebrad \
--zone "${{ env.ZONE }}"
# TODO: this approach is very mesy, but getting the just created container name is very error prone and GCP doesn't have a workaround for this without requiring a TTY
# This TODO relates to the following issues:
# https://github.com/actions/runner/issues/241
# https://www.googlecloudcommunity.com/gc/Infrastructure-Compute-Storage/SSH-into-Compute-Container-not-easily-possible/td-p/170915
- name: Get container name from logs
id: get-container-name
if: steps.create-instance.outcome == 'success'
run: |
INSTANCE_ID=$(gcloud compute instances describe sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} --zone ${{ env.ZONE }} --format='value(id)')
echo "Using instance: $INSTANCE_ID"
while [[ ${CONTAINER_NAME} != *"sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}"* ]]; do
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
echo "Using container: ${CONTAINER_NAME} from instance: ${INSTANCE_ID}"
sleep 10
done
CONTAINER_NAME=$(gcloud logging read 'log_name=projects/${{ env.PROJECT_ID }}/logs/cos_system AND jsonPayload.MESSAGE:sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}' --format='value(jsonPayload.MESSAGE)' --limit=1 | grep -o '...-sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}-....' | tr -d "'.")
echo "::set-output name=zebra_container::$CONTAINER_NAME"
- name: Sync past mandatory checkpoint logs
id: sync-past-checkpoint
run: |
gcloud compute ssh \
sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command="docker logs --follow ${{ env.ZEBRA_CONTAINER }}"
env:
ZEBRA_CONTAINER: ${{ steps.get-container-name.outputs.zebra_container }}
- name: Delete test instance
# Do not delete the instance if the sync timeouts in GitHub
if: ${{ steps.sync-past-checkpoint.outcome == 'success' }} || ${{ steps.sync-past-checkpoint.outcome == 'failure' }}
continue-on-error: true
run: |
gcloud compute instances delete "sync-tests-${{ env.GITHUB_HEAD_REF_SLUG_URL || env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_EVENT_PULL_REQUEST_HEAD_SHA_SHORT || env.GITHUB_SHA_SHORT }}" --delete-disks all --zone "${{ env.ZONE }}"