zebra/.github/workflows/deploy-gcp-tests.yml

678 lines
28 KiB
YAML
Raw Normal View History

name: Deploy GCP tests
on:
workflow_call:
inputs:
# Status and logging
test_id:
required: true
type: string
description: 'Unique identifier for the test'
test_description:
required: true
type: string
description: 'Explains what the test does'
# Test selection and parameters
test_variables:
required: true
type: string
description: 'Environmental variables used to select and configure the test'
network:
required: false
type: string
default: Mainnet
description: 'Zcash network to test against'
# Cached state
#
# TODO: find a better name
feat(ci): add `sending_transactions_using_lightwalletd` test to CI (#4267) * feat(ci): add lightwalletd_*_sync tests to CI * feat(ci): add lightwalletd RPC call test * feat(ci): add send transactions test with lwd to CI * fix(ci): create a variable to run transactions test * refactor(ci): use docker in docker This is a workaround for an issue related to disk partitioning, caused by a GCP service called Konlet, while mounting the cached disks to the VM and then to the container * fix(build): persist docker login credentials * fix(ci): get sync height from docker logs instead of gcp * try: use gha cache for faster building * fix(ci): mount disk in container to make it available in vm * fix(build): do not invalidate cache between images * try(docker): invalidate cache as less as possible * fix(ci): GHA terminal is not a TTY * fix(build): do not ignore entrypoint.sh * fix * fix(ci): mount using root priveleges * fix(ci): use existing disk as cached state * fix(ci): wait for disks to get mounted * force rebuild * fix failed force * fix failed commit * WIP * fix(ci): some tests does not use a cached state * wip * refactor(ci): disk names and job segregation * fix(ci): do not name boot and attached disk the same * fix(ci): attach a disk to full sync, to snapshot the state * fix(ci): use correct disk implementations * fix(ci): use different disk name to allow test concurrency * feat(ci): add lightwalledt send transaction test * cleanup(ci): remove extra tests * fix(ci): allow disk concurrency with tests * fix(ci): add considerations for different tests * fix(reusable): last fixes * feat(ci): use reusable workflow for tests * fix(rw): remove nested worflow * fix(rw): minor fixes * force rebuild * fix(rw): do not use an input as job name * fix(rw): remove variable id * fix(ci): remove explicit conditions and id * fix(ci): docker does not need the variable sign ($) to work * fix(ci): mount typo * fix(ci): if a sync fails, always delete the instance This also reduces the amount of jobs needed. * refactor(ci): make all test depend on the same build * fix(ci): some tests require multiple variables * fix(docker): variable substitution * fix(ci): allow to run multiple commits from a PR at once * fix(docker): lower the NETWORK env var for test names * reduce uneeded diff * imp(keys): use better naming for builds_disks * imp(ci): use input defaults * imp(ci): remove test_name in favor of test_id * fix(ci): better key naming * fix(ci): long disk names breaks GCP naming convention * feat(ci): validate local state version with cached state * fix(ci): add condition to run tests * fix: typo * fix: app_name should not be required * fix: zebra_state_path shouldn't be required * fix: reduce diff * fix(ci): checkout to grep local state version * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * revert: merge all tests into a single workflow * Remove unused STATE_VERSION env var * fix: minor fixes * fix(ci): make test.patch the same as test * fix(ci): negate the input value * imp(ci): better cached state conditional handling * imp(ci): exit code is captured by `docker run` * fix(deploy): mount disks with better write performance * fix(ci): change sync id to a broader id name * fix(ci): use correct input validation * fix(ci): do not make test with cached state dependant on other * imp(ci): organiza keys better * fix(ci): use appropiate naming * fix(ci): create docker volume before mounting * fix(lint): do not fail on all new changes * imp(ci): do not report in pr review * fix(ci): partition clean disks * fix: typo * fix: test called the wrong way * fix(build): stop using gha cache * ref(ci): validate run condition before calling reusable workflow * fix(ci): use a better filesystem dir and fix other values * fix: linting errors * fix(ci): typo * Revert "fix(build): stop using gha cache" This reverts commit a8fbc5f416df561e58b388e065d1dc9696983508. Cache expiration is a lesser evil than not using caching at all and then failing with a 401 * imp(ci): do not set a default for needs_zebra_state * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(deps): remove dependencies * force build * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(docker): add RUST_LOG as an ARG and ENV * fix(test): add `#[ignore]` to send transactions test This test needs state then it should be marked as #[ignore] * fix(ci): differentiate between root cache path and its dir * Remove extra `state` directory That was a workaround for an issue that has been fixed. * imp(docs): use better test descriptions Co-authored-by: teor <teor@riseup.net> * fix: reduce unwanted diff with main * fix(ci): make lwd conditions consistent * Remove another extra `state` directory Was also part of a workaround for an issue that has been fixed. * fix(ci): use better conditionals to run test jobs Co-authored-by: teor <teor@riseup.net> * Tweak to support different lightwalletd versions Some versions print `Waiting for block`, and some versions print `Ingestor waiting for block`. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: teor <teor@riseup.net> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2022-05-05 22:30:38 -07:00
root_state_path:
required: false
type: string
default: '/zebrad-cache'
description: 'Cached state base directory path'
# TODO: find a better name
feat(ci): add `sending_transactions_using_lightwalletd` test to CI (#4267) * feat(ci): add lightwalletd_*_sync tests to CI * feat(ci): add lightwalletd RPC call test * feat(ci): add send transactions test with lwd to CI * fix(ci): create a variable to run transactions test * refactor(ci): use docker in docker This is a workaround for an issue related to disk partitioning, caused by a GCP service called Konlet, while mounting the cached disks to the VM and then to the container * fix(build): persist docker login credentials * fix(ci): get sync height from docker logs instead of gcp * try: use gha cache for faster building * fix(ci): mount disk in container to make it available in vm * fix(build): do not invalidate cache between images * try(docker): invalidate cache as less as possible * fix(ci): GHA terminal is not a TTY * fix(build): do not ignore entrypoint.sh * fix * fix(ci): mount using root priveleges * fix(ci): use existing disk as cached state * fix(ci): wait for disks to get mounted * force rebuild * fix failed force * fix failed commit * WIP * fix(ci): some tests does not use a cached state * wip * refactor(ci): disk names and job segregation * fix(ci): do not name boot and attached disk the same * fix(ci): attach a disk to full sync, to snapshot the state * fix(ci): use correct disk implementations * fix(ci): use different disk name to allow test concurrency * feat(ci): add lightwalledt send transaction test * cleanup(ci): remove extra tests * fix(ci): allow disk concurrency with tests * fix(ci): add considerations for different tests * fix(reusable): last fixes * feat(ci): use reusable workflow for tests * fix(rw): remove nested worflow * fix(rw): minor fixes * force rebuild * fix(rw): do not use an input as job name * fix(rw): remove variable id * fix(ci): remove explicit conditions and id * fix(ci): docker does not need the variable sign ($) to work * fix(ci): mount typo * fix(ci): if a sync fails, always delete the instance This also reduces the amount of jobs needed. * refactor(ci): make all test depend on the same build * fix(ci): some tests require multiple variables * fix(docker): variable substitution * fix(ci): allow to run multiple commits from a PR at once * fix(docker): lower the NETWORK env var for test names * reduce uneeded diff * imp(keys): use better naming for builds_disks * imp(ci): use input defaults * imp(ci): remove test_name in favor of test_id * fix(ci): better key naming * fix(ci): long disk names breaks GCP naming convention * feat(ci): validate local state version with cached state * fix(ci): add condition to run tests * fix: typo * fix: app_name should not be required * fix: zebra_state_path shouldn't be required * fix: reduce diff * fix(ci): checkout to grep local state version * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * revert: merge all tests into a single workflow * Remove unused STATE_VERSION env var * fix: minor fixes * fix(ci): make test.patch the same as test * fix(ci): negate the input value * imp(ci): better cached state conditional handling * imp(ci): exit code is captured by `docker run` * fix(deploy): mount disks with better write performance * fix(ci): change sync id to a broader id name * fix(ci): use correct input validation * fix(ci): do not make test with cached state dependant on other * imp(ci): organiza keys better * fix(ci): use appropiate naming * fix(ci): create docker volume before mounting * fix(lint): do not fail on all new changes * imp(ci): do not report in pr review * fix(ci): partition clean disks * fix: typo * fix: test called the wrong way * fix(build): stop using gha cache * ref(ci): validate run condition before calling reusable workflow * fix(ci): use a better filesystem dir and fix other values * fix: linting errors * fix(ci): typo * Revert "fix(build): stop using gha cache" This reverts commit a8fbc5f416df561e58b388e065d1dc9696983508. Cache expiration is a lesser evil than not using caching at all and then failing with a 401 * imp(ci): do not set a default for needs_zebra_state * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(deps): remove dependencies * force build * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(docker): add RUST_LOG as an ARG and ENV * fix(test): add `#[ignore]` to send transactions test This test needs state then it should be marked as #[ignore] * fix(ci): differentiate between root cache path and its dir * Remove extra `state` directory That was a workaround for an issue that has been fixed. * imp(docs): use better test descriptions Co-authored-by: teor <teor@riseup.net> * fix: reduce unwanted diff with main * fix(ci): make lwd conditions consistent * Remove another extra `state` directory Was also part of a workaround for an issue that has been fixed. * fix(ci): use better conditionals to run test jobs Co-authored-by: teor <teor@riseup.net> * Tweak to support different lightwalletd versions Some versions print `Waiting for block`, and some versions print `Ingestor waiting for block`. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: teor <teor@riseup.net> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2022-05-05 22:30:38 -07:00
zebra_state_dir:
required: false
type: string
default: ''
description: 'Zebra cached state directory and input image prefix to search in GCP'
# TODO: find a better name
lwd_state_dir:
required: false
type: string
default: ''
description: 'Lightwalletd cached state directory and input image prefix to search in GCP'
disk_prefix:
required: false
type: string
default: 'zebrad-cache'
description: 'Image name prefix, and `zebra_state_dir` name for newly created cached states'
disk_suffix:
required: false
type: string
description: 'Image name suffix'
needs_zebra_state:
required: true
type: boolean
description: 'Does the test use Zebra cached state?'
needs_lwd_state:
required: false
type: boolean
description: 'Does the test use Lightwalletd and Zebra cached state?'
saves_to_disk:
required: true
type: boolean
description: 'Does the test create a new cached state disk?'
# Metadata
height_grep_text:
required: false
type: string
description: 'Regular expression to find the tip height in test logs, and add it to newly created cached state image metadata'
app_name:
required: false
type: string
default: 'zebra'
description: 'Application name for Google Cloud instance metadata'
env:
IMAGE_NAME: zebrad-test
GAR_BASE: us-docker.pkg.dev/zealous-zebra/zebra
ZONE: us-central1-a
MACHINE_TYPE: c2d-standard-16
jobs:
# set up the test, if it doesn't use any cached state
# each test runs one of the *-with/without-cached-state job series, and skips the other
setup-without-cached-state:
name: Setup ${{ inputs.test_id }} test
if: ${{ !inputs.needs_zebra_state }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Create a Compute Engine virtual machine
- name: Create ${{ inputs.test_id }} GCP compute instance
id: create-instance
run: |
gcloud compute instances create-with-container "${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }}" \
--boot-disk-size 100GB \
--boot-disk-type pd-ssd \
--create-disk name="${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }}",device-name="${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }}",size=100GB,type=pd-ssd \
--container-image debian:buster \
--container-restart-policy=never \
--machine-type ${{ env.MACHINE_TYPE }} \
--scopes cloud-platform \
--metadata=google-monitoring-enabled=true,google-logging-enabled=true \
--tags ${{ inputs.app_name }} \
--zone ${{ env.ZONE }}
sleep 60
- name: Create ${{ inputs.test_id }} Docker volume
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
sudo mkfs.ext4 /dev/sdb \
&& \
docker volume create --driver local --opt type=ext4 --opt device=/dev/sdb \
${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }} \
"
# launch the test, if it doesn't use any cached state
launch-without-cached-state:
name: Launch ${{ inputs.test_id }} test
needs: [ setup-without-cached-state ]
# If the previous job fails, we also want to run and fail this job,
# so that the branch protection rule fails in Mergify and GitHub.
if: ${{ !cancelled() && !inputs.needs_zebra_state }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Launch the test without any cached state
- name: Launch ${{ inputs.test_id }} test
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
docker run \
--name ${{ inputs.test_id }} \
--tty \
--detach \
${{ inputs.test_variables }} \
feat(ci): add `sending_transactions_using_lightwalletd` test to CI (#4267) * feat(ci): add lightwalletd_*_sync tests to CI * feat(ci): add lightwalletd RPC call test * feat(ci): add send transactions test with lwd to CI * fix(ci): create a variable to run transactions test * refactor(ci): use docker in docker This is a workaround for an issue related to disk partitioning, caused by a GCP service called Konlet, while mounting the cached disks to the VM and then to the container * fix(build): persist docker login credentials * fix(ci): get sync height from docker logs instead of gcp * try: use gha cache for faster building * fix(ci): mount disk in container to make it available in vm * fix(build): do not invalidate cache between images * try(docker): invalidate cache as less as possible * fix(ci): GHA terminal is not a TTY * fix(build): do not ignore entrypoint.sh * fix * fix(ci): mount using root priveleges * fix(ci): use existing disk as cached state * fix(ci): wait for disks to get mounted * force rebuild * fix failed force * fix failed commit * WIP * fix(ci): some tests does not use a cached state * wip * refactor(ci): disk names and job segregation * fix(ci): do not name boot and attached disk the same * fix(ci): attach a disk to full sync, to snapshot the state * fix(ci): use correct disk implementations * fix(ci): use different disk name to allow test concurrency * feat(ci): add lightwalledt send transaction test * cleanup(ci): remove extra tests * fix(ci): allow disk concurrency with tests * fix(ci): add considerations for different tests * fix(reusable): last fixes * feat(ci): use reusable workflow for tests * fix(rw): remove nested worflow * fix(rw): minor fixes * force rebuild * fix(rw): do not use an input as job name * fix(rw): remove variable id * fix(ci): remove explicit conditions and id * fix(ci): docker does not need the variable sign ($) to work * fix(ci): mount typo * fix(ci): if a sync fails, always delete the instance This also reduces the amount of jobs needed. * refactor(ci): make all test depend on the same build * fix(ci): some tests require multiple variables * fix(docker): variable substitution * fix(ci): allow to run multiple commits from a PR at once * fix(docker): lower the NETWORK env var for test names * reduce uneeded diff * imp(keys): use better naming for builds_disks * imp(ci): use input defaults * imp(ci): remove test_name in favor of test_id * fix(ci): better key naming * fix(ci): long disk names breaks GCP naming convention * feat(ci): validate local state version with cached state * fix(ci): add condition to run tests * fix: typo * fix: app_name should not be required * fix: zebra_state_path shouldn't be required * fix: reduce diff * fix(ci): checkout to grep local state version * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * revert: merge all tests into a single workflow * Remove unused STATE_VERSION env var * fix: minor fixes * fix(ci): make test.patch the same as test * fix(ci): negate the input value * imp(ci): better cached state conditional handling * imp(ci): exit code is captured by `docker run` * fix(deploy): mount disks with better write performance * fix(ci): change sync id to a broader id name * fix(ci): use correct input validation * fix(ci): do not make test with cached state dependant on other * imp(ci): organiza keys better * fix(ci): use appropiate naming * fix(ci): create docker volume before mounting * fix(lint): do not fail on all new changes * imp(ci): do not report in pr review * fix(ci): partition clean disks * fix: typo * fix: test called the wrong way * fix(build): stop using gha cache * ref(ci): validate run condition before calling reusable workflow * fix(ci): use a better filesystem dir and fix other values * fix: linting errors * fix(ci): typo * Revert "fix(build): stop using gha cache" This reverts commit a8fbc5f416df561e58b388e065d1dc9696983508. Cache expiration is a lesser evil than not using caching at all and then failing with a 401 * imp(ci): do not set a default for needs_zebra_state * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(deps): remove dependencies * force build * Update .github/workflows/test.yml Co-authored-by: teor <teor@riseup.net> * fix(docker): add RUST_LOG as an ARG and ENV * fix(test): add `#[ignore]` to send transactions test This test needs state then it should be marked as #[ignore] * fix(ci): differentiate between root cache path and its dir * Remove extra `state` directory That was a workaround for an issue that has been fixed. * imp(docs): use better test descriptions Co-authored-by: teor <teor@riseup.net> * fix: reduce unwanted diff with main * fix(ci): make lwd conditions consistent * Remove another extra `state` directory Was also part of a workaround for an issue that has been fixed. * fix(ci): use better conditionals to run test jobs Co-authored-by: teor <teor@riseup.net> * Tweak to support different lightwalletd versions Some versions print `Waiting for block`, and some versions print `Ingestor waiting for block`. Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com> Co-authored-by: teor <teor@riseup.net> Co-authored-by: Janito Vaqueiro Ferreira Filho <janito.vff@gmail.com>
2022-05-05 22:30:38 -07:00
--mount type=volume,src=${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }},dst=${{ inputs.root_state_path }}/${{ inputs.zebra_state_dir }} \
${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:sha-${{ env.GITHUB_SHA_SHORT }} \
"
# set up the test, if it uses cached state
# each test runs one of the *-with/without-cached-state job series, and skips the other
setup-with-cached-state:
name: Setup ${{ inputs.test_id }} test
if: ${{ inputs.needs_zebra_state }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
fetch-depth: '2'
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Find a cached state disk for this job, matching all of:
# - disk cached state (lwd_state_dir/zebra_state_dir or disk_prefix) - zebrad-cache or lwd-cache
# - state version (from the source code) - v{N}
# - network (network) - mainnet or testnet
# - disk target height kind (disk_suffix) - checkpoint or tip
#
# If the test needs a lightwalletd state (needs_lwd_state) set the variable DISK_PREFIX accordingly
# - To ${{ inputs.lwd_state_dir }}" if needed
# - To ${{ inputs.zebra_state_dir || inputs.disk_prefix }} if not
#
# If there are multiple disks:
# - prefer images generated from the `main` branch, then any other branch
# - prefer newer images to older images
#
# Passes the disk name to subsequent steps using $CACHED_DISK_NAME env variable
# Passes the state version to subsequent steps using $STATE_VERSION env variable
- name: Find ${{ inputs.test_id }} cached state disk
id: get-disk-name
run: |
LOCAL_STATE_VERSION=$(grep -oE "DATABASE_FORMAT_VERSION: .* [0-9]+" "$GITHUB_WORKSPACE/zebra-state/src/constants.rs" | grep -oE "[0-9]+" | tail -n1)
echo "STATE_VERSION: $LOCAL_STATE_VERSION"
if [[ "${{ inputs.needs_lwd_state }}" == "true" ]]; then
DISK_PREFIX=${{ inputs.lwd_state_dir }}
else
DISK_PREFIX=${{ inputs.zebra_state_dir || inputs.disk_prefix }}
fi
# Try to find an image generated from the main branch
# Fields are listed in the "Create image from state disk" step
3. Require network names in cached state disk names (#4392) * Require a cached state rebuild if the state version changes * Find cached state disks with the same state version And prefer `main` to other branches. * Tweak filters to make them more specific * Try adding inner quotes * Try brackets instead * Try two filters, rather than three * Use Mainnet as the default network, remove duplicate env var * Match the exact disk name format in one regular expression * Log the exact expected disk name, including the network * Consistently use CACHED_DISK_NAME as the env var name * Temporary allow missing $NETWORK in disk names * Print the exact search string * Debug log the search string * Use a generic alphabetical pattern rather than a regex group Google Cloud doesn't seem to support regex groups. * Add network name to disk match docs * Fix the logged network name * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * Revert "Use a generic alphabetical pattern rather than a regex group" This reverts commit 970afe7b179188eaa1f82cfb78eea137da941773. * Revert "Temporary allow missing $NETWORK in disk names" This reverts commit f1f66500c3360929b6b17b1c2838a15315502496. * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * refactor(ci): look for available disks instead of files changed This ensure that if the constants.rs file was changed, we search for disks available in the whole repository with the same state. If there's no disk available a rebuild is triggered depending the missing disk. And if there's a disk available, tests are run with this one. * fix(ci): lwd syncs needs to wait for zebra disk rebuild * docs(ci): use better comments on integration tests * fix(ci): we must authenticate to GCP to find disks * fix(ci): add needed permissions for google auth * fix(ci): the output needs to be echoed * imp(ci): reduce diff with main * fix(ci): remove redundant dependency Co-authored-by: teor <teor@riseup.net> * fix(ci): also add `false` to the JSON object output * fix(ci): hasty copy/paste * fix(ci): standardize comments * fix(ci): run disk rebuilds if no disk was found * fix(ci): build on any event if a cached disk is not found * fix(ci): reduce diff with main * docs(ci): reduce main diff * fix(ci): sync .patch file with changes on the workflow * fix(ci): consider network changes in new get-available-disks * force GHA trigger Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-05-19 17:44:11 -07:00
CACHED_DISK_NAME=$(gcloud compute images list --filter="name~${DISK_PREFIX}-main-[0-9a-f]+-v${LOCAL_STATE_VERSION}-${NETWORK}-${{ inputs.disk_suffix }}" --format="value(NAME)" --sort-by=~creationTimestamp --limit=1)
echo "main Disk: $CACHED_DISK_NAME"
if [[ -z "$CACHED_DISK_NAME" ]]; then
# Try to find an image generated from any other branch
3. Require network names in cached state disk names (#4392) * Require a cached state rebuild if the state version changes * Find cached state disks with the same state version And prefer `main` to other branches. * Tweak filters to make them more specific * Try adding inner quotes * Try brackets instead * Try two filters, rather than three * Use Mainnet as the default network, remove duplicate env var * Match the exact disk name format in one regular expression * Log the exact expected disk name, including the network * Consistently use CACHED_DISK_NAME as the env var name * Temporary allow missing $NETWORK in disk names * Print the exact search string * Debug log the search string * Use a generic alphabetical pattern rather than a regex group Google Cloud doesn't seem to support regex groups. * Add network name to disk match docs * Fix the logged network name * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * Revert "Use a generic alphabetical pattern rather than a regex group" This reverts commit 970afe7b179188eaa1f82cfb78eea137da941773. * Revert "Temporary allow missing $NETWORK in disk names" This reverts commit f1f66500c3360929b6b17b1c2838a15315502496. * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * refactor(ci): look for available disks instead of files changed This ensure that if the constants.rs file was changed, we search for disks available in the whole repository with the same state. If there's no disk available a rebuild is triggered depending the missing disk. And if there's a disk available, tests are run with this one. * fix(ci): lwd syncs needs to wait for zebra disk rebuild * docs(ci): use better comments on integration tests * fix(ci): we must authenticate to GCP to find disks * fix(ci): add needed permissions for google auth * fix(ci): the output needs to be echoed * imp(ci): reduce diff with main * fix(ci): remove redundant dependency Co-authored-by: teor <teor@riseup.net> * fix(ci): also add `false` to the JSON object output * fix(ci): hasty copy/paste * fix(ci): standardize comments * fix(ci): run disk rebuilds if no disk was found * fix(ci): build on any event if a cached disk is not found * fix(ci): reduce diff with main * docs(ci): reduce main diff * fix(ci): sync .patch file with changes on the workflow * fix(ci): consider network changes in new get-available-disks * force GHA trigger Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-05-19 17:44:11 -07:00
CACHED_DISK_NAME=$(gcloud compute images list --filter="name~${DISK_PREFIX}-.+-[0-9a-f]+-v${LOCAL_STATE_VERSION}-${NETWORK}-${{ inputs.disk_suffix }}" --format="value(NAME)" --sort-by=~creationTimestamp --limit=1)
echo "Disk: $CACHED_DISK_NAME"
fi
if [[ -z "$CACHED_DISK_NAME" ]]; then
echo "No cached state disk available"
3. Require network names in cached state disk names (#4392) * Require a cached state rebuild if the state version changes * Find cached state disks with the same state version And prefer `main` to other branches. * Tweak filters to make them more specific * Try adding inner quotes * Try brackets instead * Try two filters, rather than three * Use Mainnet as the default network, remove duplicate env var * Match the exact disk name format in one regular expression * Log the exact expected disk name, including the network * Consistently use CACHED_DISK_NAME as the env var name * Temporary allow missing $NETWORK in disk names * Print the exact search string * Debug log the search string * Use a generic alphabetical pattern rather than a regex group Google Cloud doesn't seem to support regex groups. * Add network name to disk match docs * Fix the logged network name * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * Revert "Use a generic alphabetical pattern rather than a regex group" This reverts commit 970afe7b179188eaa1f82cfb78eea137da941773. * Revert "Temporary allow missing $NETWORK in disk names" This reverts commit f1f66500c3360929b6b17b1c2838a15315502496. * Make jobs that use cached state wait for state rebuilds * Run jobs that need cached state even if the rebuild was skipped * Fix missing dependencies And update a TODO * refactor(ci): look for available disks instead of files changed This ensure that if the constants.rs file was changed, we search for disks available in the whole repository with the same state. If there's no disk available a rebuild is triggered depending the missing disk. And if there's a disk available, tests are run with this one. * fix(ci): lwd syncs needs to wait for zebra disk rebuild * docs(ci): use better comments on integration tests * fix(ci): we must authenticate to GCP to find disks * fix(ci): add needed permissions for google auth * fix(ci): the output needs to be echoed * imp(ci): reduce diff with main * fix(ci): remove redundant dependency Co-authored-by: teor <teor@riseup.net> * fix(ci): also add `false` to the JSON object output * fix(ci): hasty copy/paste * fix(ci): standardize comments * fix(ci): run disk rebuilds if no disk was found * fix(ci): build on any event if a cached disk is not found * fix(ci): reduce diff with main * docs(ci): reduce main diff * fix(ci): sync .patch file with changes on the workflow * fix(ci): consider network changes in new get-available-disks * force GHA trigger Co-authored-by: Deirdre Connolly <durumcrustulum@gmail.com> Co-authored-by: Gustavo Valverde <gustavo@iterativo.do>
2022-05-19 17:44:11 -07:00
echo "Expected ${DISK_PREFIX}-(branch)-[0-9a-f]+-v${LOCAL_STATE_VERSION}-${NETWORK}-${{ inputs.disk_suffix }}"
echo "Cached state test jobs must depend on the cached state rebuild job"
exit 1
fi
echo "Description: $(gcloud compute images describe $CACHED_DISK_NAME --format='value(DESCRIPTION)')"
echo "STATE_VERSION=$LOCAL_STATE_VERSION" >> $GITHUB_ENV
echo "CACHED_DISK_NAME=$CACHED_DISK_NAME" >> $GITHUB_ENV
# Create a Compute Engine virtual machine and attach a cached state disk using the
# $CACHED_DISK_NAME variable as the source image to populate the disk cached state
- name: Create ${{ inputs.test_id }} GCP compute instance
id: create-instance
run: |
gcloud compute instances create-with-container "${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }}" \
--boot-disk-size 100GB \
--boot-disk-type pd-ssd \
--create-disk image=${{ env.CACHED_DISK_NAME }},name="${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }}",device-name="${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }}",size=100GB,type=pd-ssd \
--container-image debian:buster \
--container-restart-policy=never \
--machine-type ${{ env.MACHINE_TYPE }} \
--scopes cloud-platform \
--metadata=google-monitoring-enabled=true,google-logging-enabled=true \
--tags ${{ inputs.app_name }} \
--zone ${{ env.ZONE }}
sleep 60
# Create a docker volume with the selected cached state.
#
# SSH into the just created VM, and create a docker volume with the recently attached disk.
- name: Create ${{ inputs.test_id }} Docker volume
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
docker volume create --driver local --opt type=ext4 --opt device=/dev/sdb \
${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }} \
"
# launch the test, if it uses cached state
launch-with-cached-state:
name: Launch ${{ inputs.test_id }} test
needs: [ setup-with-cached-state ]
# If the previous job fails, we also want to run and fail this job,
# so that the branch protection rule fails in Mergify and GitHub.
if: ${{ !cancelled() && inputs.needs_zebra_state }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
fetch-depth: '2'
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Launch the test with the previously created Zebra-only cached state.
# Each test runs one of the "Launch test" steps, and skips the other.
#
# SSH into the just created VM, and create a Docker container to run the incoming test
# from ${{ inputs.test_id }}, then mount the docker volume created in the previous job.
#
# The disk mounted in the VM is located at /dev/sdb, we mount the root `/` of this disk to the docker
# container in one path:
# - /var/cache/zebrad-cache -> ${{ inputs.root_state_path }}/${{ inputs.zebra_state_dir }} -> $ZEBRA_CACHED_STATE_DIR
#
# This path must match the variable used by the tests in Rust, which are also set in
# `continous-integration-docker.yml` to be able to run this tests.
#
# Although we're mounting the disk root, Zebra will only respect the values from
# $ZEBRA_CACHED_STATE_DIR. The inputs like ${{ inputs.zebra_state_dir }} are only used
# to match that variable paths.
- name: Launch ${{ inputs.test_id }} test
# This step only runs for tests that just read or write a Zebra state.
#
# lightwalletd-full-sync reads Zebra and writes lwd, so it is handled specially.
# TODO: we should find a better logic for this use cases
if: ${{ (inputs.needs_zebra_state && !inputs.needs_lwd_state) && inputs.test_id != 'lwd-full-sync' }}
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
docker run \
--name ${{ inputs.test_id }} \
--tty \
--detach \
${{ inputs.test_variables }} \
--mount type=volume,src=${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }},dst=${{ inputs.root_state_path }}/${{ inputs.zebra_state_dir }} \
${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:sha-${{ env.GITHUB_SHA_SHORT }} \
"
# Launch the test with the previously created Lightwalletd and Zebra cached state.
# Each test runs one of the "Launch test" steps, and skips the other.
#
# SSH into the just created VM, and create a Docker container to run the incoming test
# from ${{ inputs.test_id }}, then mount the docker volume created in the previous job.
#
# In this step we're using the same disk for simplicity, as mounting multiple disks to the
# VM and to the container might require more steps in this workflow, and additional
# considerations.
#
# The disk mounted in the VM is located at /dev/sdb, we mount the root `/` of this disk to the docker
# container in two different paths:
# - /var/cache/zebrad-cache -> ${{ inputs.root_state_path }}/${{ inputs.zebra_state_dir }} -> $ZEBRA_CACHED_STATE_DIR
# - /var/cache/lwd-cache -> ${{ inputs.root_state_path }}/${{ inputs.lwd_state_dir }} -> $LIGHTWALLETD_DATA_DIR
#
# This doesn't cause any path conflicts, because Zebra and lightwalletd create different
# subdirectories for their data. (But Zebra, lightwalletd, and the test harness must not
# delete the whole cache directory.)
#
# This paths must match the variables used by the tests in Rust, which are also set in
# `continous-integration-docker.yml` to be able to run this tests.
#
# Although we're mounting the disk root to both directories, Zebra and Lightwalletd
# will only respect the values from $ZEBRA_CACHED_STATE_DIR and $LIGHTWALLETD_DATA_DIR,
# the inputs like ${{ inputs.lwd_state_dir }} are only used to match those variables paths.
- name: Launch ${{ inputs.test_id }} test
# This step only runs for tests that read or write Lightwalletd and Zebra states.
#
# lightwalletd-full-sync reads Zebra and writes lwd, so it is handled specially.
# TODO: we should find a better logic for this use cases
if: ${{ (inputs.needs_zebra_state && inputs.needs_lwd_state) || inputs.test_id == 'lwd-full-sync' }}
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
docker run \
--name ${{ inputs.test_id }} \
--tty \
--detach \
${{ inputs.test_variables }} \
--mount type=volume,src=${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }},dst=${{ inputs.root_state_path }}/${{ inputs.zebra_state_dir }} \
--mount type=volume,src=${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }},dst=${{ inputs.root_state_path }}/${{ inputs.lwd_state_dir }} \
${{ env.GAR_BASE }}/${{ env.IMAGE_NAME }}:sha-${{ env.GITHUB_SHA_SHORT }} \
"
# follow the logs of the test we just launched
follow-logs:
name: Show logs for ${{ inputs.test_id }} test
needs: [ launch-with-cached-state, launch-without-cached-state ]
# We run exactly one of without-cached-state or with-cached-state, and we always skip the other one.
# If the previous job fails, we also want to run and fail this job,
# so that the branch protection rule fails in Mergify and GitHub.
if: ${{ !cancelled() }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Show all the logs since the container launched
- name: Show logs for ${{ inputs.test_id }} test
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
docker logs \
--tail all \
--follow \
${{ inputs.test_id }} \
"
# wait for the result of the test
test-result:
# TODO: update the job name here, and in the branch protection rules
name: Run ${{ inputs.test_id }} test
needs: [ follow-logs ]
# If the previous job fails, we also want to run and fail this job,
# so that the branch protection rule fails in Mergify and GitHub.
if: ${{ !cancelled() }}
runs-on: ubuntu-latest
permissions:
contents: 'read'
id-token: 'write'
steps:
- uses: actions/checkout@v3.0.2
with:
persist-credentials: false
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.8.0
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Wait for the container to finish, then exit with the test's exit status.
#
# `docker wait` prints the container exit status as a string, but we need to exit `ssh` with that status.
# `docker wait` can also wait for multiple containers, but we only ever wait for a single container.
- name: Result of ${{ inputs.test_id }} test
run: |
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command \
"\
exit $(docker wait ${{ inputs.test_id }}) \
"
# create a state image from the instance's state disk, if requested by the caller
create-state-image:
name: Create ${{ inputs.test_id }} cached state image
runs-on: ubuntu-latest
needs: [ test-result ]
# We run exactly one of without-cached-state or with-cached-state, and we always skip the other one.
# Normally, if a job is skipped, all the jobs that depend on it are also skipped.
# So we need to override the default success() check to make this job run.
if: ${{ !cancelled() && !failure() && inputs.saves_to_disk }}
permissions:
contents: 'read'
id-token: 'write'
steps:
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
# Disk images in GCP are required to be in lowercase, but the blockchain network
# uses sentence case, so we need to downcase ${{ inputs.network }}
#
# Passes ${{ inputs.network }} to subsequent steps using $NETWORK env variable
- name: Downcase network name for disks
run: |
NETWORK_CAPS=${{ inputs.network }}
echo "NETWORK=${NETWORK_CAPS,,}" >> $GITHUB_ENV
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.7.3
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Get the state version from the local constants.rs file to be used in the image creation,
# as the state version is part of the disk image name.
#
# Passes the state version to subsequent steps using $STATE_VERSION env variable
- name: Get state version from constants.rs
run: |
LOCAL_STATE_VERSION=$(grep -oE "DATABASE_FORMAT_VERSION: .* [0-9]+" $GITHUB_WORKSPACE/zebra-state/src/constants.rs | grep -oE "[0-9]+" | tail -n1)
echo "STATE_VERSION: $LOCAL_STATE_VERSION"
echo "STATE_VERSION=$LOCAL_STATE_VERSION" >> $GITHUB_ENV
# Get the sync height from the test logs, which is later used as part of the
# disk description.
#
# The regex used to grep the sync height is provided by ${{ inputs.height_grep_text }},
# this allows to dynamically change the height as needed by different situations or
# based on the logs output from different tests
#
# Passes the sync height to subsequent steps using $SYNC_HEIGHT env variable
- name: Get sync height from logs
run: |
SYNC_HEIGHT=""
DOCKER_LOGS=$(\
gcloud compute ssh \
${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} \
--zone ${{ env.ZONE }} \
--quiet \
--ssh-flag="-o ServerAliveInterval=5" \
--command="docker logs ${{ inputs.test_id }} --tail 20")
SYNC_HEIGHT=$(echo $DOCKER_LOGS | grep -oE '${{ inputs.height_grep_text }}\([0-9]+\)' | grep -oE '[0-9]+' | tail -1 || [[ $? == 1 ]])
echo "SYNC_HEIGHT=$SYNC_HEIGHT" >> $GITHUB_ENV
# Create an image from disk that will be used for following/other tests
# This image can contain:
# - Zebra cached state
# - Zebra + lightwalletd cached state
# Which cached state is being saved to the disk is defined by ${{ inputs.disk_prefix }}
#
# Force the image creation (--force) as the disk is still attached even though is not being
# used by the container
- name: Create image from state disk
run: |
gcloud compute images create ${{ inputs.disk_prefix }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }}-v${{ env.STATE_VERSION }}-${{ env.NETWORK }}-${{ inputs.disk_suffix }} \
--force \
--source-disk=${{ inputs.test_id }}-${{ env.GITHUB_SHA_SHORT }} \
--source-disk-zone=${{ env.ZONE }} \
--storage-location=us \
--description="Created from commit ${{ env.GITHUB_SHA_SHORT }} with height ${{ env.SYNC_HEIGHT }}"
# delete the Google Cloud instance for this test
delete-instance:
name: Delete ${{ inputs.test_id }} instance
runs-on: ubuntu-latest
needs: [ create-state-image ]
# If a disk generation step timeouts (+6 hours) the previous job (creating the image) will be skipped.
# Even if the instance continues running, no image will be created, so it's better to delete it.
if: always()
continue-on-error: true
permissions:
contents: 'read'
id-token: 'write'
steps:
- name: Inject slug/short variables
uses: rlespinasse/github-slug-action@v4
with:
short-length: 7
# Setup gcloud CLI
- name: Authenticate to Google Cloud
id: auth
uses: google-github-actions/auth@v0.7.3
with:
workload_identity_provider: 'projects/143793276228/locations/global/workloadIdentityPools/github-actions/providers/github-oidc'
service_account: 'github-service-account@zealous-zebra.iam.gserviceaccount.com'
token_format: 'access_token'
# Deletes the instances that has been recently deployed in the actual commit after all
# previous jobs have run, no matter the outcome of the job.
- name: Delete test instance
continue-on-error: true
run: |
INSTANCE=$(gcloud compute instances list --filter=${{ inputs.test_id }}-${{ env.GITHUB_REF_SLUG_URL }}-${{ env.GITHUB_SHA_SHORT }} --format='value(NAME)')
if [ -z "${INSTANCE}" ]; then
echo "No instance to delete"
else
gcloud compute instances delete "${INSTANCE}" --zone "${{ env.ZONE }}" --delete-disks all --quiet
fi