solana/ci
Michael Vines b0405db5a9
Assign static IPs to {edge,beta}.testnet.solana.com
2018-11-07 20:11:00 -08:00
..
docker-rust Upgrade to rust 1.30 2018-10-25 17:13:41 -07:00
docker-rust-nightly Upload coverage HTML reports (#1421) 2018-10-05 10:17:35 -07:00
docker-snapcraft Keep Snap fullnode/drone logs out of syslog, we're too spammy 2018-07-17 15:08:35 -07:00
docker-solana Add first leader to genesis (#1681) 2018-11-02 14:32:05 -07:00
semver_bash Vendor https://github.com/cloudflare/semver_bash/tree/c1133faf0e 2018-08-17 23:15:48 -07:00
.gitignore Package solana as a snap 2018-06-18 17:36:03 -07:00
README.md Include hh:mm in image name 2018-07-09 23:07:07 -06:00
audit.sh Re-enable cargo audit 2018-09-28 17:53:41 -06:00
buildkite-secondary.yml Rename buildkite-snap to buildkite-secondary 2018-11-05 08:47:51 -08:00
buildkite.yml Rename buildkite-snap to buildkite-secondary 2018-11-05 08:47:51 -08:00
channel-info.sh Add script to fetch latest channel info 2018-08-17 23:15:48 -07:00
crate-version.sh Add utility to figure the current crate version 2018-10-29 12:54:57 -07:00
docker-run.sh Add --shell argument 2018-09-24 08:05:47 -07:00
hoover.sh s/whoami/id -un/ 2018-09-05 14:26:21 -07:00
is-pr.sh Skip snap build for PRs if nothing under snap/ is modified 2018-06-30 20:05:27 -07:00
localnet-sanity.sh bench-tps/net sanity: add ability to check for unexpected extra nodes 2018-09-12 15:38:57 -07:00
pr-snap.sh Skip snap build for PRs if nothing under snap/ is modified 2018-06-30 20:05:27 -07:00
publish-bpf-sdk.sh Create programs/bpf/c/sdk/ 2018-10-29 19:10:29 -07:00
publish-crate.sh ci: correct crates.io publishing order 2018-11-02 15:39:24 -07:00
publish-solana-tar.sh Install native programs in the correct location 2018-11-07 19:44:57 -08:00
run-local.sh Consolidate CI jobs 2018-06-24 22:28:24 -07:00
shellcheck.sh Exclude ci/semver_bash/; don't want to diverge from upstream 2018-08-17 23:15:48 -07:00
snap.sh Add version check and rustup 2018-10-24 19:48:58 -07:00
snapcraft.credentials.enc Add snapcraft login credentials 2018-06-18 17:36:03 -07:00
solana-testnet.yml Add testnet pipeline for prebuilt images (#1708) 2018-11-05 13:50:33 -08:00
test-bench.sh Upload bench output as build artifacts (#1478) 2018-10-12 15:13:10 -07:00
test-large-network.sh continue rendezvous refactor for gossip and repair 2018-08-31 23:21:07 +09:00
test-nightly.sh Upload coverage HTML reports (#1421) 2018-10-05 10:17:35 -07:00
test-stable-perf.sh run integration tests serially 2018-10-17 11:37:10 -07:00
test-stable.sh Integrate the markdown book into the codebase 2018-11-07 10:58:47 -07:00
testnet-automation-cleanup.sh Add support to trigger testnet from a PR (#1434) 2018-10-05 16:32:05 -07:00
testnet-automation-json-parser.py Change format of data for TPS/Finality metrics in testnet automation (#1446) 2018-10-09 10:35:01 -07:00
testnet-automation.sh Add testnet pipeline for prebuilt images (#1708) 2018-11-05 13:50:33 -08:00
testnet-deploy.sh Support local tarball deploys 2018-11-07 14:44:40 -08:00
testnet-manager.sh Assign static IPs to {edge,beta}.testnet.solana.com 2018-11-07 20:11:00 -08:00
testnet-sanity.sh Add AWS-based nets 2018-11-07 07:47:39 -08:00
upload_ci_artifact.sh Add Snap fullnode daemon 2018-06-26 12:32:33 -07:00
version-check.sh Upgrade to rust 1.30 2018-10-25 17:13:41 -07:00

README.md

Our CI infrastructure is built around BuildKite with some additional GitHub integration provided by https://github.com/mvines/ci-gate

Agent Queues

We define two Agent Queues: queue=default and queue=cuda. The default queue should be favored and runs on lower-cost CPU instances. The cuda queue is only necessary for running tests that depend on GPU (via CUDA) access -- CUDA builds may still be run on the default queue, and the buildkite artifact system used to transfer build products over to a GPU instance for testing.

Buildkite Agent Management

Buildkite GCP Setup

CI runs on Google Cloud Platform via two Compute Engine Instance groups: ci-default and ci-cuda. Autoscaling is currently disabled and the number of VM Instances in each group is manually adjusted.

Updating a CI Disk Image

Each Instance group has its own disk image, ci-default-vX and ci-cuda-vY, where X and Y are incremented each time the image is changed.

The process to update a disk image is as follows (TODO: make this less manual):

  1. Create a new VM Instance using the disk image to modify.
  2. Once the VM boots, ssh to it and modify the disk as desired.
  3. Stop the VM Instance running the modified disk. Remember the name of the VM disk
  4. From another machine, gcloud auth login, then create a new Disk Image based off the modified VM Instance:
 $ gcloud compute images create ci-default-$(date +%Y%m%d%H%M) --source-disk xxx --source-disk-zone us-east1-b --family ci-default

or

  $ gcloud compute images create ci-cuda-$(date +%Y%m%d%H%M) --source-disk xxx --source-disk-zone us-east1-b --family ci-cuda
  1. Delete the new VM instance.
  2. Go to the Instance templates tab, find the existing template named ci-default-vX or ci-cuda-vY and select it. Use the "Copy" button to create a new Instance template called ci-default-vX+1 or ci-cuda-vY+1 with the newly created Disk image.
  3. Go to the Instance Groups tag and find the applicable group, ci-default or ci-cuda. Edit the Instance Group in two steps: (a) Set the number of instances to 0 and wait for them all to terminate, (b) Update the Instance template and restore the number of instances to the original value.
  4. Clean up the previous version by deleting it from Instance Templates and Images.

Reference

Buildkite AWS CloudFormation Setup

AWS CloudFormation is currently inactive, although it may be restored in the future

AWS CloudFormation can be used to scale machines up and down based on the current CI load. If no machine is currently running it can take up to 60 seconds to spin up a new instance, please remain calm during this time.

AMI

We use a custom AWS AMI built via https://github.com/solana-labs/elastic-ci-stack-for-aws/tree/solana/cuda.

Use the following process to update this AMI as dependencies change:

$ export AWS_ACCESS_KEY_ID=my_access_key
$ export AWS_SECRET_ACCESS_KEY=my_secret_access_key
$ git clone https://github.com/solana-labs/elastic-ci-stack-for-aws.git -b solana/cuda
$ cd elastic-ci-stack-for-aws/
$ make build
$ make build-ami

Watch for the "amazon-ebs: AMI:" log message to extract the name of the new AMI. For example:

amazon-ebs: AMI: ami-07118545e8b4ce6dc

The new AMI should also now be visible in your EC2 Dashboard. Go to the desired AWS CloudFormation stack, update the ImageId field to the new AMI id, and apply the stack changes.