Fixup scripts to set up a new CI node (#9348)
* Clean up node setup scripts for new CI boxes * Move files under ci directory * Set CUDA env var to setup cuda drivers * Fixup and add README * shellcheck * Apply review feedback, rename dir and setup files Co-authored-by: publish-docs.sh <maintainers@solana.com>
This commit is contained in:
parent
41fec5bd5b
commit
3fbe7f0bb3
64
ci/README.md
64
ci/README.md
|
@ -2,7 +2,7 @@
|
||||||
Our CI infrastructure is built around [BuildKite](https://buildkite.com) with some
|
Our CI infrastructure is built around [BuildKite](https://buildkite.com) with some
|
||||||
additional GitHub integration provided by https://github.com/mvines/ci-gate
|
additional GitHub integration provided by https://github.com/mvines/ci-gate
|
||||||
|
|
||||||
## Agent Queues
|
# Agent Queues
|
||||||
|
|
||||||
We define two [Agent Queues](https://buildkite.com/docs/agent/v3/queues):
|
We define two [Agent Queues](https://buildkite.com/docs/agent/v3/queues):
|
||||||
`queue=default` and `queue=cuda`. The `default` queue should be favored and
|
`queue=default` and `queue=cuda`. The `default` queue should be favored and
|
||||||
|
@ -12,9 +12,52 @@ be run on the `default` queue, and the [buildkite artifact
|
||||||
system](https://buildkite.com/docs/builds/artifacts) used to transfer build
|
system](https://buildkite.com/docs/builds/artifacts) used to transfer build
|
||||||
products over to a GPU instance for testing.
|
products over to a GPU instance for testing.
|
||||||
|
|
||||||
## Buildkite Agent Management
|
# Buildkite Agent Management
|
||||||
|
|
||||||
### Buildkite Azure Setup
|
## Manual Node Setup for Colocated Hardware
|
||||||
|
|
||||||
|
This section describes how to set up a new machine that does not have a
|
||||||
|
pre-configured image with all the requirements installed. Used for custom-built
|
||||||
|
hardware at a colocation or office facility. Also works for vanilla Ubuntu cloud
|
||||||
|
instances.
|
||||||
|
|
||||||
|
### Pre-Requisites
|
||||||
|
|
||||||
|
- Install Ubuntu 18.04 LTS Server
|
||||||
|
- Log in as a local or remote user with `sudo` privileges
|
||||||
|
|
||||||
|
### Install Core Requirements
|
||||||
|
|
||||||
|
##### Non-GPU enabled machines
|
||||||
|
```bash
|
||||||
|
sudo ./setup-new-buildkite-agent/setup-new-machine.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
##### GPU-enabled machines
|
||||||
|
- 1 or more NVIDIA GPUs should be installed in the machine (tested with 2080Ti)
|
||||||
|
```bash
|
||||||
|
sudo CUDA=1 ./setup-new-buildkite-agent/setup-new-machine.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configure Node for Buildkite-agent based CI
|
||||||
|
|
||||||
|
- Install `buildkite-agent` and set up it user environment with:
|
||||||
|
```bash
|
||||||
|
sudo ./setup-new-buildkite-agent/setup-buildkite.sh
|
||||||
|
```
|
||||||
|
- Copy the pubkey contents from `~buildkite-agent/.ssh/id_ecdsa.pub` and
|
||||||
|
add the pubkey as an authorized SSH key on github.
|
||||||
|
- Edit `/etc/buildkite-agent/buildkite-agent.cfg` and/or `/etc/systemd/system/buildkite-agent@*` to the desired configuration of the agent(s)
|
||||||
|
- Copy `ejson` keys from another CI node at `/opt/ejson/keys/`
|
||||||
|
to the same location on the new node.
|
||||||
|
- Start the new agent(s) with `sudo systemctl enable --now buildkite-agent`
|
||||||
|
|
||||||
|
# Reference
|
||||||
|
|
||||||
|
This section contains details regarding previous CI setups that have been used,
|
||||||
|
and that we may return to one day.
|
||||||
|
|
||||||
|
## Buildkite Azure Setup
|
||||||
|
|
||||||
Create a new Azure-based "queue=default" agent by running the following command:
|
Create a new Azure-based "queue=default" agent by running the following command:
|
||||||
```
|
```
|
||||||
|
@ -35,7 +78,7 @@ Creating a "queue=cuda" agent follows the same process but additionally:
|
||||||
2. Edit the tags field in /etc/buildkite-agent/buildkite-agent.cfg to `tags="queue=cuda,queue=default"`
|
2. Edit the tags field in /etc/buildkite-agent/buildkite-agent.cfg to `tags="queue=cuda,queue=default"`
|
||||||
and decrease the value of the priority field by one
|
and decrease the value of the priority field by one
|
||||||
|
|
||||||
#### Updating the CI Disk Image
|
### Updating the CI Disk Image
|
||||||
|
|
||||||
1. Create a new VM Instance as described above
|
1. Create a new VM Instance as described above
|
||||||
1. Modify it as required
|
1. Modify it as required
|
||||||
|
@ -48,12 +91,7 @@ Creating a "queue=cuda" agent follows the same process but additionally:
|
||||||
1. Goto the `ci` resource group in the Azure portal and remove all resources
|
1. Goto the `ci` resource group in the Azure portal and remove all resources
|
||||||
with the XYZ name in them
|
with the XYZ name in them
|
||||||
|
|
||||||
## Reference
|
## Buildkite AWS CloudFormation Setup
|
||||||
|
|
||||||
This section contains details regarding previous CI setups that have been used,
|
|
||||||
and that we may return to one day.
|
|
||||||
|
|
||||||
### Buildkite AWS CloudFormation Setup
|
|
||||||
|
|
||||||
**AWS CloudFormation is currently inactive, although it may be restored in the
|
**AWS CloudFormation is currently inactive, although it may be restored in the
|
||||||
future**
|
future**
|
||||||
|
@ -62,7 +100,7 @@ AWS CloudFormation can be used to scale machines up and down based on the
|
||||||
current CI load. If no machine is currently running it can take up to 60
|
current CI load. If no machine is currently running it can take up to 60
|
||||||
seconds to spin up a new instance, please remain calm during this time.
|
seconds to spin up a new instance, please remain calm during this time.
|
||||||
|
|
||||||
#### AMI
|
### AMI
|
||||||
We use a custom AWS AMI built via https://github.com/solana-labs/elastic-ci-stack-for-aws/tree/solana/cuda.
|
We use a custom AWS AMI built via https://github.com/solana-labs/elastic-ci-stack-for-aws/tree/solana/cuda.
|
||||||
|
|
||||||
Use the following process to update this AMI as dependencies change:
|
Use the following process to update this AMI as dependencies change:
|
||||||
|
@ -84,13 +122,13 @@ The new AMI should also now be visible in your EC2 Dashboard. Go to the desired
|
||||||
AWS CloudFormation stack, update the **ImageId** field to the new AMI id, and
|
AWS CloudFormation stack, update the **ImageId** field to the new AMI id, and
|
||||||
*apply* the stack changes.
|
*apply* the stack changes.
|
||||||
|
|
||||||
### Buildkite GCP Setup
|
## Buildkite GCP Setup
|
||||||
|
|
||||||
CI runs on Google Cloud Platform via two Compute Engine Instance groups:
|
CI runs on Google Cloud Platform via two Compute Engine Instance groups:
|
||||||
`ci-default` and `ci-cuda`. Autoscaling is currently disabled and the number of
|
`ci-default` and `ci-cuda`. Autoscaling is currently disabled and the number of
|
||||||
VM Instances in each group is manually adjusted.
|
VM Instances in each group is manually adjusted.
|
||||||
|
|
||||||
#### Updating a CI Disk Image
|
### Updating a CI Disk Image
|
||||||
|
|
||||||
Each Instance group has its own disk image, `ci-default-vX` and
|
Each Instance group has its own disk image, `ci-default-vX` and
|
||||||
`ci-cuda-vY`, where *X* and *Y* are incremented each time the image is changed.
|
`ci-cuda-vY`, where *X* and *Y* are incremented each time the image is changed.
|
||||||
|
|
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
2
net/datacenter-node-install/disable-nouveau.sh → ci/setup-new-buildkite-agent/disable-nouveau.sh
Normal file → Executable file
2
net/datacenter-node-install/disable-nouveau.sh → ci/setup-new-buildkite-agent/disable-nouveau.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
|
@ -0,0 +1,4 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
sudo systemctl daemon-reload
|
||||||
|
sudo systemctl enable --now buildkite-agent
|
2
net/datacenter-node-install/set-hostname.sh → ci/setup-new-buildkite-agent/set-hostname.sh
Normal file → Executable file
2
net/datacenter-node-install/set-hostname.sh → ci/setup-new-buildkite-agent/set-hostname.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
|
@ -0,0 +1,84 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
|
ensure_env || exit 1
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Install buildkite-agent
|
||||||
|
echo "deb https://apt.buildkite.com/buildkite-agent stable main" | tee /etc/apt/sources.list.d/buildkite-agent.list
|
||||||
|
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 32A37959C2FA5C3C99EFBC32A79206696452D198
|
||||||
|
apt-get update
|
||||||
|
apt-get install -y buildkite-agent
|
||||||
|
|
||||||
|
|
||||||
|
# Configure the installation
|
||||||
|
echo "Go to https://buildkite.com/organizations/solana-labs/agents"
|
||||||
|
echo "Click Reveal Agent Token"
|
||||||
|
echo "Paste the Agent Token, then press Enter:"
|
||||||
|
|
||||||
|
read -r agent_token
|
||||||
|
sudo sed -i "s/xxx/$agent_token/g" /etc/buildkite-agent/buildkite-agent.cfg
|
||||||
|
|
||||||
|
cat > /etc/buildkite-agent/hooks/environment <<EOF
|
||||||
|
set -e
|
||||||
|
|
||||||
|
export BUILDKITE_GIT_CLEAN_FLAGS="-ffdqx"
|
||||||
|
|
||||||
|
# Hack for non-docker rust builds
|
||||||
|
export PATH='$PATH':~buildkite-agent/.cargo/bin
|
||||||
|
|
||||||
|
# Add path to snaps
|
||||||
|
source /etc/profile.d/apps-bin-path.sh
|
||||||
|
|
||||||
|
if [[ '$BUILDKITE_BRANCH' =~ pull/* ]]; then
|
||||||
|
export BUILDKITE_REFSPEC="+'$BUILDKITE_BRANCH':refs/remotes/origin/'$BUILDKITE_BRANCH'"
|
||||||
|
fi
|
||||||
|
EOF
|
||||||
|
|
||||||
|
chown buildkite-agent:buildkite-agent /etc/buildkite-agent/hooks/environment
|
||||||
|
|
||||||
|
# Create SSH key
|
||||||
|
sudo -u buildkite-agent mkdir -p ~buildkite-agent/.ssh
|
||||||
|
sudo -u buildkite-agent ssh-keygen -t ecdsa -q -N "" -f ~buildkite-agent/.ssh/id_ecdsa
|
||||||
|
|
||||||
|
# Set buildkite-agent user's shell
|
||||||
|
sudo usermod --shell /bin/bash buildkite-agent
|
||||||
|
|
||||||
|
# Install Rust for buildkite-agent
|
||||||
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o /tmp/rustup-init.sh
|
||||||
|
sudo -u buildkite-agent HOME=~buildkite-agent sh /tmp/rustup-init.sh -y
|
||||||
|
|
||||||
|
# Add to docker and sudoers group
|
||||||
|
addgroup buildkite-agent docker
|
||||||
|
addgroup buildkite-agent sudo
|
||||||
|
|
||||||
|
# Edit the systemd unit file to include LimitNOFILE
|
||||||
|
cat > /lib/systemd/system/buildkite-agent.service <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Buildkite Agent
|
||||||
|
Documentation=https://buildkite.com/agent
|
||||||
|
After=syslog.target
|
||||||
|
After=network.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=simple
|
||||||
|
User=buildkite-agent
|
||||||
|
Environment=HOME=/var/lib/buildkite-agent
|
||||||
|
ExecStart=/usr/bin/buildkite-agent start
|
||||||
|
RestartSec=5
|
||||||
|
Restart=on-failure
|
||||||
|
RestartForceExitStatus=SIGPIPE
|
||||||
|
TimeoutStartSec=10
|
||||||
|
TimeoutStopSec=0
|
||||||
|
KillMode=process
|
||||||
|
LimitNOFILE=65536
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
DefaultInstance=1
|
||||||
|
EOF
|
18
net/datacenter-node-install/setup-cuda.sh → ci/setup-new-buildkite-agent/setup-cuda.sh
Normal file → Executable file
18
net/datacenter-node-install/setup-cuda.sh → ci/setup-new-buildkite-agent/setup-cuda.sh
Normal file → Executable file
|
@ -2,12 +2,13 @@
|
||||||
|
|
||||||
# https://developer.nvidia.com/cuda-toolkit-archive
|
# https://developer.nvidia.com/cuda-toolkit-archive
|
||||||
VERSIONS=()
|
VERSIONS=()
|
||||||
VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux")
|
#VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux")
|
||||||
VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run")
|
#VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run")
|
||||||
|
VERSIONS+=("http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run")
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
||||||
|
@ -51,3 +52,14 @@ done
|
||||||
|
|
||||||
# Allow normal users to use CUDA profiler
|
# Allow normal users to use CUDA profiler
|
||||||
echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' > /etc/modprobe.d/nvidia-enable-user-profiling.conf
|
echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' > /etc/modprobe.d/nvidia-enable-user-profiling.conf
|
||||||
|
|
||||||
|
# setup persistence mode across reboots
|
||||||
|
TMPDIR="$(mktemp -d)"
|
||||||
|
if pushd "$TMPDIR"; then
|
||||||
|
tar -xvf /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
|
||||||
|
./nvidia-persistenced-init/install.sh systemd
|
||||||
|
popd
|
||||||
|
rm -rf "$TMPDIR"
|
||||||
|
fi
|
||||||
|
|
||||||
|
nvidia-smi -pm ENABLED
|
2
net/datacenter-node-install/setup-limits.sh → ci/setup-new-buildkite-agent/setup-limits.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-limits.sh → ci/setup-new-buildkite-agent/setup-limits.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
|
@ -0,0 +1,49 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
|
||||||
|
HERE="$(dirname "$0")"
|
||||||
|
SOLANA_ROOT="$HERE"/../..
|
||||||
|
|
||||||
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
|
ensure_env || exit 1
|
||||||
|
|
||||||
|
set -ex
|
||||||
|
|
||||||
|
apt update
|
||||||
|
apt upgrade -y
|
||||||
|
|
||||||
|
cat >/etc/apt/apt.conf.d/99-solana <<'EOF'
|
||||||
|
// Set and persist extra caps on iftop binary
|
||||||
|
Dpkg::Post-Invoke { "which iftop 2>&1 >/dev/null && setcap cap_net_raw=eip $(which iftop) || true"; };
|
||||||
|
EOF
|
||||||
|
|
||||||
|
apt install -y build-essential pkg-config clang cmake sysstat linux-tools-common \
|
||||||
|
linux-generic-hwe-18.04-edge linux-tools-generic-hwe-18.04-edge \
|
||||||
|
iftop heaptrack jq ruby python3-venv gcc-multilib libudev-dev
|
||||||
|
|
||||||
|
gem install ejson ejson2env
|
||||||
|
mkdir -p /opt/ejson/keys
|
||||||
|
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-docker.sh
|
||||||
|
usermod -aG docker "$SETUP_USER"
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-certbot.sh
|
||||||
|
"$HERE"/setup-sudoers.sh
|
||||||
|
"$HERE"/setup-ssh.sh
|
||||||
|
|
||||||
|
"$HERE"/disable-nouveau.sh
|
||||||
|
"$HERE"/disable-networkd-wait.sh
|
||||||
|
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-earlyoom.sh
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-nodejs.sh
|
||||||
|
"$SOLANA_ROOT"/net/scripts/localtime.sh
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-redis.sh
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-rsync.sh
|
||||||
|
"$SOLANA_ROOT"/net/scripts/install-libssl-compatability.sh
|
||||||
|
|
||||||
|
"$HERE"/setup-procfs-knobs.sh
|
||||||
|
"$HERE"/setup-limits.sh
|
||||||
|
|
||||||
|
[[ -n $CUDA ]] && "$HERE"/setup-cuda.sh
|
||||||
|
|
||||||
|
exit 0
|
4
net/datacenter-node-install/setup-partner-node.sh → ci/setup-new-buildkite-agent/setup-partner-node.sh
Normal file → Executable file
4
net/datacenter-node-install/setup-partner-node.sh → ci/setup-new-buildkite-agent/setup-partner-node.sh
Normal file → Executable file
|
@ -2,12 +2,12 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
||||||
|
|
||||||
set -xe
|
set -ex
|
||||||
|
|
||||||
"$HERE"/disable-nouveau.sh
|
"$HERE"/disable-nouveau.sh
|
||||||
"$HERE"/disable-networkd-wait.sh
|
"$HERE"/disable-networkd-wait.sh
|
2
net/datacenter-node-install/setup-procfs-knobs.sh → ci/setup-new-buildkite-agent/setup-procfs-knobs.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-procfs-knobs.sh → ci/setup-new-buildkite-agent/setup-procfs-knobs.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
2
net/datacenter-node-install/setup-ssh.sh → ci/setup-new-buildkite-agent/setup-ssh.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-ssh.sh → ci/setup-new-buildkite-agent/setup-ssh.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
2
net/datacenter-node-install/setup-sudoers.sh → ci/setup-new-buildkite-agent/setup-sudoers.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-sudoers.sh → ci/setup-new-buildkite-agent/setup-sudoers.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
HERE="$(dirname "$0")"
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||||
source "$HERE"/utils.sh
|
source "$HERE"/utils.sh
|
||||||
|
|
||||||
ensure_env || exit 1
|
ensure_env || exit 1
|
0
net/datacenter-node-install/utils.sh → ci/setup-new-buildkite-agent/utils.sh
Normal file → Executable file
0
net/datacenter-node-install/utils.sh → ci/setup-new-buildkite-agent/utils.sh
Normal file → Executable file
|
@ -1,25 +0,0 @@
|
||||||
# Introduction
|
|
||||||
|
|
||||||
These scripts are intended to facilitate the preparation of dedicated Solana
|
|
||||||
nodes. They have been tested as working from a clean installation of Ubuntu
|
|
||||||
18.04 Server. Use elsewhere is unsupported.
|
|
||||||
|
|
||||||
# Installation
|
|
||||||
|
|
||||||
Both installation methods require that the NVIDIA proprietary driver installer
|
|
||||||
programs be downloaded alongside [setup-cuda.sh](./setup-cuda.sh). If they do
|
|
||||||
not exist at runtime, an attempt will be made to download them automatically. To
|
|
||||||
avoid downloading the installers at runtime, they may be downloaded in advance
|
|
||||||
and placed as siblings to [setup-cuda.sh](./setup-cuda.sh).
|
|
||||||
|
|
||||||
For up-to-date NVIDIA driver version requirements, see [setup-cuda.sh](./setup-cuda.sh)
|
|
||||||
|
|
||||||
## Datacenter Node
|
|
||||||
|
|
||||||
1) `sudo ./setup-dc-node-1.sh`
|
|
||||||
2) `sudo reboot`
|
|
||||||
3) `sudo ./setup-dc-node-2.sh`
|
|
||||||
|
|
||||||
## Partner Node
|
|
||||||
|
|
||||||
1) `$ sudo ./setup-partner-node.sh`
|
|
|
@ -1,61 +0,0 @@
|
||||||
#!/usr/bin/env bash
|
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
|
||||||
source "$HERE"/utils.sh
|
|
||||||
|
|
||||||
ensure_env || exit 1
|
|
||||||
|
|
||||||
if [[ -n "$1" ]]; then
|
|
||||||
PUBKEY_FILE="$1"
|
|
||||||
else
|
|
||||||
cat <<EOF
|
|
||||||
Usage: $0 [pubkey_file]
|
|
||||||
|
|
||||||
The pubkey_file should be the pubkey that will be set up to allow the current user
|
|
||||||
(assumed to be the machine admin) to log in via ssh
|
|
||||||
EOF
|
|
||||||
exit 1
|
|
||||||
fi
|
|
||||||
|
|
||||||
set -xe
|
|
||||||
|
|
||||||
apt update
|
|
||||||
apt upgrade -y
|
|
||||||
|
|
||||||
cat >/etc/apt/apt.conf.d/99-solana <<'EOF'
|
|
||||||
// Set and persist extra caps on iftop binary
|
|
||||||
Dpkg::Post-Invoke { "which iftop 2>&1 >/dev/null && setcap cap_net_raw=eip $(which iftop) || true"; };
|
|
||||||
EOF
|
|
||||||
|
|
||||||
apt install -y build-essential pkg-config clang cmake sysstat linux-tools-common \
|
|
||||||
linux-generic-hwe-18.04-edge linux-tools-generic-hwe-18.04-edge \
|
|
||||||
iftop heaptrack
|
|
||||||
|
|
||||||
"$HERE"/../scripts/install-docker.sh
|
|
||||||
usermod -aG docker "$SETUP_USER"
|
|
||||||
"$HERE"/../scripts/install-certbot.sh
|
|
||||||
"$HERE"/setup-sudoers.sh
|
|
||||||
"$HERE"/setup-ssh.sh
|
|
||||||
|
|
||||||
# Allow admin user to log in
|
|
||||||
BASE_SSH_DIR="${SETUP_HOME}/.ssh"
|
|
||||||
mkdir "$BASE_SSH_DIR"
|
|
||||||
chown "$SETUP_USER:$SETUP_USER" "$BASE_SSH_DIR"
|
|
||||||
cat "$PUBKEY_FILE" > "${BASE_SSH_DIR}/authorized_keys"
|
|
||||||
chown "$SETUP_USER:$SETUP_USER" "${BASE_SSH_DIR}/.ssh/authorized_keys"
|
|
||||||
|
|
||||||
"$HERE"/disable-nouveau.sh
|
|
||||||
"$HERE"/disable-networkd-wait.sh
|
|
||||||
"$HERE"/setup-grub.sh
|
|
||||||
"$HERE"/../scripts/install-earlyoom.sh
|
|
||||||
"$HERE"/../scripts/install-nodeljs.sh
|
|
||||||
"$HERE"/../scripts/localtime.sh
|
|
||||||
"$HERE"/../scripts/install-redis.sh
|
|
||||||
"$HERE"/../scripts/install-rsync.sh
|
|
||||||
"$HERE"/../scripts/install-libssl-compatability.sh
|
|
||||||
"$HERE"/setup-procfs-knobs.sh
|
|
||||||
"$HERE"/setup-limits.sh
|
|
||||||
|
|
||||||
echo "Please reboot then run setup-dc-node-2.sh"
|
|
|
@ -1,22 +0,0 @@
|
||||||
#!/usr/bin/env bash
|
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
|
||||||
source "$HERE"/utils.sh
|
|
||||||
|
|
||||||
ensure_env || exit 1
|
|
||||||
|
|
||||||
set -xe
|
|
||||||
|
|
||||||
"$HERE"/setup-cuda.sh
|
|
||||||
|
|
||||||
# setup persistence mode across reboots
|
|
||||||
TMPDIR="$(mktemp)"
|
|
||||||
mkdir -p "$TMPDIR"
|
|
||||||
if pushd "$TMPDIR"; then
|
|
||||||
tar -xvf /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2
|
|
||||||
./nvidia-persistenced-init/install.sh systemd
|
|
||||||
popd
|
|
||||||
rm -rf "$TMPDIR"
|
|
||||||
fi
|
|
|
@ -1,13 +0,0 @@
|
||||||
#!/usr/bin/env bash
|
|
||||||
|
|
||||||
HERE="$(dirname "$0")"
|
|
||||||
|
|
||||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
|
||||||
source "$HERE"/utils.sh
|
|
||||||
|
|
||||||
ensure_env || exit 1
|
|
||||||
|
|
||||||
set -xe
|
|
||||||
|
|
||||||
printf "GRUB_GFXPAYLOAD_LINUX=1280x1024x32\n\n" >> /etc/default/grub
|
|
||||||
update-grub
|
|
|
@ -18,9 +18,62 @@ add-apt-repository \
|
||||||
|
|
||||||
apt-get update
|
apt-get update
|
||||||
apt-get install -y docker-ce
|
apt-get install -y docker-ce
|
||||||
docker run hello-world
|
|
||||||
|
cat > /lib/systemd/system/docker.service <<EOF
|
||||||
|
[Unit]
|
||||||
|
Description=Docker Application Container Engine
|
||||||
|
Documentation=https://docs.docker.com
|
||||||
|
BindsTo=containerd.service
|
||||||
|
After=network-online.target firewalld.service
|
||||||
|
Wants=network-online.target
|
||||||
|
|
||||||
|
[Service]
|
||||||
|
Type=notify
|
||||||
|
# the default is not to use systemd for cgroups because the delegate issues still
|
||||||
|
# exists and systemd currently does not support the cgroup feature set required
|
||||||
|
# for containers run by docker
|
||||||
|
ExecStart=/usr/bin/dockerd -H unix://
|
||||||
|
ExecReload=/bin/kill -s HUP '$MAINPID'
|
||||||
|
TimeoutSec=0
|
||||||
|
RestartSec=2
|
||||||
|
Restart=always
|
||||||
|
|
||||||
|
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
|
||||||
|
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
|
||||||
|
# to make them work for either version of systemd.
|
||||||
|
StartLimitBurst=3
|
||||||
|
|
||||||
|
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
|
||||||
|
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
|
||||||
|
# this option work for either version of systemd.
|
||||||
|
StartLimitInterval=60s
|
||||||
|
|
||||||
|
# Having non-zero Limit*s causes performance problems due to accounting overhead
|
||||||
|
# in the kernel. We recommend using cgroups to do container-local accounting.
|
||||||
|
LimitNOFILE=infinity
|
||||||
|
LimitNPROC=infinity
|
||||||
|
LimitCORE=infinity
|
||||||
|
|
||||||
|
# Comment TasksMax if your systemd version does not support it.
|
||||||
|
# Only systemd 226 and above support this option.
|
||||||
|
TasksMax=infinity
|
||||||
|
|
||||||
|
# set delegate yes so that systemd does not reset the cgroups of docker containers
|
||||||
|
Delegate=yes
|
||||||
|
|
||||||
|
# kill only the docker process, not all processes in the cgroup
|
||||||
|
KillMode=process
|
||||||
|
|
||||||
|
[Install]
|
||||||
|
WantedBy=multi-user.target
|
||||||
|
EOF
|
||||||
|
|
||||||
|
systemctl daemon-reload
|
||||||
|
systemctl enable --now /lib/systemd/system/docker.service
|
||||||
|
|
||||||
# Grant the solana user access to docker
|
# Grant the solana user access to docker
|
||||||
if id solana; then
|
if id solana; then
|
||||||
addgroup solana docker
|
addgroup solana docker
|
||||||
fi
|
fi
|
||||||
|
|
||||||
|
docker run hello-world
|
||||||
|
|
Loading…
Reference in New Issue