Fixup scripts to set up a new CI node (#9348)
* Clean up node setup scripts for new CI boxes * Move files under ci directory * Set CUDA env var to setup cuda drivers * Fixup and add README * shellcheck * Apply review feedback, rename dir and setup files Co-authored-by: publish-docs.sh <maintainers@solana.com>
This commit is contained in:
parent
41fec5bd5b
commit
3fbe7f0bb3
64
ci/README.md
64
ci/README.md
|
@ -2,7 +2,7 @@
|
|||
Our CI infrastructure is built around [BuildKite](https://buildkite.com) with some
|
||||
additional GitHub integration provided by https://github.com/mvines/ci-gate
|
||||
|
||||
## Agent Queues
|
||||
# Agent Queues
|
||||
|
||||
We define two [Agent Queues](https://buildkite.com/docs/agent/v3/queues):
|
||||
`queue=default` and `queue=cuda`. The `default` queue should be favored and
|
||||
|
@ -12,9 +12,52 @@ be run on the `default` queue, and the [buildkite artifact
|
|||
system](https://buildkite.com/docs/builds/artifacts) used to transfer build
|
||||
products over to a GPU instance for testing.
|
||||
|
||||
## Buildkite Agent Management
|
||||
# Buildkite Agent Management
|
||||
|
||||
### Buildkite Azure Setup
|
||||
## Manual Node Setup for Colocated Hardware
|
||||
|
||||
This section describes how to set up a new machine that does not have a
|
||||
pre-configured image with all the requirements installed. Used for custom-built
|
||||
hardware at a colocation or office facility. Also works for vanilla Ubuntu cloud
|
||||
instances.
|
||||
|
||||
### Pre-Requisites
|
||||
|
||||
- Install Ubuntu 18.04 LTS Server
|
||||
- Log in as a local or remote user with `sudo` privileges
|
||||
|
||||
### Install Core Requirements
|
||||
|
||||
##### Non-GPU enabled machines
|
||||
```bash
|
||||
sudo ./setup-new-buildkite-agent/setup-new-machine.sh
|
||||
```
|
||||
|
||||
##### GPU-enabled machines
|
||||
- 1 or more NVIDIA GPUs should be installed in the machine (tested with 2080Ti)
|
||||
```bash
|
||||
sudo CUDA=1 ./setup-new-buildkite-agent/setup-new-machine.sh
|
||||
```
|
||||
|
||||
### Configure Node for Buildkite-agent based CI
|
||||
|
||||
- Install `buildkite-agent` and set up it user environment with:
|
||||
```bash
|
||||
sudo ./setup-new-buildkite-agent/setup-buildkite.sh
|
||||
```
|
||||
- Copy the pubkey contents from `~buildkite-agent/.ssh/id_ecdsa.pub` and
|
||||
add the pubkey as an authorized SSH key on github.
|
||||
- Edit `/etc/buildkite-agent/buildkite-agent.cfg` and/or `/etc/systemd/system/buildkite-agent@*` to the desired configuration of the agent(s)
|
||||
- Copy `ejson` keys from another CI node at `/opt/ejson/keys/`
|
||||
to the same location on the new node.
|
||||
- Start the new agent(s) with `sudo systemctl enable --now buildkite-agent`
|
||||
|
||||
# Reference
|
||||
|
||||
This section contains details regarding previous CI setups that have been used,
|
||||
and that we may return to one day.
|
||||
|
||||
## Buildkite Azure Setup
|
||||
|
||||
Create a new Azure-based "queue=default" agent by running the following command:
|
||||
```
|
||||
|
@ -35,7 +78,7 @@ Creating a "queue=cuda" agent follows the same process but additionally:
|
|||
2. Edit the tags field in /etc/buildkite-agent/buildkite-agent.cfg to `tags="queue=cuda,queue=default"`
|
||||
and decrease the value of the priority field by one
|
||||
|
||||
#### Updating the CI Disk Image
|
||||
### Updating the CI Disk Image
|
||||
|
||||
1. Create a new VM Instance as described above
|
||||
1. Modify it as required
|
||||
|
@ -48,12 +91,7 @@ Creating a "queue=cuda" agent follows the same process but additionally:
|
|||
1. Goto the `ci` resource group in the Azure portal and remove all resources
|
||||
with the XYZ name in them
|
||||
|
||||
## Reference
|
||||
|
||||
This section contains details regarding previous CI setups that have been used,
|
||||
and that we may return to one day.
|
||||
|
||||
### Buildkite AWS CloudFormation Setup
|
||||
## Buildkite AWS CloudFormation Setup
|
||||
|
||||
**AWS CloudFormation is currently inactive, although it may be restored in the
|
||||
future**
|
||||
|
@ -62,7 +100,7 @@ AWS CloudFormation can be used to scale machines up and down based on the
|
|||
current CI load. If no machine is currently running it can take up to 60
|
||||
seconds to spin up a new instance, please remain calm during this time.
|
||||
|
||||
#### AMI
|
||||
### AMI
|
||||
We use a custom AWS AMI built via https://github.com/solana-labs/elastic-ci-stack-for-aws/tree/solana/cuda.
|
||||
|
||||
Use the following process to update this AMI as dependencies change:
|
||||
|
@ -84,13 +122,13 @@ The new AMI should also now be visible in your EC2 Dashboard. Go to the desired
|
|||
AWS CloudFormation stack, update the **ImageId** field to the new AMI id, and
|
||||
*apply* the stack changes.
|
||||
|
||||
### Buildkite GCP Setup
|
||||
## Buildkite GCP Setup
|
||||
|
||||
CI runs on Google Cloud Platform via two Compute Engine Instance groups:
|
||||
`ci-default` and `ci-cuda`. Autoscaling is currently disabled and the number of
|
||||
VM Instances in each group is manually adjusted.
|
||||
|
||||
#### Updating a CI Disk Image
|
||||
### Updating a CI Disk Image
|
||||
|
||||
Each Instance group has its own disk image, `ci-default-vX` and
|
||||
`ci-cuda-vY`, where *X* and *Y* are incremented each time the image is changed.
|
||||
|
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
2
net/datacenter-node-install/disable-nouveau.sh → ci/setup-new-buildkite-agent/disable-nouveau.sh
Normal file → Executable file
2
net/datacenter-node-install/disable-nouveau.sh → ci/setup-new-buildkite-agent/disable-nouveau.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
|
@ -0,0 +1,4 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
sudo systemctl daemon-reload
|
||||
sudo systemctl enable --now buildkite-agent
|
2
net/datacenter-node-install/set-hostname.sh → ci/setup-new-buildkite-agent/set-hostname.sh
Normal file → Executable file
2
net/datacenter-node-install/set-hostname.sh → ci/setup-new-buildkite-agent/set-hostname.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
|
@ -0,0 +1,84 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
set -e
|
||||
|
||||
# Install buildkite-agent
|
||||
echo "deb https://apt.buildkite.com/buildkite-agent stable main" | tee /etc/apt/sources.list.d/buildkite-agent.list
|
||||
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 32A37959C2FA5C3C99EFBC32A79206696452D198
|
||||
apt-get update
|
||||
apt-get install -y buildkite-agent
|
||||
|
||||
|
||||
# Configure the installation
|
||||
echo "Go to https://buildkite.com/organizations/solana-labs/agents"
|
||||
echo "Click Reveal Agent Token"
|
||||
echo "Paste the Agent Token, then press Enter:"
|
||||
|
||||
read -r agent_token
|
||||
sudo sed -i "s/xxx/$agent_token/g" /etc/buildkite-agent/buildkite-agent.cfg
|
||||
|
||||
cat > /etc/buildkite-agent/hooks/environment <<EOF
|
||||
set -e
|
||||
|
||||
export BUILDKITE_GIT_CLEAN_FLAGS="-ffdqx"
|
||||
|
||||
# Hack for non-docker rust builds
|
||||
export PATH='$PATH':~buildkite-agent/.cargo/bin
|
||||
|
||||
# Add path to snaps
|
||||
source /etc/profile.d/apps-bin-path.sh
|
||||
|
||||
if [[ '$BUILDKITE_BRANCH' =~ pull/* ]]; then
|
||||
export BUILDKITE_REFSPEC="+'$BUILDKITE_BRANCH':refs/remotes/origin/'$BUILDKITE_BRANCH'"
|
||||
fi
|
||||
EOF
|
||||
|
||||
chown buildkite-agent:buildkite-agent /etc/buildkite-agent/hooks/environment
|
||||
|
||||
# Create SSH key
|
||||
sudo -u buildkite-agent mkdir -p ~buildkite-agent/.ssh
|
||||
sudo -u buildkite-agent ssh-keygen -t ecdsa -q -N "" -f ~buildkite-agent/.ssh/id_ecdsa
|
||||
|
||||
# Set buildkite-agent user's shell
|
||||
sudo usermod --shell /bin/bash buildkite-agent
|
||||
|
||||
# Install Rust for buildkite-agent
|
||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs -o /tmp/rustup-init.sh
|
||||
sudo -u buildkite-agent HOME=~buildkite-agent sh /tmp/rustup-init.sh -y
|
||||
|
||||
# Add to docker and sudoers group
|
||||
addgroup buildkite-agent docker
|
||||
addgroup buildkite-agent sudo
|
||||
|
||||
# Edit the systemd unit file to include LimitNOFILE
|
||||
cat > /lib/systemd/system/buildkite-agent.service <<EOF
|
||||
[Unit]
|
||||
Description=Buildkite Agent
|
||||
Documentation=https://buildkite.com/agent
|
||||
After=syslog.target
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=simple
|
||||
User=buildkite-agent
|
||||
Environment=HOME=/var/lib/buildkite-agent
|
||||
ExecStart=/usr/bin/buildkite-agent start
|
||||
RestartSec=5
|
||||
Restart=on-failure
|
||||
RestartForceExitStatus=SIGPIPE
|
||||
TimeoutStartSec=10
|
||||
TimeoutStopSec=0
|
||||
KillMode=process
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
DefaultInstance=1
|
||||
EOF
|
18
net/datacenter-node-install/setup-cuda.sh → ci/setup-new-buildkite-agent/setup-cuda.sh
Normal file → Executable file
18
net/datacenter-node-install/setup-cuda.sh → ci/setup-new-buildkite-agent/setup-cuda.sh
Normal file → Executable file
|
@ -2,12 +2,13 @@
|
|||
|
||||
# https://developer.nvidia.com/cuda-toolkit-archive
|
||||
VERSIONS=()
|
||||
VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux")
|
||||
VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run")
|
||||
#VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.0/Prod/local_installers/cuda_10.0.130_410.48_linux")
|
||||
#VERSIONS+=("https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run")
|
||||
VERSIONS+=("http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run")
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
@ -51,3 +52,14 @@ done
|
|||
|
||||
# Allow normal users to use CUDA profiler
|
||||
echo 'options nvidia "NVreg_RestrictProfilingToAdminUsers=0"' > /etc/modprobe.d/nvidia-enable-user-profiling.conf
|
||||
|
||||
# setup persistence mode across reboots
|
||||
TMPDIR="$(mktemp -d)"
|
||||
if pushd "$TMPDIR"; then
|
||||
tar -xvf /usr/share/doc/NVIDIA_GLX-1.0/samples/nvidia-persistenced-init.tar.bz2
|
||||
./nvidia-persistenced-init/install.sh systemd
|
||||
popd
|
||||
rm -rf "$TMPDIR"
|
||||
fi
|
||||
|
||||
nvidia-smi -pm ENABLED
|
2
net/datacenter-node-install/setup-limits.sh → ci/setup-new-buildkite-agent/setup-limits.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-limits.sh → ci/setup-new-buildkite-agent/setup-limits.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
|
@ -0,0 +1,49 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
SOLANA_ROOT="$HERE"/../..
|
||||
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
set -ex
|
||||
|
||||
apt update
|
||||
apt upgrade -y
|
||||
|
||||
cat >/etc/apt/apt.conf.d/99-solana <<'EOF'
|
||||
// Set and persist extra caps on iftop binary
|
||||
Dpkg::Post-Invoke { "which iftop 2>&1 >/dev/null && setcap cap_net_raw=eip $(which iftop) || true"; };
|
||||
EOF
|
||||
|
||||
apt install -y build-essential pkg-config clang cmake sysstat linux-tools-common \
|
||||
linux-generic-hwe-18.04-edge linux-tools-generic-hwe-18.04-edge \
|
||||
iftop heaptrack jq ruby python3-venv gcc-multilib libudev-dev
|
||||
|
||||
gem install ejson ejson2env
|
||||
mkdir -p /opt/ejson/keys
|
||||
|
||||
"$SOLANA_ROOT"/net/scripts/install-docker.sh
|
||||
usermod -aG docker "$SETUP_USER"
|
||||
"$SOLANA_ROOT"/net/scripts/install-certbot.sh
|
||||
"$HERE"/setup-sudoers.sh
|
||||
"$HERE"/setup-ssh.sh
|
||||
|
||||
"$HERE"/disable-nouveau.sh
|
||||
"$HERE"/disable-networkd-wait.sh
|
||||
|
||||
"$SOLANA_ROOT"/net/scripts/install-earlyoom.sh
|
||||
"$SOLANA_ROOT"/net/scripts/install-nodejs.sh
|
||||
"$SOLANA_ROOT"/net/scripts/localtime.sh
|
||||
"$SOLANA_ROOT"/net/scripts/install-redis.sh
|
||||
"$SOLANA_ROOT"/net/scripts/install-rsync.sh
|
||||
"$SOLANA_ROOT"/net/scripts/install-libssl-compatability.sh
|
||||
|
||||
"$HERE"/setup-procfs-knobs.sh
|
||||
"$HERE"/setup-limits.sh
|
||||
|
||||
[[ -n $CUDA ]] && "$HERE"/setup-cuda.sh
|
||||
|
||||
exit 0
|
4
net/datacenter-node-install/setup-partner-node.sh → ci/setup-new-buildkite-agent/setup-partner-node.sh
Normal file → Executable file
4
net/datacenter-node-install/setup-partner-node.sh → ci/setup-new-buildkite-agent/setup-partner-node.sh
Normal file → Executable file
|
@ -2,12 +2,12 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
set -xe
|
||||
set -ex
|
||||
|
||||
"$HERE"/disable-nouveau.sh
|
||||
"$HERE"/disable-networkd-wait.sh
|
2
net/datacenter-node-install/setup-procfs-knobs.sh → ci/setup-new-buildkite-agent/setup-procfs-knobs.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-procfs-knobs.sh → ci/setup-new-buildkite-agent/setup-procfs-knobs.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
2
net/datacenter-node-install/setup-ssh.sh → ci/setup-new-buildkite-agent/setup-ssh.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-ssh.sh → ci/setup-new-buildkite-agent/setup-ssh.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
2
net/datacenter-node-install/setup-sudoers.sh → ci/setup-new-buildkite-agent/setup-sudoers.sh
Normal file → Executable file
2
net/datacenter-node-install/setup-sudoers.sh → ci/setup-new-buildkite-agent/setup-sudoers.sh
Normal file → Executable file
|
@ -2,7 +2,7 @@
|
|||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
# shellcheck source=ci/setup-new-buildkite-agent/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
0
net/datacenter-node-install/utils.sh → ci/setup-new-buildkite-agent/utils.sh
Normal file → Executable file
0
net/datacenter-node-install/utils.sh → ci/setup-new-buildkite-agent/utils.sh
Normal file → Executable file
|
@ -1,25 +0,0 @@
|
|||
# Introduction
|
||||
|
||||
These scripts are intended to facilitate the preparation of dedicated Solana
|
||||
nodes. They have been tested as working from a clean installation of Ubuntu
|
||||
18.04 Server. Use elsewhere is unsupported.
|
||||
|
||||
# Installation
|
||||
|
||||
Both installation methods require that the NVIDIA proprietary driver installer
|
||||
programs be downloaded alongside [setup-cuda.sh](./setup-cuda.sh). If they do
|
||||
not exist at runtime, an attempt will be made to download them automatically. To
|
||||
avoid downloading the installers at runtime, they may be downloaded in advance
|
||||
and placed as siblings to [setup-cuda.sh](./setup-cuda.sh).
|
||||
|
||||
For up-to-date NVIDIA driver version requirements, see [setup-cuda.sh](./setup-cuda.sh)
|
||||
|
||||
## Datacenter Node
|
||||
|
||||
1) `sudo ./setup-dc-node-1.sh`
|
||||
2) `sudo reboot`
|
||||
3) `sudo ./setup-dc-node-2.sh`
|
||||
|
||||
## Partner Node
|
||||
|
||||
1) `$ sudo ./setup-partner-node.sh`
|
|
@ -1,61 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
if [[ -n "$1" ]]; then
|
||||
PUBKEY_FILE="$1"
|
||||
else
|
||||
cat <<EOF
|
||||
Usage: $0 [pubkey_file]
|
||||
|
||||
The pubkey_file should be the pubkey that will be set up to allow the current user
|
||||
(assumed to be the machine admin) to log in via ssh
|
||||
EOF
|
||||
exit 1
|
||||
fi
|
||||
|
||||
set -xe
|
||||
|
||||
apt update
|
||||
apt upgrade -y
|
||||
|
||||
cat >/etc/apt/apt.conf.d/99-solana <<'EOF'
|
||||
// Set and persist extra caps on iftop binary
|
||||
Dpkg::Post-Invoke { "which iftop 2>&1 >/dev/null && setcap cap_net_raw=eip $(which iftop) || true"; };
|
||||
EOF
|
||||
|
||||
apt install -y build-essential pkg-config clang cmake sysstat linux-tools-common \
|
||||
linux-generic-hwe-18.04-edge linux-tools-generic-hwe-18.04-edge \
|
||||
iftop heaptrack
|
||||
|
||||
"$HERE"/../scripts/install-docker.sh
|
||||
usermod -aG docker "$SETUP_USER"
|
||||
"$HERE"/../scripts/install-certbot.sh
|
||||
"$HERE"/setup-sudoers.sh
|
||||
"$HERE"/setup-ssh.sh
|
||||
|
||||
# Allow admin user to log in
|
||||
BASE_SSH_DIR="${SETUP_HOME}/.ssh"
|
||||
mkdir "$BASE_SSH_DIR"
|
||||
chown "$SETUP_USER:$SETUP_USER" "$BASE_SSH_DIR"
|
||||
cat "$PUBKEY_FILE" > "${BASE_SSH_DIR}/authorized_keys"
|
||||
chown "$SETUP_USER:$SETUP_USER" "${BASE_SSH_DIR}/.ssh/authorized_keys"
|
||||
|
||||
"$HERE"/disable-nouveau.sh
|
||||
"$HERE"/disable-networkd-wait.sh
|
||||
"$HERE"/setup-grub.sh
|
||||
"$HERE"/../scripts/install-earlyoom.sh
|
||||
"$HERE"/../scripts/install-nodeljs.sh
|
||||
"$HERE"/../scripts/localtime.sh
|
||||
"$HERE"/../scripts/install-redis.sh
|
||||
"$HERE"/../scripts/install-rsync.sh
|
||||
"$HERE"/../scripts/install-libssl-compatability.sh
|
||||
"$HERE"/setup-procfs-knobs.sh
|
||||
"$HERE"/setup-limits.sh
|
||||
|
||||
echo "Please reboot then run setup-dc-node-2.sh"
|
|
@ -1,22 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
set -xe
|
||||
|
||||
"$HERE"/setup-cuda.sh
|
||||
|
||||
# setup persistence mode across reboots
|
||||
TMPDIR="$(mktemp)"
|
||||
mkdir -p "$TMPDIR"
|
||||
if pushd "$TMPDIR"; then
|
||||
tar -xvf /usr/share/doc/NVIDIA_GLX-1.0/sample/nvidia-persistenced-init.tar.bz2
|
||||
./nvidia-persistenced-init/install.sh systemd
|
||||
popd
|
||||
rm -rf "$TMPDIR"
|
||||
fi
|
|
@ -1,13 +0,0 @@
|
|||
#!/usr/bin/env bash
|
||||
|
||||
HERE="$(dirname "$0")"
|
||||
|
||||
# shellcheck source=net/datacenter-node-install/utils.sh
|
||||
source "$HERE"/utils.sh
|
||||
|
||||
ensure_env || exit 1
|
||||
|
||||
set -xe
|
||||
|
||||
printf "GRUB_GFXPAYLOAD_LINUX=1280x1024x32\n\n" >> /etc/default/grub
|
||||
update-grub
|
|
@ -18,9 +18,62 @@ add-apt-repository \
|
|||
|
||||
apt-get update
|
||||
apt-get install -y docker-ce
|
||||
docker run hello-world
|
||||
|
||||
cat > /lib/systemd/system/docker.service <<EOF
|
||||
[Unit]
|
||||
Description=Docker Application Container Engine
|
||||
Documentation=https://docs.docker.com
|
||||
BindsTo=containerd.service
|
||||
After=network-online.target firewalld.service
|
||||
Wants=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
# the default is not to use systemd for cgroups because the delegate issues still
|
||||
# exists and systemd currently does not support the cgroup feature set required
|
||||
# for containers run by docker
|
||||
ExecStart=/usr/bin/dockerd -H unix://
|
||||
ExecReload=/bin/kill -s HUP '$MAINPID'
|
||||
TimeoutSec=0
|
||||
RestartSec=2
|
||||
Restart=always
|
||||
|
||||
# Note that StartLimit* options were moved from "Service" to "Unit" in systemd 229.
|
||||
# Both the old, and new location are accepted by systemd 229 and up, so using the old location
|
||||
# to make them work for either version of systemd.
|
||||
StartLimitBurst=3
|
||||
|
||||
# Note that StartLimitInterval was renamed to StartLimitIntervalSec in systemd 230.
|
||||
# Both the old, and new name are accepted by systemd 230 and up, so using the old name to make
|
||||
# this option work for either version of systemd.
|
||||
StartLimitInterval=60s
|
||||
|
||||
# Having non-zero Limit*s causes performance problems due to accounting overhead
|
||||
# in the kernel. We recommend using cgroups to do container-local accounting.
|
||||
LimitNOFILE=infinity
|
||||
LimitNPROC=infinity
|
||||
LimitCORE=infinity
|
||||
|
||||
# Comment TasksMax if your systemd version does not support it.
|
||||
# Only systemd 226 and above support this option.
|
||||
TasksMax=infinity
|
||||
|
||||
# set delegate yes so that systemd does not reset the cgroups of docker containers
|
||||
Delegate=yes
|
||||
|
||||
# kill only the docker process, not all processes in the cgroup
|
||||
KillMode=process
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
systemctl daemon-reload
|
||||
systemctl enable --now /lib/systemd/system/docker.service
|
||||
|
||||
# Grant the solana user access to docker
|
||||
if id solana; then
|
||||
addgroup solana docker
|
||||
fi
|
||||
|
||||
docker run hello-world
|
||||
|
|
Loading…
Reference in New Issue