open source v2

This commit is contained in:
Nathaniel Parke 2020-10-19 17:41:51 +08:00
parent dc7673ba74
commit 35a7a33346
34 changed files with 1002 additions and 1 deletions

5
.gitignore vendored
View File

@ -127,3 +127,8 @@ dmypy.json
# Pyre type checker
.pyre/
deploy/id_rsa
*.retry
*.key
.idea/

View File

@ -1 +1,55 @@
# validators
# Validators
## Motivation
This repository is meant to serve as an example for how to run a solana validator.
It does not give specifics on the architecture of Solana, and should not be used as a substitute for Solana's documentation.
It is highly recommended to read [Solana's Documentation](https://docs.solana.com/running-validator) about running a validator.
This repository should be used in conjunction with Solana's guide. It provides practical
real-world examples of cluster setup, and should act as a starting point for participating
in mainnet validation.
This repository gives two examples of potential validator setups. The first is a
single node validator that can be used as an entry point for querying on-chain Solana data, or
validating transactions.
The second is a cluster of Solana validators that are load balanced by an nginx server. Nginx
has an active health check feature offered in their premium version. A configuration
for active health checks is also included.
The end goal of this guide is to have a solana validator cluster running in a cloud
environment.
## Overview of setups
- run a single validator
- run a cluster of validators
## Running a single validator
#### Instance configuration
##### Choosing an instance type
Solana's documentation recommends choosing a node type with the highest number of cores possible ([see here](https://docs.solana.com/running-validator/validator-reqs)).
Additionally the Solana mainnet utilizes GPUs to increase network throughput. Solana's documentation
recommends using Nvidia Turing or Volta family GPUs which are available through most cloud providers.
This guide was tested using [Amazon AWS g4dn.16xlarge instances](https://aws.amazon.com/ec2/instance-types/g4/) using the
Ubuntu 18.04 Deep Learning AMI. These instances provide Nvidia T4 GPUs with a balance of high network
throughput and CPU resources.
##### Instance network configuration
After provisioning an instance it is important to configure network whitelists to be compatible
with a validator's network usage. Solana nodes communicate via a gossip protocol. This protocol takes
place over a port range specified upon validator startup. For this guide we will set that port range to
8000-8012. Be sure to whitelist network traffic on whichever port range you choose.
Validator RPC servers also bind to configurable ports. This guide will set RPC servers to use port 8899
for standard HTTP requests and 8900 for websocket connections.
#### Setting up a single validator
Once an instance has been deployed and is accessible over SSH, we can use ansible to run some basic setup
scripts. Ansible works by inspecting the contents of a `hosts.yaml` file, which defines the inventory of servers to which one can deploy.
To make our servers accesible to ansible, add your server's network location to the validators block in `deploy/hosts.yaml`.
This will indicate that the specified server is part of the `validators` group, which will contain our validator machines.
`deploy/setup.yaml` contains a set of common setup steps for configuring a server from the base OS image. You can run these
setup steps using
```
# run this from the /deploy directory
ansible-playbook -i hosts.yaml -l validators setup.yaml
```
## Running a cluster of validators

7
deploy/ansible.cfg Normal file
View File

@ -0,0 +1,7 @@
[defaults]
inventory = ./hosts.yaml
forks = 100
interpreter_python = auto
[ssh_connection]
pipelining = True

5
deploy/check_slot_distance.sh Executable file
View File

@ -0,0 +1,5 @@
#!/bin/bash -e
pssh=$(which parallel-ssh || which pssh)
$pssh -h <(ansible all --list-hosts -i hosts.yaml | tail -n+2) -i -l ubuntu -- 'curl http://localhost:8899/health'

View File

@ -0,0 +1,5 @@
# Increase cache size
cache-size=4096
# Cache negative replies even if they do not have TTLs
neg-ttl=10

View File

@ -0,0 +1,82 @@
user www-data;
worker_processes auto;
pid /run/nginx.pid;
include /etc/nginx/modules-enabled/*.conf;
worker_rlimit_nofile 80000;
events {
worker_connections 50000;
# multi_accept on;
}
http {
##
# Basic Settings
##
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65;
types_hash_max_size 2048;
# server_tokens off;
server_names_hash_bucket_size 128;
# server_name_in_redirect off;
include /etc/nginx/mime.types;
default_type application/octet-stream;
client_max_body_size 100m;
proxy_busy_buffers_size 32k;
proxy_buffers 128 8k;
##
# SSL Settings
##
ssl_protocols TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384;
ssl_session_timeout 100m;
ssl_session_cache shared:SSL:10m;
ssl_session_tickets off;
ssl_dhparam /etc/nginx/ssl/dhparam.pem;
##
# Logging Settings
##
log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$host" sn="$server_name" '
'rt=$request_time '
'ua="$upstream_addr" us="$upstream_status" '
'ut="$upstream_response_time" ul="$upstream_response_length" '
'cs=$upstream_cache_status '
'msec=$msec '
'aid="$upstream_http_account_id"';
access_log /var/log/nginx/access.log main_ext;
error_log /var/log/nginx/error.log warn;
##
# Gzip Settings
##
gzip on;
gzip_vary on;
gzip_proxied any;
gzip_comp_level 6;
gzip_buffers 16 8k;
gzip_http_version 1.1;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml application/xml+rss text/javascript;
##
# Virtual Host Configs
##
include /etc/nginx/conf.d/*.conf;
include /etc/nginx/sites-enabled/*;
}

View File

@ -0,0 +1,9 @@
server {
listen 127.0.0.1:81;
server_name 127.0.0.1;
location /nginx_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}

View File

@ -0,0 +1,5 @@
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains; preload";
add_header X-Frame-Options sameorigin;
add_header X-Content-Type-Options nosniff;
add_header X-XSS-Protection "1; mode=block";
add_header Content-Security-Policy block-all-mixed-content;

View File

@ -0,0 +1,13 @@
-----BEGIN DH PARAMETERS-----
MIICCAKCAgEAwSMA4EB8xhOeTzV+UMAG7fGVvE75S7WqUMG83YC0hXuefpNY0w5b
wQkM5ffkNiIa/lv2W+SqR2WRoh0M6xI0HdUdVKVkNYyWqBKRW4fjh+hbYMar8FCM
TFibDHoNU+40Z9bwKWWeURZAQj9yCA0dbXCkv7nIuVrWTHBMHtNt9quMvqevZPoU
wL6N004E9pjlEogH4PX/H+o08xGicNtlJXsU0rd2Xev9URo/8IU92qocBjUiUvow
yRUJaufmqfT5IV+ezLUCV1yC2UOj0BA3sNVdFNS8MUIIJWWUfLspXHE0iQjNJuW6
HOmj9sMwVWjnuRjpMza6wNi+CAaKgzI8YrfABd/PtRl9bxztGRXTaLK+ecRlUbq3
l++SLu3mX7GfoACxHhAxQAoaDsZZMgqvsI23DP5FHCCMSQGw6r/dJuZ4q4b8qjWX
u6eOY+ZBg4FIYiMsHcgNcNPGKoLf/YQ3L3EAl9iRb2dXPza5QW9pLzoGLRC94EIT
Wq2hthOqJPsiEihc2gBaV5sdcbO+tqf4XhtbWLKMVDt91TSYzukdrlE5rnFpmvr5
0ze5saNI1tsAgpL8UmJkjpT19VUF6eTv7wpc2gAklel+kUTlJ1rjwja2uq+zNDI5
dzt6iXs1SHgY6wkn9orNPAmWFRoKkaLJgmWFeJHIqp14opS4ZESaSiMCAQI=
-----END DH PARAMETERS-----

View File

@ -0,0 +1,12 @@
server {
listen 30000;
location /api {
api;
}
location = /dashboard.html {
root /usr/share/nginx/html;
}
# location /swagger-ui {
# root /usr/share/nginx/html;
# }
}

View File

@ -0,0 +1,73 @@
upstream validator_backend {
zone validator_backend 512k;
least_conn;
keepalive 8192;
server validator-1.test.net:8899 max_fails=20 fail_timeout=2;
server validator-2.test.net:8899 max_fails=20 fail_timeout=2;
}
upstream validator_ws_backend {
zone validator_ws_backend 512k;
least_conn;
server validator-1.test.net:8899 max_fails=20 fail_timeout=2;
server validator-2.test.net:8899 max_fails=20 fail_timeout=2;
}
server {
listen 80;
server_name validator-lb.test.net;
status_zone http_status_zone;
location / {
try_files /nonexistent @$http_upgrade;
}
location @websocket {
proxy_pass http://validator_ws_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
health_check uri=/health port=9090;
}
location @ {
proxy_pass http://validator_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_next_upstream error timeout non_idempotent;
proxy_next_upstream_timeout 5;
proxy_next_upstream_tries 5;
health_check uri=/health port=9090;
}
}
server {
listen 443;
server_name validator-lb.test.net;
status_zone https_status_zone;
ssl on;
ssl_certificate /etc/ssl/certs/test.net.pem;
ssl_certificate_key /etc/ssl/private/test.net.key;
ssl_client_certificate /etc/ssl/certs/cloudflare.pem;
ssl_verify_client on;
location / {
try_files /nonexistent @$http_upgrade;
}
location @websocket {
proxy_pass http://validator_ws_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
health_check uri=/health port=9090;
}
location @ {
proxy_pass http://validator_backend;
proxy_http_version 1.1;
proxy_set_header Connection "";
health_check uri=/health port=9090;
}
}

View File

@ -0,0 +1,65 @@
upstream validator_backend {
least_conn;
keepalive 8192;
server validator-1.test.net:8899 max_fails=20 fail_timeout=2;
server validator-2.test.net:8899 max_fails=20 fail_timeout=2;
}
upstream validator_ws_backend {
least_conn;
server validator-1.test.net:8899 max_fails=20 fail_timeout=2;
server validator-2.test.net:8899 max_fails=20 fail_timeout=2;
}
server {
listen 80;
server_name validator-lb.test.net;
location / {
try_files /nonexistent @$http_upgrade;
}
location @websocket {
proxy_pass http://validator_ws_backend/$1;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location @ {
proxy_pass http://validator_backend/$1;
proxy_http_version 1.1;
proxy_set_header Connection "";
proxy_next_upstream error timeout non_idempotent;
proxy_next_upstream_timeout 5;
proxy_next_upstream_tries 5;
}
}
server {
listen 443;
server_name validator-lb.test.net;
ssl on;
ssl_certificate /etc/ssl/certs/test.net.pem;
ssl_certificate_key /etc/ssl/private/test.net.key;
ssl_client_certificate /etc/ssl/certs/cloudflare.pem;
ssl_verify_client on;
location / {
try_files /nonexistent @$http_upgrade;
}
location @websocket {
proxy_pass http://validator_ws_backend/$1;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location @ {
proxy_pass http://validator_backend/$1;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}

View File

@ -0,0 +1 @@
# insert ssl certificate here

View File

@ -0,0 +1 @@
# insert ssl private key here

View File

@ -0,0 +1,30 @@
upstream validator_backend {
keepalive 8192;
server localhost:8899 max_fails=20 fail_timeout=2;
}
upstream validator_ws_backend {
least_conn;
server localhost:8900 fail_timeout=2;
}
server {
listen 80;
location / {
try_files /nonexistent @$http_upgrade;
}
location @websocket {
proxy_pass http://validator_ws_backend/$1;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
location @ {
proxy_pass http://validator_backend/$1;
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}

View File

@ -0,0 +1,3 @@
# Increase memory mapped files limit
# https://docs.solana.com/running-validator/validator-start#manual
vm.max_map_count = 500000

View File

@ -0,0 +1,6 @@
# Increase UDP buffer size
# https://docs.solana.com/running-validator/validator-start#manual
net.core.rmem_default = 134217728
net.core.rmem_max = 134217728
net.core.wmem_default = 134217728
net.core.wmem_max = 134217728

10
deploy/health-setup.yaml Normal file
View File

@ -0,0 +1,10 @@
- name: Setup health check
hosts: all
remote_user: sol
tasks:
- name: install python dependencies
pip:
chdir: ~/sol/sol
virtualenv: env
virtualenv_python: python3.6
requirements: "{{ requirements | default('requirements.txt') }}"

28
deploy/hosts.yaml Normal file
View File

@ -0,0 +1,28 @@
all:
vars:
validator_user: sol
solana_version: v1.2.32
run_validator: true
nginx_sites:
- validator.conf
load_balancers:
hosts:
validator-lb.test.net:
extra_packages:
- nginx
vars:
is_watchtower: true
etc_dir: lb
run_validator: false
validators:
hosts:
validator-1.test.net:
validator-2.test.net:
vars:
etc_dir: validator
supervisord_conf_file: validator.conf
local_disk: /dev/nvme0n1
extra_packages:
- nginx

1
deploy/id_rsa Normal file
View File

@ -0,0 +1 @@
# add created private key here

1
deploy/id_rsa.pub Normal file
View File

@ -0,0 +1 @@
# add public key here

27
deploy/nginx-setup.yaml Normal file
View File

@ -0,0 +1,27 @@
- name: Setup nginx
hosts: all
remote_user: ubuntu
become: yes
tasks:
- name: linking enabled sites
file:
src: /etc/nginx/sites-available/{{ item }}
dest: /etc/nginx/sites-enabled/{{ item }}
state: link
with_items: "{{ nginx_sites }}"
notify: reload nginx
- find: file_type=link paths=/etc/nginx/sites-enabled
register: sites
- name: cleaning up others
with_items: "{{ sites.files | map(attribute='path') | list }}"
file: path={{ item }} state=absent
when: "(item | basename) not in nginx_sites"
notify: reload nginx
handlers:
- name: reload nginx
service:
name: nginx
state: reloaded

View File

@ -0,0 +1,72 @@
- hosts: validators
remote_user: ubuntu
become: yes
max_fail_percentage: 30
serial: 10
tasks:
- name: remove from load balancers
replace:
path: /etc/nginx/sites-available/validator.conf
regexp: '^(\s*)(server {{ inventory_hostname }}:\d+ .*;)\s*$'
replace: '\1# \2 # removed for restart'
delegate_to: "{{ item }}"
with_items: "{{ groups.load_balancers }}"
throttle: 1
register: result
failed_when: result is not changed
- name: reload nginx
service:
name: nginx
state: reloaded
delegate_to: "{{ item }}"
with_items: "{{ groups.load_balancers }}"
run_once: yes
- name: wait for connections to close
wait_for:
timeout: 10
- name: restart validator
command: supervisorctl restart validator
- name: wait for validator to start up
wait_for:
port: 8899
delay: 60
- name: wait for validator to catch up
uri:
url: http://localhost:8899/health
return_content: yes
register: result
until: "result.content == 'ok'"
retries: 200
delay: 10
- name: wait for validator to fully catch up
wait_for:
timeout: 120
- name: add back to load balancers
replace:
path: /etc/nginx/sites-available/validator.conf
regexp: '^(\s*)# (server {{ inventory_hostname }}:\d+ .*;) # removed for restart$'
replace: '\1\2'
delegate_to: "{{ item }}"
with_items: "{{ groups.load_balancers }}"
throttle: 1
register: result
failed_when: result is not changed
- hosts: validators
remote_user: ubuntu
become: yes
tasks:
- name: reload nginx one last time
service:
name: nginx
state: reloaded
delegate_to: "{{ item }}"
with_items: "{{ groups.load_balancers }}"
run_once: yes

174
deploy/setup.yaml Normal file
View File

@ -0,0 +1,174 @@
- name: set up system
hosts: all
remote_user: ubuntu
become: yes
tasks:
- name: set hostname
hostname:
name: "{{ inventory_hostname }}"
- name: add self to /etc/hosts
lineinfile:
dest: /etc/hosts
regexp: '^127\.0\.0\.1[ \t]+localhost'
line: '127.0.0.1 localhost {{ inventory_hostname }}'
state: present
- group:
name: "{{ validator_user }}"
- user:
name: "{{ validator_user }}"
group: "{{ validator_user }}"
shell: /bin/bash
- file:
path: "/home/{{ validator_user }}/.ssh"
state: directory
owner: "{{ validator_user }}"
group: "{{ validator_user }}"
- name: apt repos
apt_repository:
repo: ppa:deadsnakes/ppa # For python 3.6
- name: update packages
apt:
update_cache: yes
upgrade: 'yes'
- name: install packages
apt:
update_cache: yes
name:
- cron
- graphviz
- iotop
- dnsmasq
- supervisor
- iputils-ping
- less
- lsof
- psmisc
- screen
- silversearcher-ag
- software-properties-common
- vim
- zstd
- python3.6
- virtualenv
- python3-virtualenv
- name: install extra packages # Configured in hosts file
apt:
name: "{{ extra_packages }}"
when: extra_packages is defined
- name: create log directory
file:
path: /var/log/sol
state: directory
owner: "{{ validator_user }}"
group: "{{ validator_user }}"
- name: configure common /etc
copy:
src: etc/common/
dest: /etc/
- name: configure /etc overrides
when: etc_dir is defined
copy:
src: "etc/{{ etc_dir }}/"
dest: /etc/
- name: evaluate sysctl overrides for udp buffers
shell: sudo sysctl -p /etc/sysctl.d/20-solana-udp-buffers.conf
when: run_validator
- name: evaluate sysctl overrides for udp buffers
shell: sudo sysctl -p /etc/sysctl.d/20-solana-mmaps.conf
when: run_validator
- name: configure supervisord
when: supervisord_conf_file is defined
template:
src: "supervisord/{{ supervisord_conf_file }}"
dest: /etc/supervisor/conf.d/sol.conf
- name: whitelist github ssh host
lineinfile:
regexp: "^github\\.com"
dest: /etc/ssh/ssh_known_hosts
create: yes
state: present
line: "github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ=="
- name: format local disk
filesystem:
dev: "{{ local_disk }}"
fstype: xfs
when: local_disk is defined
- name: mount local disk
mount:
path: /data
src: "{{ local_disk }}"
fstype: xfs
opts: defaults,nofail
state: mounted
when: local_disk is defined
- name: create sol directory on local disk
file:
path: /data/sol
state: directory
owner: "{{ validator_user }}"
group: "{{ validator_user }}"
when: local_disk is defined
- import_playbook: nginx-setup.yaml
when: nginx_sites is defined
- name: update shh keys
hosts: all
remote_user: "{{ validator_user }}"
tags:
- keys
tasks:
- name: install priv key
copy:
src: id_rsa
dest: ~/.ssh/
mode: '600'
- name: install pub key
copy:
src: id_rsa.pub
dest: ~/.ssh/
mode: '644'
- name: update code
hosts: all
remote_user: "{{ validator_user }}"
tags:
- code
tasks:
- name: update git
git:
repo: git@github.com:wireless-table/validators.git
dest: "~/{{ validator_user }}"
version: "{{ commit | default('HEAD')}}"
- import_playbook: health-setup.yaml
- name: update cli
hosts: all
remote_user: ubuntu
become: yes
tags:
- cli
tasks:
- name: install cli
shell: sudo --login -u sol -- bash -c "curl -sSf https://raw.githubusercontent.com/solana-labs/solana/{{ solana_version }}/install/solana-install-init.sh | sh -s {{ solana_version }}"
- hosts: all
remote_user: ubuntu
become: yes
tasks:
- name: update supervisorctl
command: supervisorctl update

18
deploy/supervisord/macros Normal file
View File

@ -0,0 +1,18 @@
{% macro program(name, module) %}
[program:{{ name }}]
environment=PS={{ name }},TZ=UTC
directory=/home/{{ validator_user }}/{{ validator_user }}
startsecs=3
stopwaitsecs=30
user={{ validator_user }}
stopasgroup=true
startretries=100000
autorestart=true
redirect_stderr=true
stdout_logfile_maxbytes=2000000000
stdout_logfile_backups=3
{% for key, value in kwargs.items() %}
{{ key }}={{ value }}
{% endfor %}
{% endmacro %}

View File

@ -0,0 +1,28 @@
{% from 'macros' import program with context %}
[supervisord]
minfds=600000
{% if run_validator %}
{{ program('validator', '') }}
command=/home/sol/sol/sol/api.sh
{% endif %}
{% if is_watchtower is defined %}
{{ program('watchtower', '') }}
command=/home/sol/sol/sol/watchtower.sh
{% endif %}
[program:health_check_server]
command=/home/sol/sol/sol/env/bin/python -m health.main
environment=PS=health_check_server,TZ=UTC
directory=/home/sol/sol/sol
startsecs=3
stopwaitsecs=30
user=sol
stopasgroup=true
startretries=100000
autorestart=true
redirect_stderr=true
stdout_logfile_maxbytes=2000000000
stdout_logfile_backups=3

5
deploy/tail_all_logs.sh Executable file
View File

@ -0,0 +1,5 @@
#!/bin/bash -e
pssh=$(which parallel-ssh || which pssh)
$pssh -h <(ansible all --list-hosts -i hosts.yaml | tail -n+2) -l ubuntu -P -t0 'sudo tail -n0 -qF /var/log/supervisor/*.log'

73
sol/api.sh Executable file
View File

@ -0,0 +1,73 @@
#!/usr/bin/env bash
set -ex
#shellcheck source=/dev/null
#. ~/service-env.sh
PATH=/home/sol/.local/share/solana/install/active_release/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
# Parameters from https://docs.solana.com/clusters#mainnet-beta
ENTRYPOINT=mainnet-beta.solana.com:8001
TRUSTED_VALIDATOR_PUBKEYS=(7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S)
EXPECTED_BANK_HASH=5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d
EXPECTED_GENESIS_HASH=5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d
EXPECTED_SHRED_VERSION=64864
# NOTE: Check if this is reasonable
RPC_HEALTH_CHECK_SLOT_DISTANCE=15
# Delete any zero-length snapshots that can cause validator startup to fail
find /data/sol/ledger/snapshot-* -size 0 -print -exec rm {} \; || true
identity_keypair=~/api-identity.json
if [[ -f $identity_keypair ]]; then
echo 'identity_keypair exists'
else
echo 'generating identity_keypair'
solana-keygen new -o $identity_keypair --no-passphrase
fi
identity_pubkey=$(solana-keygen pubkey $identity_keypair)
trusted_validators=()
for tv in "${TRUSTED_VALIDATOR_PUBKEYS[@]}"; do
[[ $tv = "$identity_pubkey" ]] || trusted_validators+=(--trusted-validator "$tv")
done
if [[ -n "$EXPECTED_BANK_HASH" ]]; then
maybe_expected_bank_hash="--expected-bank-hash $EXPECTED_BANK_HASH"
fi
args=(
--gossip-port 8001
--dynamic-port-range 8002-8012
--entrypoint "${ENTRYPOINT}"
--ledger /data/sol/ledger
--identity "$identity_keypair"
--enable-rpc-transaction-history
--limit-ledger-size 50000000
--cuda
--rpc-port 8899
--private-rpc
--expected-genesis-hash "$EXPECTED_GENESIS_HASH"
--expected-shred-version "$EXPECTED_SHRED_VERSION"
${maybe_expected_bank_hash}
"${trusted_validators[@]}"
--no-untrusted-rpc
--no-voting
--log -
--wal-recovery-mode skip_any_corrupted_record
)
if [[ -n "$RPC_HEALTH_CHECK_SLOT_DISTANCE" ]]; then
args+=(--health-check-slot-distance "$RPC_HEALTH_CHECK_SLOT_DISTANCE")
fi
# Note: can get into a bad state that requires actually fetching a new snapshot. One such error that indicates this:
# "...processing for bank 0 must succeed: FailedToLoadEntries(InvalidShredData(Custom(\"could not reconstruct entries\")))"
if [[ -d /data/sol/ledger ]]; then
args+=(--no-snapshot-fetch)
fi
exec solana-validator "${args[@]}"

View File

@ -0,0 +1 @@
15

3
sol/health/__init__.py Normal file
View File

@ -0,0 +1,3 @@
from gevent import monkey
monkey.patch_all()

115
sol/health/main.py Normal file
View File

@ -0,0 +1,115 @@
import logging
import socket
import traceback
from functools import wraps
from pathlib import Path
from typing import Union, Tuple, Optional
import jsonpickle
import requests
from flask import Flask
from flask import jsonify
from gevent.pywsgi import WSGIServer
app = Flask('health.main')
logger = logging.getLogger('health.main')
PORT = 9090
TRUSTED_VALIDATOR_ENDPOINT = 'http://vip-api.mainnet-beta.solana.com'
LOCAL_VALIDATOR_ENDPOINT = 'http://localhost:8899'
UNHEALTHY_BLOCKHEIGHT_DIFF = 15
DATA_DIR = 'data'
def serve_flask_app(app: Flask, port: int, allow_remote_connections: bool = False,
allow_multiple_listeners: bool = False):
listener: Union[socket.socket, Tuple[str, int]]
hostname = '' if allow_remote_connections else 'localhost'
listener = (hostname, port)
if allow_multiple_listeners:
listener = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
listener.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEPORT, 1)
listener.bind((hostname, port))
listener.listen()
server = WSGIServer(listener, app)
server.serve_forever()
def api_endpoint(f):
@wraps(f)
def wrapped(*args, **kwargs):
try:
result = f(*args, **kwargs)
return jsonify({'status': 'OK',
'result': result})
except Exception as e:
logger.warning('Error in handler %s', f, exc_info=True)
return jsonify({'status': 'Error',
'error': repr(e),
'pickled_exception': jsonpickle.encode(e),
'traceback': traceback.format_exc()}), 500
return wrapped
@app.route('/')
@api_endpoint
def get_status():
return f'Hello from {socket.gethostname()}.'
@app.route('/status')
@api_endpoint
def get_validator_status():
local = get_epoch_info(LOCAL_VALIDATOR_ENDPOINT)['result']['blockHeight']
trusted = get_epoch_info(TRUSTED_VALIDATOR_ENDPOINT)['result']['blockHeight']
return {
'local': local,
'trusted': trusted
}
@app.route('/health')
@api_endpoint
def get_health_status():
local = get_epoch_info(LOCAL_VALIDATOR_ENDPOINT)['result']['blockHeight']
trusted = get_epoch_info(TRUSTED_VALIDATOR_ENDPOINT)['result']['blockHeight']
diff = trusted - local
if diff < 0:
logger.info(f'Local block height is greater than trusted validator. '
f'Current block height: {local}, '
f'Trusted block height: {trusted}')
behind = max(0, diff)
unhealthy_blockheight_diff = load_data_file_locally('unhealthy_block_threshold') or UNHEALTHY_BLOCKHEIGHT_DIFF
if behind > int(unhealthy_blockheight_diff):
raise Exception(f'Local validator is behind trusted validator by more than {unhealthy_blockheight_diff} blocks.')
return {
'local': local,
'trusted': trusted
}
def load_data_file_locally(filename: str, mode='r') -> Optional[str]:
file_path = Path(DATA_DIR) / filename
if file_path.exists():
with file_path.open(mode=mode) as f:
return f.read()
return None
def get_epoch_info(url: str):
res = requests.post(
url,
headers={
'Content-Type': 'application/json'
},
json={"jsonrpc":"2.0", "id":1, "method":"getEpochInfo", "params":[]}
)
res.raise_for_status()
return res.json()
if __name__ == '__main__':
serve_flask_app(
app, PORT, allow_remote_connections=True, allow_multiple_listeners=True
)

33
sol/requirements.txt Normal file
View File

@ -0,0 +1,33 @@
ansible==2.10.0
ansible-base==2.10.1
certifi==2020.6.20
cffi==1.14.3
chardet==3.0.4
click==7.1.2
cryptography==3.1.1
Flask==1.1.2
gevent==20.9.0
gevent-websocket==0.10.1
greenlet==0.4.17
gunicorn==20.0.4
idna==2.10
importlib-metadata==2.0.0
itsdangerous==1.1.0
Jinja2==2.11.2
jsonpickle==1.4.1
MarkupSafe==1.1.1
mypy==0.782
mypy-extensions==0.4.3
packaging==20.4
pycparser==2.20
pyparsing==2.4.7
PyYAML==5.3.1
requests==2.24.0
six==1.15.0
typed-ast==1.4.1
typing-extensions==3.7.4.3
urllib3==1.25.10
Werkzeug==1.0.1
zipp==3.2.0
zope.event==4.5.0
zope.interface==5.1.2

8
sol/solana-sys-tuner.sh Executable file
View File

@ -0,0 +1,8 @@
#!/usr/bin/env bash
set -ex
#shellcheck source=/dev/null
#. /home/sol/service-env.sh
PATH=/home/sol/.local/share/solana/install/active_release/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
exec solana-sys-tuner --user sol

28
sol/watchtower.sh Executable file
View File

@ -0,0 +1,28 @@
#!/usr/bin/env bash
set -ex
#shellcheck source=/dev/null
#. ~/service-env.sh
PATH=/home/sol/.local/share/solana/install/active_release/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
TRUSTED_VALIDATOR_PUBKEYS=(7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S)
VALIDATOR_IDENTITIES=(HiMfCsAvNr5KDaAC4RxzbGtV6TcpeqeTjgNFjCeTHMSw EAqg3S1tHxCmQbwKXFLXBvsWx2Yvh2jyFCqFx5C1s7PM 75Mv8XfC4VxRV7XJ8Ev4DeiJfa2FdbKrAYNc6TUinvkR)
RPC_URL=http://localhost:8899/
args=(
--url "$RPC_URL" \
--monitor-active-stake \
--no-duplicate-notifications \
)
for tv in "${VALIDATOR_IDENTITIES[@]}"; do
args+=(--validator-identity "$tv")
done
if [[ -n $TRANSACTION_NOTIFIER_SLACK_WEBHOOK ]]; then
args+=(--notify-on-transactions)
fi
exec solana-watchtower "${args[@]}"