238 lines
9.2 KiB
Markdown
238 lines
9.2 KiB
Markdown
---
|
|
title: Solana Validator Operations Best Practices
|
|
sidebar_label: General Operations
|
|
pagination_label: "Best Practices: Validator Operations"
|
|
---
|
|
|
|
After you have successfully setup and started a
|
|
[validator on testnet](../setup-a-validator.md) (or another cluster
|
|
of your choice), you will want to become familiar with how to operate your
|
|
validator on a day-to-day basis. During daily operations, you will be
|
|
[monitoring your server](./monitoring.md), updating software regularly (both the
|
|
Solana validator software and operating system packages), and managing your vote
|
|
account and identity account.
|
|
|
|
All of these skills are critical to practice. Maximizing your validator uptime
|
|
is an important part of being a good operator.
|
|
|
|
## Educational Workshops
|
|
|
|
The Solana validator community holds regular educational workshops. You can
|
|
watch past workshops through the
|
|
[Solana validator educational workshops playlist](https://www.youtube.com/watch?v=86zySQ5vGW8&list=PLilwLeBwGuK6jKrmn7KOkxRxS9tvbRa5p).
|
|
|
|
## Help with the validator command line
|
|
|
|
From within the Solana CLI, you can execute the `agave-validator` command with
|
|
the `--help` flag to get a better understanding of the flags and sub commands
|
|
available.
|
|
|
|
```
|
|
agave-validator --help
|
|
```
|
|
|
|
## Restarting your validator
|
|
|
|
There are many operational reasons you may want to restart your validator. As a
|
|
best practice, you should avoid a restart during a leader slot. A
|
|
[leader slot](https://solana.com/docs/terminology#leader-schedule) is the time
|
|
when your validator is expected to produce blocks. For the health of the cluster
|
|
and also for your validator's ability to earn transaction fee rewards, you do
|
|
not want your validator to be offline during an opportunity to produce blocks.
|
|
|
|
To see the full leader schedule for an epoch, use the following command:
|
|
|
|
```
|
|
solana leader-schedule
|
|
```
|
|
|
|
Based on the current slot and the leader schedule, you can calculate open time
|
|
windows where your validator is not expected to produce blocks.
|
|
|
|
Assuming you are ready to restart, you may use the `agave-validator exit`
|
|
command. The command exits your validator process when an appropriate idle time
|
|
window is reached. Assuming that you have systemd implemented for your validator
|
|
process, the validator should restart automatically after the exit. See the
|
|
below help command for details:
|
|
|
|
```
|
|
agave-validator exit --help
|
|
```
|
|
|
|
## Upgrading
|
|
|
|
There are many ways to upgrade the
|
|
[Solana CLI software](../../cli/install.md). As an operator, you
|
|
will need to upgrade often, so it is important to get comfortable with this
|
|
process.
|
|
|
|
> **Note** validator nodes do not need to be offline while the newest version is
|
|
> being downloaded or built from source. All methods below can be done before
|
|
> the validator process is restarted.
|
|
|
|
### Building From Source
|
|
|
|
It is a best practice to always build your Solana binaries from source. If you
|
|
build from source, you are certain that the code you are building has not been
|
|
tampered with before the binary was created. You may also be able to optimize
|
|
your `agave-validator` binary to your specific hardware.
|
|
|
|
If you build from source on the validator machine (or a machine with the same
|
|
CPU), you can target your specific architecture using the `-march` flag. Refer
|
|
to the following doc for
|
|
[instructions on building from source](../../cli/install.md#build-from-source).
|
|
|
|
### agave-install
|
|
|
|
If you are not comfortable building from source, or you need to quickly install
|
|
a new version to test something out, you could instead try using the
|
|
`agave-install` command.
|
|
|
|
Assuming you want to install Solana version `1.14.17`, you would execute the
|
|
following:
|
|
|
|
```
|
|
agave-install init 1.14.17
|
|
```
|
|
|
|
This command downloads the executable for `1.14.17` and installs it into a
|
|
`.local` directory. You can also look at `agave-install --help` for more
|
|
options.
|
|
|
|
> **Note** this command only works if you already have the solana cli installed.
|
|
> If you do not have the cli installed, refer to
|
|
> [install solana cli tools](../../cli/install.md)
|
|
|
|
### Restart
|
|
|
|
For all install methods, the validator process will need to be restarted before
|
|
the newly installed version is in use. Use `agave-validator exit` to restart
|
|
your validator process.
|
|
|
|
### Verifying version
|
|
|
|
The best way to verify that your validator process has changed to the desired
|
|
version is to grep the logs after a restart. The following grep command should
|
|
show you the version that your validator restarted with:
|
|
|
|
```
|
|
grep -B1 'Starting validator with' <path/to/logfile>
|
|
```
|
|
|
|
## Snapshots
|
|
|
|
Validators operators who have not experienced significant downtime (multiple
|
|
hours of downtime), should avoid downloading snapshots. It is important for the
|
|
health of the cluster as well as your validator history to maintain the local
|
|
ledger. Therefore, you should not download a new snapshot any time your
|
|
validator is offline or experiences an issue. Downloading a snapshot should only
|
|
be reserved for occasions when you do not have local state. Prolonged downtime
|
|
or the first install of a new validator are examples of times when you may not
|
|
have state locally. In other cases such as restarts for upgrades, a snapshot
|
|
download should be avoided.
|
|
|
|
To avoid downloading a snapshot on restart, add the following flag to the
|
|
`agave-validator` command:
|
|
|
|
```
|
|
--no-snapshot-fetch
|
|
```
|
|
|
|
If you use this flag with the `agave-validator` command, make sure that you run
|
|
`solana catchup <pubkey>` after your validator starts to make sure that the
|
|
validator is catching up in a reasonable time. After some time (potentially a
|
|
few hours), if it appears that your validator continues to fall behind, then you
|
|
may have to download a new snapshot.
|
|
|
|
### Downloading Snapshots
|
|
|
|
If you are starting a validator for the first time, or your validator has fallen
|
|
too far behind after a restart, then you may have to download a snapshot.
|
|
|
|
To download a snapshot, you must **_NOT_** use the `--no-snapshot-fetch` flag.
|
|
Without the flag, your validator will automatically download a snapshot from
|
|
your known validators that you specified with the `--known-validator` flag.
|
|
|
|
If one of the known validators is downloading slowly, you can try adding the
|
|
`--minimal-snapshot-download-speed` flag to your validator. This flag will
|
|
switch to another known validator if the initial download speed is below the
|
|
threshold that you set.
|
|
|
|
### Manually Downloading Snapshots
|
|
|
|
In the case that there are network troubles with one or more of your known
|
|
validators, then you may have to manually download the snapshot. To manually
|
|
download a snapshot from one of your known validators, first, find the IP
|
|
address of the validator in using the `solana gossip` command. In the example
|
|
below, `5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on` is the pubkey of one of my
|
|
known validators:
|
|
|
|
```
|
|
solana gossip | grep 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on
|
|
```
|
|
|
|
The IP address of the validators is `139.178.68.207` and the open port on this
|
|
validator is `80`. You can see the IP address and port in the fifth column in
|
|
the gossip output:
|
|
|
|
```
|
|
139.178.68.207 | 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on | 8001 | 8004 | 139.178.68.207:80 | 1.10.27 | 1425680972
|
|
```
|
|
|
|
Now that the IP and port are known, you can download a full snapshot or an
|
|
incremental snapshot:
|
|
|
|
```
|
|
wget --trust-server-names http://139.178.68.207:80/snapshot.tar.bz2
|
|
wget --trust-server-names http://139.178.68.207:80/incremental-snapshot.tar.bz2
|
|
```
|
|
|
|
Now move those files into your snapshot directory. If you have not specified a
|
|
snapshot directory, then you should put the files in your ledger directory.
|
|
|
|
Once you have a local snapshot, you can restart your validator with the
|
|
`--no-snapshot-fetch` flag.
|
|
|
|
## Regularly Check Account Balances
|
|
|
|
It is important that you do not accidentally run out of funds in your identity
|
|
account, as your node will stop voting. It is also important to note that this
|
|
account keypair is the most vulnerable of the three keypairs in a vote account
|
|
because the keypair for the identity account is stored on your validator when
|
|
running the `agave-validator` software. How much SOL you should store there is
|
|
up to you. As a best practice, make sure to check the account regularly and
|
|
refill or deduct from it as needed. To check the account balance do:
|
|
|
|
```
|
|
solana balance validator-keypair.json
|
|
```
|
|
|
|
> **Note** `agave-watchtower` can monitor for a minimum validator identity
|
|
> balance. See [monitoring best practices](./monitoring.md) for details.
|
|
|
|
## Withdrawing From The Vote Account
|
|
|
|
As a reminder, your withdrawer's keypair should **_NEVER_** be stored on your
|
|
server. It should be stored on a hardware wallet, paper wallet, or multisig
|
|
mitigates the risk of hacking and theft of funds.
|
|
|
|
To withdraw your funds from your vote account, you will need to run
|
|
`solana withdraw-from-vote-account` on a trusted computer. For example, on a
|
|
trusted computer, you could withdraw all of the funds from your vote account
|
|
(excluding the rent exempt minimum). The below example assumes you have a
|
|
separate keypair to store your funds called `person-keypair.json`
|
|
|
|
```
|
|
solana withdraw-from-vote-account \
|
|
vote-account-keypair.json \
|
|
person-keypair.json ALL \
|
|
--authorized-withdrawer authorized-withdrawer-keypair.json
|
|
```
|
|
|
|
To get more information on the command, use
|
|
`solana withdraw-from-vote-account --help`.
|
|
|
|
For a more detailed explanation of the different keypairs and other related
|
|
operations refer to
|
|
[vote account management](../guides/vote-accounts.md).
|