2019-12-11 16:05:10 -08:00
|
|
|
The `solana-watchtower` program is used to monitor the health of a cluster. It
|
|
|
|
periodically polls the cluster over an RPC API to confirm that the transaction
|
|
|
|
count is advancing, new blockhashes are available, and no validators are
|
2019-12-17 07:57:51 -08:00
|
|
|
delinquent. Results are reported as InfluxDB metrics, with an optional push
|
|
|
|
notification on sanity failure.
|
2019-12-11 16:05:10 -08:00
|
|
|
|
2019-12-16 09:06:08 -08:00
|
|
|
If you only care about the health of one specific validator, the
|
|
|
|
`--validator-identity` command-line argument can be used to restrict failure
|
|
|
|
notifications to issues only affecting that validator.
|
|
|
|
|
2019-12-11 16:05:10 -08:00
|
|
|
### Metrics
|
|
|
|
#### `watchtower-sanity`
|
|
|
|
On every iteration this data point will be emitted indicating the overall result
|
|
|
|
using a boolean `ok` field.
|
|
|
|
|
|
|
|
#### `watchtower-sanity-failure`
|
|
|
|
On failure this data point contains details about the specific test that failed via
|
|
|
|
the following fields:
|
|
|
|
* `test`: name of the sanity test that failed
|
|
|
|
* `err`: exact sanity failure message
|
|
|
|
|
2019-12-12 23:49:16 -08:00
|
|
|
|
|
|
|
### Sanity failure push notification
|
2019-12-17 07:57:51 -08:00
|
|
|
To receive a Slack, Discord and/or Telegram notification on sanity failure,
|
2019-12-16 10:48:56 -08:00
|
|
|
define environment variables before running `solana-watchtower`:
|
2019-12-12 23:49:16 -08:00
|
|
|
```
|
|
|
|
export SLACK_WEBHOOK=...
|
|
|
|
export DISCORD_WEBHOOK=...
|
|
|
|
```
|
2019-12-16 10:48:56 -08:00
|
|
|
|
2019-12-17 07:57:51 -08:00
|
|
|
Telegram requires the following two variables:
|
2019-12-16 10:48:56 -08:00
|
|
|
```
|
|
|
|
export TELEGRAM_BOT_TOKEN=...
|
|
|
|
export TELEGRAM_CHAT_ID=...
|
|
|
|
```
|