solana/watchtower/README.md

36 lines
1.2 KiB
Markdown
Raw Normal View History

2019-12-11 16:05:10 -08:00
The `solana-watchtower` program is used to monitor the health of a cluster. It
periodically polls the cluster over an RPC API to confirm that the transaction
count is advancing, new blockhashes are available, and no validators are
2019-12-17 07:57:51 -08:00
delinquent. Results are reported as InfluxDB metrics, with an optional push
notification on sanity failure.
2019-12-11 16:05:10 -08:00
If you only care about the health of one specific validator, the
`--validator-identity` command-line argument can be used to restrict failure
notifications to issues only affecting that validator.
2019-12-11 16:05:10 -08:00
### Metrics
#### `watchtower-sanity`
On every iteration this data point will be emitted indicating the overall result
using a boolean `ok` field.
#### `watchtower-sanity-failure`
On failure this data point contains details about the specific test that failed via
the following fields:
* `test`: name of the sanity test that failed
* `err`: exact sanity failure message
### Sanity failure push notification
2019-12-17 07:57:51 -08:00
To receive a Slack, Discord and/or Telegram notification on sanity failure,
2019-12-16 10:48:56 -08:00
define environment variables before running `solana-watchtower`:
```
export SLACK_WEBHOOK=...
export DISCORD_WEBHOOK=...
```
2019-12-16 10:48:56 -08:00
2019-12-17 07:57:51 -08:00
Telegram requires the following two variables:
2019-12-16 10:48:56 -08:00
```
export TELEGRAM_BOT_TOKEN=...
export TELEGRAM_CHAT_ID=...
```