sbv2-solana/website/docs-1/oracle/monitor.mdx

88 lines
2.9 KiB
Plaintext

---
sidebar_position: 70
---
# Monitoring
## Metrics
The oracle uses open telemetry for emitting metrics.
As such, at the time of this writing, the oracle supports prometheus (as well as any wire-compatible solution like cortex, thanos or victoria metrics) and GCP monitoring.
This can be configured via the `$METRICS_EXPORTER` environmental variable ('prometheus' and 'gcp')
If GCP monitoring is used, all data collection and visualization is handled by the platform although users are advised to closely watch billing.
For users that are either not running on GCP or want a more cost effective solution, grafana and victoria metrics (prometheus compatible scraping+storage) manifests are provided in the kubernetes-manifests repo.
<details open>
<summary>List of metrics</summary>
<table>
<tr>
<th>Metric Name</th>
<th>Description</th>
</tr>
<tr>
<td>switchboard_aggregator_variance</td>
<td>
the ratio between the largest and smallest value from all aggregated
responses for a given job
</td>
</tr>
<tr>
<td>switchboard_heartbeat_failure_total</td>
<td>a counter for every time there is a heartbeat failure</td>
</tr>
<tr>
<td>switchboard_job_failure_total</td>
<td>a counter for every time there is a job failure</td>
</tr>
<tr>
<td>switchboard_job_success_total</td>
<td>
a counter for every time there is a successful completion of a job
</td>
</tr>
<tr>
<td>switchboard_job_posting_total</td>
<td>a counter for every time there is a job posting</td>
</tr>
<tr>
<td>switchboard_log_age</td>
<td>a value recorder for the age of the job logs when processed</td>
</tr>
<tr>
<td>switchboard_node_aggregation_insufficient_responses_total</td>
<td>
a counter for every time there is an insufficient number of aggregated
responses for a job
</td>
</tr>
<tr>
<td>switchboard_node_balance</td>
<td>
each oracle will report its balance with a label indicating the public
key
</td>
</tr>
<tr>
<td>switchboard_save_result_failure_total</td>
<td>
a counter for the number of times an oracle is unable to commit a
transaction to the blockchain
</td>
</tr>
<tr>
<td>scheduler_node_balance</td>
<td>most recent balance of the scheduler wallet</td>
</tr>
</table>
</details>
## Alerts
Pagerduty allows you to get real time alerts on your oracle. You will need to sign up for an account and get an API key for access.
`$PAGERDUTY_EVENT_KEY` is an optional environment variable to help you manage your cluster. You may wish to ignore this variable if you are comfortable with your own monitoring solutions
- [Pagerduty - Generating API Keys](https://support.pagerduty.com/docs/generating-api-keys#section-events-api-keys)