solana-with-rpc-optimizations/metrics
Illia Bobyr 4f7e45bb24
metrics: Submit metrics when exiting. Refactor `MetricsAgent::run()`. (#718)
There are a few minor issues this change addresses:

1. When we send points to the `MetricsWriter` we are calling
   `Instant::now()` twice, using the first result in the metrics stats,
   and using the seconds value for `last_write_time`.  Yet, on the next
   upload, we would use `last_write_time` as a reference point.

   We upload metrics using a network call, so it is far from
   instantaneous.  This creates a minor discrepancy in our time
   reporting.

   Good news is that we do not really need to call `Instant::now()`
   twice at all, as we can use the same value for both stats and
   `last_write_time`.

2. We did not report metrics stats if we did not have any points
   accumulated.  It seems better to always report metric stats,
   including when no points have been accumulated.  In practice, this
   does not happen for the validator, as validators always report
   something during a 10-second accumulation interval.

3. We did not upload any points when the metrics thread was existing.
   This may cause a short number of metrics not to be reported.

4. `collect_points()` was always converting both `points` and `counters`
   into a vector of `DataPoint`, even if the final length was over the
   specified `max_points`.  In the `mainnet-beta` we have values of up
   to 5m points lost, so it could be a small optimization if we drop
   them sooner.
2024-04-11 22:02:44 -07:00
..
benches fixes errors from clippy::useless_conversion (#29534) 2023-01-05 18:05:32 +00:00
scripts fix node count query (#28259) 2022-10-06 11:39:39 -05:00
src metrics: Submit metrics when exiting. Refactor `MetricsAgent::run()`. (#718) 2024-04-11 22:02:44 -07:00
.gitignore
Cargo.toml sanity check metrics configuration (#32799) 2023-08-11 14:38:33 -07:00
README.md chore(docs): proofreading (#35172) 2024-02-10 17:46:07 -07:00

README.md

image

Metrics

InfluxDB

In order to explore validator specific metrics from mainnet-beta, testnet or devnet you can use Chronograf:

For local cluster deployments you should use:

Public Grafana Dashboards

There are three main public dashboards for cluster related metrics:

For local cluster deployments you should use:

Cluster Telemetry

The cluster telemetry dashboard shows the current state of the cluster:

  1. Cluster Stability
  2. Validator Streamer
  3. Tomer Consensus
  4. IP Network
  5. Snapshots
  6. RPC Send Transaction Service

Fee Market

The fee market dashboard shows:

  1. Total Prioritization Fees
  2. Block Min Prioritization Fees
  3. Cost Tracker Stats

Ping Results

The ping results dashboard displays relevant information about the Ping API