solana/metrics
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
..
scripts change metrics to internal-metrics 2022-01-15 00:08:45 +05:30
src Poh timing service (#23736) 2022-03-30 09:04:49 -05:00
.gitignore
Cargo.toml Bump version to v1.11 (#23807) 2022-03-21 17:40:50 -05:00
README.md Rework cluster metrics dashboard to support the modern clusters 2020-03-11 14:14:56 -07:00
grafcli.conf
publish-metrics-dashboard.sh Rework cluster metrics dashboard to support the modern clusters 2020-03-11 14:14:56 -07:00

README.md

Metrics

Testnet Grafana Dashboard

There are three versions of the testnet dashboard, corresponding to the three release channels:

The dashboard for each channel is defined from the metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json source file in the git branch associated with that channel, and deployed by automation running ci/publish-metrics-dashboard.sh.

A deploy can be triggered at any time via the New Build button of https://buildkite.com/solana-labs/publish-metrics-dashboard.

Modifying a Dashboard

Dashboard updates are accomplished by modifying metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json, manual edits made directly in Grafana will be overwritten.

  • Check out metrics to add at https://metrics.solana.com:8888/ in the data explorer.
  • When editing a query for a dashboard graph, use the "Toggle Edit Mode" selection behind the hamburger button to use raw SQL and copy the query into the text field. You may have to fixup the query with the dashboard variables like $testnet or $timeFilter, check other functioning fields in the dashboard for examples.
  1. Open the desired dashboard in Grafana
  2. Create a development copy of the dashboard by selecting Save As.. in the Settings menu for the dashboard
  3. Edit dashboard as desired
  4. Extract the JSON Model by selecting JSON Model in the Settings menu. Copy the JSON to the clipboard and paste into metrics/scripts/grafana-provisioning/dashboards/cluster-monitor.json,
  5. Delete your development dashboard: Settings => Delete

Deploying a Dashboard Manually

If you need to immediately deploy a dashboard using the contents of cluster-monitor.json in your local workspace,

$ export GRAFANA_API_TOKEN="an API key from https://metrics.solana.com:3000/org/apikeys"
$ metrics/publish-metrics-dashboard.sh (edge|beta|stable)

Note that automation will eventually overwrite your manual deploy.