solana

History

Yihau Chen 021d2cdb71 fix: metrics deploy script (#32074 ) fix: cert path		2023-06-12 22:15:09 +08:00
..
README.md	update influx enterprise scripts (#31117 )	2023-04-10 09:10:54 -05:00
alertmanager-discord.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
alertmanager.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
alertmanager.yml	…
chronograf.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
chronograf_8889.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
first_rules.yml	…
grafana-metrics.solana.com.ini	…
grafana.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
host.sh	…
kapacitor.conf	ci: update kapacitor config (#32069 )	2023-06-12 20:23:44 +08:00
kapacitor.sh	ci: update metrics related deploying code (#32072 )	2023-06-12 21:44:30 +08:00
prometheus.sh	increase docker mem allocation (#31197 )	2023-04-14 03:06:23 -05:00
prometheus.yml	fix prometheus path reference (#32003 )	2023-06-07 02:56:55 +00:00
start.sh	fix: metrics deploy script (#32074 )	2023-06-12 22:15:09 +08:00
status.sh	ci: update metrics related deploying code (#32072 )	2023-06-12 21:44:30 +08:00

README.md

Services:

Prometheus
AlertManager
Chronograf (on port 8888)
Chronograf_8889 (on port 8889)
Grafana (on port 3000)
AlertManager_Discord
Kapacitor

To install all the services on the metrics-main server you need to run the start.sh script.

Install the Buildkite-agent to run the status.sh script to periodically check for the status of the containers.

If any of the containers is not in running state or in exited state then it will try to redeploy the container, if it fails to do so an alert will be triggered to Discord and PagerDuty.

Note: If you deleted or removed any of containers manually you need to run the start.sh script.