Initial commit

Co-authored-by: Leopold Schabel <leo@certus.one>
This commit is contained in:
Hendrik Hofstadt 2018-09-25 22:12:17 +02:00 committed by Leo
commit c6e8970efe
37 changed files with 26500 additions and 0 deletions

2
.gitignore vendored Normal file
View File

@ -0,0 +1,2 @@
*.iml
TODO

102
README.md Normal file
View File

@ -0,0 +1,102 @@
# testnet_deploy
This repo deploys a full Cosmos SDK testnet plus monitoring on a
OpenShift Origin/okd.io Kubernetes cluster.
Requirements:
- CentOS => 7.5
- OpenShift Origin == 3.9
## Introduction
We recorded this video to guide you through the (one-click) setup of your own fully monitored Cosmos network and explain how the snippets and monitoring systems can be used.
[Watch the video here](https://www.useloom.com/share/c281221bcfb04e4798659618eb15ac88)
Also don't forget our validator knowledge base with important information about operations and monitoring.
[Knowledgebase](https://kb.certus.one/)
The `gaia_exporter`, `net_exporter` and alerting tools are built from the [chain_exporter](https://github.com/certusone/chain_exporter) repo.
Please take usage instructions from the deployment scripts and commandline output.
## Deploying an OpenShift Origin Cluster
Deploy an OpenShift Origin 3.9 cluster on CentOS 7:
yum -y install git docker tcpdump bridge-utils vim centos-release-openshift-origin39 epel-release
yum -y install origin origin-clients htop
cat <<EOF > /etc/sysconfig/docker
OPTIONS="--log-driver=journald --insecure-registry 172.30.0.0/16 --signature-verification=false"
EOF
systemctl enable docker
systemctl start docker
git clone https://github.com/openshift-evangelists/oc-cluster-wrapper
cat <<EOF >> ~/.bash_profile
export PATH=~/oc-cluster-wrapper:\$PATH
export OC_CLUSTER_PUBLIC_HOSTNAME=$(hostname -f)
export OC_CLUSTER_ROUTING_SUFFIX=apps.$(hostname --ip-address).nip.io
EOF
~/oc-cluster-wrapper/oc-cluster completion bash > /etc/bash_completion.d/oc-cluster.bash
Re-login once youre done to make the auto-completion work. This is a non-production deployment
of OpenShift and you can login via admin/admin. If you're running this on
a publicly reachable host, make sure to properly configure your firewall to prevent
the infamous Kubernetes Bitcoin mining botnet from assimilating your cluster:
Configure firewalld:
yum -y install firewalld
systemctl start firewalld
systemctl enable firewalld
firewall-cmd --permanent --new-zone admin
firewall-cmd --permanent --add-source=your_public_ip_to_whitelist/32 --zone=admin
firewall-cmd --permanent --add-port=8443/tcp --zone=admin
firewall-cmd --permanent --add-port=443/tcp --zone=admin
firewall-cmd --permanent --new-zone dockerc
firewall-cmd --permanent --zone dockerc --add-source 172.17.0.0/16
firewall-cmd --permanent --zone dockerc --add-port 8443/tcp
firewall-cmd --permanent --zone dockerc --add-port 53/udp
firewall-cmd --permanent --zone dockerc --add-port 8053/udp
firewall-cmd --permanent --add-masquerade --zone=public
firewall-cmd --reload
Finally, boot up your cluster:
oc-cluster up
You can now log into the web application using developer or admin/admin
(`https://<hostname>:8443`), or log in using the CLI:
oc login https://<hostname>:8443
(the admin user is cluster administrator, whereas the developer user isnt)
## Deploy our testnet
For Sentry alerts to work set the following variables:
`monitoring/exporter/alerter.yml`: Replace `<INSERT_RAVEN_DSN>` with the RAVEN_DSN URL of your (self-)hosted Sentry instance.
If you want alerts from your alertmanager:
`monitoring/prometheus/prometheus.yml`: Modify the alertmanager config according to [the Prometheus docs](https://prometheus.io/docs/alerting/configuration/)
This deploys our testnet:
./deploy_testnet.sh
This deploys everything, including our monitoring stack:
./deploy_all.sh

13
ansible/.s2i/bin/assemble Normal file
View File

@ -0,0 +1,13 @@
#!/bin/bash
set -e
shopt -s dotglob
# Copy source code to /opt/app-root/src
echo "---> Copy application source ..."
# Remove source code from previous s2i stage
rm -rf ./*
mv /tmp/src/* ./
#echo "---> Installing OpenShift Python client for Ansible"
# TODO: we're currently doing this in the Dockerfile to speed up repeated builds
#pip install --user openshift

4
ansible/.s2i/bin/run Normal file
View File

@ -0,0 +1,4 @@
#!/bin/bash
set -euo pipefail
ansible-playbook -c local -i localhost, playbook.yaml

131
ansible/playbook.yaml Normal file
View File

@ -0,0 +1,131 @@
---
- hosts: localhost
vars:
gaia_namespace: "{{ ansible_env.OPENSHIFT_BUILD_NAMESPACE }}"
gaia_num_nodes: 4
state_dir: /opt/app-root/state
tasks:
- name: Delete existing testnet folder
file:
path: "{{ state_dir }}/testnet"
state: absent
- name: Create testnet files
command: >
/opt/app-root/go/bin/gaiad testnet
--v {{ gaia_num_nodes }}
--starting-ip-address 10.33.33.0
-o {{ state_dir }}/testnet
- name: Replace persistent peers
shell: >
sed -i 's/10\.33\.33\./gaia-node/g'
{{ state_dir }}/testnet/node*/gaiad/config/config.toml
args:
warn: no
- name: config.toml overrides
shell:
sed -i 's/^{{ item.key }} = .*/{{ item.key }} = {{ item.value }}/'
{{ state_dir }}/testnet/node*/gaiad/config/config.toml
args:
warn: no
with_dict:
# produce blocks as fast as we can
skip_timeout_commit: 'true'
# accept private IP ranges
addr_book_strict: 'false'
# calculated max peers
max_num_peers: '{{ 50 + gaia_num_nodes }}'
# we can later auto-configure a Prometheus server via annotations
prometheus: 'true'
# Headless service - we need forward DNS!
- name: Create node services
k8s:
state: present
definition:
apiVersion: v1
kind: Service
metadata:
labels:
app: gaia
template: gaia-nodes
namespace: "{{ gaia_namespace }}"
name: gaia-node{{ item }}
spec:
clusterIP: None
ports:
- name: gaiad-peer
port: 26656
protocol: TCP
targetPort: 26656
- name: gaiad-rpc
port: 26657
protocol: TCP
targetPort: 26657
- name: gaiad-metrics
port: 26660
protocol: TCP
targetPort: 26660
selector:
name: gaia-node{{ item }}
sessionAffinity: None
type: ClusterIP
with_sequence: start=0 count={{ gaia_num_nodes }}
- name: Create RPC route for node1
k8s:
state: present
definition:
apiVersion: v1
kind: Route
metadata:
name: gaia-rpc
namespace: "{{ gaia_namespace }}"
spec:
port:
targetPort: gaiad-rpc
tls:
termination: edge
to:
kind: Service
name: gaia-node1
weight: 100
wildcardPolicy: None
- name: Start gaiad pods
k8s:
state: present
definition:
apiVersion: v1
kind: Pod
metadata:
labels:
app: gaia
template: gaia-nodes
name: gaia-node{{ item }}
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '26660'
namespace: "{{ gaia_namespace }}"
name: gaia-node{{ item }}
spec:
volumes:
- name: state
persistentVolumeClaim:
claimName: gaia-ansible-state
containers:
- name: gaia-node{{ item }}
image: gaiad:latest
volumeMounts:
- mountPath: /opt/app-root/state
name: state
command:
- /opt/app-root/go/bin/gaiad
- start
- --home
- /opt/app-root/state/testnet/node{{ item }}/gaiad
with_sequence: start=0 count={{ gaia_num_nodes}}

7
deploy_all.sh Executable file
View File

@ -0,0 +1,7 @@
#! /bin/sh
./deploy_testnet.sh
./monitoring/exporter/deploy.sh
./monitoring/prometheus/deploy.sh
./monitoring/grafana/deploy.sh

20
deploy_testnet.sh Executable file
View File

@ -0,0 +1,20 @@
#! /bin/sh
# Necessary so we can run as uid 1001 (would have to inject nss_wrapper
# everywhere and don't really feel like doing so)
oc adm policy add-scc-to-user anyuid -z gaia-ansible
# Build golang-s2i base image
oc process -f openshift/golang-s2i.yaml | oc apply -f -
# Build gaiad image
oc process -f openshift/gaiad.yaml | oc apply -f -
# Deploy
oc process -f openshift/gaia-ansible.yaml | oc apply -f -
oc start-build gaia-ansible -w
oc delete pod,service,route -l template=gaia-nodes; \
oc delete job gaia-deploy; \
oc apply -f openshift/deploy.yaml && \
while ! oc logs -f jobs/gaia-deploy; do sleep 1; done

View File

@ -0,0 +1,62 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia-alerter
template: gaia-alerter
metadata:
name: phabricator
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: gaia-alerter
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/certusone/chain_exporter
objects:
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- name: gaia-alerter
image: gaia-exporter:latest
command:
- /opt/app-root/go/bin/alerter
env:
- name: "DB_HOST"
value: "postgres-chain:5432"
- name: "DB_USER"
value: "postgres"
- name: "DB_PW"
value: "mypwd"
- name: "RAVEN_DSN"
value: "<INSERT_RAVEN_DSN>"
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: gaia-exporter:latest
type: ImageChange
- type: ConfigChange

8
monitoring/exporter/deploy.sh Executable file
View File

@ -0,0 +1,8 @@
#! /bin/sh
cd $(dirname $0)
oc process -f postgres.yml | oc apply -f -
oc process -f lcd.yml | oc apply -f -
oc process -f exporter.yml | oc apply -f -
oc process -f alerter.yml | oc apply -f -
oc process -f net_exporter.yml | oc apply -f -

View File

@ -0,0 +1,102 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia-exporter
template: gaia-exporter
metadata:
name: phabricator
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: gaia-exporter
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/certusone/chain_exporter
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: ${NAME}
spec:
# this allows k8s objects to directly reference this image stream
lookupPolicy:
local: true
- apiVersion: v1
kind: BuildConfig
metadata:
name: ${NAME}
spec:
output:
to:
kind: ImageStreamTag
name: ${NAME}:latest
postCommit: {}
runPolicy: Serial
source:
git:
uri: ${GIT_REPO}
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: golang-s2i:1.10
env:
- name: S2I_GOPKG
value: github.com/certusone/chain_exporter
type: Source
triggers:
- imageChange: {}
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- name: gaia-exporter
image: gaia-exporter:latest
command:
- /opt/app-root/go/bin/chain_exporter
env:
- name: "GAIA_URL"
value: "http://gaia-node1:26657"
- name: "DB_HOST"
value: "postgres-chain:5432"
- name: "DB_USER"
value: "postgres"
- name: "DB_PW"
value: "mypwd"
- name: "LCD_URL"
value: "https://gaia-lcd:1317"
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: gaia-exporter:latest
type: ImageChange
- type: ConfigChange

View File

@ -0,0 +1,79 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia-lcd
template: gaia-lcd
metadata:
name: gaia-lcd
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: gaia-lcd
objects:
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- name: gaia-lcd
image: gaiad:latest
command:
- /opt/app-root/go/bin/gaiacli
- rest-server
- "--node=tcp://gaia-node1:26657"
- "--trust-node"
- "--laddr=tcp://:1317"
ports:
- containerPort: 1317
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: gaiad:latest
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: Service
metadata:
name: ${NAME}
spec:
ports:
- name: lcd
port: 1317
selector:
name: ${NAME}
- apiVersion: v1
kind: Route
metadata:
name: gaia-lcd
spec:
port:
targetPort: lcd
tls:
termination: edge
to:
kind: Service
name: gaia-lcd
weight: 100
wildcardPolicy: None

View File

@ -0,0 +1,64 @@
---
apiVersion: v1
kind: Template
labels:
app: net-exporter
template: net-exporter
metadata:
name: phabricator
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: net-exporter
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/certusone/chain_exporter
objects:
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- name: net-exporter
image: gaia-exporter:latest
command:
- /opt/app-root/go/bin/net_exporter
env:
- name: "GAIA_URLS"
value: "http://gaia-node0:26657,http://gaia-node1:26657,http://gaia-node2:26657,http://gaia-node3:26657"
- name: "DB_HOST"
value: "postgres-chain:5432"
- name: "DB_USER"
value: "postgres"
- name: "DB_PW"
value: "mypwd"
- name: "PERIOD"
value: "10"
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: gaia-exporter:latest
type: ImageChange
- type: ConfigChange

View File

@ -0,0 +1,82 @@
---
apiVersion: v1
kind: Template
labels:
app: postgres-chain
template: postgres-chain
parameters:
- displayName: Name
name: NAME
required: true
value: postgres-chain
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: postgres
spec:
lookupPolicy:
local: true
tags:
- from:
kind: DockerImage
name: centos/postgresql-96-centos7:latest
name: latest
referencePolicy:
type: Source
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- image: ' '
imagePullPolicy: IfNotPresent
name: ${NAME}
ports:
- containerPort: 5432
volumeMounts:
- mountPath: /var/lib/postgresql/data
name: ${NAME}-data
env:
- name: POSTGRESQL_ADMIN_PASSWORD
value: "mypwd"
volumes:
- name: ${NAME}-data
emptyDir: {}
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: postgres:latest
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: Service
metadata:
name: ${NAME}
spec:
ports:
- name: postgres
port: 5432
selector:
name: ${NAME}

View File

@ -0,0 +1,11 @@
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 3 #how often Grafana will scan for changed dashboards
options:
path: /var/lib/grafana/dashboards

View File

@ -0,0 +1,652 @@
{
"__inputs": [
{
"name": "DS_MAIN",
"label": "main",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "5.3.0-pre1"
},
{
"type": "panel",
"id": "graph",
"name": "Graph",
"version": "5.0.0"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "5.0.0"
},
{
"type": "panel",
"id": "singlestat",
"name": "Singlestat",
"version": "5.0.0"
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": 5345,
"graphTooltip": 0,
"id": null,
"iteration": 1537853985914,
"links": [],
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 15,
"panels": [],
"repeat": "targets",
"title": "$targets UP/DOWN Status",
"type": "row"
},
{
"cacheTimeout": null,
"colorBackground": true,
"colorValue": false,
"colors": [
"#d44a3a",
"rgba(237, 129, 40, 0.89)",
"#299c46"
],
"datasource": "main",
"format": "none",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 2,
"w": 24,
"x": 0,
"y": 1
},
"id": 2,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"minSpan": 3,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"repeat": null,
"repeatDirection": "h",
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "probe_success{instance=~\"$targets\"}",
"format": "time_series",
"interval": "$interval",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": "1,1",
"title": "$targets",
"type": "singlestat",
"valueFontSize": "80%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
},
{
"op": "=",
"text": "UP",
"value": "1"
},
{
"op": "=",
"text": "DOWN",
"value": "0"
}
],
"valueName": "current"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "main",
"fill": 1,
"gridPos": {
"h": 6,
"w": 12,
"x": 0,
"y": 3
},
"id": 17,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "probe_duration_seconds{instance=~\"$targets\"}",
"format": "time_series",
"interval": "$interval",
"intervalFactor": 1,
"legendFormat": "seconds",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Probe Duration",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "main",
"fill": 1,
"gridPos": {
"h": 6,
"w": 12,
"x": 12,
"y": 3
},
"id": 21,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "probe_dns_lookup_time_seconds{instance=~\"$targets\"}",
"format": "time_series",
"interval": "$interval",
"intervalFactor": 1,
"legendFormat": "seconds",
"refId": "A"
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "DNS Lookup",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "s",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "main",
"format": "s",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 2,
"w": 12,
"x": 0,
"y": 9
},
"id": 23,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "avg(probe_duration_seconds{instance=~\"$targets\"})",
"format": "time_series",
"interval": "$interval",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": "",
"title": "Average Probe Duration",
"type": "singlestat",
"valueFontSize": "50%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
},
{
"cacheTimeout": null,
"colorBackground": false,
"colorValue": false,
"colors": [
"#299c46",
"rgba(237, 129, 40, 0.89)",
"#d44a3a"
],
"datasource": "main",
"format": "s",
"gauge": {
"maxValue": 100,
"minValue": 0,
"show": false,
"thresholdLabels": false,
"thresholdMarkers": true
},
"gridPos": {
"h": 2,
"w": 12,
"x": 12,
"y": 9
},
"id": 24,
"interval": null,
"links": [],
"mappingType": 1,
"mappingTypes": [
{
"name": "value to text",
"value": 1
},
{
"name": "range to text",
"value": 2
}
],
"maxDataPoints": 100,
"nullPointMode": "connected",
"nullText": null,
"postfix": "",
"postfixFontSize": "50%",
"prefix": "",
"prefixFontSize": "50%",
"rangeMaps": [
{
"from": "null",
"text": "N/A",
"to": "null"
}
],
"sparkline": {
"fillColor": "rgba(31, 118, 189, 0.18)",
"full": false,
"lineColor": "rgb(31, 120, 193)",
"show": false
},
"tableColumn": "",
"targets": [
{
"expr": "avg(probe_dns_lookup_time_seconds{instance=~\"$targets\"})",
"format": "time_series",
"interval": "$interval",
"intervalFactor": 1,
"refId": "A"
}
],
"thresholds": "",
"title": "Average DNS Lookup",
"type": "singlestat",
"valueFontSize": "50%",
"valueMaps": [
{
"op": "=",
"text": "N/A",
"value": "null"
}
],
"valueName": "current"
}
],
"refresh": "1m",
"schemaVersion": 16,
"style": "dark",
"tags": [
"blackbox",
"prometheus"
],
"templating": {
"list": [
{
"auto": true,
"auto_count": 10,
"auto_min": "10s",
"current": {
"text": "auto",
"value": "$__auto_interval_interval"
},
"hide": 0,
"label": "Interval",
"name": "interval",
"options": [
{
"selected": true,
"text": "auto",
"value": "$__auto_interval_interval"
},
{
"selected": false,
"text": "5s",
"value": "5s"
},
{
"selected": false,
"text": "10s",
"value": "10s"
},
{
"selected": false,
"text": "30s",
"value": "30s"
},
{
"selected": false,
"text": "1m",
"value": "1m"
},
{
"selected": false,
"text": "10m",
"value": "10m"
},
{
"selected": false,
"text": "30m",
"value": "30m"
},
{
"selected": false,
"text": "1h",
"value": "1h"
},
{
"selected": false,
"text": "6h",
"value": "6h"
},
{
"selected": false,
"text": "12h",
"value": "12h"
},
{
"selected": false,
"text": "1d",
"value": "1d"
},
{
"selected": false,
"text": "7d",
"value": "7d"
},
{
"selected": false,
"text": "14d",
"value": "14d"
},
{
"selected": false,
"text": "30d",
"value": "30d"
}
],
"query": "5s,10s,30s,1m,10m,30m,1h,6h,12h,1d,7d,14d,30d",
"refresh": 2,
"skipUrlSync": false,
"type": "interval"
},
{
"allValue": null,
"current": {},
"datasource": "main",
"hide": 0,
"includeAll": true,
"label": null,
"multi": true,
"name": "targets",
"options": [],
"query": "label_values(probe_success, instance)",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Blackbox Exporter Overview",
"uid": "xtkCtBkiz",
"version": 2
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,489 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": "-- Grafana --",
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"id": 5,
"iteration": 1537897365846,
"links": [],
"panels": [
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "postgres",
"fill": 1,
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 0
},
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "SELECT timestamp as time,COUNT(node) FROM peer_infos WHERE is_outbound = TRUE GROUP BY timestamp ORDER BY timestamp DESC",
"format": "time_series",
"group": [
{
"params": [
"$__interval",
"none"
],
"type": "time"
}
],
"intervalFactor": 1,
"metricColumn": "none",
"rawQuery": true,
"rawSql": "SELECT\n $__timeGroupAlias(\"timestamp\",$__interval),\n count(id) AS \"outbound\"\nFROM peer_infos\nWHERE\n $__timeFilter(\"timestamp\")\n and is_outbound = TRUE\n and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
"refId": "A",
"select": [
[
{
"params": [
"id"
],
"type": "column"
},
{
"params": [
"count"
],
"type": "aggregate"
},
{
"params": [
"inbound"
],
"type": "alias"
}
]
],
"table": "peer_infos",
"timeColumn": "\"timestamp\"",
"timeColumnType": "timestamptz",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
},
{
"datatype": "bool",
"name": "",
"params": [
"is_outbound",
"=",
"FALSE"
],
"type": "expression"
},
{
"datatype": "text",
"name": "",
"params": [
"node",
"=",
"'$node'"
],
"type": "expression"
}
]
},
{
"format": "time_series",
"group": [],
"metricColumn": "none",
"rawQuery": true,
"rawSql": "SELECT\n $__timeGroupAlias(\"timestamp\",$__interval),\n count(id) AS \"inbound\"\nFROM peer_infos\nWHERE\n $__timeFilter(\"timestamp\")\n and is_outbound IS NULL\n and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
"refId": "B",
"select": [
[
{
"params": [
"value"
],
"type": "column"
}
]
],
"timeColumn": "time",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Peer connections",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"decimals": 0,
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
},
{
"columns": [],
"datasource": "postgres",
"fontSize": "100%",
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 0
},
"id": 4,
"links": [],
"pageSize": null,
"scroll": true,
"showHeader": true,
"sort": {
"col": 0,
"desc": true
},
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "date"
},
{
"alias": "Outbound",
"colorMode": null,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"mappingType": 1,
"pattern": "is_outbound",
"thresholds": [],
"type": "string",
"unit": "short",
"valueMaps": [
{
"text": "Yes",
"value": "1"
},
{
"text": "No",
"value": ""
}
]
},
{
"alias": "",
"colorMode": null,
"colors": [
"rgba(245, 54, 54, 0.9)",
"rgba(237, 129, 40, 0.89)",
"rgba(50, 172, 45, 0.97)"
],
"decimals": 2,
"pattern": "/.*/",
"thresholds": [],
"type": "number",
"unit": "short"
}
],
"targets": [
{
"expr": "",
"format": "table",
"group": [],
"intervalFactor": 1,
"metricColumn": "none",
"rawQuery": true,
"rawSql": "SELECT\n moniker,listen_addr as \"Address\",is_outbound,version\nFROM peer_infos\nWHERE\n timestamp = (select max(timestamp) from peer_infos as f where f.node = '$node')\n and node = '$node'\nGROUP BY is_outbound,listen_addr,version,moniker",
"refId": "A",
"select": [
[
{
"params": [
"value"
],
"type": "column"
}
]
],
"timeColumn": "time",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
}
]
}
],
"title": "Current Peers",
"transform": "table",
"type": "table"
},
{
"aliasColors": {},
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "postgres",
"fill": 1,
"gridPos": {
"h": 10,
"w": 24,
"x": 0,
"y": 9
},
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [],
"nullPointMode": "null",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [],
"spaceLength": 10,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "SELECT\n $__timeGroupAlias(\"timestamp\",$__interval),\n count(id) AS \"outbound\"\nFROM peer_infos\nWHERE\n $__timeFilter(\"timestamp\")\n and is_outbound = TRUE\n and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
"format": "time_series",
"group": [],
"intervalFactor": 1,
"metricColumn": "none",
"rawQuery": true,
"rawSql": "SELECT\n $__timeGroupAlias(\"timestamp\",$__interval),\n listen_addr,\n (recv_data->>'CurRate')::int as recv,\n (send_data->>'CurRate')::int as send\nFROM peer_infos\nWHERE\n $__timeFilter(\"timestamp\")\n and node = '$node'\nGROUP BY timestamp,recv_data,send_data,listen_addr\nORDER BY 1",
"refId": "A",
"select": [
[
{
"params": [
"value"
],
"type": "column"
}
]
],
"timeColumn": "time",
"where": [
{
"name": "$__timeFilter",
"params": [],
"type": "macro"
}
]
}
],
"thresholds": [],
"timeFrom": null,
"timeShift": null,
"title": "Node Traffic",
"tooltip": {
"shared": true,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": []
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": true
}
],
"yaxis": {
"align": false,
"alignLevel": null
}
}
],
"schemaVersion": 16,
"style": "dark",
"tags": [],
"templating": {
"list": [
{
"allValue": null,
"current": {
"tags": [],
"text": "gaia-node0:26657",
"value": "gaia-node0:26657"
},
"datasource": "postgres",
"hide": 0,
"includeAll": false,
"label": null,
"multi": false,
"name": "node",
"options": [
{
"selected": false,
"text": "gaia-node2:26657",
"value": "gaia-node2:26657"
},
{
"selected": false,
"text": "gaia-node1:26657",
"value": "gaia-node1:26657"
},
{
"selected": false,
"text": "gaia-node3:26657",
"value": "gaia-node3:26657"
},
{
"selected": true,
"text": "gaia-node0:26657",
"value": "gaia-node0:26657"
}
],
"query": "SELECT DISTINCT node FROM peer_infos",
"refresh": 0,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"tagValuesQuery": "",
"tags": [],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "P2P",
"uid": "t4voLF0mk",
"version": 4
}

View File

@ -0,0 +1,18 @@
# config file version
apiVersion: 1
# list of datasources to insert/update depending
# what's available in the database
datasources:
# <string, required> name of the datasource. Required
- name: main
# <string, required> datasource type. Required
type: prometheus
# <string, required> access mode. proxy or direct (Server or Browser in the UI). Required
access: proxy
# <int> org id. will default to orgId 1 if not specified
orgId: 1
# <string> url
url: http://prom:9090
isDefault: true
# <bool> allow users to edit datasources from the UI.
editable: true

View File

@ -0,0 +1,25 @@
# config file version
apiVersion: 1
# list of datasources to insert/update depending
# what's available in the database
datasources:
# <string, required> name of the datasource. Required
- name: postgres
# <string, required> datasource type. Required
type: postgres
# <string, required> access mode. proxy or direct (Server or Browser in the UI). Required
access: proxy
# <int> org id. will default to orgId 1 if not specified
orgId: 1
# <string> url
url: postgres-chain:5432
jsonData:
sslmode: "disable"
secureJsonData:
user: "postgres"
password: "mypwd"
user: postgres
database: postgres
version: 1
# <bool> allow users to edit datasources from the UI.
editable: true

11
monitoring/grafana/deploy.sh Executable file
View File

@ -0,0 +1,11 @@
#! /bin/sh
cd $(dirname $0)
oc delete configmap grafana-datasources
oc delete configmap grafana-dashboards
oc delete configmap grafana-dashboards-prov
oc create configmap grafana-datasources --from-file=datastores/
oc create configmap grafana-dashboards --from-file=dashboards/
oc create configmap grafana-dashboards-prov --from-file=dashboards-prov/
oc process -f grafana.yml | oc apply -f -

View File

@ -0,0 +1,545 @@
---
kind: Template
apiVersion: v1
metadata:
name: grafana
annotations:
"openshift.io/display-name": Grafana
description: |
Grafana server with patched Prometheus datasource.
iconClass: fa fa-cogs
tags: "metrics,monitoring,grafana,prometheus"
parameters:
- description: The location of the grafana image
name: IMAGE_GRAFANA
value: docker.io/grafana/grafana:master
- description: The location of the proxy image
name: IMAGE_PROXY
value: openshift/oauth-proxy:v1.0.0
- description: External URL for the grafana route
name: ROUTE_URL
value: ""
- description: The session secret for the proxy
name: SESSION_SECRET
generate: expression
from: "[a-zA-Z0-9]{43}"
objects:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: grafana
annotations:
serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"grafana"}}'
- apiVersion: authorization.openshift.io/v1
kind: ClusterRoleBinding
metadata:
name: grafana-cluster-reader
roleRef:
name: cluster-reader
subjects:
- kind: ServiceAccount
name: grafana
- apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: grafana
spec:
host: "${ROUTE_URL}"
to:
name: grafana
tls:
termination: Reencrypt
- apiVersion: v1
kind: Service
metadata:
name: grafana
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: https
service.alpha.openshift.io/serving-cert-secret-name: grafana-tls
labels:
metrics-infra: grafana
name: grafana
spec:
ports:
- name: grafana
port: 443
protocol: TCP
targetPort: 8443
selector:
app: grafana
- apiVersion: v1
kind: Secret
metadata:
name: grafana-proxy
stringData:
session_secret: "${SESSION_SECRET}="
# Deploy Grafana behind an oauth proxy
- apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
name: grafana
spec:
serviceAccountName: grafana
containers:
- name: oauth-proxy
image: ${IMAGE_PROXY}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8443
name: web
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- -https-address=:8443
- -http-address=
- -email-domain=*
- -client-id=system:serviceaccount:$(NAMESPACE):grafana
- -upstream=http://localhost:3000
- -provider=openshift
# - '-openshift-delegate-urls={"/api/datasources": {"resource": "namespace", "verb": "get", "resourceName": "grafana", "namespace": "${NAMESPACE}"}}'
- '-openshift-sar={"namespace": "$(NAMESPACE)", "verb": "list", "resource": "services"}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -skip-auth-regex=^/metrics,/api/datasources,/api/dashboards
volumeMounts:
- mountPath: /etc/tls/private
name: grafana-tls
- mountPath: /etc/proxy/secrets
name: secrets
- name: grafana
image: ${IMAGE_GRAFANA}
ports:
- name: grafana-http
containerPort: 3000
volumeMounts:
- mountPath: "/root/go/src/github.com/grafana/grafana/data"
name: grafana-data
- mountPath: "/usr/share/grafana/conf"
name: grafanaconfig
- mountPath: "/usr/share/grafana/datasources"
name: grafanadatasources
- mountPath: "/usr/share/grafana/dashboards"
name: grafanadashboards-prov
- mountPath: "/var/lib/grafana/dashboards"
name: grafanadashboards
- mountPath: /etc/tls/private
name: grafana-tls
- mountPath: /etc/proxy/secrets
name: secrets
volumes:
- name: grafanaconfig
configMap:
name: grafana-config
- name: grafanadatasources
configMap:
name: grafana-datasources
- name: grafanadashboards
configMap:
name: grafana-dashboards
- name: grafanadashboards-prov
configMap:
name: grafana-dashboards-prov
- name: secrets
secret:
secretName: grafana-proxy
- name: grafana-tls
secret:
secretName: grafana-tls
- emptyDir: {}
name: grafana-data
- apiVersion: v1
kind: ConfigMap
metadata:
name: grafana-config
data:
defaults.ini: |-
##################### Grafana Configuration Defaults #####################
#
# Do not modify this file in grafana installs
#
# possible values : production, development
app_mode = production
# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
instance_name = ${HOSTNAME}
#################################### Paths ###############################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
#
data = data
#
# Directory where grafana can store logs
#
logs = data/log
#
# Directory where grafana will automatically scan and look for plugins
#
plugins = data/plugins
#################################### Server ##############################
[server]
# Protocol (http, https, socket)
protocol = http
# The ip address to bind to, empty will bind to all interfaces
http_addr =
# The http port to use
http_port = 3000
# The public facing domain name used to access grafana from a browser
domain = localhost
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
enforce_domain = false
# The full public facing url
root_url = %(protocol)s://%(domain)s:%(http_port)s/
# Log web requests
router_logging = false
# the path relative working path
static_root_path = public
# enable gzip
enable_gzip = false
# https certs & key file
cert_file = /etc/tls/private/tls.crt
cert_key = /etc/tls/private/tls.key
# Unix socket path
socket = /tmp/grafana.sock
#################################### Database ############################
[database]
# You can configure the database connection by specifying type, host, name, user and password
# as separate properties or as on string using the url property.
# Either "mysql", "postgres" or "sqlite3", it's your choice
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
# If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
password =
# Use either URL or the previous fields to configure the database
# Example: mysql://user:secret@host:port/database
url =
# Max idle conn setting default is 2
max_idle_conn = 2
# Max conn setting default is 0 (mean not set)
max_open_conn =
# For "postgres", use either "disable", "require" or "verify-full"
# For "mysql", use either "true", "false", or "skip-verify".
ssl_mode = disable
ca_cert_path =
client_key_path =
client_cert_path =
server_cert_name =
# For "sqlite3" only, path relative to data_path setting
path = grafana.db
#################################### Session #############################
[session]
# Either "memory", "file", "redis", "mysql", "postgres", "memcache", default is "file"
provider = file
# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
# mysql: go-sql-driver/mysql dsn config string, examples:
# `user:password@tcp(127.0.0.1:3306)/database_name`
# `user:password@unix(/var/run/mysqld/mysqld.sock)/database_name`
# memcache: 127.0.0.1:11211
provider_config = sessions
# Session cookie name
cookie_name = grafana_sess
# If you use session in https only, default is false
cookie_secure = false
# Session life time, default is 86400
session_life_time = 86400
gc_interval_time = 86400
#################################### Data proxy ###########################
[dataproxy]
# This enables data proxy logging, default is false
logging = false
#################################### Analytics ###########################
[analytics]
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
# No ip addresses are being tracked, only simple counters to track
# running instances, dashboard and error counts. It is very helpful to us.
# Change this option to false to disable reporting.
reporting_enabled = true
# Set to false to disable all checks to https://grafana.com
# for new versions (grafana itself and plugins), check is used
# in some UI views to notify that grafana or plugin update exists
# This option does not cause any auto updates, nor send any information
# only a GET request to https://grafana.com to get latest versions
check_for_updates = true
# Google Analytics universal tracking code, only enabled if you specify an id here
google_analytics_ua_id =
# Google Tag Manager ID, only enabled if you specify an id here
google_tag_manager_id =
#################################### Security ############################
[security]
# default admin user, created on startup
admin_user = admin
# default admin password, can be changed before first start of grafana, or in profile settings
admin_password = admin
# used for signing
secret_key = SW2YcwTIb9zpOOhoPsMm
# Auto-login remember days
login_remember_days = 7
cookie_username = grafana_user
cookie_remember_name = grafana_remember
# disable gravatar profile images
disable_gravatar = false
# data source proxy whitelist (ip_or_domain:port separated by spaces)
data_source_proxy_whitelist =
[snapshots]
# snapshot sharing options
external_enabled = false
external_snapshot_url = https://snapshots-origin.raintank.io
external_snapshot_name = Publish to snapshot.raintank.io
# remove expired snapshot
snapshot_remove_expired = true
# remove snapshots after 90 days
snapshot_TTL_days = 90
#################################### Users ####################################
[users]
# disable user signup / registration
allow_sign_up = true
# Allow non admin users to create organizations
allow_org_create = true
# Set to true to automatically assign new users to the default organization (id 1)
auto_assign_org = true
# Default role new users will be automatically assigned (if auto_assign_org above is set to true)
auto_assign_org_role = Admin
# Require email validation before sign up completes
verify_email_enabled = false
# Background text for the user field on the login page
login_hint = email or username
# Default UI theme ("dark" or "light")
default_theme = dark
# External user management
external_manage_link_url =
external_manage_link_name =
external_manage_info =
[auth]
# Set to true to disable (hide) the login form, useful if you use OAuth
disable_login_form = true
# Set to true to disable the signout link in the side menu. useful if you use auth.proxy
disable_signout_menu = true
#################################### Anonymous Auth ######################
[auth.anonymous]
# enable anonymous access
enabled = true
# specify organization name that should be used for unauthenticated users
org_name = Main Org.
# specify role for unauthenticated users
org_role = Admin
#################################### Github Auth #########################
[auth.github]
enabled = false
allow_sign_up = true
client_id = some_id
client_secret = some_secret
scopes = user:email
auth_url = https://github.com/login/oauth/authorize
token_url = https://github.com/login/oauth/access_token
api_url = https://api.github.com/user
team_ids =
allowed_organizations =
#################################### Google Auth #########################
[auth.google]
enabled = false
allow_sign_up = true
client_id = some_client_id
client_secret = some_client_secret
scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
auth_url = https://accounts.google.com/o/oauth2/auth
token_url = https://accounts.google.com/o/oauth2/token
api_url = https://www.googleapis.com/oauth2/v1/userinfo
allowed_domains =
hosted_domain =
#################################### Grafana.com Auth ####################
# legacy key names (so they work in env variables)
[auth.grafananet]
enabled = false
allow_sign_up = true
client_id = some_id
client_secret = some_secret
scopes = user:email
allowed_organizations =
[auth.grafana_com]
enabled = false
allow_sign_up = true
client_id = some_id
client_secret = some_secret
scopes = user:email
allowed_organizations =
#################################### Generic OAuth #######################
[auth.generic_oauth]
name = OAuth
enabled = false
allow_sign_up = true
client_id = some_id
client_secret = some_secret
scopes = user:email
auth_url =
token_url =
api_url =
team_ids =
allowed_organizations =
#################################### Basic Auth ##########################
[auth.basic]
enabled = false
#################################### Auth Proxy ##########################
[auth.proxy]
enabled = true
header_name = X-WEBAUTH-USER
header_property = username
auto_sign_up = true
ldap_sync_ttl = 60
whitelist =
#################################### Auth LDAP ###########################
[auth.ldap]
enabled = false
config_file = /etc/grafana/ldap.toml
allow_sign_up = true
#################################### SMTP / Emailing #####################
[smtp]
enabled = false
host = localhost:25
user =
# If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
password =
cert_file =
key_file =
skip_verify = false
from_address = admin@grafana.localhost
from_name = Grafana
ehlo_identity =
[emails]
welcome_email_on_sign_up = false
templates_pattern = emails/*.html
#################################### Logging ##########################
[log]
# Either "console", "file", "syslog". Default is console and file
# Use space to separate multiple modes, e.g. "console file"
mode = console file
# Either "debug", "info", "warn", "error", "critical", default is "info"
level = error
# optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
filters =
# For "console" mode only
[log.console]
level =
# log line format, valid options are text, console and json
format = console
# For "file" mode only
[log.file]
level =
# log line format, valid options are text, console and json
format = text
# This enables automated log rotate(switch of following options), default is true
log_rotate = true
# Max line number of single file, default is 1000000
max_lines = 1000000
# Max size shift of single file, default is 28 means 1 << 28, 256MB
max_size_shift = 28
# Segment log daily, default is true
daily_rotate = true
# Expired days of log file(delete after max days), default is 7
max_days = 7
[log.syslog]
level =
# log line format, valid options are text, console and json
format = text
# Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
network =
address =
# Syslog facility. user, daemon and local0 through local7 are valid.
facility =
# Syslog tag. By default, the process' argv[0] is used.
tag =
#################################### AMQP Event Publisher ################
[event_publisher]
enabled = false
rabbitmq_url = amqp://localhost/
exchange = grafana_events
#################################### Dashboard JSON files ################
[dashboards.json]
enabled = false
path = /var/lib/grafana/dashboards
#################################### Usage Quotas ########################
[quota]
enabled = false
#### set quotas to -1 to make unlimited. ####
# limit number of users per Org.
org_user = 10
# limit number of dashboards per Org.
org_dashboard = 100
# limit number of data_sources per Org.
org_data_source = 10
# limit number of api_keys per Org.
org_api_key = 10
# limit number of orgs a user can create.
user_org = 10
# Global limit of users.
global_user = -1
# global limit of orgs.
global_org = -1
# global limit of dashboards
global_dashboard = -1
# global limit of api_keys
global_api_key = -1
# global limit on number of logged in users.
global_session = -1
#################################### Alerting ############################
[alerting]
# Disable alerting engine & UI features
enabled = true
# Makes it possible to turn off alert rule execution but alerting UI is visible
execute_alerts = true
#################################### Internal Grafana Metrics ############
# Metrics available at HTTP API Url /api/metrics
[metrics]
enabled = true
interval_seconds = 10
# Send internal Grafana metrics to graphite
[metrics.graphite]
# Enable by setting the address setting (ex localhost:2003)
address =
prefix = prod.grafana.%(instance_name)s.
[grafana_net]
url = https://grafana.com
[grafana_com]
url = https://grafana.com
#################################### Distributed tracing ############
[tracing.jaeger]
# jaeger destination (ex localhost:6831)
address =
# tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
always_included_tag =
# Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
sampler_type = const
# jaeger samplerconfig param
# for "const" sampler, 0 or 1 for always false/true respectively
# for "probabilistic" sampler, a probability between 0 and 1
# for "rateLimiting" sampler, the number of spans per second
# for "remote" sampler, param is the same as for "probabilistic"
# and indicates the initial sampling rate before the actual one
# is received from the mothership
sampler_param = 1
#################################### External Image Storage ##############

16
monitoring/logging/deploy.sh Executable file
View File

@ -0,0 +1,16 @@
#! /bin/sh
cd $(dirname $0)
oc process -f kafka.yml | oc apply -f -
oc apply -f fluentd.yml -n kube-system
oc adm policy add-scc-to-user privileged -z fluentd -n kube-system
oc patch ds fluentd -n kube-system -p "spec:
template:
spec:
containers:
- name: fluentd
securityContext:
privileged: true"
oc delete pod --namespace kube-system -l "k8s-app = fluentd-logging"

View File

@ -0,0 +1,86 @@
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluentd
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRole
metadata:
name: fluentd
namespace: kube-system
rules:
- apiGroups:
- ""
resources:
- pods
- namespaces
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: fluentd
roleRef:
kind: ClusterRole
name: fluentd
apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
name: fluentd
namespace: kube-system
---
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: fluentd
namespace: kube-system
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
template:
metadata:
labels:
k8s-app: fluentd-logging
version: v1
kubernetes.io/cluster-service: "true"
spec:
serviceAccount: fluentd
serviceAccountName: fluentd
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
containers:
- name: fluentd
image: fluent/fluentd-kubernetes-daemonset:v1.2.5-debian-kafka
env:
- name: FLUENT_KAFKA_BROKERS
value: "kafka.testmon.svc:9092"
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200Mi
volumeMounts:
- name: varlog
mountPath: /var/log
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
terminationGracePeriodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers

View File

@ -0,0 +1,143 @@
---
apiVersion: v1
kind: Template
labels:
app: kafka
template: kafka
parameters:
- displayName: Name
name: NAME
required: true
value: kafka
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: kafka
spec:
lookupPolicy:
local: true
tags:
- from:
kind: DockerImage
name: wurstmeister/kafka:latest
name: latest
referencePolicy:
type: Source
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}
spec:
replicas: 1
selector:
name: ${NAME}
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}
spec:
containers:
- image: ' '
imagePullPolicy: IfNotPresent
name: ${NAME}
ports:
- containerPort: 9092
env:
- name: KAFKA_ADVERTISED_HOST_NAME
value: "kafka"
- name: KAFKA_CREATE_TOPICS
value: "topic:1:1"
- name: KAFKA_ZOOKEEPER_CONNECT
value: "${NAME}-zk:2181"
- name: KAFKA_PORT
value: "9092"
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}
from:
kind: ImageStreamTag
name: kafka:latest
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: Service
metadata:
name: ${NAME}
spec:
ports:
- name: kafka
port: 9092
selector:
name: ${NAME}
- apiVersion: v1
kind: ImageStream
metadata:
name: zookeeper
spec:
lookupPolicy:
local: true
tags:
- from:
kind: DockerImage
name: wurstmeister/zookeeper:latest
name: latest
referencePolicy:
type: Source
- apiVersion: v1
kind: DeploymentConfig
metadata:
annotations:
template.alpha.openshift.io/wait-for-ready: "true"
name: ${NAME}-zk
spec:
replicas: 1
selector:
name: ${NAME}-zk
strategy:
type: Recreate
template:
metadata:
labels:
name: ${NAME}-zk
spec:
containers:
- image: ' '
imagePullPolicy: IfNotPresent
name: ${NAME}-zk
ports:
- containerPort: 2181
triggers:
- imageChangeParams:
automatic: true
containerNames:
- ${NAME}-zk
from:
kind: ImageStreamTag
name: zookeeper:latest
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: Service
metadata:
name: ${NAME}-zk
spec:
ports:
- name: zookeeper
port: 2181
selector:
name: ${NAME}-zk

View File

@ -0,0 +1,65 @@
# node-exporter is an optional component that collects host level metrics from the nodes
# in the cluster. This group of resources will require the 'hostaccess' level of privilege, which
# should only be granted to namespaces that administrators can access.
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-blackbox-exporter
# You must grant hostaccess via: oadm policy add-scc-to-user -z prometheus-blackbox-exporter hostaccess
# in order for the node-exporter to access the host network and mount /proc and /sys from the host
- apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: prometheus-blackbox-exporter
name: prometheus-blackbox-exporter
spec:
clusterIP: None
ports:
- name: scrape
port: 9115
protocol: TCP
targetPort: 9115
selector:
app: prometheus-blackbox-exporter
- apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: prometheus-blackbox-exporter
labels:
app: prometheus-blackbox-exporter
role: monitoring
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-blackbox-exporter
role: monitoring
name: prometheus-exporter
spec:
hostNetwork: true
serviceAccountName: prometheus-blackbox-exporter
containers:
- image: prom/blackbox-exporter:v0.12.0
name: blackbox-exporter
securityContext:
capabilities:
add:
- NET_RAW
ports:
- containerPort: 9115
name: scrape
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m

12
monitoring/prometheus/deploy.sh Executable file
View File

@ -0,0 +1,12 @@
#! /bin/bash
cd $(dirname $0)
# Start blackbox exporter
oc apply -f blackbox_exporter.yml -n kube-system
oc adm policy add-scc-to-user -z prometheus-blackbox-exporter -n kube-system privileged hostaccess
# Start node exporter
oc apply -f node_exporter.yml -n kube-system
oc adm policy add-scc-to-user -z prometheus-node-exporter -n kube-system hostaccess
oc process -f prometheus.yml | oc apply -f -

View File

@ -0,0 +1,79 @@
# node-exporter is an optional component that collects host level metrics from the nodes
# in the cluster. This group of resources will require the 'hostaccess' level of privilege, which
# should only be granted to namespaces that administrators can access.
apiVersion: v1
kind: List
items:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus-node-exporter
# You must grant hostaccess via: oadm policy add-scc-to-user -z prometheus-node-exporter hostaccess
# in order for the node-exporter to access the host network and mount /proc and /sys from the host
- apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
app: prometheus-node-exporter
name: prometheus-node-exporter
spec:
clusterIP: None
ports:
- name: scrape
port: 9100
protocol: TCP
targetPort: 9100
selector:
app: prometheus-node-exporter
- apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: prometheus-node-exporter
labels:
app: prometheus-node-exporter
role: monitoring
spec:
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: prometheus-node-exporter
role: monitoring
name: prometheus-exporter
spec:
serviceAccountName: prometheus-node-exporter
hostNetwork: true
hostPID: true
containers:
- image: openshift/prometheus-node-exporter:v0.16.0
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
name: node-exporter
ports:
- containerPort: 9100
name: scrape
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys

View File

@ -0,0 +1,446 @@
apiVersion: template.openshift.io/v1
kind: Template
metadata:
name: prometheus
annotations:
"openshift.io/display-name": Prometheus
description: |
A Prometheus deployment that can be customized to monitor components and dispatch alerts. It is secure by default and can be used to monitor arbitrary clients.
iconClass: fa fa-cogs
tags: "monitoring,prometheus,alertmanager,time-series"
parameters:
- description: The location of the proxy image
name: IMAGE_PROXY
value: openshift/oauth-proxy:v1.0.0
- description: The location of the prometheus image
name: IMAGE_PROMETHEUS
value: openshift/prometheus:v2.3.2
- description: The location of the alertmanager image
name: IMAGE_ALERTMANAGER
value: openshift/prometheus-alertmanager:v0.15.1
- description: The location of alert-buffer image
name: IMAGE_ALERT_BUFFER
value: openshift/prometheus-alert-buffer:v0.0.2
- description: The session secret for the proxy
name: SESSION_SECRET
generate: expression
from: "[a-zA-Z0-9]{43}"
objects:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: prom
annotations:
serviceaccounts.openshift.io/oauth-redirectreference.prom: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"prom"}}'
serviceaccounts.openshift.io/oauth-redirectreference.alerts: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"prom-alerts"}}'
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: prometheus-data-claim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
# Create a fully end-to-end TLS connection to the prometheus proxy
- apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: prom
spec:
to:
name: prom
tls:
termination: Reencrypt
insecureEdgeTerminationPolicy: Redirect
- apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
prometheus.io/scheme: https
service.alpha.openshift.io/serving-cert-secret-name: prom-tls
labels:
name: prom
name: prom
spec:
ports:
- name: prometheus
port: 443
protocol: TCP
targetPort: 8443
- name: prometheusapi
port: 9090
protocol: TCP
targetPort: 9090
selector:
app: prom
- apiVersion: v1
kind: Secret
metadata:
name: prom-proxy
stringData:
session_secret: "${SESSION_SECRET}="
- apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
labels:
app: prom
name: prom
spec:
updateStrategy:
type: RollingUpdate
podManagementPolicy: Parallel
selector:
matchLabels:
app: prom
template:
metadata:
labels:
app: prom
name: prom
spec:
serviceAccountName: prom
containers:
# Deploy Prometheus behind an oauth proxy
- name: prom-proxy
image: ${IMAGE_PROXY}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8443
name: web
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- -provider=openshift
- -https-address=:8443
- -http-address=
- -email-domain=*
- -upstream=http://localhost:9090
- -client-id=system:serviceaccount:$(NAMESPACE):prom
- -openshift-ca=/etc/pki/tls/cert.pem
- -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- '-openshift-sar={"resource": "namespaces", "verb": "get", "resourceName": "$(NAMESPACE)", "namespace": "$(NAMESPACE)"}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -skip-auth-regex=^/metrics
volumeMounts:
- mountPath: /etc/tls/private
name: prometheus-tls
- mountPath: /etc/proxy/secrets
name: prometheus-secrets
- mountPath: /prometheus
name: prometheus-data
- name: prometheus
args:
- --storage.tsdb.retention=6h
- --config.file=/etc/prometheus/prometheus.yml
- --web.listen-address=:9090
image: ${IMAGE_PROMETHEUS}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9090
name: api
volumeMounts:
- mountPath: /etc/prometheus
name: prometheus-config
- mountPath: /prometheus
name: prometheus-data
# Deploy alertmanager behind an oauth proxy
# use http port=4190 and https port=9943 to differ from prom-proxy
- name: alerts-proxy
image: ${IMAGE_PROXY}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9443
name: web
env:
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
args:
- -provider=openshift
- -https-address=:9443
- -http-address=
- -email-domain=*
- -upstream=http://localhost:9093
- -client-id=system:serviceaccount:$(NAMESPACE):prom
- -openshift-ca=/etc/pki/tls/cert.pem
- -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
- '-openshift-sar={"resource": "namespaces", "verb": "get", "resourceName": "$(NAMESPACE)", "namespace": "$(NAMESPACE)"}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
- -cookie-secret-file=/etc/proxy/secrets/session_secret
volumeMounts:
- mountPath: /etc/tls/private
name: alerts-tls
- mountPath: /etc/proxy/secrets
name: alerts-secrets
- name: alertmanager
args:
- --config.file=/etc/alertmanager/alertmanager.yml
image: ${IMAGE_ALERTMANAGER}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 9093
name: web
volumeMounts:
- mountPath: /etc/alertmanager
name: alertmanager-config
- mountPath: /alertmanager
name: alertmanager-data
restartPolicy: Always
volumes:
- name: prometheus-config
configMap:
defaultMode: 420
name: prometheus
- name: prometheus-secrets
secret:
secretName: prom-proxy
- name: prometheus-tls
secret:
secretName: prom-tls
- name: prometheus-data
persistentVolumeClaim:
claimName: prometheus-data-claim
- name: alertmanager-config
configMap:
defaultMode: 420
name: alertmanager
- name: alerts-secrets
secret:
secretName: prom-alerts-proxy
- name: alerts-tls
secret:
secretName: prom-alerts-tls
- name: alertmanager-data
emptyDir: {}
# Create a fully end-to-end TLS connection to the alert proxy
- apiVersion: route.openshift.io/v1
kind: Route
metadata:
name: prom-alerts
spec:
to:
name: prom-alerts
tls:
termination: Reencrypt
insecureEdgeTerminationPolicy: Redirect
- apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.openshift.io/serving-cert-secret-name: prom-alerts-tls
labels:
name: prom-alerts
name: prom-alerts
spec:
ports:
- name: alerts
port: 443
protocol: TCP
targetPort: 9443
selector:
app: prom
- apiVersion: v1
kind: Secret
metadata:
name: prom-alerts-proxy
stringData:
session_secret: "${SESSION_SECRET}="
- apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus
data:
alerting.rules: |
groups:
- name: node_rules
interval: 30s # defaults to global interval
rules:
- alert: high_cpu_usage_on_node
for: 5m
expr: sum(rate(process_cpu_seconds_total[5m])) by (instance) * 100 > 70
annotations:
summary: "HIGH CPU USAGE WARNING ON '{{ $labels.instance }}'"
severity: "HIGH"
message: "{{ $labels.instance }} is using a LOT of CPU. CPU usage is {{ humanize $value}}%."
- alert: high_memory_usage_on_node
for: 5m
expr: ((node_memory_MemTotal-node_memory_MemAvailable)/node_memory_MemTotal)*100 > 80
annotations:
summary: "HIGH MEMORY USAGE WARNING TASK ON '{{ $labels.instance }}'"
severity: "HIGH"
message: "{{ $labels.instance }} is using a LOT of MEMORY. MEMORY usage is over {{ humanize $value}}%."
- alert: node_running_out_of_disk_space
for: 5m
expr: (node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}) * 100/ node_filesystem_size{mountpoint="/"} > 70
annotations:
summary: "LOW DISK SPACE WARING: NODE '{{ $labels.instance }}'"
severity: "HIGH"
message: "More than 70% of disk used. Disk usage {{ humanize $value }}%."
- alert: disk_will_fill_in_8_hours
for: 5m
expr: predict_linear(node_filesystem_free{mountpoint="/"}[1h], 8*3600) < 0
annotations:
summary: "DISK SPACE FULL IN 8 HOURS: NODE '{{ $labels.instance }}'"
severity: "HIGH"
message: "{{ $labels.instance }} is writing a lot."
- alert: service_down
for: 1m
expr: up < 1
annotations:
summary: "SERVICE DOWN:'{{ $labels.job }}' on '{{ $labels.instance }}'"
severity: "HIGH"
message: "'{{ $labels.job }}' on {{ $labels.instance }} could not be reached by Prometheus for more than 5 minutes."
- alert: ping_failed
for: 1m
expr: probe_success == 0
annotations:
summary: "PING FAILED:'{{ $labels.job }}' on '{{ $labels.instance }}'"
severity: "HIGH"
message: "'{{ $labels.instance }} could not be pinged for more than 1 minute. Host offline ?"
- alert: gaia_stuck
for: 1m
expr: changes(consensus_height[1m]) == 0
annotations:
summary: "NODE seems stuck'{{ $labels.job }}' on '{{ $labels.instance }}'"
severity: "HIGH"
message: "'height of {{ $labels.job }}' on {{ $labels.instance }} did not change for more than 1 minute."
- alert: gaia_behind
for: 1m
expr: consensus_height - ignoring (instance) group_left max without (instance)(consensus_height) <= -5
annotations:
summary: "NODE seems behind'{{ $labels.job }}' on '{{ $labels.instance }}'"
severity: "HIGH"
message: "'height of {{ $labels.job }}' on {{ $labels.instance }} is behind our best height by {{ $value }} for longer than 1minute."
- alert: node_reboot
expr: changes(node_boot_time[10m]) > 1
annotations:
summary: "NODE has rebooted'{{ $labels.job }}' on '{{ $labels.instance }}'"
severity: "HIGH"
message: "It looks like {{ $labels.instance }}' has rebooted. Was that intentional ?"
- alert: high_load
for: 2m
expr: node_load1 / count(node_cpu{mode="system"}) WITHOUT (cpu, mode) > 0.8
annotations:
summary: '{{ $labels.instance }} of job {{ $labels.job }} is under high load.'
severity: "HIGH"
message: "{{ $labels.instance }} has a high load. Sysload is {{ humanize $value}}."
- alert: validator_connection
for: 30s
expr: -p2p_peers + ignoring (instance) group_left count without (instance)(consensus_height) -1 > 0
annotations:
summary: '{{ $labels.instance }} of job {{ $labels.job }} is not connected to all other nodes.'
severity: "HIGH"
message: "{{ $labels.instance }} has a peering problem with other private nodes. Nodes missing: {{$value}}."
- alert: network_errors
for: 30s
expr: sum(rate(node_network_receive_errs[5m])) > 1
annotations:
summary: '{{ $labels.instance }} of job {{ $labels.job }} has network receive errors.'
severity: "HIGH"
message: "{{ $labels.instance }} has networking issues. Package error rate: {{$value}}."
- alert: socket_opens
for: 5m
expr: delta(node_sockstat_TCP_alloc[1m]) > 200
annotations:
summary: '{{ $labels.instance }} allocates TCP sockets at a very high rate.'
severity: "HIGH"
message: "{{ $labels.instance }} allocates very many TCP sockets. TCP sockets / second: {{$value}}."
recording.rules: |
groups:
- name: aggregate_container_resources
rules:
- record: container_cpu_usage_rate
expr: sum without (cpu) (rate(container_cpu_usage_seconds_total[5m]))
- record: container_memory_rss_by_type
expr: container_memory_rss{id=~"/|/system.slice|/kubepods.slice"} > 0
- record: container_cpu_usage_percent_by_host
expr: sum(rate(container_cpu_usage_seconds_total{id="/"}[5m])) BY(kubernetes_io_hostname) / ON(kubernetes_io_hostname) machine_cpu_cores
- record: apiserver_request_count_rate_by_resources
expr: sum without (client,instance,contentType) (rate(apiserver_request_count[5m]))
prometheus.yml: |
rule_files:
- '*.rules'
# A scrape configuration for running Prometheus on a Kubernetes cluster.
# This uses separate scrape configs for cluster components (i.e. API server, node)
# and services to allow each to use different authentication configs.
#
# Kubernetes labels will be added as Prometheus labels on metrics via the
# `labelmap` relabeling action.
# Scrape config for API servers.
#
# Kubernetes exposes API servers as endpoints to the default/kubernetes
# service so this uses `endpoints` role and uses relabelling to only keep
# the endpoints associated with the default/kubernetes service using the
# default named port `https`. This works for single API server deployments as
# well as HA API server deployments.
scrape_configs:
- job_name: "node"
scrape_interval: 5s
static_configs:
- targets: ["5.83.163.203:9100"]
- job_name: "gaia"
scrape_interval: 2s
static_configs:
- targets: ["gaia-node0:26660","gaia-node1:26660","gaia-node2:26660","gaia-node3:26660"]
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [icmp] # Look for a HTTP 200 response.
static_configs:
- targets:
- 5.83.163.203 # Target to probe with http.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 5.83.163.203:9115 # The blackbox exporter's real hostname:port.
alerting:
alertmanagers:
- scheme: http
static_configs:
- targets:
- "localhost:9093"
- apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager
data:
alertmanager.yml: |
global:
# The root route on which each incoming alert enters.
route:
# default route if none match
receiver: alert-buffer-wh
# The labels by which incoming alerts are grouped together. For example,
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would
# be batched into a single group.
# TODO:
group_by: []
# All the above attributes are inherited by all child routes and can
# overwritten on each.
receivers:
- name: alert-buffer-wh
webhook_configs:
- url: http://localhost:9099/topics/alerts

28
openshift/deploy.yaml Normal file
View File

@ -0,0 +1,28 @@
---
apiVersion: batch/v1
kind: Job
metadata:
name: gaia-deploy
app: gaia
spec:
parallelism: 1
completions: 1
template:
metadata:
name: gaia-deploy
spec:
volumes:
- name: state
persistentVolumeClaim:
claimName: gaia-ansible-state
containers:
- name: gaia-deploy
image: gaia-ansible:latest
env:
- name: ANSIBLE_FORCE_COLOR
value: 'true'
volumeMounts:
- name: state
mountPath: /opt/app-root/state
restartPolicy: OnFailure
serviceAccount: gaia-ansible

View File

@ -0,0 +1,88 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia
template: gaia-ansible
metadata:
name: gaia-ansible
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: gaia-ansible
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/certusone/testnet_deploy
- displayName: Git Reference
name: GIT_REF
required: true
value: master
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: ${NAME}
spec:
lookupPolicy:
local: true
- apiVersion: v1
kind: BuildConfig
metadata:
name: ${NAME}
spec:
output:
to:
kind: ImageStreamTag
name: ${NAME}:latest
postCommit: {}
runPolicy: Serial
source:
git:
ref: ${GIT_REF}
uri: ${GIT_REPO}
contextDir: ansible
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: gaiad:latest
type: Source
triggers:
- imageChange: {}
type: ImageChange
- type: ConfigChange
- apiVersion: v1
kind: ServiceAccount
metadata:
name: ${NAME}
- apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: ${NAME}-edit
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: ${NAME}
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ${NAME}-state
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi

63
openshift/gaiad.yaml Normal file
View File

@ -0,0 +1,63 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia
template: gaiad
metadata:
name: gaia
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: gaiad
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/cosmos/cosmos-sdk
- displayName: Git Reference
name: GIT_REF
required: true
value: develop
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: ${NAME}
spec:
# this allows k8s objects to directly reference this image stream
lookupPolicy:
local: true
- apiVersion: v1
kind: BuildConfig
metadata:
name: ${NAME}
spec:
output:
to:
kind: ImageStreamTag
name: ${NAME}:latest
postCommit: {}
runPolicy: Serial
source:
git:
ref: ${GIT_REF}
uri: ${GIT_REPO}
type: Git
strategy:
sourceStrategy:
from:
kind: ImageStreamTag
name: golang-s2i:1.10
env:
- name: S2I_GOPKG
value: github.com/cosmos/cosmos-sdk
type: Source
triggers:
- imageChange: {}
type: ImageChange
- type: ConfigChange

58
openshift/golang-s2i.yaml Normal file
View File

@ -0,0 +1,58 @@
---
apiVersion: v1
kind: Template
labels:
app: gaia
template: golang-s2i
metadata:
name: golang-s2i
template.openshift.io/bindable: "false"
parameters:
- displayName: Name
name: NAME
required: true
value: golang-s2i
- displayName: Git Repository
name: GIT_REPO
required: true
value: https://github.com/certusone/testnet_deploy
- displayName: Git Reference
name: GIT_REF
required: true
value: master
objects:
- apiVersion: v1
kind: ImageStream
metadata:
name: ${NAME}
- apiVersion: v1
kind: BuildConfig
metadata:
labels:
build: ${NAME}
name: ${NAME}
spec:
output:
to:
kind: ImageStreamTag
name: ${NAME}:1.10
runPolicy: Serial
source:
contextDir: openshift/golang-s2i
git:
ref: ${GIT_REF}
uri: ${GIT_REPO}
type: Git
strategy:
dockerStrategy:
from:
kind: DockerImage
name: registry.fedoraproject.org/f28/s2i-base
type: Docker
triggers:
- imageChange: {}
type: ImageChange
- type: ConfigChange

View File

@ -0,0 +1,24 @@
# Custom S2I image for Golang
FROM registry.fedoraproject.org/f28/s2i-base
RUN dnf -y install golang inotify-tools protobuf-compiler rsync nmap-ncat dep ansible origin-clients && dnf -y clean all
# Environment setup
COPY scl_enable /opt/app-root/etc/scl_enable
# Drop the root user and make the content of /opt/app-root owned by user 1001
RUN chown -R 1001:0 ${APP_ROOT} && chmod -R ug+rwx ${APP_ROOT} && \
rpm-file-permissions
# Copy S2I scripts
COPY s2i/ $STI_SCRIPTS_PATH
RUN chmod +x $STI_SCRIPTS_PATH/*
# OpenShift Ansible module
# TODO: don't even think about doing this in production
RUN pip install openshift
USER 1001
# Set the default CMD to print the usage of the language image
CMD $STI_SCRIPTS_PATH/usage

View File

@ -0,0 +1,45 @@
#!/bin/bash
set -e
shopt -s dotglob
set -x
. /opt/app-root/etc/scl_enable
# Copy source code to /opt/app-root/src
echo "---> Copy application source ..."
mv /tmp/src/* ./
# Create user-owned GOPATH
echo "---> Initialize Go environment ..."
mkdir ${GOPATH}
# If S2I_GOPKG is set, we link app-root to its proper place in the GOPATH
if [[ ! -z ${S2I_GOPKG} ]]; then
PKG="${GOPATH}/src/${S2I_GOPKG}"
mkdir -p $(dirname "$PKG")
ln -s "$(pwd)" "$PKG"
fi
echo "Go package to be built: $PKG"
if [[ ! -z ${S2I_GOCMD} ]]; then
# Build the package
echo "---> Building ..."
go build -o ./app ${S2I_GOCMD}
fi
ls -lisa
if [[ -f Makefile ]]; then
echo "---> Building using Makefile ..."
if [[ ! -z ${S2I_GOPKG} ]]; then
cd "${GOPATH}/src/${S2I_GOPKG}"
fi
# TODO: generalize
LEDGER_ENABLED=false make get_vendor_deps install
fi
echo "---> Fixing permissions ..."
fix-permissions ./

View File

@ -0,0 +1,9 @@
#!/bin/bash
# Colored output
export TERM=xterm
export GOPATH=${APP_ROOT}/go
export PATH=${PATH}:${GOPATH}/bin
unset BASH_ENV PROMPT_COMMAND ENV