Initial commit

Co-authored-by: Leopold Schabel <leo@certus.one>
2018-09-25 22:12:17 +02:00 · 2018-09-25 22:12:17 +02:00 · c6e8970efe
commit c6e8970efe
37 changed files with 26500 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,2 @@
+*.iml
+TODO
--- a/README.md
+++ b/README.md
@ -0,0 +1,102 @@
+# testnet_deploy
+
+This repo deploys a full Cosmos SDK testnet plus monitoring on a 
+OpenShift Origin/okd.io Kubernetes cluster.
+
+Requirements:
+
+- CentOS => 7.5
+- OpenShift Origin == 3.9
+
+## Introduction
+
+We recorded this video to guide you through the (one-click) setup of your own fully monitored Cosmos network and explain how the snippets and monitoring systems can be used.
+
+[Watch the video here](https://www.useloom.com/share/c281221bcfb04e4798659618eb15ac88)
+
+Also don't forget our validator knowledge base with important information about operations and monitoring.
+
+[Knowledgebase](https://kb.certus.one/)
+
+The `gaia_exporter`, `net_exporter` and alerting tools are built from the [chain_exporter](https://github.com/certusone/chain_exporter) repo.
+Please take usage instructions from the deployment scripts and commandline output.
+
+## Deploying an OpenShift Origin Cluster
+
+Deploy an OpenShift Origin 3.9 cluster on CentOS 7:
+
+    yum -y install git docker tcpdump bridge-utils vim centos-release-openshift-origin39 epel-release
+    yum -y install origin origin-clients htop
+    
+    cat <<EOF > /etc/sysconfig/docker
+    OPTIONS="--log-driver=journald --insecure-registry 172.30.0.0/16 --signature-verification=false"
+    EOF
+    
+    systemctl enable docker
+    systemctl start docker
+    
+    git clone https://github.com/openshift-evangelists/oc-cluster-wrapper
+    
+    cat <<EOF >> ~/.bash_profile
+    export PATH=~/oc-cluster-wrapper:\$PATH
+    export OC_CLUSTER_PUBLIC_HOSTNAME=$(hostname -f)
+    export OC_CLUSTER_ROUTING_SUFFIX=apps.$(hostname --ip-address).nip.io
+    EOF
+    
+    ~/oc-cluster-wrapper/oc-cluster completion bash > /etc/bash_completion.d/oc-cluster.bash
+
+Re-login once you’re done to make the auto-completion work. This is a non-production deployment
+of OpenShift and you can login via admin/admin. If you're running this on
+a publicly reachable host, make sure to properly configure your firewall to prevent
+the infamous Kubernetes Bitcoin mining botnet from assimilating your cluster:
+
+Configure firewalld:
+
+    yum -y install firewalld
+    systemctl start firewalld
+    systemctl enable firewalld
+    
+    firewall-cmd --permanent --new-zone admin
+    firewall-cmd --permanent --add-source=your_public_ip_to_whitelist/32 --zone=admin
+    firewall-cmd --permanent --add-port=8443/tcp --zone=admin
+    firewall-cmd --permanent --add-port=443/tcp --zone=admin
+    
+    firewall-cmd --permanent --new-zone dockerc
+    firewall-cmd --permanent --zone dockerc --add-source 172.17.0.0/16
+    firewall-cmd --permanent --zone dockerc --add-port 8443/tcp
+    firewall-cmd --permanent --zone dockerc --add-port 53/udp
+    firewall-cmd --permanent --zone dockerc --add-port 8053/udp
+    
+    firewall-cmd --permanent --add-masquerade --zone=public
+    
+    firewall-cmd --reload
+
+Finally, boot up your cluster:
+
+    oc-cluster up
+
+You can now log into the web application using developer or admin/admin
+(`https://<hostname>:8443`), or log in using the CLI:
+
+    oc login https://<hostname>:8443
+
+(the admin user is cluster administrator, whereas the developer user isn’t)
+
+
+## Deploy our testnet
+
+For Sentry alerts to work set the following variables:
+
+`monitoring/exporter/alerter.yml`: Replace `<INSERT_RAVEN_DSN>` with the RAVEN_DSN URL of your (self-)hosted Sentry instance. 
+
+If you want alerts from your alertmanager:
+
+`monitoring/prometheus/prometheus.yml`: Modify the alertmanager config according to [the Prometheus docs](https://prometheus.io/docs/alerting/configuration/)
+
+This deploys our testnet:
+
+    ./deploy_testnet.sh
+    
+This deploys everything, including our monitoring stack:
+
+    ./deploy_all.sh
--- a/ansible/.s2i/bin/assemble
+++ b/ansible/.s2i/bin/assemble
@ -0,0 +1,13 @@
+#!/bin/bash
+set -e
+shopt -s dotglob
+
+# Copy source code to /opt/app-root/src
+echo "---> Copy application source ..."
+# Remove source code from previous s2i stage
+rm -rf ./*
+mv /tmp/src/* ./
+
+#echo "---> Installing OpenShift Python client for Ansible"
+# TODO: we're currently doing this in the Dockerfile to speed up repeated builds
+#pip install --user openshift
--- a/ansible/.s2i/bin/run
+++ b/ansible/.s2i/bin/run
@ -0,0 +1,4 @@
+#!/bin/bash
+set -euo pipefail
+
+ansible-playbook -c local -i localhost, playbook.yaml
--- a/ansible/playbook.yaml
+++ b/ansible/playbook.yaml
@ -0,0 +1,131 @@
+---
+
+- hosts: localhost
+  vars:
+    gaia_namespace: "{{ ansible_env.OPENSHIFT_BUILD_NAMESPACE }}"
+    gaia_num_nodes: 4
+    state_dir: /opt/app-root/state
+  tasks:
+  - name: Delete existing testnet folder
+    file:
+      path: "{{ state_dir }}/testnet"
+      state: absent
+
+  - name: Create testnet files
+    command: >
+      /opt/app-root/go/bin/gaiad testnet
+      --v {{ gaia_num_nodes }}
+      --starting-ip-address 10.33.33.0
+      -o {{ state_dir }}/testnet
+
+  - name: Replace persistent peers
+    shell: >
+      sed -i 's/10\.33\.33\./gaia-node/g'
+      {{ state_dir }}/testnet/node*/gaiad/config/config.toml
+    args:
+      warn: no
+
+  - name: config.toml overrides
+    shell:
+      sed -i 's/^{{ item.key }} = .*/{{ item.key }} = {{ item.value }}/'
+      {{ state_dir }}/testnet/node*/gaiad/config/config.toml
+    args:
+      warn: no
+    with_dict:
+      # produce blocks as fast as we can
+      skip_timeout_commit: 'true'
+      # accept private IP ranges
+      addr_book_strict: 'false'
+      # calculated max peers
+      max_num_peers: '{{ 50 + gaia_num_nodes }}'
+      # we can later auto-configure a Prometheus server via annotations
+      prometheus: 'true'
+
+  # Headless service - we need forward DNS!
+  - name: Create node services
+    k8s:
+      state: present
+      definition:
+        apiVersion: v1
+        kind: Service
+        metadata:
+          labels:
+            app: gaia
+            template: gaia-nodes
+          namespace: "{{ gaia_namespace }}"
+          name: gaia-node{{ item }}
+        spec:
+          clusterIP: None
+          ports:
+          - name: gaiad-peer
+            port: 26656
+            protocol: TCP
+            targetPort: 26656
+          - name: gaiad-rpc
+            port: 26657
+            protocol: TCP
+            targetPort: 26657
+          - name: gaiad-metrics
+            port: 26660
+            protocol: TCP
+            targetPort: 26660
+          selector:
+            name: gaia-node{{ item }}
+          sessionAffinity: None
+          type: ClusterIP
+
+    with_sequence: start=0 count={{ gaia_num_nodes }}
+
+  - name: Create RPC route for node1
+    k8s:
+      state: present
+      definition:
+        apiVersion: v1
+        kind: Route
+        metadata:
+          name: gaia-rpc
+          namespace: "{{ gaia_namespace }}"
+        spec:
+          port:
+            targetPort: gaiad-rpc
+          tls:
+            termination: edge
+          to:
+            kind: Service
+            name: gaia-node1
+            weight: 100
+          wildcardPolicy: None
+
+  - name: Start gaiad pods
+    k8s:
+      state: present
+      definition:
+        apiVersion: v1
+        kind: Pod
+        metadata:
+          labels:
+            app: gaia
+            template: gaia-nodes
+            name: gaia-node{{ item }}
+          annotations:
+            prometheus.io/scrape: 'true'
+            prometheus.io/port: '26660'
+          namespace: "{{ gaia_namespace }}"
+          name: gaia-node{{ item }}
+        spec:
+          volumes:
+          - name: state
+            persistentVolumeClaim:
+              claimName: gaia-ansible-state
+          containers:
+            - name: gaia-node{{ item }}
+              image: gaiad:latest
+              volumeMounts:
+                - mountPath: /opt/app-root/state
+                  name: state
+              command:
+                - /opt/app-root/go/bin/gaiad
+                - start
+                - --home
+                - /opt/app-root/state/testnet/node{{ item }}/gaiad
+    with_sequence: start=0 count={{ gaia_num_nodes}}
--- a/deploy_all.sh
+++ b/deploy_all.sh
@ -0,0 +1,7 @@
+#! /bin/sh
+
+./deploy_testnet.sh
+
+./monitoring/exporter/deploy.sh
+./monitoring/prometheus/deploy.sh
+./monitoring/grafana/deploy.sh
--- a/deploy_testnet.sh
+++ b/deploy_testnet.sh
@ -0,0 +1,20 @@
+#! /bin/sh
+
+# Necessary so we can run as uid 1001 (would have to inject nss_wrapper
+# everywhere and don't really feel like doing so) 
+oc adm policy add-scc-to-user anyuid -z gaia-ansible
+
+# Build golang-s2i base image
+oc process -f openshift/golang-s2i.yaml | oc apply -f -
+
+# Build gaiad image
+oc process -f openshift/gaiad.yaml | oc apply -f -
+
+# Deploy
+oc process -f openshift/gaia-ansible.yaml | oc apply -f -
+oc start-build gaia-ansible -w
+
+oc delete pod,service,route -l template=gaia-nodes; \
+    oc delete job gaia-deploy; \
+    oc apply -f openshift/deploy.yaml && \
+    while ! oc logs -f jobs/gaia-deploy; do sleep 1; done
--- a/monitoring/exporter/alerter.yml
+++ b/monitoring/exporter/alerter.yml
@ -0,0 +1,62 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia-alerter
+  template: gaia-alerter
+metadata:
+  name: phabricator
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: gaia-alerter
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/certusone/chain_exporter
+
+objects:
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+          - name: gaia-alerter
+            image: gaia-exporter:latest
+            command:
+              - /opt/app-root/go/bin/alerter
+            env:
+              - name: "DB_HOST"
+                value: "postgres-chain:5432"
+              - name: "DB_USER"
+                value: "postgres"
+              - name: "DB_PW"
+                value: "mypwd"
+              - name: "RAVEN_DSN"
+                value: "<INSERT_RAVEN_DSN>"
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: gaia-exporter:latest
+      type: ImageChange
+    - type: ConfigChange
--- a/monitoring/exporter/deploy.sh
+++ b/monitoring/exporter/deploy.sh
@ -0,0 +1,8 @@
+#! /bin/sh
+cd $(dirname $0)
+
+oc process -f postgres.yml | oc apply -f -
+oc process -f lcd.yml | oc apply -f -
+oc process -f exporter.yml | oc apply -f -
+oc process -f alerter.yml | oc apply -f -
+oc process -f net_exporter.yml | oc apply -f -
--- a/monitoring/exporter/exporter.yml
+++ b/monitoring/exporter/exporter.yml
@ -0,0 +1,102 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia-exporter
+  template: gaia-exporter
+metadata:
+  name: phabricator
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: gaia-exporter
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/certusone/chain_exporter
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: ${NAME}
+  spec:
+    # this allows k8s objects to directly reference this image stream
+    lookupPolicy:
+      local: true
+
+- apiVersion: v1
+  kind: BuildConfig
+  metadata:
+    name: ${NAME}
+  spec:
+    output:
+      to:
+        kind: ImageStreamTag
+        name: ${NAME}:latest
+    postCommit: {}
+    runPolicy: Serial
+    source:
+      git:
+        uri: ${GIT_REPO}
+      type: Git
+    strategy:
+      sourceStrategy:
+        from:
+          kind: ImageStreamTag
+          name: golang-s2i:1.10
+        env:
+          - name: S2I_GOPKG
+            value: github.com/certusone/chain_exporter
+      type: Source
+    triggers:
+    - imageChange: {}
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+          - name: gaia-exporter
+            image: gaia-exporter:latest
+            command:
+              - /opt/app-root/go/bin/chain_exporter
+            env:
+              - name: "GAIA_URL"
+                value: "http://gaia-node1:26657"
+              - name: "DB_HOST"
+                value: "postgres-chain:5432"
+              - name: "DB_USER"
+                value: "postgres"
+              - name: "DB_PW"
+                value: "mypwd"
+              - name: "LCD_URL"
+                value: "https://gaia-lcd:1317"
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: gaia-exporter:latest
+      type: ImageChange
+    - type: ConfigChange
--- a/monitoring/exporter/lcd.yml
+++ b/monitoring/exporter/lcd.yml
@ -0,0 +1,79 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia-lcd
+  template: gaia-lcd
+metadata:
+  name: gaia-lcd
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: gaia-lcd
+
+objects:
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+          - name: gaia-lcd
+            image: gaiad:latest
+            command:
+              - /opt/app-root/go/bin/gaiacli
+              - rest-server 
+              - "--node=tcp://gaia-node1:26657"
+              - "--trust-node"
+              - "--laddr=tcp://:1317"
+            ports:
+            - containerPort: 1317
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: gaiad:latest
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: Service
+  metadata:
+    name: ${NAME}
+  spec:
+    ports:
+    - name: lcd
+      port: 1317
+    selector:
+      name: ${NAME}
+- apiVersion: v1
+  kind: Route
+  metadata:
+    name: gaia-lcd
+  spec:
+    port:
+      targetPort: lcd
+    tls:
+      termination: edge
+    to:
+      kind: Service
+      name: gaia-lcd
+      weight: 100
+    wildcardPolicy: None
--- a/monitoring/exporter/net_exporter.yml
+++ b/monitoring/exporter/net_exporter.yml
@ -0,0 +1,64 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: net-exporter
+  template: net-exporter
+metadata:
+  name: phabricator
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: net-exporter
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/certusone/chain_exporter
+
+objects:
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+          - name: net-exporter
+            image: gaia-exporter:latest
+            command:
+              - /opt/app-root/go/bin/net_exporter
+            env:
+              - name: "GAIA_URLS"
+                value: "http://gaia-node0:26657,http://gaia-node1:26657,http://gaia-node2:26657,http://gaia-node3:26657"
+              - name: "DB_HOST"
+                value: "postgres-chain:5432"
+              - name: "DB_USER"
+                value: "postgres"
+              - name: "DB_PW"
+                value: "mypwd"
+              - name: "PERIOD"
+                value: "10"
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: gaia-exporter:latest
+      type: ImageChange
+    - type: ConfigChange
--- a/monitoring/exporter/postgres.yml
+++ b/monitoring/exporter/postgres.yml
@ -0,0 +1,82 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: postgres-chain
+  template: postgres-chain
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: postgres-chain
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: postgres
+  spec:
+    lookupPolicy:
+      local: true
+    tags:
+    - from:
+        kind: DockerImage
+        name: centos/postgresql-96-centos7:latest
+      name: latest
+      referencePolicy:
+        type: Source
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+        - image: ' '
+          imagePullPolicy: IfNotPresent
+          name: ${NAME}
+          ports:
+          - containerPort: 5432
+          volumeMounts:
+          - mountPath: /var/lib/postgresql/data
+            name: ${NAME}-data
+          env:
+          - name: POSTGRESQL_ADMIN_PASSWORD
+            value: "mypwd"
+        volumes:
+        - name: ${NAME}-data
+          emptyDir: {}
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: postgres:latest
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: Service
+  metadata:
+    name: ${NAME}
+  spec:
+    ports:
+    - name: postgres
+      port: 5432
+    selector:
+      name: ${NAME}
--- a/monitoring/grafana/dashboards-prov/dashboards.yaml
+++ b/monitoring/grafana/dashboards-prov/dashboards.yaml
@ -0,0 +1,11 @@
+apiVersion: 1
+
+providers:
+- name: 'default'
+  orgId: 1
+  folder: ''
+  type: file
+  disableDeletion: false
+  updateIntervalSeconds: 3 #how often Grafana will scan for changed dashboards
+  options:
+    path: /var/lib/grafana/dashboards
--- a/monitoring/grafana/dashboards/blackbox.json
+++ b/monitoring/grafana/dashboards/blackbox.json
@ -0,0 +1,652 @@
+{
+  "__inputs": [
+    {
+      "name": "DS_MAIN",
+      "label": "main",
+      "description": "",
+      "type": "datasource",
+      "pluginId": "prometheus",
+      "pluginName": "Prometheus"
+    }
+  ],
+  "__requires": [
+    {
+      "type": "grafana",
+      "id": "grafana",
+      "name": "Grafana",
+      "version": "5.3.0-pre1"
+    },
+    {
+      "type": "panel",
+      "id": "graph",
+      "name": "Graph",
+      "version": "5.0.0"
+    },
+    {
+      "type": "datasource",
+      "id": "prometheus",
+      "name": "Prometheus",
+      "version": "5.0.0"
+    },
+    {
+      "type": "panel",
+      "id": "singlestat",
+      "name": "Singlestat",
+      "version": "5.0.0"
+    }
+  ],
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": "-- Grafana --",
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "editable": true,
+  "gnetId": 5345,
+  "graphTooltip": 0,
+  "id": null,
+  "iteration": 1537853985914,
+  "links": [],
+  "panels": [
+    {
+      "collapsed": false,
+      "gridPos": {
+        "h": 1,
+        "w": 24,
+        "x": 0,
+        "y": 0
+      },
+      "id": 15,
+      "panels": [],
+      "repeat": "targets",
+      "title": "$targets UP/DOWN Status",
+      "type": "row"
+    },
+    {
+      "cacheTimeout": null,
+      "colorBackground": true,
+      "colorValue": false,
+      "colors": [
+        "#d44a3a",
+        "rgba(237, 129, 40, 0.89)",
+        "#299c46"
+      ],
+      "datasource": "main",
+      "format": "none",
+      "gauge": {
+        "maxValue": 100,
+        "minValue": 0,
+        "show": false,
+        "thresholdLabels": false,
+        "thresholdMarkers": true
+      },
+      "gridPos": {
+        "h": 2,
+        "w": 24,
+        "x": 0,
+        "y": 1
+      },
+      "id": 2,
+      "interval": null,
+      "links": [],
+      "mappingType": 1,
+      "mappingTypes": [
+        {
+          "name": "value to text",
+          "value": 1
+        },
+        {
+          "name": "range to text",
+          "value": 2
+        }
+      ],
+      "maxDataPoints": 100,
+      "minSpan": 3,
+      "nullPointMode": "connected",
+      "nullText": null,
+      "postfix": "",
+      "postfixFontSize": "50%",
+      "prefix": "",
+      "prefixFontSize": "50%",
+      "rangeMaps": [
+        {
+          "from": "null",
+          "text": "N/A",
+          "to": "null"
+        }
+      ],
+      "repeat": null,
+      "repeatDirection": "h",
+      "sparkline": {
+        "fillColor": "rgba(31, 118, 189, 0.18)",
+        "full": false,
+        "lineColor": "rgb(31, 120, 193)",
+        "show": false
+      },
+      "tableColumn": "",
+      "targets": [
+        {
+          "expr": "probe_success{instance=~\"$targets\"}",
+          "format": "time_series",
+          "interval": "$interval",
+          "intervalFactor": 1,
+          "refId": "A"
+        }
+      ],
+      "thresholds": "1,1",
+      "title": "$targets",
+      "type": "singlestat",
+      "valueFontSize": "80%",
+      "valueMaps": [
+        {
+          "op": "=",
+          "text": "N/A",
+          "value": "null"
+        },
+        {
+          "op": "=",
+          "text": "UP",
+          "value": "1"
+        },
+        {
+          "op": "=",
+          "text": "DOWN",
+          "value": "0"
+        }
+      ],
+      "valueName": "current"
+    },
+    {
+      "aliasColors": {},
+      "bars": false,
+      "dashLength": 10,
+      "dashes": false,
+      "datasource": "main",
+      "fill": 1,
+      "gridPos": {
+        "h": 6,
+        "w": 12,
+        "x": 0,
+        "y": 3
+      },
+      "id": 17,
+      "legend": {
+        "avg": false,
+        "current": false,
+        "max": false,
+        "min": false,
+        "show": true,
+        "total": false,
+        "values": false
+      },
+      "lines": true,
+      "linewidth": 1,
+      "links": [],
+      "nullPointMode": "null",
+      "percentage": false,
+      "pointradius": 5,
+      "points": false,
+      "renderer": "flot",
+      "seriesOverrides": [],
+      "spaceLength": 10,
+      "stack": false,
+      "steppedLine": false,
+      "targets": [
+        {
+          "expr": "probe_duration_seconds{instance=~\"$targets\"}",
+          "format": "time_series",
+          "interval": "$interval",
+          "intervalFactor": 1,
+          "legendFormat": "seconds",
+          "refId": "A"
+        }
+      ],
+      "thresholds": [],
+      "timeFrom": null,
+      "timeShift": null,
+      "title": "Probe Duration",
+      "tooltip": {
+        "shared": true,
+        "sort": 0,
+        "value_type": "individual"
+      },
+      "type": "graph",
+      "xaxis": {
+        "buckets": null,
+        "mode": "time",
+        "name": null,
+        "show": true,
+        "values": []
+      },
+      "yaxes": [
+        {
+          "format": "s",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        },
+        {
+          "format": "short",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        }
+      ],
+      "yaxis": {
+        "align": false,
+        "alignLevel": null
+      }
+    },
+    {
+      "aliasColors": {},
+      "bars": false,
+      "dashLength": 10,
+      "dashes": false,
+      "datasource": "main",
+      "fill": 1,
+      "gridPos": {
+        "h": 6,
+        "w": 12,
+        "x": 12,
+        "y": 3
+      },
+      "id": 21,
+      "legend": {
+        "avg": false,
+        "current": false,
+        "max": false,
+        "min": false,
+        "show": true,
+        "total": false,
+        "values": false
+      },
+      "lines": true,
+      "linewidth": 1,
+      "links": [],
+      "nullPointMode": "null",
+      "percentage": false,
+      "pointradius": 5,
+      "points": false,
+      "renderer": "flot",
+      "seriesOverrides": [],
+      "spaceLength": 10,
+      "stack": false,
+      "steppedLine": false,
+      "targets": [
+        {
+          "expr": "probe_dns_lookup_time_seconds{instance=~\"$targets\"}",
+          "format": "time_series",
+          "interval": "$interval",
+          "intervalFactor": 1,
+          "legendFormat": "seconds",
+          "refId": "A"
+        }
+      ],
+      "thresholds": [],
+      "timeFrom": null,
+      "timeShift": null,
+      "title": "DNS Lookup",
+      "tooltip": {
+        "shared": true,
+        "sort": 0,
+        "value_type": "individual"
+      },
+      "type": "graph",
+      "xaxis": {
+        "buckets": null,
+        "mode": "time",
+        "name": null,
+        "show": true,
+        "values": []
+      },
+      "yaxes": [
+        {
+          "format": "s",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        },
+        {
+          "format": "short",
+          "label": null,
+          "logBase": 1,
+          "max": null,
+          "min": null,
+          "show": true
+        }
+      ],
+      "yaxis": {
+        "align": false,
+        "alignLevel": null
+      }
+    },
+    {
+      "cacheTimeout": null,
+      "colorBackground": false,
+      "colorValue": false,
+      "colors": [
+        "#299c46",
+        "rgba(237, 129, 40, 0.89)",
+        "#d44a3a"
+      ],
+      "datasource": "main",
+      "format": "s",
+      "gauge": {
+        "maxValue": 100,
+        "minValue": 0,
+        "show": false,
+        "thresholdLabels": false,
+        "thresholdMarkers": true
+      },
+      "gridPos": {
+        "h": 2,
+        "w": 12,
+        "x": 0,
+        "y": 9
+      },
+      "id": 23,
+      "interval": null,
+      "links": [],
+      "mappingType": 1,
+      "mappingTypes": [
+        {
+          "name": "value to text",
+          "value": 1
+        },
+        {
+          "name": "range to text",
+          "value": 2
+        }
+      ],
+      "maxDataPoints": 100,
+      "nullPointMode": "connected",
+      "nullText": null,
+      "postfix": "",
+      "postfixFontSize": "50%",
+      "prefix": "",
+      "prefixFontSize": "50%",
+      "rangeMaps": [
+        {
+          "from": "null",
+          "text": "N/A",
+          "to": "null"
+        }
+      ],
+      "sparkline": {
+        "fillColor": "rgba(31, 118, 189, 0.18)",
+        "full": false,
+        "lineColor": "rgb(31, 120, 193)",
+        "show": false
+      },
+      "tableColumn": "",
+      "targets": [
+        {
+          "expr": "avg(probe_duration_seconds{instance=~\"$targets\"})",
+          "format": "time_series",
+          "interval": "$interval",
+          "intervalFactor": 1,
+          "refId": "A"
+        }
+      ],
+      "thresholds": "",
+      "title": "Average Probe Duration",
+      "type": "singlestat",
+      "valueFontSize": "50%",
+      "valueMaps": [
+        {
+          "op": "=",
+          "text": "N/A",
+          "value": "null"
+        }
+      ],
+      "valueName": "current"
+    },
+    {
+      "cacheTimeout": null,
+      "colorBackground": false,
+      "colorValue": false,
+      "colors": [
+        "#299c46",
+        "rgba(237, 129, 40, 0.89)",
+        "#d44a3a"
+      ],
+      "datasource": "main",
+      "format": "s",
+      "gauge": {
+        "maxValue": 100,
+        "minValue": 0,
+        "show": false,
+        "thresholdLabels": false,
+        "thresholdMarkers": true
+      },
+      "gridPos": {
+        "h": 2,
+        "w": 12,
+        "x": 12,
+        "y": 9
+      },
+      "id": 24,
+      "interval": null,
+      "links": [],
+      "mappingType": 1,
+      "mappingTypes": [
+        {
+          "name": "value to text",
+          "value": 1
+        },
+        {
+          "name": "range to text",
+          "value": 2
+        }
+      ],
+      "maxDataPoints": 100,
+      "nullPointMode": "connected",
+      "nullText": null,
+      "postfix": "",
+      "postfixFontSize": "50%",
+      "prefix": "",
+      "prefixFontSize": "50%",
+      "rangeMaps": [
+        {
+          "from": "null",
+          "text": "N/A",
+          "to": "null"
+        }
+      ],
+      "sparkline": {
+        "fillColor": "rgba(31, 118, 189, 0.18)",
+        "full": false,
+        "lineColor": "rgb(31, 120, 193)",
+        "show": false
+      },
+      "tableColumn": "",
+      "targets": [
+        {
+          "expr": "avg(probe_dns_lookup_time_seconds{instance=~\"$targets\"})",
+          "format": "time_series",
+          "interval": "$interval",
+          "intervalFactor": 1,
+          "refId": "A"
+        }
+      ],
+      "thresholds": "",
+      "title": "Average DNS Lookup",
+      "type": "singlestat",
+      "valueFontSize": "50%",
+      "valueMaps": [
+        {
+          "op": "=",
+          "text": "N/A",
+          "value": "null"
+        }
+      ],
+      "valueName": "current"
+    }
+  ],
+  "refresh": "1m",
+  "schemaVersion": 16,
+  "style": "dark",
+  "tags": [
+    "blackbox",
+    "prometheus"
+  ],
+  "templating": {
+    "list": [
+      {
+        "auto": true,
+        "auto_count": 10,
+        "auto_min": "10s",
+        "current": {
+          "text": "auto",
+          "value": "$__auto_interval_interval"
+        },
+        "hide": 0,
+        "label": "Interval",
+        "name": "interval",
+        "options": [
+          {
+            "selected": true,
+            "text": "auto",
+            "value": "$__auto_interval_interval"
+          },
+          {
+            "selected": false,
+            "text": "5s",
+            "value": "5s"
+          },
+          {
+            "selected": false,
+            "text": "10s",
+            "value": "10s"
+          },
+          {
+            "selected": false,
+            "text": "30s",
+            "value": "30s"
+          },
+          {
+            "selected": false,
+            "text": "1m",
+            "value": "1m"
+          },
+          {
+            "selected": false,
+            "text": "10m",
+            "value": "10m"
+          },
+          {
+            "selected": false,
+            "text": "30m",
+            "value": "30m"
+          },
+          {
+            "selected": false,
+            "text": "1h",
+            "value": "1h"
+          },
+          {
+            "selected": false,
+            "text": "6h",
+            "value": "6h"
+          },
+          {
+            "selected": false,
+            "text": "12h",
+            "value": "12h"
+          },
+          {
+            "selected": false,
+            "text": "1d",
+            "value": "1d"
+          },
+          {
+            "selected": false,
+            "text": "7d",
+            "value": "7d"
+          },
+          {
+            "selected": false,
+            "text": "14d",
+            "value": "14d"
+          },
+          {
+            "selected": false,
+            "text": "30d",
+            "value": "30d"
+          }
+        ],
+        "query": "5s,10s,30s,1m,10m,30m,1h,6h,12h,1d,7d,14d,30d",
+        "refresh": 2,
+        "skipUrlSync": false,
+        "type": "interval"
+      },
+      {
+        "allValue": null,
+        "current": {},
+        "datasource": "main",
+        "hide": 0,
+        "includeAll": true,
+        "label": null,
+        "multi": true,
+        "name": "targets",
+        "options": [],
+        "query": "label_values(probe_success, instance)",
+        "refresh": 1,
+        "regex": "",
+        "skipUrlSync": false,
+        "sort": 0,
+        "tagValuesQuery": "",
+        "tags": [],
+        "tagsQuery": "",
+        "type": "query",
+        "useTags": false
+      }
+    ]
+  },
+  "time": {
+    "from": "now-1h",
+    "to": "now"
+  },
+  "timepicker": {
+    "refresh_intervals": [
+      "5s",
+      "10s",
+      "30s",
+      "1m",
+      "5m",
+      "15m",
+      "30m",
+      "1h",
+      "2h",
+      "1d"
+    ],
+    "time_options": [
+      "5m",
+      "15m",
+      "1h",
+      "6h",
+      "12h",
+      "24h",
+      "2d",
+      "7d",
+      "30d"
+    ]
+  },
+  "timezone": "",
+  "title": "Blackbox Exporter Overview",
+  "uid": "xtkCtBkiz",
+  "version": 2
+}
--- a/monitoring/grafana/dashboards/gaia.json
+++ b/monitoring/grafana/dashboards/gaia.json
--- a/monitoring/grafana/dashboards/goproc.json
+++ b/monitoring/grafana/dashboards/goproc.json
--- a/monitoring/grafana/dashboards/node.json
+++ b/monitoring/grafana/dashboards/node.json
--- a/monitoring/grafana/dashboards/p2p.json
+++ b/monitoring/grafana/dashboards/p2p.json
@ -0,0 +1,489 @@
+{
+    "annotations": {
+      "list": [
+        {
+          "builtIn": 1,
+          "datasource": "-- Grafana --",
+          "enable": true,
+          "hide": true,
+          "iconColor": "rgba(0, 211, 255, 1)",
+          "name": "Annotations & Alerts",
+          "type": "dashboard"
+        }
+      ]
+    },
+    "editable": true,
+    "gnetId": null,
+    "graphTooltip": 0,
+    "id": 5,
+    "iteration": 1537897365846,
+    "links": [],
+    "panels": [
+      {
+        "aliasColors": {},
+        "bars": false,
+        "dashLength": 10,
+        "dashes": false,
+        "datasource": "postgres",
+        "fill": 1,
+        "gridPos": {
+          "h": 9,
+          "w": 12,
+          "x": 0,
+          "y": 0
+        },
+        "id": 2,
+        "legend": {
+          "avg": false,
+          "current": false,
+          "max": false,
+          "min": false,
+          "show": true,
+          "total": false,
+          "values": false
+        },
+        "lines": true,
+        "linewidth": 1,
+        "links": [],
+        "nullPointMode": "null",
+        "percentage": false,
+        "pointradius": 5,
+        "points": false,
+        "renderer": "flot",
+        "seriesOverrides": [],
+        "spaceLength": 10,
+        "stack": true,
+        "steppedLine": false,
+        "targets": [
+          {
+            "expr": "SELECT timestamp as time,COUNT(node) FROM peer_infos WHERE is_outbound = TRUE GROUP BY timestamp ORDER BY timestamp DESC",
+            "format": "time_series",
+            "group": [
+              {
+                "params": [
+                  "$__interval",
+                  "none"
+                ],
+                "type": "time"
+              }
+            ],
+            "intervalFactor": 1,
+            "metricColumn": "none",
+            "rawQuery": true,
+            "rawSql": "SELECT\n  $__timeGroupAlias(\"timestamp\",$__interval),\n  count(id) AS \"outbound\"\nFROM peer_infos\nWHERE\n  $__timeFilter(\"timestamp\")\n  and is_outbound = TRUE\n  and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
+            "refId": "A",
+            "select": [
+              [
+                {
+                  "params": [
+                    "id"
+                  ],
+                  "type": "column"
+                },
+                {
+                  "params": [
+                    "count"
+                  ],
+                  "type": "aggregate"
+                },
+                {
+                  "params": [
+                    "inbound"
+                  ],
+                  "type": "alias"
+                }
+              ]
+            ],
+            "table": "peer_infos",
+            "timeColumn": "\"timestamp\"",
+            "timeColumnType": "timestamptz",
+            "where": [
+              {
+                "name": "$__timeFilter",
+                "params": [],
+                "type": "macro"
+              },
+              {
+                "datatype": "bool",
+                "name": "",
+                "params": [
+                  "is_outbound",
+                  "=",
+                  "FALSE"
+                ],
+                "type": "expression"
+              },
+              {
+                "datatype": "text",
+                "name": "",
+                "params": [
+                  "node",
+                  "=",
+                  "'$node'"
+                ],
+                "type": "expression"
+              }
+            ]
+          },
+          {
+            "format": "time_series",
+            "group": [],
+            "metricColumn": "none",
+            "rawQuery": true,
+            "rawSql": "SELECT\n  $__timeGroupAlias(\"timestamp\",$__interval),\n  count(id) AS \"inbound\"\nFROM peer_infos\nWHERE\n  $__timeFilter(\"timestamp\")\n  and is_outbound IS NULL\n  and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
+            "refId": "B",
+            "select": [
+              [
+                {
+                  "params": [
+                    "value"
+                  ],
+                  "type": "column"
+                }
+              ]
+            ],
+            "timeColumn": "time",
+            "where": [
+              {
+                "name": "$__timeFilter",
+                "params": [],
+                "type": "macro"
+              }
+            ]
+          }
+        ],
+        "thresholds": [],
+        "timeFrom": null,
+        "timeShift": null,
+        "title": "Peer connections",
+        "tooltip": {
+          "shared": true,
+          "sort": 0,
+          "value_type": "individual"
+        },
+        "type": "graph",
+        "xaxis": {
+          "buckets": null,
+          "mode": "time",
+          "name": null,
+          "show": true,
+          "values": []
+        },
+        "yaxes": [
+          {
+            "decimals": 0,
+            "format": "short",
+            "label": null,
+            "logBase": 1,
+            "max": null,
+            "min": null,
+            "show": true
+          },
+          {
+            "format": "short",
+            "label": null,
+            "logBase": 1,
+            "max": null,
+            "min": null,
+            "show": true
+          }
+        ],
+        "yaxis": {
+          "align": false,
+          "alignLevel": null
+        }
+      },
+      {
+        "columns": [],
+        "datasource": "postgres",
+        "fontSize": "100%",
+        "gridPos": {
+          "h": 9,
+          "w": 12,
+          "x": 12,
+          "y": 0
+        },
+        "id": 4,
+        "links": [],
+        "pageSize": null,
+        "scroll": true,
+        "showHeader": true,
+        "sort": {
+          "col": 0,
+          "desc": true
+        },
+        "styles": [
+          {
+            "alias": "Time",
+            "dateFormat": "YYYY-MM-DD HH:mm:ss",
+            "pattern": "Time",
+            "type": "date"
+          },
+          {
+            "alias": "Outbound",
+            "colorMode": null,
+            "colors": [
+              "rgba(245, 54, 54, 0.9)",
+              "rgba(237, 129, 40, 0.89)",
+              "rgba(50, 172, 45, 0.97)"
+            ],
+            "dateFormat": "YYYY-MM-DD HH:mm:ss",
+            "decimals": 2,
+            "mappingType": 1,
+            "pattern": "is_outbound",
+            "thresholds": [],
+            "type": "string",
+            "unit": "short",
+            "valueMaps": [
+              {
+                "text": "Yes",
+                "value": "1"
+              },
+              {
+                "text": "No",
+                "value": ""
+              }
+            ]
+          },
+          {
+            "alias": "",
+            "colorMode": null,
+            "colors": [
+              "rgba(245, 54, 54, 0.9)",
+              "rgba(237, 129, 40, 0.89)",
+              "rgba(50, 172, 45, 0.97)"
+            ],
+            "decimals": 2,
+            "pattern": "/.*/",
+            "thresholds": [],
+            "type": "number",
+            "unit": "short"
+          }
+        ],
+        "targets": [
+          {
+            "expr": "",
+            "format": "table",
+            "group": [],
+            "intervalFactor": 1,
+            "metricColumn": "none",
+            "rawQuery": true,
+            "rawSql": "SELECT\n  moniker,listen_addr as \"Address\",is_outbound,version\nFROM peer_infos\nWHERE\n  timestamp = (select max(timestamp) from peer_infos as f where f.node = '$node')\n  and node = '$node'\nGROUP BY is_outbound,listen_addr,version,moniker",
+            "refId": "A",
+            "select": [
+              [
+                {
+                  "params": [
+                    "value"
+                  ],
+                  "type": "column"
+                }
+              ]
+            ],
+            "timeColumn": "time",
+            "where": [
+              {
+                "name": "$__timeFilter",
+                "params": [],
+                "type": "macro"
+              }
+            ]
+          }
+        ],
+        "title": "Current Peers",
+        "transform": "table",
+        "type": "table"
+      },
+      {
+        "aliasColors": {},
+        "bars": false,
+        "dashLength": 10,
+        "dashes": false,
+        "datasource": "postgres",
+        "fill": 1,
+        "gridPos": {
+          "h": 10,
+          "w": 24,
+          "x": 0,
+          "y": 9
+        },
+        "id": 6,
+        "legend": {
+          "avg": false,
+          "current": false,
+          "max": false,
+          "min": false,
+          "show": true,
+          "total": false,
+          "values": false
+        },
+        "lines": true,
+        "linewidth": 1,
+        "links": [],
+        "nullPointMode": "null",
+        "percentage": false,
+        "pointradius": 5,
+        "points": false,
+        "renderer": "flot",
+        "seriesOverrides": [],
+        "spaceLength": 10,
+        "stack": false,
+        "steppedLine": false,
+        "targets": [
+          {
+            "expr": "SELECT\n  $__timeGroupAlias(\"timestamp\",$__interval),\n  count(id) AS \"outbound\"\nFROM peer_infos\nWHERE\n  $__timeFilter(\"timestamp\")\n  and is_outbound = TRUE\n  and node = '$node'\nGROUP BY timestamp\nORDER BY 1",
+            "format": "time_series",
+            "group": [],
+            "intervalFactor": 1,
+            "metricColumn": "none",
+            "rawQuery": true,
+            "rawSql": "SELECT\n  $__timeGroupAlias(\"timestamp\",$__interval),\n  listen_addr,\n  (recv_data->>'CurRate')::int as recv,\n  (send_data->>'CurRate')::int as send\nFROM peer_infos\nWHERE\n  $__timeFilter(\"timestamp\")\n  and node = '$node'\nGROUP BY timestamp,recv_data,send_data,listen_addr\nORDER BY 1",
+            "refId": "A",
+            "select": [
+              [
+                {
+                  "params": [
+                    "value"
+                  ],
+                  "type": "column"
+                }
+              ]
+            ],
+            "timeColumn": "time",
+            "where": [
+              {
+                "name": "$__timeFilter",
+                "params": [],
+                "type": "macro"
+              }
+            ]
+          }
+        ],
+        "thresholds": [],
+        "timeFrom": null,
+        "timeShift": null,
+        "title": "Node Traffic",
+        "tooltip": {
+          "shared": true,
+          "sort": 0,
+          "value_type": "individual"
+        },
+        "type": "graph",
+        "xaxis": {
+          "buckets": null,
+          "mode": "time",
+          "name": null,
+          "show": true,
+          "values": []
+        },
+        "yaxes": [
+          {
+            "format": "short",
+            "label": null,
+            "logBase": 1,
+            "max": null,
+            "min": null,
+            "show": true
+          },
+          {
+            "format": "short",
+            "label": null,
+            "logBase": 1,
+            "max": null,
+            "min": null,
+            "show": true
+          }
+        ],
+        "yaxis": {
+          "align": false,
+          "alignLevel": null
+        }
+      }
+    ],
+    "schemaVersion": 16,
+    "style": "dark",
+    "tags": [],
+    "templating": {
+      "list": [
+        {
+          "allValue": null,
+          "current": {
+            "tags": [],
+            "text": "gaia-node0:26657",
+            "value": "gaia-node0:26657"
+          },
+          "datasource": "postgres",
+          "hide": 0,
+          "includeAll": false,
+          "label": null,
+          "multi": false,
+          "name": "node",
+          "options": [
+            {
+              "selected": false,
+              "text": "gaia-node2:26657",
+              "value": "gaia-node2:26657"
+            },
+            {
+              "selected": false,
+              "text": "gaia-node1:26657",
+              "value": "gaia-node1:26657"
+            },
+            {
+              "selected": false,
+              "text": "gaia-node3:26657",
+              "value": "gaia-node3:26657"
+            },
+            {
+              "selected": true,
+              "text": "gaia-node0:26657",
+              "value": "gaia-node0:26657"
+            }
+          ],
+          "query": "SELECT DISTINCT node FROM peer_infos",
+          "refresh": 0,
+          "regex": "",
+          "skipUrlSync": false,
+          "sort": 0,
+          "tagValuesQuery": "",
+          "tags": [],
+          "tagsQuery": "",
+          "type": "query",
+          "useTags": false
+        }
+      ]
+    },
+    "time": {
+      "from": "now-5m",
+      "to": "now"
+    },
+    "timepicker": {
+      "refresh_intervals": [
+        "5s",
+        "10s",
+        "30s",
+        "1m",
+        "5m",
+        "15m",
+        "30m",
+        "1h",
+        "2h",
+        "1d"
+      ],
+      "time_options": [
+        "5m",
+        "15m",
+        "1h",
+        "6h",
+        "12h",
+        "24h",
+        "2d",
+        "7d",
+        "30d"
+      ]
+    },
+    "timezone": "",
+    "title": "P2P",
+    "uid": "t4voLF0mk",
+    "version": 4
+  }
--- a/monitoring/grafana/datastores/ds_postgres.yaml
+++ b/monitoring/grafana/datastores/ds_postgres.yaml
@ -0,0 +1,18 @@
+# config file version
+apiVersion: 1
+# list of datasources to insert/update depending
+# what's available in the database
+datasources:
+  # <string, required> name of the datasource. Required
+- name: main
+  # <string, required> datasource type. Required
+  type: prometheus
+  # <string, required> access mode. proxy or direct (Server or Browser in the UI). Required
+  access: proxy
+  # <int> org id. will default to orgId 1 if not specified
+  orgId: 1
+  # <string> url
+  url: http://prom:9090
+  isDefault: true
+  # <bool> allow users to edit datasources from the UI.
+  editable: true
--- a/monitoring/grafana/datastores/ds_prometheus.yaml
+++ b/monitoring/grafana/datastores/ds_prometheus.yaml
@ -0,0 +1,25 @@
+# config file version
+apiVersion: 1
+# list of datasources to insert/update depending
+# what's available in the database
+datasources:
+  # <string, required> name of the datasource. Required
+- name: postgres
+  # <string, required> datasource type. Required
+  type: postgres
+  # <string, required> access mode. proxy or direct (Server or Browser in the UI). Required
+  access: proxy
+  # <int> org id. will default to orgId 1 if not specified
+  orgId: 1
+  # <string> url
+  url: postgres-chain:5432
+  jsonData:
+     sslmode: "disable"
+  secureJsonData:
+    user: "postgres"
+    password: "mypwd"
+  user: postgres
+  database: postgres
+  version: 1
+  # <bool> allow users to edit datasources from the UI.
+  editable: true
--- a/monitoring/grafana/deploy.sh
+++ b/monitoring/grafana/deploy.sh
@ -0,0 +1,11 @@
+#! /bin/sh
+cd $(dirname $0)
+
+oc delete configmap grafana-datasources
+oc delete configmap grafana-dashboards
+oc delete configmap grafana-dashboards-prov
+
+oc create configmap grafana-datasources --from-file=datastores/
+oc create configmap grafana-dashboards --from-file=dashboards/
+oc create configmap grafana-dashboards-prov --from-file=dashboards-prov/
+oc process -f grafana.yml | oc apply -f -
--- a/monitoring/grafana/grafana.yml
+++ b/monitoring/grafana/grafana.yml
@ -0,0 +1,545 @@
+---
+kind: Template
+apiVersion: v1
+metadata:
+  name: grafana
+  annotations:
+    "openshift.io/display-name": Grafana
+    description: |
+      Grafana server with patched Prometheus datasource.
+    iconClass: fa fa-cogs
+    tags: "metrics,monitoring,grafana,prometheus"
+parameters:
+- description: The location of the grafana image
+  name: IMAGE_GRAFANA
+  value: docker.io/grafana/grafana:master 
+- description: The location of the proxy image
+  name: IMAGE_PROXY
+  value: openshift/oauth-proxy:v1.0.0
+- description: External URL for the grafana route
+  name: ROUTE_URL
+  value: ""
+- description: The session secret for the proxy
+  name: SESSION_SECRET
+  generate: expression
+  from: "[a-zA-Z0-9]{43}"
+objects:
+- apiVersion: v1
+  kind: ServiceAccount
+  metadata:
+    name: grafana
+    annotations:
+      serviceaccounts.openshift.io/oauth-redirectreference.primary: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"grafana"}}'
+- apiVersion: authorization.openshift.io/v1
+  kind: ClusterRoleBinding
+  metadata:
+    name: grafana-cluster-reader
+  roleRef:
+    name: cluster-reader
+  subjects:
+  - kind: ServiceAccount
+    name: grafana
+- apiVersion: route.openshift.io/v1
+  kind: Route
+  metadata:
+    name: grafana
+  spec:
+    host: "${ROUTE_URL}"
+    to:
+      name: grafana
+    tls:
+      termination: Reencrypt
+- apiVersion: v1
+  kind: Service
+  metadata:
+    name: grafana
+    annotations:
+      prometheus.io/scrape: "true"
+      prometheus.io/scheme: https
+      service.alpha.openshift.io/serving-cert-secret-name: grafana-tls
+    labels:
+      metrics-infra: grafana
+      name: grafana
+  spec:
+    ports:
+    - name: grafana
+      port: 443
+      protocol: TCP
+      targetPort: 8443
+    selector:
+      app: grafana
+- apiVersion: v1
+  kind: Secret
+  metadata:
+    name: grafana-proxy
+  stringData:
+    session_secret: "${SESSION_SECRET}="
+# Deploy Grafana behind an oauth proxy
+- apiVersion: extensions/v1beta1
+  kind: Deployment
+  metadata:
+    labels:
+      app: grafana
+    name: grafana
+  spec:
+    replicas: 1
+    selector:
+      matchLabels:
+        app: grafana
+    template:
+      metadata:
+        labels:
+          app: grafana
+        name: grafana
+      spec:
+        serviceAccountName: grafana
+        containers:
+        - name: oauth-proxy
+          image: ${IMAGE_PROXY}
+          imagePullPolicy: IfNotPresent
+          ports:
+          - containerPort: 8443
+            name: web
+          env:
+          - name: NAMESPACE
+            valueFrom:
+              fieldRef:
+                fieldPath: metadata.namespace
+          args:
+          - -https-address=:8443
+          - -http-address=
+          - -email-domain=*
+          - -client-id=system:serviceaccount:$(NAMESPACE):grafana
+          - -upstream=http://localhost:3000
+          - -provider=openshift
+#          - '-openshift-delegate-urls={"/api/datasources": {"resource": "namespace", "verb": "get", "resourceName": "grafana", "namespace": "${NAMESPACE}"}}'
+          - '-openshift-sar={"namespace": "$(NAMESPACE)", "verb": "list", "resource": "services"}'
+          - -tls-cert=/etc/tls/private/tls.crt
+          - -tls-key=/etc/tls/private/tls.key
+          - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
+          - -cookie-secret-file=/etc/proxy/secrets/session_secret
+          - -skip-auth-regex=^/metrics,/api/datasources,/api/dashboards
+          volumeMounts:
+          - mountPath: /etc/tls/private
+            name: grafana-tls
+          - mountPath: /etc/proxy/secrets
+            name: secrets
+
+        - name: grafana
+          image: ${IMAGE_GRAFANA}
+          ports:
+          - name: grafana-http
+            containerPort: 3000
+          volumeMounts:
+          - mountPath: "/root/go/src/github.com/grafana/grafana/data"
+            name: grafana-data
+          - mountPath: "/usr/share/grafana/conf"
+            name: grafanaconfig
+          - mountPath: "/usr/share/grafana/datasources"
+            name: grafanadatasources
+          - mountPath: "/usr/share/grafana/dashboards"
+            name: grafanadashboards-prov
+          - mountPath: "/var/lib/grafana/dashboards"
+            name: grafanadashboards
+          - mountPath: /etc/tls/private
+            name: grafana-tls
+          - mountPath: /etc/proxy/secrets
+            name: secrets
+
+        volumes:
+        - name: grafanaconfig
+          configMap:
+            name: grafana-config
+        - name: grafanadatasources
+          configMap:
+            name: grafana-datasources
+        - name: grafanadashboards
+          configMap:
+            name: grafana-dashboards
+        - name: grafanadashboards-prov
+          configMap:
+            name: grafana-dashboards-prov
+        - name: secrets
+          secret:
+            secretName: grafana-proxy
+        - name: grafana-tls
+          secret:
+            secretName: grafana-tls
+        - emptyDir: {}
+          name: grafana-data
+- apiVersion: v1
+  kind: ConfigMap
+  metadata:
+    name: grafana-config
+  data:
+
+    defaults.ini: |-
+      ##################### Grafana Configuration Defaults #####################
+      #
+      # Do not modify this file in grafana installs
+      #
+      # possible values : production, development
+      app_mode = production
+      # instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
+      instance_name = ${HOSTNAME}
+      #################################### Paths ###############################
+      [paths]
+      # Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
+      #
+      data = data
+      #
+      # Directory where grafana can store logs
+      #
+      logs = data/log
+      #
+      # Directory where grafana will automatically scan and look for plugins
+      #
+      plugins = data/plugins
+      #################################### Server ##############################
+      [server]
+      # Protocol (http, https, socket)
+      protocol = http
+      # The ip address to bind to, empty will bind to all interfaces
+      http_addr =
+      # The http port  to use
+      http_port = 3000
+      # The public facing domain name used to access grafana from a browser
+      domain = localhost
+      # Redirect to correct domain if host header does not match domain
+      # Prevents DNS rebinding attacks
+      enforce_domain = false
+      # The full public facing url
+      root_url = %(protocol)s://%(domain)s:%(http_port)s/
+      # Log web requests
+      router_logging = false
+      # the path relative working path
+      static_root_path = public
+      # enable gzip
+      enable_gzip = false
+      # https certs & key file
+      cert_file = /etc/tls/private/tls.crt
+      cert_key = /etc/tls/private/tls.key
+      # Unix socket path
+      socket = /tmp/grafana.sock
+      #################################### Database ############################
+      [database]
+      # You can configure the database connection by specifying type, host, name, user and password
+      # as separate properties or as on string using the url property.
+      # Either "mysql", "postgres" or "sqlite3", it's your choice
+      type = sqlite3
+      host = 127.0.0.1:3306
+      name = grafana
+      user = root
+      # If the password contains # or ; you have to wrap it with triple quotes. Ex """#password;"""
+      password =
+      # Use either URL or the previous fields to configure the database
+      # Example: mysql://user:secret@host:port/database
+      url =
+      # Max idle conn setting default is 2
+      max_idle_conn = 2
+      # Max conn setting default is 0 (mean not set)
+      max_open_conn =
+      # For "postgres", use either "disable", "require" or "verify-full"
+      # For "mysql", use either "true", "false", or "skip-verify".
+      ssl_mode = disable
+      ca_cert_path =
+      client_key_path =
+      client_cert_path =
+      server_cert_name =
+      # For "sqlite3" only, path relative to data_path setting
+      path = grafana.db
+      #################################### Session #############################
+      [session]
+      # Either "memory", "file", "redis", "mysql", "postgres", "memcache", default is "file"
+      provider = file
+      # Provider config options
+      # memory: not have any config yet
+      # file: session dir path, is relative to grafana data_path
+      # redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
+      # postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
+      # mysql: go-sql-driver/mysql dsn config string, examples:
+      #         `user:password@tcp(127.0.0.1:3306)/database_name`
+      #         `user:password@unix(/var/run/mysqld/mysqld.sock)/database_name`
+      # memcache: 127.0.0.1:11211
+      provider_config = sessions
+      # Session cookie name
+      cookie_name = grafana_sess
+      # If you use session in https only, default is false
+      cookie_secure = false
+      # Session life time, default is 86400
+      session_life_time = 86400
+      gc_interval_time = 86400
+      #################################### Data proxy ###########################
+      [dataproxy]
+      # This enables data proxy logging, default is false
+      logging = false
+      #################################### Analytics ###########################
+      [analytics]
+      # Server reporting, sends usage counters to stats.grafana.org every 24 hours.
+      # No ip addresses are being tracked, only simple counters to track
+      # running instances, dashboard and error counts. It is very helpful to us.
+      # Change this option to false to disable reporting.
+      reporting_enabled = true
+      # Set to false to disable all checks to https://grafana.com
+      # for new versions (grafana itself and plugins), check is used
+      # in some UI views to notify that grafana or plugin update exists
+      # This option does not cause any auto updates, nor send any information
+      # only a GET request to https://grafana.com to get latest versions
+      check_for_updates = true
+      # Google Analytics universal tracking code, only enabled if you specify an id here
+      google_analytics_ua_id =
+      # Google Tag Manager ID, only enabled if you specify an id here
+      google_tag_manager_id =
+      #################################### Security ############################
+      [security]
+      # default admin user, created on startup
+      admin_user = admin
+      # default admin password, can be changed before first start of grafana,  or in profile settings
+      admin_password = admin
+      # used for signing
+      secret_key = SW2YcwTIb9zpOOhoPsMm
+      # Auto-login remember days
+      login_remember_days = 7
+      cookie_username = grafana_user
+      cookie_remember_name = grafana_remember
+      # disable gravatar profile images
+      disable_gravatar = false
+      # data source proxy whitelist (ip_or_domain:port separated by spaces)
+      data_source_proxy_whitelist =
+      [snapshots]
+      # snapshot sharing options
+      external_enabled = false
+      external_snapshot_url = https://snapshots-origin.raintank.io
+      external_snapshot_name = Publish to snapshot.raintank.io
+      # remove expired snapshot
+      snapshot_remove_expired = true
+      # remove snapshots after 90 days
+      snapshot_TTL_days = 90
+      #################################### Users ####################################
+      [users]
+      # disable user signup / registration
+      allow_sign_up = true
+      # Allow non admin users to create organizations
+      allow_org_create = true
+      # Set to true to automatically assign new users to the default organization (id 1)
+      auto_assign_org = true
+      # Default role new users will be automatically assigned (if auto_assign_org above is set to true)
+      auto_assign_org_role = Admin
+      # Require email validation before sign up completes
+      verify_email_enabled = false
+      # Background text for the user field on the login page
+      login_hint = email or username
+      # Default UI theme ("dark" or "light")
+      default_theme = dark
+      # External user management
+      external_manage_link_url =
+      external_manage_link_name =
+      external_manage_info =
+      [auth]
+      # Set to true to disable (hide) the login form, useful if you use OAuth
+      disable_login_form = true
+      # Set to true to disable the signout link in the side menu. useful if you use auth.proxy
+      disable_signout_menu = true
+      #################################### Anonymous Auth ######################
+      [auth.anonymous]
+      # enable anonymous access
+      enabled = true
+      # specify organization name that should be used for unauthenticated users
+      org_name = Main Org.
+      # specify role for unauthenticated users
+      org_role = Admin
+      #################################### Github Auth #########################
+      [auth.github]
+      enabled = false
+      allow_sign_up = true
+      client_id = some_id
+      client_secret = some_secret
+      scopes = user:email
+      auth_url = https://github.com/login/oauth/authorize
+      token_url = https://github.com/login/oauth/access_token
+      api_url = https://api.github.com/user
+      team_ids =
+      allowed_organizations =
+      #################################### Google Auth #########################
+      [auth.google]
+      enabled = false
+      allow_sign_up = true
+      client_id = some_client_id
+      client_secret = some_client_secret
+      scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
+      auth_url = https://accounts.google.com/o/oauth2/auth
+      token_url = https://accounts.google.com/o/oauth2/token
+      api_url = https://www.googleapis.com/oauth2/v1/userinfo
+      allowed_domains =
+      hosted_domain =
+      #################################### Grafana.com Auth ####################
+      # legacy key names (so they work in env variables)
+      [auth.grafananet]
+      enabled = false
+      allow_sign_up = true
+      client_id = some_id
+      client_secret = some_secret
+      scopes = user:email
+      allowed_organizations =
+      [auth.grafana_com]
+      enabled = false
+      allow_sign_up = true
+      client_id = some_id
+      client_secret = some_secret
+      scopes = user:email
+      allowed_organizations =
+      #################################### Generic OAuth #######################
+      [auth.generic_oauth]
+      name = OAuth
+      enabled = false
+      allow_sign_up = true
+      client_id = some_id
+      client_secret = some_secret
+      scopes = user:email
+      auth_url =
+      token_url =
+      api_url =
+      team_ids =
+      allowed_organizations =
+      #################################### Basic Auth ##########################
+      [auth.basic]
+      enabled = false
+      #################################### Auth Proxy ##########################
+      [auth.proxy]
+      enabled = true
+      header_name = X-WEBAUTH-USER
+      header_property = username
+      auto_sign_up = true
+      ldap_sync_ttl = 60
+      whitelist =
+      #################################### Auth LDAP ###########################
+      [auth.ldap]
+      enabled = false
+      config_file = /etc/grafana/ldap.toml
+      allow_sign_up = true
+      #################################### SMTP / Emailing #####################
+      [smtp]
+      enabled = false
+      host = localhost:25
+      user =
+      # If the password contains # or ; you have to wrap it with trippel quotes. Ex """#password;"""
+      password =
+      cert_file =
+      key_file =
+      skip_verify = false
+      from_address = admin@grafana.localhost
+      from_name = Grafana
+      ehlo_identity =
+      [emails]
+      welcome_email_on_sign_up = false
+      templates_pattern = emails/*.html
+      #################################### Logging ##########################
+      [log]
+      # Either "console", "file", "syslog". Default is console and  file
+      # Use space to separate multiple modes, e.g. "console file"
+      mode = console file
+      # Either "debug", "info", "warn", "error", "critical", default is "info"
+      level = error
+      # optional settings to set different levels for specific loggers. Ex filters = sqlstore:debug
+      filters =
+      # For "console" mode only
+      [log.console]
+      level =
+      # log line format, valid options are text, console and json
+      format = console
+      # For "file" mode only
+      [log.file]
+      level =
+      # log line format, valid options are text, console and json
+      format = text
+      # This enables automated log rotate(switch of following options), default is true
+      log_rotate = true
+      # Max line number of single file, default is 1000000
+      max_lines = 1000000
+      # Max size shift of single file, default is 28 means 1 << 28, 256MB
+      max_size_shift = 28
+      # Segment log daily, default is true
+      daily_rotate = true
+      # Expired days of log file(delete after max days), default is 7
+      max_days = 7
+      [log.syslog]
+      level =
+      # log line format, valid options are text, console and json
+      format = text
+      # Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
+      network =
+      address =
+      # Syslog facility. user, daemon and local0 through local7 are valid.
+      facility =
+      # Syslog tag. By default, the process' argv[0] is used.
+      tag =
+      #################################### AMQP Event Publisher ################
+      [event_publisher]
+      enabled = false
+      rabbitmq_url = amqp://localhost/
+      exchange = grafana_events
+      #################################### Dashboard JSON files ################
+      [dashboards.json]
+      enabled = false
+      path = /var/lib/grafana/dashboards
+      #################################### Usage Quotas ########################
+      [quota]
+      enabled = false
+      #### set quotas to -1 to make unlimited. ####
+      # limit number of users per Org.
+      org_user = 10
+      # limit number of dashboards per Org.
+      org_dashboard = 100
+      # limit number of data_sources per Org.
+      org_data_source = 10
+      # limit number of api_keys per Org.
+      org_api_key = 10
+      # limit number of orgs a user can create.
+      user_org = 10
+      # Global limit of users.
+      global_user = -1
+      # global limit of orgs.
+      global_org = -1
+      # global limit of dashboards
+      global_dashboard = -1
+      # global limit of api_keys
+      global_api_key = -1
+      # global limit on number of logged in users.
+      global_session = -1
+      #################################### Alerting ############################
+      [alerting]
+      # Disable alerting engine & UI features
+      enabled = true
+      # Makes it possible to turn off alert rule execution but alerting UI is visible
+      execute_alerts = true
+      #################################### Internal Grafana Metrics ############
+      # Metrics available at HTTP API Url /api/metrics
+      [metrics]
+      enabled           = true
+      interval_seconds  = 10
+      # Send internal Grafana metrics to graphite
+      [metrics.graphite]
+      # Enable by setting the address setting (ex localhost:2003)
+      address =
+      prefix = prod.grafana.%(instance_name)s.
+      [grafana_net]
+      url = https://grafana.com
+      [grafana_com]
+      url = https://grafana.com
+      #################################### Distributed tracing ############
+      [tracing.jaeger]
+      # jaeger destination (ex localhost:6831)
+      address =
+      # tag that will always be included in when creating new spans. ex (tag1:value1,tag2:value2)
+      always_included_tag =
+      # Type specifies the type of the sampler: const, probabilistic, rateLimiting, or remote
+      sampler_type = const
+      # jaeger samplerconfig param
+      # for "const" sampler, 0 or 1 for always false/true respectively
+      # for "probabilistic" sampler, a probability between 0 and 1
+      # for "rateLimiting" sampler, the number of spans per second
+      # for "remote" sampler, param is the same as for "probabilistic"
+      # and indicates the initial sampling rate before the actual one
+      # is received from the mothership
+      sampler_param = 1
+      #################################### External Image Storage ##############
--- a/monitoring/logging/deploy.sh
+++ b/monitoring/logging/deploy.sh
@ -0,0 +1,16 @@
+#! /bin/sh
+cd $(dirname $0)
+
+oc process -f kafka.yml | oc apply -f -
+
+oc apply -f fluentd.yml -n kube-system
+oc adm policy add-scc-to-user privileged -z fluentd -n kube-system
+oc patch ds fluentd -n kube-system -p "spec:
+  template:
+    spec:
+      containers:
+      - name: fluentd
+        securityContext:
+          privileged: true"
+oc delete pod --namespace kube-system -l "k8s-app = fluentd-logging"
+
--- a/monitoring/logging/fluentd.yml
+++ b/monitoring/logging/fluentd.yml
@ -0,0 +1,86 @@
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: fluentd
+  namespace: kube-system
+
+---
+apiVersion: rbac.authorization.k8s.io/v1beta1
+kind: ClusterRole
+metadata:
+  name: fluentd
+  namespace: kube-system
+rules:
+- apiGroups:
+  - ""
+  resources:
+  - pods
+  - namespaces
+  verbs:
+  - get
+  - list
+  - watch
+
+---
+kind: ClusterRoleBinding
+apiVersion: rbac.authorization.k8s.io/v1beta1
+metadata:
+  name: fluentd
+roleRef:
+  kind: ClusterRole
+  name: fluentd
+  apiGroup: rbac.authorization.k8s.io
+subjects:
+- kind: ServiceAccount
+  name: fluentd
+  namespace: kube-system
+---
+apiVersion: extensions/v1beta1
+kind: DaemonSet
+metadata:
+  name: fluentd
+  namespace: kube-system
+  labels:
+    k8s-app: fluentd-logging
+    version: v1
+    kubernetes.io/cluster-service: "true"
+spec:
+  template:
+    metadata:
+      labels:
+        k8s-app: fluentd-logging
+        version: v1
+        kubernetes.io/cluster-service: "true"
+    spec:
+      serviceAccount: fluentd
+      serviceAccountName: fluentd
+      tolerations:
+      - key: node-role.kubernetes.io/master
+        effect: NoSchedule
+      containers:
+      - name: fluentd
+        image: fluent/fluentd-kubernetes-daemonset:v1.2.5-debian-kafka
+        env:
+          - name:  FLUENT_KAFKA_BROKERS
+            value: "kafka.testmon.svc:9092"
+        resources:
+          limits:
+            memory: 200Mi
+          requests:
+            cpu: 100m
+            memory: 200Mi
+        volumeMounts:
+        - name: varlog
+          mountPath: /var/log
+        - name: varlibdockercontainers
+          mountPath: /var/lib/docker/containers
+          readOnly: true
+      terminationGracePeriodSeconds: 30
+      volumes:
+      - name: varlog
+        hostPath:
+          path: /var/log
+      - name: varlibdockercontainers
+        hostPath:
+          path: /var/lib/docker/containers
--- a/monitoring/logging/kafka.yml
+++ b/monitoring/logging/kafka.yml
@ -0,0 +1,143 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: kafka
+  template: kafka
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: kafka
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: kafka
+  spec:
+    lookupPolicy:
+      local: true
+    tags:
+    - from:
+        kind: DockerImage
+        name: wurstmeister/kafka:latest
+      name: latest
+      referencePolicy:
+        type: Source
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}
+      spec:
+        containers:
+        - image: ' '
+          imagePullPolicy: IfNotPresent
+          name: ${NAME}
+          ports:
+          - containerPort: 9092
+          env:
+            - name: KAFKA_ADVERTISED_HOST_NAME
+              value: "kafka"
+            - name: KAFKA_CREATE_TOPICS
+              value: "topic:1:1"
+            - name: KAFKA_ZOOKEEPER_CONNECT
+              value: "${NAME}-zk:2181"
+            - name: KAFKA_PORT
+              value: "9092"
+            
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}
+        from:
+          kind: ImageStreamTag
+          name: kafka:latest
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: Service
+  metadata:
+    name: ${NAME}
+  spec:
+    ports:
+    - name: kafka
+      port: 9092
+    selector:
+      name: ${NAME}
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: zookeeper
+  spec:
+    lookupPolicy:
+      local: true
+    tags:
+    - from:
+        kind: DockerImage
+        name: wurstmeister/zookeeper:latest
+      name: latest
+      referencePolicy:
+        type: Source
+
+- apiVersion: v1
+  kind: DeploymentConfig
+  metadata:
+    annotations:
+      template.alpha.openshift.io/wait-for-ready: "true"
+    name: ${NAME}-zk
+  spec:
+    replicas: 1
+    selector:
+      name: ${NAME}-zk
+    strategy:
+      type: Recreate
+    template:
+      metadata:
+        labels:
+          name: ${NAME}-zk
+      spec:
+        containers:
+        - image: ' '
+          imagePullPolicy: IfNotPresent
+          name: ${NAME}-zk
+          ports:
+          - containerPort: 2181
+    triggers:
+    - imageChangeParams:
+        automatic: true
+        containerNames:
+        - ${NAME}-zk
+        from:
+          kind: ImageStreamTag
+          name: zookeeper:latest
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: Service
+  metadata:
+    name: ${NAME}-zk
+  spec:
+    ports:
+    - name: zookeeper
+      port: 2181
+    selector:
+      name: ${NAME}-zk
--- a/monitoring/prometheus/blackbox_exporter.yml
+++ b/monitoring/prometheus/blackbox_exporter.yml
@ -0,0 +1,65 @@
+# node-exporter is an optional component that collects host level metrics from the nodes
+# in the cluster. This group of resources will require the 'hostaccess' level of privilege, which
+# should only be granted to namespaces that administrators can access.
+apiVersion: v1
+kind: List
+items:
+- apiVersion: v1
+  kind: ServiceAccount
+  metadata:
+    name: prometheus-blackbox-exporter
+  # You must grant hostaccess via: oadm policy add-scc-to-user -z prometheus-blackbox-exporter hostaccess
+  # in order for the node-exporter to access the host network and mount /proc and /sys from the host
+- apiVersion: v1
+  kind: Service
+  metadata:
+    annotations:
+      prometheus.io/scrape: "true"
+    labels:
+      app: prometheus-blackbox-exporter
+    name: prometheus-blackbox-exporter
+  spec:
+    clusterIP: None
+    ports:
+    - name: scrape
+      port: 9115
+      protocol: TCP
+      targetPort: 9115
+    selector:
+      app: prometheus-blackbox-exporter
+- apiVersion: extensions/v1beta1
+  kind: DaemonSet
+  metadata:
+    name: prometheus-blackbox-exporter
+    labels:
+      app: prometheus-blackbox-exporter
+      role: monitoring
+  spec:
+    updateStrategy:
+      type: RollingUpdate
+    template:
+      metadata:
+        labels:
+          app: prometheus-blackbox-exporter
+          role: monitoring
+        name: prometheus-exporter
+      spec:
+        hostNetwork: true
+        serviceAccountName: prometheus-blackbox-exporter
+        containers:
+        - image: prom/blackbox-exporter:v0.12.0
+          name: blackbox-exporter
+          securityContext:
+            capabilities:
+              add:
+                - NET_RAW
+          ports:
+          - containerPort: 9115
+            name: scrape
+          resources:
+            requests:
+              memory: 30Mi
+              cpu: 100m
+            limits:
+              memory: 50Mi
+              cpu: 200m
--- a/monitoring/prometheus/deploy.sh
+++ b/monitoring/prometheus/deploy.sh
@ -0,0 +1,12 @@
+#! /bin/bash
+cd $(dirname $0)
+
+# Start blackbox exporter
+oc apply -f blackbox_exporter.yml -n kube-system
+oc adm policy add-scc-to-user -z prometheus-blackbox-exporter -n kube-system privileged hostaccess
+
+# Start node exporter
+oc apply -f node_exporter.yml -n kube-system
+oc adm policy add-scc-to-user -z prometheus-node-exporter -n kube-system hostaccess
+
+oc process -f prometheus.yml | oc apply -f -
--- a/monitoring/prometheus/node_exporter.yml
+++ b/monitoring/prometheus/node_exporter.yml
@ -0,0 +1,79 @@
+# node-exporter is an optional component that collects host level metrics from the nodes
+# in the cluster. This group of resources will require the 'hostaccess' level of privilege, which
+# should only be granted to namespaces that administrators can access.
+apiVersion: v1
+kind: List
+items:
+- apiVersion: v1
+  kind: ServiceAccount
+  metadata:
+    name: prometheus-node-exporter
+  # You must grant hostaccess via: oadm policy add-scc-to-user -z prometheus-node-exporter hostaccess
+  # in order for the node-exporter to access the host network and mount /proc and /sys from the host
+- apiVersion: v1
+  kind: Service
+  metadata:
+    annotations:
+      prometheus.io/scrape: "true"
+    labels:
+      app: prometheus-node-exporter
+    name: prometheus-node-exporter
+  spec:
+    clusterIP: None
+    ports:
+    - name: scrape
+      port: 9100
+      protocol: TCP
+      targetPort: 9100
+    selector:
+      app: prometheus-node-exporter
+- apiVersion: extensions/v1beta1
+  kind: DaemonSet
+  metadata:
+    name: prometheus-node-exporter
+    labels:
+      app: prometheus-node-exporter
+      role: monitoring
+  spec:
+    updateStrategy:
+      type: RollingUpdate
+    template:
+      metadata:
+        labels:
+          app: prometheus-node-exporter
+          role: monitoring
+        name: prometheus-exporter
+      spec:
+        serviceAccountName: prometheus-node-exporter
+        hostNetwork: true
+        hostPID: true
+        containers:
+        - image: openshift/prometheus-node-exporter:v0.16.0
+          args:
+          - "--path.procfs=/host/proc"
+          - "--path.sysfs=/host/sys"
+          name: node-exporter
+          ports:
+          - containerPort: 9100
+            name: scrape
+          resources:
+            requests:
+              memory: 30Mi
+              cpu: 100m
+            limits:
+              memory: 50Mi
+              cpu: 200m
+          volumeMounts:
+          - name: proc
+            readOnly:  true
+            mountPath: /host/proc
+          - name: sys
+            readOnly: true
+            mountPath: /host/sys
+        volumes:
+        - name: proc
+          hostPath:
+            path: /proc
+        - name: sys
+          hostPath:
+            path: /sys
--- a/monitoring/prometheus/prometheus.yml
+++ b/monitoring/prometheus/prometheus.yml
@ -0,0 +1,446 @@
+apiVersion: template.openshift.io/v1
+kind: Template
+metadata:
+  name: prometheus
+  annotations:
+    "openshift.io/display-name": Prometheus
+    description: |
+      A Prometheus deployment that can be customized to monitor components and dispatch alerts. It is secure by default and can be used to monitor arbitrary clients.
+    iconClass: fa fa-cogs
+    tags: "monitoring,prometheus,alertmanager,time-series"
+parameters:
+- description: The location of the proxy image
+  name: IMAGE_PROXY
+  value: openshift/oauth-proxy:v1.0.0
+- description: The location of the prometheus image
+  name: IMAGE_PROMETHEUS
+  value: openshift/prometheus:v2.3.2
+- description: The location of the alertmanager image
+  name: IMAGE_ALERTMANAGER
+  value: openshift/prometheus-alertmanager:v0.15.1
+- description: The location of alert-buffer image
+  name: IMAGE_ALERT_BUFFER
+  value: openshift/prometheus-alert-buffer:v0.0.2
+- description: The session secret for the proxy
+  name: SESSION_SECRET
+  generate: expression
+  from: "[a-zA-Z0-9]{43}"
+
+objects:
+- apiVersion: v1
+  kind: ServiceAccount
+  metadata:
+    name: prom
+    annotations:
+      serviceaccounts.openshift.io/oauth-redirectreference.prom: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"prom"}}'
+      serviceaccounts.openshift.io/oauth-redirectreference.alerts: '{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"prom-alerts"}}'
+
+- apiVersion: v1
+  kind: PersistentVolumeClaim
+  metadata:
+    name: prometheus-data-claim
+  spec:
+    accessModes:
+    - ReadWriteOnce
+    resources:
+      requests:
+        storage: 1Gi
+# Create a fully end-to-end TLS connection to the prometheus proxy
+- apiVersion: route.openshift.io/v1
+  kind: Route
+  metadata:
+    name: prom
+  spec:
+    to:
+      name: prom
+    tls:
+      termination: Reencrypt
+      insecureEdgeTerminationPolicy: Redirect
+- apiVersion: v1
+  kind: Service
+  metadata:
+    annotations:
+      prometheus.io/scrape: "true"
+      prometheus.io/scheme: https
+      service.alpha.openshift.io/serving-cert-secret-name: prom-tls
+    labels:
+      name: prom
+    name: prom
+  spec:
+    ports:
+    - name: prometheus
+      port: 443
+      protocol: TCP
+      targetPort: 8443
+    - name: prometheusapi
+      port: 9090
+      protocol: TCP
+      targetPort: 9090
+    selector:
+      app: prom
+- apiVersion: v1
+  kind: Secret
+  metadata:
+    name: prom-proxy
+  stringData:
+    session_secret: "${SESSION_SECRET}="
+- apiVersion: apps/v1beta1
+  kind: StatefulSet
+  metadata:
+    labels:
+      app: prom
+    name: prom
+  spec:
+    updateStrategy:
+      type: RollingUpdate
+    podManagementPolicy: Parallel
+    selector:
+      matchLabels:
+        app: prom
+    template:
+      metadata:
+        labels:
+          app: prom
+        name: prom
+      spec:
+        serviceAccountName: prom
+        containers:
+        # Deploy Prometheus behind an oauth proxy
+        - name: prom-proxy
+          image: ${IMAGE_PROXY}
+          imagePullPolicy: IfNotPresent
+          ports:
+          - containerPort: 8443
+            name: web
+          env:
+          - name: NAMESPACE
+            valueFrom:
+              fieldRef:
+                fieldPath: metadata.namespace
+          args:
+          - -provider=openshift
+          - -https-address=:8443
+          - -http-address=
+          - -email-domain=*
+          - -upstream=http://localhost:9090
+          - -client-id=system:serviceaccount:$(NAMESPACE):prom
+          - -openshift-ca=/etc/pki/tls/cert.pem
+          - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+          - '-openshift-sar={"resource": "namespaces", "verb": "get", "resourceName": "$(NAMESPACE)", "namespace": "$(NAMESPACE)"}'
+          - -tls-cert=/etc/tls/private/tls.crt
+          - -tls-key=/etc/tls/private/tls.key
+          - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
+          - -cookie-secret-file=/etc/proxy/secrets/session_secret
+          - -skip-auth-regex=^/metrics
+          volumeMounts:
+          - mountPath: /etc/tls/private
+            name: prometheus-tls
+          - mountPath: /etc/proxy/secrets
+            name: prometheus-secrets
+          - mountPath: /prometheus
+            name: prometheus-data
+
+        - name: prometheus
+          args:
+          - --storage.tsdb.retention=6h
+          - --config.file=/etc/prometheus/prometheus.yml
+          - --web.listen-address=:9090
+          image: ${IMAGE_PROMETHEUS}
+          imagePullPolicy: IfNotPresent
+          ports:
+          - containerPort: 9090
+            name: api
+          volumeMounts:
+          - mountPath: /etc/prometheus
+            name: prometheus-config
+          - mountPath: /prometheus
+            name: prometheus-data
+
+        # Deploy alertmanager behind an oauth proxy
+        # use http port=4190 and https port=9943 to differ from prom-proxy
+        - name: alerts-proxy
+          image: ${IMAGE_PROXY}
+          imagePullPolicy: IfNotPresent
+          ports:
+          - containerPort: 9443
+            name: web
+          env:
+          - name: NAMESPACE
+            valueFrom:
+              fieldRef:
+                fieldPath: metadata.namespace
+          args:
+          - -provider=openshift
+          - -https-address=:9443
+          - -http-address=
+          - -email-domain=*
+          - -upstream=http://localhost:9093
+          - -client-id=system:serviceaccount:$(NAMESPACE):prom
+          - -openshift-ca=/etc/pki/tls/cert.pem
+          - -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+          - '-openshift-sar={"resource": "namespaces", "verb": "get", "resourceName": "$(NAMESPACE)", "namespace": "$(NAMESPACE)"}'
+          - -tls-cert=/etc/tls/private/tls.crt
+          - -tls-key=/etc/tls/private/tls.key
+          - -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
+          - -cookie-secret-file=/etc/proxy/secrets/session_secret
+          volumeMounts:
+          - mountPath: /etc/tls/private
+            name: alerts-tls
+          - mountPath: /etc/proxy/secrets
+            name: alerts-secrets
+
+        - name: alertmanager
+          args:
+          - --config.file=/etc/alertmanager/alertmanager.yml
+          image: ${IMAGE_ALERTMANAGER}
+          imagePullPolicy: IfNotPresent
+          ports:
+          - containerPort: 9093
+            name: web
+          volumeMounts:
+          - mountPath: /etc/alertmanager
+            name: alertmanager-config
+          - mountPath: /alertmanager
+            name: alertmanager-data
+
+        restartPolicy: Always
+        volumes:
+        - name: prometheus-config
+          configMap:
+            defaultMode: 420
+            name: prometheus
+        - name: prometheus-secrets
+          secret:
+            secretName: prom-proxy
+        - name: prometheus-tls
+          secret:
+            secretName: prom-tls
+        - name: prometheus-data
+          persistentVolumeClaim:
+            claimName: prometheus-data-claim
+        - name: alertmanager-config
+          configMap:
+            defaultMode: 420
+            name: alertmanager
+        - name: alerts-secrets
+          secret:
+            secretName: prom-alerts-proxy
+        - name: alerts-tls
+          secret:
+            secretName: prom-alerts-tls
+        - name: alertmanager-data
+          emptyDir: {}
+
+# Create a fully end-to-end TLS connection to the alert proxy
+- apiVersion: route.openshift.io/v1
+  kind: Route
+  metadata:
+    name: prom-alerts
+  spec:
+    to:
+      name: prom-alerts
+    tls:
+      termination: Reencrypt
+      insecureEdgeTerminationPolicy: Redirect
+- apiVersion: v1
+  kind: Service
+  metadata:
+    annotations:
+      service.alpha.openshift.io/serving-cert-secret-name: prom-alerts-tls
+    labels:
+      name: prom-alerts
+    name: prom-alerts
+  spec:
+    ports:
+    - name: alerts
+      port: 443
+      protocol: TCP
+      targetPort: 9443
+    selector:
+      app: prom
+- apiVersion: v1
+  kind: Secret
+  metadata:
+    name: prom-alerts-proxy
+  stringData:
+    session_secret: "${SESSION_SECRET}="
+
+- apiVersion: v1
+  kind: ConfigMap
+  metadata:
+    name: prometheus
+  data:
+    alerting.rules: |
+      groups:
+      - name: node_rules
+        interval: 30s # defaults to global interval
+        rules:
+        - alert: high_cpu_usage_on_node
+          for: 5m
+          expr: sum(rate(process_cpu_seconds_total[5m])) by (instance) * 100 > 70
+          annotations:
+            summary: "HIGH CPU USAGE WARNING ON '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "{{ $labels.instance }} is using a LOT of CPU. CPU usage is {{ humanize $value}}%."
+        - alert: high_memory_usage_on_node
+          for: 5m
+          expr: ((node_memory_MemTotal-node_memory_MemAvailable)/node_memory_MemTotal)*100 > 80
+          annotations:
+            summary: "HIGH MEMORY USAGE WARNING TASK ON '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "{{ $labels.instance }} is using a LOT of MEMORY. MEMORY usage is over {{ humanize $value}}%."
+        - alert: node_running_out_of_disk_space
+          for: 5m
+          expr: (node_filesystem_size{mountpoint="/"} - node_filesystem_free{mountpoint="/"}) * 100/ node_filesystem_size{mountpoint="/"} > 70
+          annotations:
+            summary: "LOW DISK SPACE WARING: NODE '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "More than 70% of disk used. Disk usage {{ humanize $value }}%."
+        - alert: disk_will_fill_in_8_hours
+          for: 5m
+          expr: predict_linear(node_filesystem_free{mountpoint="/"}[1h], 8*3600) < 0
+          annotations:
+            summary: "DISK SPACE FULL IN 8 HOURS: NODE '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "{{ $labels.instance }} is writing a lot."
+        - alert: service_down
+          for: 1m
+          expr: up < 1
+          annotations:
+            summary: "SERVICE DOWN:'{{ $labels.job }}' on '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "'{{ $labels.job }}' on {{ $labels.instance }} could not be reached by Prometheus for more than 5 minutes."
+        - alert: ping_failed
+          for: 1m
+          expr: probe_success == 0
+          annotations:
+            summary: "PING FAILED:'{{ $labels.job }}' on '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "'{{ $labels.instance }} could not be pinged for more than 1 minute. Host offline ?"
+        - alert: gaia_stuck
+          for: 1m
+          expr: changes(consensus_height[1m]) == 0
+          annotations:
+            summary: "NODE seems stuck'{{ $labels.job }}' on '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "'height of {{ $labels.job }}' on {{ $labels.instance }} did not change for more than 1 minute."
+        - alert: gaia_behind
+          for: 1m
+          expr: consensus_height - ignoring (instance) group_left max without (instance)(consensus_height) <= -5
+          annotations:
+            summary: "NODE seems behind'{{ $labels.job }}' on '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "'height of {{ $labels.job }}' on {{ $labels.instance }} is behind our best height by {{ $value }} for longer than 1minute."
+        - alert: node_reboot
+          expr: changes(node_boot_time[10m]) > 1
+          annotations:
+            summary: "NODE has rebooted'{{ $labels.job }}' on '{{ $labels.instance }}'"
+            severity: "HIGH"
+            message: "It looks like {{ $labels.instance }}' has rebooted. Was that intentional ?"
+        - alert: high_load
+          for: 2m
+          expr: node_load1 / count(node_cpu{mode="system"}) WITHOUT (cpu, mode) > 0.8
+          annotations:
+            summary: '{{ $labels.instance }} of job {{ $labels.job }} is under high load.'
+            severity: "HIGH"
+            message: "{{ $labels.instance }} has a high load. Sysload is {{ humanize $value}}."
+        - alert: validator_connection
+          for: 30s
+          expr: -p2p_peers + ignoring (instance) group_left count without (instance)(consensus_height) -1 > 0
+          annotations:
+            summary: '{{ $labels.instance }} of job {{ $labels.job }} is not connected to all other nodes.'
+            severity: "HIGH"
+            message: "{{ $labels.instance }} has a peering problem with other private nodes. Nodes missing: {{$value}}."
+        - alert: network_errors
+          for: 30s
+          expr: sum(rate(node_network_receive_errs[5m])) > 1
+          annotations:
+            summary: '{{ $labels.instance }} of job {{ $labels.job }} has network receive errors.'
+            severity: "HIGH"
+            message: "{{ $labels.instance }} has networking issues. Package error rate: {{$value}}."
+        - alert: socket_opens
+          for: 5m
+          expr: delta(node_sockstat_TCP_alloc[1m]) > 200
+          annotations:
+            summary: '{{ $labels.instance }} allocates TCP sockets at a very high rate.'
+            severity: "HIGH"
+            message: "{{ $labels.instance }} allocates very many TCP sockets. TCP sockets / second: {{$value}}."        
+    recording.rules: |
+      groups:
+      - name: aggregate_container_resources
+        rules:
+        - record: container_cpu_usage_rate
+          expr: sum without (cpu) (rate(container_cpu_usage_seconds_total[5m]))
+        - record: container_memory_rss_by_type
+          expr: container_memory_rss{id=~"/|/system.slice|/kubepods.slice"} > 0
+        - record: container_cpu_usage_percent_by_host
+          expr: sum(rate(container_cpu_usage_seconds_total{id="/"}[5m])) BY(kubernetes_io_hostname) / ON(kubernetes_io_hostname) machine_cpu_cores
+        - record: apiserver_request_count_rate_by_resources
+          expr: sum without (client,instance,contentType) (rate(apiserver_request_count[5m]))
+    prometheus.yml: |
+      rule_files:
+        - '*.rules'
+      # A scrape configuration for running Prometheus on a Kubernetes cluster.
+      # This uses separate scrape configs for cluster components (i.e. API server, node)
+      # and services to allow each to use different authentication configs.
+      #
+      # Kubernetes labels will be added as Prometheus labels on metrics via the
+      # `labelmap` relabeling action.
+      # Scrape config for API servers.
+      #
+      # Kubernetes exposes API servers as endpoints to the default/kubernetes
+      # service so this uses `endpoints` role and uses relabelling to only keep
+      # the endpoints associated with the default/kubernetes service using the
+      # default named port `https`. This works for single API server deployments as
+      # well as HA API server deployments.
+      scrape_configs:
+        - job_name: "node"
+          scrape_interval: 5s
+          static_configs:
+          - targets: ["5.83.163.203:9100"]
+        - job_name: "gaia"
+          scrape_interval: 2s
+          static_configs:
+          - targets: ["gaia-node0:26660","gaia-node1:26660","gaia-node2:26660","gaia-node3:26660"]
+        - job_name: 'blackbox'
+          metrics_path: /probe
+          params:
+            module: [icmp]  # Look for a HTTP 200 response.
+          static_configs:
+            - targets:
+              - 5.83.163.203    # Target to probe with http.
+          relabel_configs:
+            - source_labels: [__address__]
+              target_label: __param_target
+            - source_labels: [__param_target]
+              target_label: instance
+            - target_label: __address__
+              replacement: 5.83.163.203:9115  # The blackbox exporter's real hostname:port.
+      alerting:
+        alertmanagers:
+        - scheme: http
+          static_configs:
+          - targets:
+            - "localhost:9093"
+- apiVersion: v1
+  kind: ConfigMap
+  metadata:
+    name: alertmanager
+  data:
+    alertmanager.yml: |
+      global:
+      # The root route on which each incoming alert enters.
+      route:
+        # default route if none match
+        receiver: alert-buffer-wh
+        # The labels by which incoming alerts are grouped together. For example,
+        # multiple alerts coming in for cluster=A and alertname=LatencyHigh would
+        # be batched into a single group.
+        # TODO:
+        group_by: []
+        # All the above attributes are inherited by all child routes and can
+        # overwritten on each.
+      receivers:
+      - name: alert-buffer-wh
+        webhook_configs:
+        - url: http://localhost:9099/topics/alerts
--- a/openshift/deploy.yaml
+++ b/openshift/deploy.yaml
@ -0,0 +1,28 @@
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+  name: gaia-deploy
+  app: gaia
+spec:
+  parallelism: 1
+  completions: 1
+  template:
+    metadata:
+      name: gaia-deploy
+    spec:
+      volumes:
+      - name: state
+        persistentVolumeClaim:
+          claimName: gaia-ansible-state
+      containers:
+      - name: gaia-deploy
+        image: gaia-ansible:latest
+        env:
+        - name: ANSIBLE_FORCE_COLOR
+          value: 'true'
+        volumeMounts:
+          - name: state
+            mountPath: /opt/app-root/state
+      restartPolicy: OnFailure
+      serviceAccount: gaia-ansible
--- a/openshift/gaia-ansible.yaml
+++ b/openshift/gaia-ansible.yaml
@ -0,0 +1,88 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia
+  template: gaia-ansible
+metadata:
+  name: gaia-ansible
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: gaia-ansible
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/certusone/testnet_deploy
+- displayName: Git Reference
+  name: GIT_REF
+  required: true
+  value: master
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: ${NAME}
+  spec:
+    lookupPolicy:
+      local: true
+
+- apiVersion: v1
+  kind: BuildConfig
+  metadata:
+    name: ${NAME}
+  spec:
+    output:
+      to:
+        kind: ImageStreamTag
+        name: ${NAME}:latest
+    postCommit: {}
+    runPolicy: Serial
+    source:
+      git:
+        ref: ${GIT_REF}
+        uri: ${GIT_REPO}
+      contextDir: ansible
+      type: Git
+    strategy:
+      sourceStrategy:
+        from:
+          kind: ImageStreamTag
+          name: gaiad:latest
+      type: Source
+    triggers:
+    - imageChange: {}
+      type: ImageChange
+    - type: ConfigChange
+
+- apiVersion: v1
+  kind: ServiceAccount
+  metadata:
+    name: ${NAME}
+
+- apiVersion: rbac.authorization.k8s.io/v1
+  kind: RoleBinding
+  metadata:
+    name: ${NAME}-edit
+  roleRef:
+    apiGroup: rbac.authorization.k8s.io
+    kind: ClusterRole
+    name: edit
+  subjects:
+  - kind: ServiceAccount
+    name: ${NAME}
+
+- apiVersion: v1
+  kind: PersistentVolumeClaim
+  metadata:
+    name: ${NAME}-state
+  spec:
+    accessModes:
+    - ReadWriteOnce
+    resources:
+      requests:
+        storage: 1Gi
--- a/openshift/gaiad.yaml
+++ b/openshift/gaiad.yaml
@ -0,0 +1,63 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia
+  template: gaiad
+metadata:
+  name: gaia
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: gaiad
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/cosmos/cosmos-sdk
+- displayName: Git Reference
+  name: GIT_REF
+  required: true
+  value: develop
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: ${NAME}
+  spec:
+    # this allows k8s objects to directly reference this image stream
+    lookupPolicy:
+      local: true
+
+- apiVersion: v1
+  kind: BuildConfig
+  metadata:
+    name: ${NAME}
+  spec:
+    output:
+      to:
+        kind: ImageStreamTag
+        name: ${NAME}:latest
+    postCommit: {}
+    runPolicy: Serial
+    source:
+      git:
+        ref: ${GIT_REF}
+        uri: ${GIT_REPO}
+      type: Git
+    strategy:
+      sourceStrategy:
+        from:
+          kind: ImageStreamTag
+          name: golang-s2i:1.10
+        env:
+          - name: S2I_GOPKG
+            value: github.com/cosmos/cosmos-sdk
+      type: Source
+    triggers:
+    - imageChange: {}
+      type: ImageChange
+    - type: ConfigChange
--- a/openshift/golang-s2i.yaml
+++ b/openshift/golang-s2i.yaml
@ -0,0 +1,58 @@
+---
+apiVersion: v1
+kind: Template
+labels:
+  app: gaia
+  template: golang-s2i
+metadata:
+  name: golang-s2i
+  template.openshift.io/bindable: "false"
+parameters:
+- displayName: Name
+  name: NAME
+  required: true
+  value: golang-s2i
+- displayName: Git Repository
+  name: GIT_REPO
+  required: true
+  value: https://github.com/certusone/testnet_deploy
+- displayName: Git Reference
+  name: GIT_REF
+  required: true
+  value: master
+
+objects:
+
+- apiVersion: v1
+  kind: ImageStream
+  metadata:
+    name: ${NAME}
+
+- apiVersion: v1
+  kind: BuildConfig
+  metadata:
+    labels:
+      build: ${NAME}
+    name: ${NAME}
+  spec:
+    output:
+      to:
+        kind: ImageStreamTag
+        name: ${NAME}:1.10
+    runPolicy: Serial
+    source:
+      contextDir: openshift/golang-s2i
+      git:
+        ref: ${GIT_REF}
+        uri: ${GIT_REPO}
+      type: Git
+    strategy:
+      dockerStrategy:
+        from:
+          kind: DockerImage
+          name: registry.fedoraproject.org/f28/s2i-base
+      type: Docker
+    triggers:
+    - imageChange: {}
+      type: ImageChange
+    - type: ConfigChange
--- a/openshift/golang-s2i/Dockerfile
+++ b/openshift/golang-s2i/Dockerfile
@ -0,0 +1,24 @@
+# Custom S2I image for Golang
+FROM registry.fedoraproject.org/f28/s2i-base
+
+RUN dnf -y install golang inotify-tools protobuf-compiler rsync nmap-ncat dep ansible origin-clients && dnf -y clean all
+
+# Environment setup
+COPY scl_enable /opt/app-root/etc/scl_enable
+
+# Drop the root user and make the content of /opt/app-root owned by user 1001
+RUN chown -R 1001:0 ${APP_ROOT} && chmod -R ug+rwx ${APP_ROOT} && \
+    rpm-file-permissions
+
+# Copy S2I scripts
+COPY s2i/ $STI_SCRIPTS_PATH
+RUN chmod +x $STI_SCRIPTS_PATH/*
+
+# OpenShift Ansible module
+# TODO: don't even think about doing this in production
+RUN pip install openshift
+
+USER 1001
+
+# Set the default CMD to print the usage of the language image
+CMD $STI_SCRIPTS_PATH/usage
--- a/openshift/golang-s2i/s2i/assemble
+++ b/openshift/golang-s2i/s2i/assemble
@ -0,0 +1,45 @@
+#!/bin/bash
+set -e
+shopt -s dotglob
+
+set -x
+. /opt/app-root/etc/scl_enable
+
+# Copy source code to /opt/app-root/src
+echo "---> Copy application source ..."
+mv /tmp/src/* ./
+
+# Create user-owned GOPATH
+echo "---> Initialize Go environment ..."
+mkdir ${GOPATH}
+
+# If S2I_GOPKG is set, we link app-root to its proper place in the GOPATH
+if [[ ! -z ${S2I_GOPKG} ]]; then
+    PKG="${GOPATH}/src/${S2I_GOPKG}"
+    mkdir -p $(dirname "$PKG")
+    ln -s "$(pwd)" "$PKG"
+fi
+
+echo "Go package to be built: $PKG"
+
+if [[ ! -z ${S2I_GOCMD} ]]; then
+    # Build the package
+    echo "---> Building ..."
+    go build -o ./app ${S2I_GOCMD}
+fi
+
+ls -lisa
+
+if [[ -f Makefile ]]; then
+    echo "---> Building using Makefile ..."
+
+    if [[ ! -z ${S2I_GOPKG} ]]; then
+        cd "${GOPATH}/src/${S2I_GOPKG}"
+    fi
+
+    # TODO: generalize
+    LEDGER_ENABLED=false make get_vendor_deps install
+fi
+
+echo "---> Fixing permissions ..."
+fix-permissions ./
--- a/openshift/golang-s2i/scl_enable
+++ b/openshift/golang-s2i/scl_enable
@ -0,0 +1,9 @@
+#!/bin/bash
+
+# Colored output
+export TERM=xterm
+
+export GOPATH=${APP_ROOT}/go
+export PATH=${PATH}:${GOPATH}/bin
+
+unset BASH_ENV PROMPT_COMMAND ENV