Quota monitor end to end example (#125)

* working example, README missing

* add missing boilerplate to outputs file

* README

* fix dynamic resources in IAM binding for_each

* add tests

* update input/output table in README

* add example to READMEs
This commit is contained in:
Ludovico Magnocavallo 2020-08-29 11:29:46 +02:00 committed by GitHub
parent 86bee0ff70
commit 088a7c569f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
17 changed files with 606 additions and 1 deletions

View File

@ -19,7 +19,7 @@ Currently available examples:
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop)
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms/), [Cloud Storage to Bigquery with Cloud Dataflow](./data-solutions/gcs-to-bq-with-dataflow/)
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](.//cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam)
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](.//cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring)
For more information see the README files in the [foundations](./foundations/), [networking](./networking/), [data solutions](./data-solutions/) and [cloud operations](./cloud-operations/) folders.

View File

@ -16,3 +16,8 @@ The example's feed tracks changes to Google Compute instances, and the Cloud Fun
<br clear="left">
## Compute Engine quota monitoring
<a href="./quota-monitoring" title="Compute Engine quota monitoring"><img src="./quota-monitoring/diagram.png" align="left" width="280px"></a> This [example](./quota-monitoring) shows a practical way of collecting and monitoring [Compute Engine resource quotas](https://cloud.google.com/compute/quotas) via Cloud Monitoring metrics as an alternative to the recently released [built-in quota metrics](https://cloud.google.com/monitoring/alerts/using-quota-metrics). A simple alert on quota thresholds is also part of the example.
<br clear="left">

View File

@ -0,0 +1,42 @@
# Compute Engine quota monitoring
This example improves on the [GCE quota exporter tool](https://github.com/GoogleCloudPlatform/professional-services/tree/master/tools/gce-quota-sync) (by the same author of this example), and shows a practical way of collecting and monitoring [Compute Engine resource quotas](https://cloud.google.com/compute/quotas) via Cloud Monitoring metrics as an alternative to the recently released [built-in quota metrics](https://cloud.google.com/monitoring/alerts/using-quota-metrics).
Compared to the built-in metrics, it offers a simpler representation of quotas and quota ratios which is especially useful in charts, it allows filtering or combining quotas between different projects regardless of their monitoring workspace, and it creates a default alerting policy without the need to interact directly with the monitoring API.
Regardless of its specific purpose, this example is also useful in showing how to manipulate and write time series to cloud monitoring. The resources it creates are shown in the high level diagram below:
<img src="diagram.png" width="640px" alt="GCP resource diagram">
The solution is designed so that the Cloud Function arguments that control function execution (eg to set which project quotas to monitor) are defined in the Cloud Scheduler payload set in the PubSub message, so that a single function can be used for different configurations by creating more schedules.
Quota time series are stored using a [custom metric](https://cloud.google.com/monitoring/custom-metrics) with the `custom.googleapis.com/quota/gce` type and [gauge kind](https://cloud.google.com/monitoring/api/v3/kinds-and-types#metric-kinds), tracking the ratio between quota and limit as double to aid in visualization and alerting. Labels are set with the quota name, project id (which may differ from the monitoring workspace projects), value, and limit. This is how they look like in the metrics explorer.
<img src="explorer.png" width="640px" alt="GCP resource diagram">
The solution also creates a basic monitoring alert policy, to demonstrate how to raise alerts when any of the tracked quota ratios go over a predefined threshold.
## Running the example
Clone this repository or [open it in cloud shell](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fterraform-google-modules%2Fcloud-foundation-fabric&cloudshell_print=cloud-shell-readme.txt&cloudshell_working_dir=cloud-operations%2Fquota-monitoring), then go through the following steps to create resources:
- `terraform init`
- `terraform apply -var project_id=my-project-id`
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---: |:---:|:---:|
| project_id | Project id that references existing project. | <code title="">string</code> | ✓ | |
| *bundle_path* | Path used to write the intermediate Cloud Function code bundle. | <code title="">string</code> | | <code title="">./bundle.zip</code> |
| *name* | Arbitrary string used to name created resources. | <code title="">string</code> | | <code title="">quota-monitor</code> |
| *project_create* | Create project instead ofusing an existing one. | <code title="">bool</code> | | <code title="">false</code> |
| *quota_config* | Cloud function configuration. | <code title="object&#40;&#123;&#10;filters &#61; list&#40;string&#41;&#10;projects &#61; list&#40;string&#41;&#10;regions &#61; list&#40;string&#41;&#10;&#125;&#41;">object({...})</code> | | <code title="&#123;&#10;filters &#61; null&#10;projects &#61; null&#10;regions &#61; null&#10;&#125;">...</code> |
| *region* | Compute region used in the example. | <code title="">string</code> | | <code title="">europe-west1</code> |
| *schedule_config* | Schedule timer configuration in crontab format | <code title="">string</code> | | <code title="">0 * * * *</code> |
## Outputs
<!-- END TFDOC -->

View File

@ -0,0 +1,23 @@
# Copyright 2019 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# set a valid bucket below and rename this file to backend.tf
terraform {
backend "gcs" {
bucket = ""
prefix = "fabric/operations/quota-monitoring"
}
}

View File

@ -0,0 +1,201 @@
#! /usr/bin/env python3
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Sync GCE quota usage to Stackdriver for multiple projects.
This tool fetches global and/or regional quotas from the GCE API for
multiple projects, and sends them to Stackdriver as custom metrics, where they
can be used to set alert policies or create charts.
"""
import base64
import datetime
import json
import logging
import os
import warnings
import click
from google.api_core.exceptions import GoogleAPIError
from google.cloud import monitoring_v3
import googleapiclient.discovery
import googleapiclient.errors
_BATCH_SIZE = 5
_METRIC_KIND = monitoring_v3.enums.MetricDescriptor.MetricKind.GAUGE
_METRIC_TYPE = 'custom.googleapis.com/quota/gce'
def _add_series(project_id, series, client=None):
"""Write metrics series to Stackdriver.
Args:
project_id: series will be written to this project id's account
series: the time series to be written, as a list of
monitoring_v3.types.TimeSeries instances
client: optional monitoring_v3.MetricServiceClient will be used
instead of obtaining a new one
"""
client = client or monitoring_v3.MetricServiceClient()
project_name = client.project_path(project_id)
if isinstance(series, monitoring_v3.types.TimeSeries):
series = [series]
try:
client.create_time_series(project_name, series)
except GoogleAPIError as e:
raise RuntimeError('Error from monitoring API: %s' % e)
def _configure_logging(verbose=True):
"""Basic logging configuration.
Args:
verbose: enable verbose logging
"""
level = logging.DEBUG if verbose else logging.INFO
logging.basicConfig(level=level)
warnings.filterwarnings('ignore', r'.*end user credentials.*', UserWarning)
def _fetch_quotas(project, region='global', compute=None):
"""Fetch GCE per - project or per - region quotas from the API.
Args:
project: fetch global or regional quotas for this project id
region: which quotas to fetch, 'global' or region name
compute: optional instance of googleapiclient.discovery.build will be used
instead of obtaining a new one
"""
compute = compute or googleapiclient.discovery.build('compute', 'v1')
try:
if region != 'global':
req = compute.regions().get(project=project, region=region)
else:
req = compute.projects().get(project=project)
resp = req.execute()
return resp['quotas']
except (GoogleAPIError, googleapiclient.errors.HttpError) as e:
logging.debug('API Error: %s', e, exc_info=True)
raise RuntimeError('Error fetching quota (project: %s, region: %s)' %
(project, region))
def _get_series(metric_labels, value, metric_type=_METRIC_TYPE, dt=None):
"""Create a Stackdriver monitoring time series from value and labels.
Args:
metric_labels: dict with labels that will be used in the time series
value: time series value
metric_type: which metric is this series for
dt: datetime.datetime instance used for the series end time
"""
series = monitoring_v3.types.TimeSeries()
series.metric.type = metric_type
series.resource.type = 'global'
for label in metric_labels:
series.metric.labels[label] = metric_labels[label]
point = series.points.add()
point.value.double_value = value
point.interval.end_time.FromDatetime(dt or datetime.datetime.utcnow())
return series
def _quota_to_series(project, region, quota):
"""Convert API quota objects to Stackdriver monitoring time series.
Args:
project: set in converted time series labels
region: set in converted time series labels
quota: quota object received from the GCE API
"""
labels = dict((k, str(v)) for k, v in quota.items())
labels['project'] = project
labels['region'] = region
try:
value = quota['usage'] / float(quota['limit'])
except ZeroDivisionError:
value = 0
return _get_series(labels, value)
@click.command()
@click.option('--monitoring-project', required=True,
help='monitoring project id')
@click.option('--gce-project', multiple=True,
help='project ids (multiple), defaults to monitoring project')
@click.option('--gce-region', multiple=True,
help='regions (multiple), defaults to "global"')
@click.option('--verbose', is_flag=True, help='Verbose output')
@click.argument('keywords', nargs=-1)
def main_cli(monitoring_project=None, gce_project=None, gce_region=None,
verbose=False, keywords=None):
"""Fetch GCE quotas and writes them as custom metrics to Stackdriver.
If KEYWORDS are specified as arguments, only quotas matching one of the
keywords will be stored in Stackdriver.
"""
try:
_main(monitoring_project, gce_project, gce_region, verbose, keywords)
except RuntimeError:
logging.exception('exception raised')
def main(event, context):
"""Cloud Function entry point."""
try:
data = json.loads(base64.b64decode(event['data']).decode('utf-8'))
_main(os.environ.get('GCP_PROJECT'), **data)
# uncomment once https://issuetracker.google.com/issues/155215191 is fixed
# except RuntimeError:
# raise
except Exception:
logging.exception('exception in cloud function entry point')
def _main(monitoring_project, gce_project=None, gce_region=None, verbose=False,
keywords=None):
"""Module entry point used by cli and cloud function wrappers."""
_configure_logging(verbose=verbose)
gce_projects = gce_project or [monitoring_project]
gce_regions = gce_region or ['global']
keywords = set(keywords or [])
logging.debug('monitoring project %s', monitoring_project)
logging.debug('projects %s regions %s', gce_projects, gce_regions)
logging.debug('keywords %s', keywords)
quotas = []
compute = googleapiclient.discovery.build(
'compute', 'v1', cache_discovery=False)
for project in gce_projects:
logging.debug('project %s', project)
for region in gce_regions:
logging.debug('region %s', region)
for quota in _fetch_quotas(project, region, compute=compute):
if keywords and not any(k in quota['metric'] for k in keywords):
# logging.debug('skipping %s', quota)
continue
logging.debug('quota %s', quota)
quotas.append((project, region, quota))
client, i = monitoring_v3.MetricServiceClient(), 0
while i < len(quotas):
series = [_quota_to_series(*q) for q in quotas[i:i + _BATCH_SIZE]]
_add_series(monitoring_project, series, client)
i += _BATCH_SIZE
if __name__ == '__main__':
main_cli()

View File

@ -0,0 +1,3 @@
Click>=7.0
google-api-python-client>=1.10.1
google-cloud-monitoring>=1.1.0

View File

@ -0,0 +1,9 @@
################################# Quickstart #################################
- terraform init
- terraform apply -var project_id=$GOOGLE_CLOUD_PROJECT
Refer to the README.md file for more info and testing flow.

Binary file not shown.

After

Width:  |  Height:  |  Size: 47 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

View File

@ -0,0 +1,142 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
projects = (
var.quota_config.projects == null
? [var.project_id]
: var.quota_config.projects
)
}
module "project" {
source = "../../modules/project"
name = var.project_id
project_create = var.project_create
services = [
"compute.googleapis.com",
"cloudfunctions.googleapis.com"
]
service_config = {
disable_on_destroy = false,
disable_dependent_services = false
}
iam_roles = [
"roles/monitoring.metricWriter",
]
iam_members = {
"roles/monitoring.metricWriter" = [module.cf.service_account_iam_email]
}
}
module "pubsub" {
source = "../../modules/pubsub"
project_id = module.project.project_id
name = var.name
subscriptions = {
"${var.name}-default" = null
}
# the Cloud Scheduler robot service account already has pubsub.topics.publish
# at the project level via roles/cloudscheduler.serviceAgent
}
module "cf" {
source = "../../modules/cloud-function"
project_id = module.project.project_id
name = var.name
bucket_name = "${var.name}-${random_pet.random.id}"
bucket_config = {
location = var.region
lifecycle_delete_age = null
}
bundle_config = {
source_dir = "cf"
output_path = var.bundle_path
}
service_account_create = true
trigger_config = {
event = "google.pubsub.topic.publish"
resource = module.pubsub.topic.id
retry = null
}
}
resource "google_cloud_scheduler_job" "job" {
project = var.project_id
region = var.region
name = var.name
schedule = var.schedule_config
time_zone = "UTC"
pubsub_target {
attributes = {}
topic_name = module.pubsub.topic.id
data = base64encode(jsonencode({
gce_project = var.quota_config.projects
gce_region = var.quota_config.regions
keywords = var.quota_config.filters
}))
}
}
resource "google_project_iam_member" "network_viewer" {
for_each = toset(local.projects)
project = each.key
role = "roles/compute.networkViewer"
member = module.cf.service_account_iam_email
}
resource "google_project_iam_member" "quota_viewer" {
for_each = toset(local.projects)
project = each.key
role = "roles/servicemanagement.quotaViewer"
member = module.cf.service_account_iam_email
}
resource "google_monitoring_alert_policy" "alert_policy" {
project = module.project.project_id
display_name = "Quota monitor"
combiner = "OR"
conditions {
display_name = "simple quota threshold"
condition_threshold {
filter = "metric.type=\"custom.googleapis.com/quota/gce\" resource.type=\"global\""
threshold_value = 0.75
comparison = "COMPARISON_GT"
duration = "0s"
aggregations {
alignment_period = "60s"
group_by_fields = []
per_series_aligner = "ALIGN_MEAN"
}
trigger {
count = 1
percent = 0
}
}
}
enabled = false
user_labels = {
name = var.name
}
documentation {
content = "GCE quota over threshold."
}
}
resource "random_pet" "random" {
length = 1
}

View File

@ -0,0 +1,16 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

View File

@ -0,0 +1,64 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "bundle_path" {
description = "Path used to write the intermediate Cloud Function code bundle."
type = string
default = "./bundle.zip"
}
variable "name" {
description = "Arbitrary string used to name created resources."
type = string
default = "quota-monitor"
}
variable "project_create" {
description = "Create project instead ofusing an existing one."
type = bool
default = false
}
variable "project_id" {
description = "Project id that references existing project."
type = string
}
variable "quota_config" {
description = "Cloud function configuration."
type = object({
filters = list(string)
projects = list(string)
regions = list(string)
})
default = {
filters = null
projects = null
regions = null
}
}
variable "region" {
description = "Compute region used in the example."
type = string
default = "europe-west1"
}
variable "schedule_config" {
description = "Schedule timer configuration in crontab format"
type = string
default = "0 * * * *"
}

View File

@ -0,0 +1,13 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@ -0,0 +1,22 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
module "test" {
source = "../../../../cloud-operations/quota-monitoring"
name = var.name
project_create = var.project_create
project_id = var.project_id
}

View File

@ -0,0 +1,38 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
variable "name" {
type = string
default = "dns-sd-test"
}
variable "project_create" {
type = bool
default = true
}
variable "project_id" {
type = string
default = "test"
}
variable "region" {
type = string
default = "europe-west1"
}
variable "zone_domain" {
type = string
default = "svc.example.org."
}

View File

@ -0,0 +1,27 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pytest
FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture')
def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner(FIXTURES_DIR)
assert len(modules) == 3
assert len(resources) == 10