diff --git a/README.md b/README.md index b541bfcf..811bc844 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,7 @@ Currently available examples: - **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments) - **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop) - **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms/), [Cloud Storage to Bigquery with Cloud Dataflow](./data-solutions/gcs-to-bq-with-dataflow/) -- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](.//cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam) +- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](.//cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring) For more information see the README files in the [foundations](./foundations/), [networking](./networking/), [data solutions](./data-solutions/) and [cloud operations](./cloud-operations/) folders. diff --git a/cloud-operations/README.md b/cloud-operations/README.md index 016a5f86..e887b417 100644 --- a/cloud-operations/README.md +++ b/cloud-operations/README.md @@ -16,3 +16,8 @@ The example's feed tracks changes to Google Compute instances, and the Cloud Fun
+## Compute Engine quota monitoring + + This [example](./quota-monitoring) shows a practical way of collecting and monitoring [Compute Engine resource quotas](https://cloud.google.com/compute/quotas) via Cloud Monitoring metrics as an alternative to the recently released [built-in quota metrics](https://cloud.google.com/monitoring/alerts/using-quota-metrics). A simple alert on quota thresholds is also part of the example. + +
diff --git a/cloud-operations/quota-monitoring/README.md b/cloud-operations/quota-monitoring/README.md new file mode 100644 index 00000000..a92abf68 --- /dev/null +++ b/cloud-operations/quota-monitoring/README.md @@ -0,0 +1,42 @@ +# Compute Engine quota monitoring + +This example improves on the [GCE quota exporter tool](https://github.com/GoogleCloudPlatform/professional-services/tree/master/tools/gce-quota-sync) (by the same author of this example), and shows a practical way of collecting and monitoring [Compute Engine resource quotas](https://cloud.google.com/compute/quotas) via Cloud Monitoring metrics as an alternative to the recently released [built-in quota metrics](https://cloud.google.com/monitoring/alerts/using-quota-metrics). + +Compared to the built-in metrics, it offers a simpler representation of quotas and quota ratios which is especially useful in charts, it allows filtering or combining quotas between different projects regardless of their monitoring workspace, and it creates a default alerting policy without the need to interact directly with the monitoring API. + +Regardless of its specific purpose, this example is also useful in showing how to manipulate and write time series to cloud monitoring. The resources it creates are shown in the high level diagram below: + +GCP resource diagram + +The solution is designed so that the Cloud Function arguments that control function execution (eg to set which project quotas to monitor) are defined in the Cloud Scheduler payload set in the PubSub message, so that a single function can be used for different configurations by creating more schedules. + +Quota time series are stored using a [custom metric](https://cloud.google.com/monitoring/custom-metrics) with the `custom.googleapis.com/quota/gce` type and [gauge kind](https://cloud.google.com/monitoring/api/v3/kinds-and-types#metric-kinds), tracking the ratio between quota and limit as double to aid in visualization and alerting. Labels are set with the quota name, project id (which may differ from the monitoring workspace projects), value, and limit. This is how they look like in the metrics explorer. + +GCP resource diagram + +The solution also creates a basic monitoring alert policy, to demonstrate how to raise alerts when any of the tracked quota ratios go over a predefined threshold. + +## Running the example + +Clone this repository or [open it in cloud shell](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fterraform-google-modules%2Fcloud-foundation-fabric&cloudshell_print=cloud-shell-readme.txt&cloudshell_working_dir=cloud-operations%2Fquota-monitoring), then go through the following steps to create resources: + +- `terraform init` +- `terraform apply -var project_id=my-project-id` + + +## Variables + +| name | description | type | required | default | +|---|---|:---: |:---:|:---:| +| project_id | Project id that references existing project. | string | ✓ | | +| *bundle_path* | Path used to write the intermediate Cloud Function code bundle. | string | | ./bundle.zip | +| *name* | Arbitrary string used to name created resources. | string | | quota-monitor | +| *project_create* | Create project instead ofusing an existing one. | bool | | false | +| *quota_config* | Cloud function configuration. | object({...}) | | ... | +| *region* | Compute region used in the example. | string | | europe-west1 | +| *schedule_config* | Schedule timer configuration in crontab format | string | | 0 * * * * | + +## Outputs + + + diff --git a/cloud-operations/quota-monitoring/backend.tf.sample b/cloud-operations/quota-monitoring/backend.tf.sample new file mode 100644 index 00000000..528c4eb2 --- /dev/null +++ b/cloud-operations/quota-monitoring/backend.tf.sample @@ -0,0 +1,23 @@ +# Copyright 2019 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# set a valid bucket below and rename this file to backend.tf + +terraform { + backend "gcs" { + bucket = "" + prefix = "fabric/operations/quota-monitoring" + } +} + diff --git a/cloud-operations/quota-monitoring/cf/main.py b/cloud-operations/quota-monitoring/cf/main.py new file mode 100755 index 00000000..a3ff4a09 --- /dev/null +++ b/cloud-operations/quota-monitoring/cf/main.py @@ -0,0 +1,201 @@ +#! /usr/bin/env python3 +# Copyright 2020 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +"""Sync GCE quota usage to Stackdriver for multiple projects. + +This tool fetches global and/or regional quotas from the GCE API for +multiple projects, and sends them to Stackdriver as custom metrics, where they +can be used to set alert policies or create charts. +""" + +import base64 +import datetime +import json +import logging +import os +import warnings + +import click + +from google.api_core.exceptions import GoogleAPIError +from google.cloud import monitoring_v3 + +import googleapiclient.discovery +import googleapiclient.errors + + +_BATCH_SIZE = 5 +_METRIC_KIND = monitoring_v3.enums.MetricDescriptor.MetricKind.GAUGE +_METRIC_TYPE = 'custom.googleapis.com/quota/gce' + + +def _add_series(project_id, series, client=None): + """Write metrics series to Stackdriver. + + Args: + project_id: series will be written to this project id's account + series: the time series to be written, as a list of + monitoring_v3.types.TimeSeries instances + client: optional monitoring_v3.MetricServiceClient will be used + instead of obtaining a new one + """ + client = client or monitoring_v3.MetricServiceClient() + project_name = client.project_path(project_id) + if isinstance(series, monitoring_v3.types.TimeSeries): + series = [series] + try: + client.create_time_series(project_name, series) + except GoogleAPIError as e: + raise RuntimeError('Error from monitoring API: %s' % e) + + +def _configure_logging(verbose=True): + """Basic logging configuration. + + Args: + verbose: enable verbose logging + """ + level = logging.DEBUG if verbose else logging.INFO + logging.basicConfig(level=level) + warnings.filterwarnings('ignore', r'.*end user credentials.*', UserWarning) + + +def _fetch_quotas(project, region='global', compute=None): + """Fetch GCE per - project or per - region quotas from the API. + + Args: + project: fetch global or regional quotas for this project id + region: which quotas to fetch, 'global' or region name + compute: optional instance of googleapiclient.discovery.build will be used + instead of obtaining a new one + """ + compute = compute or googleapiclient.discovery.build('compute', 'v1') + try: + if region != 'global': + req = compute.regions().get(project=project, region=region) + else: + req = compute.projects().get(project=project) + resp = req.execute() + return resp['quotas'] + except (GoogleAPIError, googleapiclient.errors.HttpError) as e: + logging.debug('API Error: %s', e, exc_info=True) + raise RuntimeError('Error fetching quota (project: %s, region: %s)' % + (project, region)) + + +def _get_series(metric_labels, value, metric_type=_METRIC_TYPE, dt=None): + """Create a Stackdriver monitoring time series from value and labels. + + Args: + metric_labels: dict with labels that will be used in the time series + value: time series value + metric_type: which metric is this series for + dt: datetime.datetime instance used for the series end time + """ + series = monitoring_v3.types.TimeSeries() + series.metric.type = metric_type + series.resource.type = 'global' + for label in metric_labels: + series.metric.labels[label] = metric_labels[label] + point = series.points.add() + point.value.double_value = value + point.interval.end_time.FromDatetime(dt or datetime.datetime.utcnow()) + return series + + +def _quota_to_series(project, region, quota): + """Convert API quota objects to Stackdriver monitoring time series. + + Args: + project: set in converted time series labels + region: set in converted time series labels + quota: quota object received from the GCE API + """ + labels = dict((k, str(v)) for k, v in quota.items()) + labels['project'] = project + labels['region'] = region + try: + value = quota['usage'] / float(quota['limit']) + except ZeroDivisionError: + value = 0 + return _get_series(labels, value) + + +@click.command() +@click.option('--monitoring-project', required=True, + help='monitoring project id') +@click.option('--gce-project', multiple=True, + help='project ids (multiple), defaults to monitoring project') +@click.option('--gce-region', multiple=True, + help='regions (multiple), defaults to "global"') +@click.option('--verbose', is_flag=True, help='Verbose output') +@click.argument('keywords', nargs=-1) +def main_cli(monitoring_project=None, gce_project=None, gce_region=None, + verbose=False, keywords=None): + """Fetch GCE quotas and writes them as custom metrics to Stackdriver. + + If KEYWORDS are specified as arguments, only quotas matching one of the + keywords will be stored in Stackdriver. + """ + try: + _main(monitoring_project, gce_project, gce_region, verbose, keywords) + except RuntimeError: + logging.exception('exception raised') + + +def main(event, context): + """Cloud Function entry point.""" + try: + data = json.loads(base64.b64decode(event['data']).decode('utf-8')) + _main(os.environ.get('GCP_PROJECT'), **data) + # uncomment once https://issuetracker.google.com/issues/155215191 is fixed + # except RuntimeError: + # raise + except Exception: + logging.exception('exception in cloud function entry point') + + +def _main(monitoring_project, gce_project=None, gce_region=None, verbose=False, + keywords=None): + """Module entry point used by cli and cloud function wrappers.""" + _configure_logging(verbose=verbose) + gce_projects = gce_project or [monitoring_project] + gce_regions = gce_region or ['global'] + keywords = set(keywords or []) + logging.debug('monitoring project %s', monitoring_project) + logging.debug('projects %s regions %s', gce_projects, gce_regions) + logging.debug('keywords %s', keywords) + quotas = [] + compute = googleapiclient.discovery.build( + 'compute', 'v1', cache_discovery=False) + for project in gce_projects: + logging.debug('project %s', project) + for region in gce_regions: + logging.debug('region %s', region) + for quota in _fetch_quotas(project, region, compute=compute): + if keywords and not any(k in quota['metric'] for k in keywords): + # logging.debug('skipping %s', quota) + continue + logging.debug('quota %s', quota) + quotas.append((project, region, quota)) + client, i = monitoring_v3.MetricServiceClient(), 0 + while i < len(quotas): + series = [_quota_to_series(*q) for q in quotas[i:i + _BATCH_SIZE]] + _add_series(monitoring_project, series, client) + i += _BATCH_SIZE + + +if __name__ == '__main__': + main_cli() diff --git a/cloud-operations/quota-monitoring/cf/requirements.txt b/cloud-operations/quota-monitoring/cf/requirements.txt new file mode 100644 index 00000000..b9c9e011 --- /dev/null +++ b/cloud-operations/quota-monitoring/cf/requirements.txt @@ -0,0 +1,3 @@ +Click>=7.0 +google-api-python-client>=1.10.1 +google-cloud-monitoring>=1.1.0 diff --git a/cloud-operations/quota-monitoring/cloud-shell-readme.txt b/cloud-operations/quota-monitoring/cloud-shell-readme.txt new file mode 100644 index 00000000..62e65af9 --- /dev/null +++ b/cloud-operations/quota-monitoring/cloud-shell-readme.txt @@ -0,0 +1,9 @@ + + +################################# Quickstart ################################# + +- terraform init +- terraform apply -var project_id=$GOOGLE_CLOUD_PROJECT + +Refer to the README.md file for more info and testing flow. + diff --git a/cloud-operations/quota-monitoring/diagram.png b/cloud-operations/quota-monitoring/diagram.png new file mode 100644 index 00000000..c68131f2 Binary files /dev/null and b/cloud-operations/quota-monitoring/diagram.png differ diff --git a/cloud-operations/quota-monitoring/explorer.png b/cloud-operations/quota-monitoring/explorer.png new file mode 100644 index 00000000..80f50254 Binary files /dev/null and b/cloud-operations/quota-monitoring/explorer.png differ diff --git a/cloud-operations/quota-monitoring/main.tf b/cloud-operations/quota-monitoring/main.tf new file mode 100644 index 00000000..ca27b933 --- /dev/null +++ b/cloud-operations/quota-monitoring/main.tf @@ -0,0 +1,142 @@ +/** + * Copyright 2020 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +locals { + projects = ( + var.quota_config.projects == null + ? [var.project_id] + : var.quota_config.projects + ) +} + +module "project" { + source = "../../modules/project" + name = var.project_id + project_create = var.project_create + services = [ + "compute.googleapis.com", + "cloudfunctions.googleapis.com" + ] + service_config = { + disable_on_destroy = false, + disable_dependent_services = false + } + iam_roles = [ + "roles/monitoring.metricWriter", + ] + iam_members = { + "roles/monitoring.metricWriter" = [module.cf.service_account_iam_email] + } +} + +module "pubsub" { + source = "../../modules/pubsub" + project_id = module.project.project_id + name = var.name + subscriptions = { + "${var.name}-default" = null + } + # the Cloud Scheduler robot service account already has pubsub.topics.publish + # at the project level via roles/cloudscheduler.serviceAgent +} + +module "cf" { + source = "../../modules/cloud-function" + project_id = module.project.project_id + name = var.name + bucket_name = "${var.name}-${random_pet.random.id}" + bucket_config = { + location = var.region + lifecycle_delete_age = null + } + bundle_config = { + source_dir = "cf" + output_path = var.bundle_path + } + service_account_create = true + trigger_config = { + event = "google.pubsub.topic.publish" + resource = module.pubsub.topic.id + retry = null + } +} + +resource "google_cloud_scheduler_job" "job" { + project = var.project_id + region = var.region + name = var.name + schedule = var.schedule_config + time_zone = "UTC" + + pubsub_target { + attributes = {} + topic_name = module.pubsub.topic.id + data = base64encode(jsonencode({ + gce_project = var.quota_config.projects + gce_region = var.quota_config.regions + keywords = var.quota_config.filters + })) + } +} + +resource "google_project_iam_member" "network_viewer" { + for_each = toset(local.projects) + project = each.key + role = "roles/compute.networkViewer" + member = module.cf.service_account_iam_email +} + +resource "google_project_iam_member" "quota_viewer" { + for_each = toset(local.projects) + project = each.key + role = "roles/servicemanagement.quotaViewer" + member = module.cf.service_account_iam_email +} + +resource "google_monitoring_alert_policy" "alert_policy" { + project = module.project.project_id + display_name = "Quota monitor" + combiner = "OR" + conditions { + display_name = "simple quota threshold" + condition_threshold { + filter = "metric.type=\"custom.googleapis.com/quota/gce\" resource.type=\"global\"" + threshold_value = 0.75 + comparison = "COMPARISON_GT" + duration = "0s" + aggregations { + alignment_period = "60s" + group_by_fields = [] + per_series_aligner = "ALIGN_MEAN" + } + trigger { + count = 1 + percent = 0 + } + } + } + enabled = false + user_labels = { + name = var.name + } + documentation { + content = "GCE quota over threshold." + } +} + +resource "random_pet" "random" { + length = 1 +} diff --git a/cloud-operations/quota-monitoring/outputs.tf b/cloud-operations/quota-monitoring/outputs.tf new file mode 100644 index 00000000..264a34bf --- /dev/null +++ b/cloud-operations/quota-monitoring/outputs.tf @@ -0,0 +1,16 @@ +/** + * Copyright 2020 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + diff --git a/cloud-operations/quota-monitoring/variables.tf b/cloud-operations/quota-monitoring/variables.tf new file mode 100644 index 00000000..c54a44bb --- /dev/null +++ b/cloud-operations/quota-monitoring/variables.tf @@ -0,0 +1,64 @@ +/** + * Copyright 2020 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +variable "bundle_path" { + description = "Path used to write the intermediate Cloud Function code bundle." + type = string + default = "./bundle.zip" +} + +variable "name" { + description = "Arbitrary string used to name created resources." + type = string + default = "quota-monitor" +} + +variable "project_create" { + description = "Create project instead ofusing an existing one." + type = bool + default = false +} + +variable "project_id" { + description = "Project id that references existing project." + type = string +} + +variable "quota_config" { + description = "Cloud function configuration." + type = object({ + filters = list(string) + projects = list(string) + regions = list(string) + }) + default = { + filters = null + projects = null + regions = null + } +} + +variable "region" { + description = "Compute region used in the example." + type = string + default = "europe-west1" +} + +variable "schedule_config" { + description = "Schedule timer configuration in crontab format" + type = string + default = "0 * * * *" +} diff --git a/tests/cloud_operations/quota_monitoring/__init__.py b/tests/cloud_operations/quota_monitoring/__init__.py new file mode 100644 index 00000000..6913f02e --- /dev/null +++ b/tests/cloud_operations/quota_monitoring/__init__.py @@ -0,0 +1,13 @@ +# Copyright 2020 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/tests/cloud_operations/quota_monitoring/fixture/cf/README b/tests/cloud_operations/quota_monitoring/fixture/cf/README new file mode 100644 index 00000000..e69de29b diff --git a/tests/cloud_operations/quota_monitoring/fixture/main.tf b/tests/cloud_operations/quota_monitoring/fixture/main.tf new file mode 100644 index 00000000..3f2810ae --- /dev/null +++ b/tests/cloud_operations/quota_monitoring/fixture/main.tf @@ -0,0 +1,22 @@ +/** + * Copyright 2020 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +module "test" { + source = "../../../../cloud-operations/quota-monitoring" + name = var.name + project_create = var.project_create + project_id = var.project_id +} diff --git a/tests/cloud_operations/quota_monitoring/fixture/variables.tf b/tests/cloud_operations/quota_monitoring/fixture/variables.tf new file mode 100644 index 00000000..ce52e598 --- /dev/null +++ b/tests/cloud_operations/quota_monitoring/fixture/variables.tf @@ -0,0 +1,38 @@ +# Copyright 2020 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +variable "name" { + type = string + default = "dns-sd-test" +} + +variable "project_create" { + type = bool + default = true +} + +variable "project_id" { + type = string + default = "test" +} + +variable "region" { + type = string + default = "europe-west1" +} + +variable "zone_domain" { + type = string + default = "svc.example.org." +} diff --git a/tests/cloud_operations/quota_monitoring/test_plan.py b/tests/cloud_operations/quota_monitoring/test_plan.py new file mode 100644 index 00000000..7b195b5c --- /dev/null +++ b/tests/cloud_operations/quota_monitoring/test_plan.py @@ -0,0 +1,27 @@ +# Copyright 2020 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + + +import os +import pytest + + +FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture') + + +def test_resources(e2e_plan_runner): + "Test that plan works and the numbers of resources is as expected." + modules, resources = e2e_plan_runner(FIXTURES_DIR) + assert len(modules) == 3 + assert len(resources) == 10