cloud-foundation-fabric/blueprints/cloud-operations/compute-quota-monitoring
simonebruzzechesse d11c380aec
Format python files in blueprints (#2079)
* format python files in blueprints
* update check on blueprints python code
* update python linter in CI workflow
2024-02-15 09:37:49 +01:00
..
src Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
tools Format python files in blueprints (#2079) 2024-02-15 09:37:49 +01:00
README.md Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
backend.tf.sample Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
cloud-shell-readme.txt Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
diagram.png Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
explorer.png Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
main.tf Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
outputs.tf Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00
variables.tf Blueprints naming convention update (#1942) 2023-12-21 17:02:25 +01:00

README.md

Compute Engine quota monitoring

This blueprint improves on the GCE quota exporter tool (by the same author of this blueprint), and shows a practical way of collecting and monitoring Compute Engine resource quotas via Cloud Monitoring metrics as an alternative to the built-in quota metrics.

Compared to the built-in metrics, it offers a simpler representation of quotas and quota ratios which is especially useful in charts, allows filtering or combining quotas between different projects regardless of their monitoring workspace, and optionally creates alerting policies without the need to interact directly with the monitoring API.

Regardless of its specific purpose, this blueprint is also useful in showing how to manipulate and write time series to cloud monitoring. The resources it creates are shown in the high level diagram below:

GCP resource diagram

The Cloud Function arguments that control function execution (for example to set which project quotas to monitor) are defined in the Cloud Scheduler payload sent in the PubSub message, so that a single function can be used for different configurations by creating more schedules.

Quota time series are stored using custom metrics with different metric types for usage, limit and utilization; metric types are based on a common prefix defaulting to quota and two tokens representing the quota name and type of data. This is an example:

  • custom.googleapis.com/quota/firewalls/usage
  • custom.googleapis.com/quota/firewalls/limit
  • custom.googleapis.com/quota/firewalls/ratio

All custom metrics are associated to the global resource type and use gauge kind

Metric labels contain

  • project set to the project of the quota
  • location set to the region of the quota (or global for project-level quotas)
  • quota containing the string representation of usage / limit for the quota, to provide an immediate reference when checking ratios; this can be easily turned off in code if reducing cardinality is needed

Labels are set with project id (which may differ from the monitoring workspace projects) and region (quotas that are not region specific are labelled global), this is how the ratio metric for a quota looks in in Metrics Explorer

GCP Metrics Explorer, usage, limit and utilization view sample

Configuring resources

The projects where resources are created is also the one where metrics will be written, and is configured via the project_id variable. The project can optionally be created by configuring the project_create_config variable.

The region, location of the bundle used to deploy the function, and scheduling frequency can also be configured via the relevant variables.

Configuring Cloud Function parameters

The quota_config variable mirrors the arguments accepted by the Python program, and allows configuring several different aspects of its behaviour:

  • quota_config.discover_root organization or folder to be used to discover all underlying projects to track quotas for, in organizations/nnnnn or folders/nnnnn format
  • quota_config.exclude do not generate metrics for quotas matching prefixes listed here
  • quota_config.include only generate metrics for quotas matching prefixes listed here
  • quota_config.projects projects to track quotas for, defaults to the project where metrics are stored, if projects are automatically discovered, those in this list are appended.
  • quota_config.regions regions to track quotas for, defaults to the global region for project-level quotas
  • dry_run do not write actual metrics
  • verbose increase logging verbosity

The solution can also create a basic monitoring alert policies, to demonstrate how to raise alerts when quotas utilization goes over a predefined threshold, to enable it, set variable alert_create to true and reapply main.tf after main.py has run at least one and quota monitoring metrics have been created.

Running the blueprint

Clone this repository or open it in cloud shell, then go through the following steps to create resources:

  • terraform init
  • terraform apply -var project_id=my-project-id

Variables

name description type required default
project_id Project id that references existing project. string
alert_configs Configure creation of monitoring alerts for specific quotas. Keys match quota names. map(object({…})) {}
bundle_path Path used to write the intermediate Cloud Function code bundle. string "./bundle.zip"
name Arbitrary string used to name created resources. string "quota-monitor"
project_create_config Create project instead of using an existing one. object({…}) null
quota_config Cloud function configuration. object({…}) {}
region Compute region used in the example. string "europe-west1"
schedule_config Schedule timer configuration in crontab format. string "0 * * * *"

Test

module "test" {
  source     = "./fabric/blueprints/cloud-operations/compute-quota-monitoring"
  name       = "name"
  project_id = "test"
  project_create_config = {
    billing_account = "12345-ABCDE-12345"
  }
}
# tftest modules=4 resources=19