244 lines
18 KiB
Markdown
244 lines
18 KiB
Markdown
# GKE Autopilot cluster module
|
|
|
|
This module offers a way to create and manage Google Kubernetes Engine (GKE) [Autopilot clusters](https://cloud.google.com/kubernetes-engine/docs/concepts/autopilot-overview). With its sensible default settings based on best practices and authors' experience as Google Cloud practitioners, the module accommodates for many common use cases out-of-the-box, without having to rely on verbose configuration.
|
|
|
|
<!-- BEGIN TOC -->
|
|
- [Examples](#examples)
|
|
- [GKE Autopilot cluster](#gke-autopilot-cluster)
|
|
- [Cloud DNS](#cloud-dns)
|
|
- [Logging configuration](#logging-configuration)
|
|
- [Monitoring configuration](#monitoring-configuration)
|
|
- [Backup for GKE](#backup-for-gke)
|
|
- [Variables](#variables)
|
|
- [Outputs](#outputs)
|
|
<!-- END TOC -->
|
|
|
|
## Examples
|
|
|
|
### GKE Autopilot cluster
|
|
|
|
This example shows how to [create a GKE cluster in Autopilot mode](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-an-autopilot-cluster).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {
|
|
pods = "pods"
|
|
services = "services"
|
|
}
|
|
master_authorized_ranges = {
|
|
internal-vms = "10.0.0.0/8"
|
|
}
|
|
master_ipv4_cidr_block = "192.168.0.0/28"
|
|
}
|
|
private_cluster_config = {
|
|
enable_private_endpoint = true
|
|
master_global_access = false
|
|
}
|
|
labels = {
|
|
environment = "dev"
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=basic.yaml
|
|
```
|
|
|
|
### Cloud DNS
|
|
|
|
> [!WARNING]
|
|
> [Cloud DNS is the only DNS provider for Autopilot clusters](https://cloud.google.com/kubernetes-engine/docs/concepts/service-discovery#cloud_dns) running version `1.25.9-gke.400` and later, and version `1.26.4-gke.500` and later. It is [pre-configured](https://cloud.google.com/kubernetes-engine/docs/resources/autopilot-standard-feature-comparison#feature-comparison) for those clusters. The following example *only* applies to Autopilot clusters running *earlier* versions.
|
|
|
|
This example shows how to [use Cloud DNS as a Kubernetes DNS provider](https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
enable_features = {
|
|
dns = {
|
|
provider = "CLOUD_DNS"
|
|
scope = "CLUSTER_SCOPE"
|
|
domain = "gke.local"
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=dns.yaml
|
|
```
|
|
|
|
### Logging configuration
|
|
|
|
> [!NOTE]
|
|
> System and workload logs collection is pre-configured for Autopilot clusters and cannot be disabled.
|
|
|
|
This example shows how to [collect logs for the Kubernetes control plane components](https://cloud.google.com/stackdriver/docs/solutions/gke/installing). The logs for these components are not collected by default.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
logging_config = {
|
|
enable_api_server_logs = true
|
|
enable_scheduler_logs = true
|
|
enable_controller_manager_logs = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=logging-config.yaml
|
|
```
|
|
|
|
### Monitoring configuration
|
|
|
|
> [!NOTE]
|
|
> [System metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-system-metrics) collection is pre-configured for Autopilot clusters and cannot be disabled.
|
|
|
|
> [!WARNING]
|
|
> GKE **workload metrics** is deprecated and removed in GKE 1.24 and later. Workload metrics is replaced by [Google Cloud Managed Service for Prometheus](https://cloud.google.com/stackdriver/docs/managed-prometheus), which is Google's recommended way to monitor Kubernetes applications by using Cloud Monitoring.
|
|
|
|
This example shows how to [configure collection of Kubernetes control plane metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-control-plane-metrics). These metrics are optional and are not collected by default.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
monitoring_config = {
|
|
enable_api_server_metrics = true
|
|
enable_controller_manager_metrics = true
|
|
enable_scheduler_metrics = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=monitoring-config-control-plane.yaml
|
|
```
|
|
|
|
The next example shows how to [configure collection of kube state metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-ksm). These metrics are optional and are not collected by default.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
monitoring_config = {
|
|
enable_daemonset_metrics = true
|
|
enable_deployment_metrics = true
|
|
enable_hpa_metrics = true
|
|
enable_pod_metrics = true
|
|
enable_statefulset_metrics = true
|
|
enable_storage_metrics = true
|
|
# Kube state metrics collection requires Google Cloud Managed Service for Prometheus,
|
|
# which is enabled by default.
|
|
# enable_managed_prometheus = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=monitoring-config-kube-state.yaml
|
|
```
|
|
|
|
The *control plane metrics* and *kube state metrics* collection can be configured in a single `monitoring_config` block.
|
|
|
|
### Backup for GKE
|
|
|
|
> [!NOTE]
|
|
> Although Backup for GKE can be enabled as an add-on when configuring your GKE clusters, it is a separate service from GKE.
|
|
|
|
[Backup for GKE](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/backup-for-gke) is a service for backing up and restoring workloads in GKE clusters. It has two components:
|
|
|
|
* A [Google Cloud API](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/reference/rest) that serves as the control plane for the service.
|
|
* A GKE add-on (the [Backup for GKE agent](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/backup-for-gke#agent_overview)) that must be enabled in each cluster for which you wish to perform backup and restore operations.
|
|
|
|
Backup for GKE is supported in GKE Autopilot clusters with [some restrictions](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/about-autopilot).
|
|
|
|
This example shows how to [enable Backup for GKE on a new Autopilot cluster](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/how-to/install#enable_on_a_new_cluster_optional) and [plan a set of backups](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/how-to/backup-plan).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-autopilot"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
backup_configs = {
|
|
enable_backup_agent = true
|
|
backup_plans = {
|
|
"backup-1" = {
|
|
region = "europe-west-2"
|
|
schedule = "0 9 * * 1"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2 inventory=backup.yaml
|
|
```
|
|
<!-- BEGIN TFDOC -->
|
|
## Variables
|
|
|
|
| name | description | type | required | default |
|
|
|---|---|:---:|:---:|:---:|
|
|
| [location](variables.tf#L112) | Autopilot clusters are always regional. | <code>string</code> | ✓ | |
|
|
| [name](variables.tf#L189) | Cluster name. | <code>string</code> | ✓ | |
|
|
| [project_id](variables.tf#L225) | Cluster project ID. | <code>string</code> | ✓ | |
|
|
| [vpc_config](variables.tf#L241) | VPC-level configuration. | <code title="object({ network = string subnetwork = string master_ipv4_cidr_block = optional(string) secondary_range_blocks = optional(object({ pods = string services = string })) secondary_range_names = optional(object({ pods = optional(string, "pods") services = optional(string, "services") })) master_authorized_ranges = optional(map(string)) stack_type = optional(string) })">object({…})</code> | ✓ | |
|
|
| [backup_configs](variables.tf#L17) | Configuration for Backup for GKE. | <code title="object({ enable_backup_agent = optional(bool, false) backup_plans = optional(map(object({ encryption_key = optional(string) include_secrets = optional(bool, true) include_volume_data = optional(bool, true) namespaces = optional(list(string)) region = string schedule = string retention_policy_days = optional(string) retention_policy_lock = optional(bool, false) retention_policy_delete_lock_days = optional(string) })), {}) })">object({…})</code> | | <code>{}</code> |
|
|
| [deletion_protection](variables.tf#L37) | Whether or not to allow Terraform to destroy the cluster. Unless this field is set to false in Terraform state, a terraform destroy or terraform apply that would delete the cluster will fail. | <code>bool</code> | | <code>true</code> |
|
|
| [description](variables.tf#L44) | Cluster description. | <code>string</code> | | <code>null</code> |
|
|
| [enable_addons](variables.tf#L50) | Addons enabled in the cluster (true means enabled). | <code title="object({ cloudrun = optional(bool, false) config_connector = optional(bool, false) istio = optional(object({ enable_tls = bool })) kalm = optional(bool, false) })">object({…})</code> | | <code>{}</code> |
|
|
| [enable_features](variables.tf#L64) | Enable cluster-level features. Certain features allow configuration. | <code title="object({ beta_apis = optional(list(string)) binary_authorization = optional(bool, false) cost_management = optional(bool, false) dns = optional(object({ provider = optional(string) scope = optional(string) domain = optional(string) })) database_encryption = optional(object({ state = string key_name = string })) gateway_api = optional(bool, false) groups_for_rbac = optional(string) l4_ilb_subsetting = optional(bool, false) mesh_certificates = optional(bool) pod_security_policy = optional(bool, false) allow_net_admin = optional(bool, false) resource_usage_export = optional(object({ dataset = string enable_network_egress_metering = optional(bool) enable_resource_consumption_metering = optional(bool) })) service_external_ips = optional(bool, true) tpu = optional(bool, false) upgrade_notifications = optional(object({ topic_id = optional(string) })) vertical_pod_autoscaling = optional(bool, false) })">object({…})</code> | | <code>{}</code> |
|
|
| [issue_client_certificate](variables.tf#L100) | Enable issuing client certificate. | <code>bool</code> | | <code>false</code> |
|
|
| [labels](variables.tf#L106) | Cluster resource labels. | <code>map(string)</code> | | <code>null</code> |
|
|
| [logging_config](variables.tf#L117) | Logging configuration. | <code title="object({ enable_api_server_logs = optional(bool, false) enable_scheduler_logs = optional(bool, false) enable_controller_manager_logs = optional(bool, false) })">object({…})</code> | | <code>{}</code> |
|
|
| [maintenance_config](variables.tf#L128) | Maintenance window configuration. | <code title="object({ daily_window_start_time = optional(string) recurring_window = optional(object({ start_time = string end_time = string recurrence = string })) maintenance_exclusions = optional(list(object({ name = string start_time = string end_time = string scope = optional(string) }))) })">object({…})</code> | | <code title="{ daily_window_start_time = "03:00" recurring_window = null maintenance_exclusion = [] }">{…}</code> |
|
|
| [min_master_version](variables.tf#L151) | Minimum version of the master, defaults to the version of the most recent official release. | <code>string</code> | | <code>null</code> |
|
|
| [monitoring_config](variables.tf#L157) | Monitoring configuration. System metrics collection cannot be disabled. Control plane metrics are optional. Kube state metrics are optional. Google Cloud Managed Service for Prometheus is enabled by default. | <code title="object({ enable_api_server_metrics = optional(bool, false) enable_controller_manager_metrics = optional(bool, false) enable_scheduler_metrics = optional(bool, false) enable_daemonset_metrics = optional(bool, false) enable_deployment_metrics = optional(bool, false) enable_hpa_metrics = optional(bool, false) enable_pod_metrics = optional(bool, false) enable_statefulset_metrics = optional(bool, false) enable_storage_metrics = optional(bool, false) enable_managed_prometheus = optional(bool, true) })">object({…})</code> | | <code>{}</code> |
|
|
| [node_config](variables.tf#L194) | Configuration for nodes and nodepools. | <code title="object({ boot_disk_kms_key = optional(string) service_account = optional(string) tags = optional(list(string)) })">object({…})</code> | | <code>{}</code> |
|
|
| [node_locations](variables.tf#L204) | Zones in which the cluster's nodes are located. | <code>list(string)</code> | | <code>[]</code> |
|
|
| [private_cluster_config](variables.tf#L211) | Private cluster configuration. | <code title="object({ enable_private_endpoint = optional(bool) master_global_access = optional(bool) peering_config = optional(object({ export_routes = optional(bool) import_routes = optional(bool) project_id = optional(string) })) })">object({…})</code> | | <code>null</code> |
|
|
| [release_channel](variables.tf#L230) | Release channel for GKE upgrades. Clusters created in the Autopilot mode must use a release channel. Choose between \"RAPID\", \"REGULAR\", and \"STABLE\". | <code>string</code> | | <code>"REGULAR"</code> |
|
|
|
|
## Outputs
|
|
|
|
| name | description | sensitive |
|
|
|---|---|:---:|
|
|
| [ca_certificate](outputs.tf#L17) | Public certificate of the cluster (base64-encoded). | ✓ |
|
|
| [cluster](outputs.tf#L23) | Cluster resource. | ✓ |
|
|
| [endpoint](outputs.tf#L29) | Cluster endpoint. | |
|
|
| [id](outputs.tf#L34) | Fully qualified cluster ID. | |
|
|
| [location](outputs.tf#L39) | Cluster location. | |
|
|
| [master_version](outputs.tf#L44) | Master version. | |
|
|
| [name](outputs.tf#L49) | Cluster name. | |
|
|
| [notifications](outputs.tf#L54) | GKE Pub/Sub notifications topic. | |
|
|
| [self_link](outputs.tf#L59) | Cluster self link. | ✓ |
|
|
| [workload_identity_pool](outputs.tf#L65) | Workload identity pool. | |
|
|
<!-- END TFDOC -->
|