350 lines
24 KiB
Markdown
350 lines
24 KiB
Markdown
# GKE Standard cluster module
|
|
|
|
This module offers a way to create and manage Google Kubernetes Engine (GKE) [Standard clusters](https://cloud.google.com/kubernetes-engine/docs/concepts/choose-cluster-mode#why-standard). With its sensible default settings based on best practices and authors' experience as Google Cloud practitioners, the module accommodates for many common use cases out-of-the-box, without having to rely on verbose configuration.
|
|
|
|
> [!IMPORTANT]
|
|
> This module should be used together with the [`gke-nodepool`](../gke-nodepool/) module because the default node pool is deleted upon cluster creation and cannot be re-created.
|
|
|
|
<!-- BEGIN TOC -->
|
|
- [Example](#example)
|
|
- [GKE Standard cluster](#gke-standard-cluster)
|
|
- [Enable Dataplane V2](#enable-dataplane-v2)
|
|
- [Managing GKE logs](#managing-gke-logs)
|
|
- [Monitoring configuration](#monitoring-configuration)
|
|
- [Disable GKE logs or metrics collection](#disable-gke-logs-or-metrics-collection)
|
|
- [Cloud DNS](#cloud-dns)
|
|
- [Backup for GKE](#backup-for-gke)
|
|
- [Automatic creation of new secondary ranges](#automatic-creation-of-new-secondary-ranges)
|
|
- [Variables](#variables)
|
|
- [Outputs](#outputs)
|
|
<!-- END TOC -->
|
|
|
|
## Example
|
|
|
|
### GKE Standard cluster
|
|
|
|
This example shows how to [create a zonal GKE cluster in Standard mode](https://cloud.google.com/kubernetes-engine/docs/how-to/creating-a-zonal-cluster).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {
|
|
pods = "pods"
|
|
services = "services"
|
|
}
|
|
master_authorized_ranges = {
|
|
internal-vms = "10.0.0.0/8"
|
|
}
|
|
master_ipv4_cidr_block = "192.168.0.0/28"
|
|
}
|
|
max_pods_per_node = 32
|
|
private_cluster_config = {
|
|
enable_private_endpoint = true
|
|
master_global_access = false
|
|
}
|
|
labels = {
|
|
environment = "dev"
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=basic.yaml
|
|
```
|
|
|
|
### Enable Dataplane V2
|
|
|
|
This example shows how to [create a zonal GKE Cluster with Dataplane V2 enabled](https://cloud.google.com/kubernetes-engine/docs/how-to/dataplane-v2).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-dataplane-v2"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
master_authorized_ranges = {
|
|
internal-vms = "10.0.0.0/8"
|
|
}
|
|
master_ipv4_cidr_block = "192.168.0.0/28"
|
|
}
|
|
private_cluster_config = {
|
|
enable_private_endpoint = true
|
|
master_global_access = false
|
|
}
|
|
enable_features = {
|
|
dataplane_v2 = true
|
|
fqdn_network_policy = true
|
|
workload_identity = true
|
|
}
|
|
labels = {
|
|
environment = "dev"
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=dataplane-v2.yaml
|
|
```
|
|
|
|
### Managing GKE logs
|
|
|
|
This example shows you how to [control which logs are sent from your GKE cluster to Cloud Logging](https://cloud.google.com/stackdriver/docs/solutions/gke/installing).
|
|
|
|
When you create a new GKE cluster, [Cloud Operations for GKE](https://cloud.google.com/stackdriver/docs/solutions/gke) integration with Cloud Logging is enabled by default and [System logs](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-logs#what_logs) are collected. You can enable collection of several other [types of logs](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-logs#what_logs). The following example enables collection of *all* optional logs.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
logging_config = {
|
|
enable_workloads_logs = true
|
|
enable_api_server_logs = true
|
|
enable_scheduler_logs = true
|
|
enable_controller_manager_logs = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=logging-config-enable-all.yaml
|
|
```
|
|
|
|
### Monitoring configuration
|
|
|
|
This example shows how to [configure collection of Kubernetes control plane metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-control-plane-metrics). These metrics are optional and are not collected by default.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
monitoring_config = {
|
|
enable_api_server_metrics = true
|
|
enable_controller_manager_metrics = true
|
|
enable_scheduler_metrics = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=monitoring-config-control-plane.yaml
|
|
```
|
|
|
|
The next example shows how to [configure collection of kube state metrics](https://cloud.google.com/stackdriver/docs/solutions/gke/managing-metrics#enable-ksm). These metrics are optional and are not collected by default.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {} # use default names "pods" and "services"
|
|
}
|
|
monitoring_config = {
|
|
enable_daemonset_metrics = true
|
|
enable_deployment_metrics = true
|
|
enable_hpa_metrics = true
|
|
enable_pod_metrics = true
|
|
enable_statefulset_metrics = true
|
|
enable_storage_metrics = true
|
|
# Kube state metrics collection requires Google Cloud Managed Service for Prometheus,
|
|
# which is enabled by default.
|
|
# enable_managed_prometheus = true
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=monitoring-config-kube-state.yaml
|
|
```
|
|
|
|
The *control plane metrics* and *kube state metrics* collection can be configured in a single `monitoring_config` block.
|
|
|
|
### Disable GKE logs or metrics collection
|
|
|
|
> [!WARNING]
|
|
> If you've disabled Cloud Logging or Cloud Monitoring, GKE customer support
|
|
> is offered on a best-effort basis and might require additional effort
|
|
> from your engineering team.
|
|
|
|
This example shows how to fully disable logs collection on a zonal GKE Standard cluster. This is not recommended.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
logging_config = {
|
|
enable_system_logs = false
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=logging-config-disable-all.yaml
|
|
```
|
|
|
|
The next example shows how to fully disable metrics collection on a zonal GKE Standard cluster. This is not recommended.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = "myproject"
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
monitoring_config = {
|
|
enable_system_metrics = false
|
|
enable_managed_prometheus = false
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=monitoring-config-disable-all.yaml
|
|
```
|
|
|
|
### Cloud DNS
|
|
|
|
This example shows how to [use Cloud DNS as a Kubernetes DNS provider](https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns) for GKE Standard clusters.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
enable_features = {
|
|
dns = {
|
|
provider = "CLOUD_DNS"
|
|
scope = "CLUSTER_SCOPE"
|
|
domain = "gke.local"
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=dns.yaml
|
|
```
|
|
|
|
### Backup for GKE
|
|
|
|
> [!NOTE]
|
|
> Although Backup for GKE can be enabled as an add-on when configuring your GKE clusters, it is a separate service from GKE.
|
|
|
|
[Backup for GKE](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/backup-for-gke) is a service for backing up and restoring workloads in GKE clusters. It has two components:
|
|
|
|
* A [Google Cloud API](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/reference/rest) that serves as the control plane for the service.
|
|
* A GKE add-on (the [Backup for GKE agent](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/concepts/backup-for-gke#agent_overview)) that must be enabled in each cluster for which you wish to perform backup and restore operations.
|
|
|
|
This example shows how to [enable Backup for GKE on a new zonal GKE Standard cluster](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/how-to/install#enable_on_a_new_cluster_optional) and [plan a set of backups](https://cloud.google.com/kubernetes-engine/docs/add-on/backup-for-gke/how-to/backup-plan).
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_names = {}
|
|
}
|
|
backup_configs = {
|
|
enable_backup_agent = true
|
|
backup_plans = {
|
|
"backup-1" = {
|
|
region = "europe-west2"
|
|
schedule = "0 9 * * 1"
|
|
applications = {
|
|
namespace-1 = ["app-1", "app-2"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2 inventory=backup.yaml
|
|
```
|
|
|
|
### Automatic creation of new secondary ranges
|
|
|
|
You can use `var.vpc_config.secondary_range_blocks` to let GKE create new secondary ranges for the cluster. The example below reserves an available /14 block for pods and a /20 for services.
|
|
|
|
```hcl
|
|
module "cluster-1" {
|
|
source = "./fabric/modules/gke-cluster-standard"
|
|
project_id = var.project_id
|
|
name = "cluster-1"
|
|
location = "europe-west1-b"
|
|
vpc_config = {
|
|
network = var.vpc.self_link
|
|
subnetwork = var.subnet.self_link
|
|
secondary_range_blocks = {
|
|
pods = ""
|
|
services = "/20" # can be an empty string as well
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1
|
|
```
|
|
<!-- BEGIN TFDOC -->
|
|
## Variables
|
|
|
|
| name | description | type | required | default |
|
|
|---|---|:---:|:---:|:---:|
|
|
| [location](variables.tf#L211) | Cluster zone or region. | <code>string</code> | ✓ | |
|
|
| [name](variables.tf#L322) | Cluster name. | <code>string</code> | ✓ | |
|
|
| [project_id](variables.tf#L358) | Cluster project id. | <code>string</code> | ✓ | |
|
|
| [vpc_config](variables.tf#L369) | VPC-level configuration. | <code title="object({ network = string subnetwork = string master_ipv4_cidr_block = optional(string) secondary_range_blocks = optional(object({ pods = string services = string })) secondary_range_names = optional(object({ pods = optional(string, "pods") services = optional(string, "services") })) master_authorized_ranges = optional(map(string)) stack_type = optional(string) })">object({…})</code> | ✓ | |
|
|
| [backup_configs](variables.tf#L17) | Configuration for Backup for GKE. | <code title="object({ enable_backup_agent = optional(bool, false) backup_plans = optional(map(object({ region = string applications = optional(map(list(string))) encryption_key = optional(string) include_secrets = optional(bool, true) include_volume_data = optional(bool, true) namespaces = optional(list(string)) schedule = optional(string) retention_policy_days = optional(number) retention_policy_lock = optional(bool, false) retention_policy_delete_lock_days = optional(number) })), {}) })">object({…})</code> | | <code>{}</code> |
|
|
| [cluster_autoscaling](variables.tf#L38) | Enable and configure limits for Node Auto-Provisioning with Cluster Autoscaler. | <code title="object({ autoscaling_profile = optional(string, "BALANCED") auto_provisioning_defaults = optional(object({ boot_disk_kms_key = optional(string) disk_size = optional(number) disk_type = optional(string, "pd-standard") image_type = optional(string) oauth_scopes = optional(list(string)) service_account = optional(string) management = optional(object({ auto_repair = optional(bool, true) auto_upgrade = optional(bool, true) })) shielded_instance_config = optional(object({ integrity_monitoring = optional(bool, true) secure_boot = optional(bool, false) })) upgrade_settings = optional(object({ blue_green = optional(object({ node_pool_soak_duration = optional(string) standard_rollout_policy = optional(object({ batch_percentage = optional(number) batch_node_count = optional(number) batch_soak_duration = optional(string) })) })) surge = optional(object({ max = optional(number) unavailable = optional(number) })) })) })) cpu_limits = optional(object({ min = number max = number })) mem_limits = optional(object({ min = number max = number })) gpu_resources = optional(list(object({ resource_type = string min = number max = number }))) })">object({…})</code> | | <code>null</code> |
|
|
| [deletion_protection](variables.tf#L115) | Whether or not to allow Terraform to destroy the cluster. Unless this field is set to false in Terraform state, a terraform destroy or terraform apply that would delete the cluster will fail. | <code>bool</code> | | <code>true</code> |
|
|
| [description](variables.tf#L122) | Cluster description. | <code>string</code> | | <code>null</code> |
|
|
| [enable_addons](variables.tf#L128) | Addons enabled in the cluster (true means enabled). | <code title="object({ cloudrun = optional(bool, false) config_connector = optional(bool, false) dns_cache = optional(bool, false) gce_persistent_disk_csi_driver = optional(bool, false) gcp_filestore_csi_driver = optional(bool, false) gcs_fuse_csi_driver = optional(bool, false) horizontal_pod_autoscaling = optional(bool, false) http_load_balancing = optional(bool, false) istio = optional(object({ enable_tls = bool })) kalm = optional(bool, false) network_policy = optional(bool, false) })">object({…})</code> | | <code title="{ horizontal_pod_autoscaling = true http_load_balancing = true }">{…}</code> |
|
|
| [enable_features](variables.tf#L152) | Enable cluster-level features. Certain features allow configuration. | <code title="object({ binary_authorization = optional(bool, false) cost_management = optional(bool, false) dns = optional(object({ provider = optional(string) scope = optional(string) domain = optional(string) })) database_encryption = optional(object({ state = string key_name = string })) dataplane_v2 = optional(bool, false) fqdn_network_policy = optional(bool, false) gateway_api = optional(bool, false) groups_for_rbac = optional(string) image_streaming = optional(bool, false) intranode_visibility = optional(bool, false) l4_ilb_subsetting = optional(bool, false) mesh_certificates = optional(bool) pod_security_policy = optional(bool, false) resource_usage_export = optional(object({ dataset = string enable_network_egress_metering = optional(bool) enable_resource_consumption_metering = optional(bool) })) shielded_nodes = optional(bool, false) tpu = optional(bool, false) upgrade_notifications = optional(object({ topic_id = optional(string) })) vertical_pod_autoscaling = optional(bool, false) workload_identity = optional(bool, true) })">object({…})</code> | | <code title="{ workload_identity = true }">{…}</code> |
|
|
| [issue_client_certificate](variables.tf#L199) | Enable issuing client certificate. | <code>bool</code> | | <code>false</code> |
|
|
| [labels](variables.tf#L205) | Cluster resource labels. | <code>map(string)</code> | | <code>null</code> |
|
|
| [logging_config](variables.tf#L216) | Logging configuration. | <code title="object({ enable_system_logs = optional(bool, true) enable_workloads_logs = optional(bool, false) enable_api_server_logs = optional(bool, false) enable_scheduler_logs = optional(bool, false) enable_controller_manager_logs = optional(bool, false) })">object({…})</code> | | <code>{}</code> |
|
|
| [maintenance_config](variables.tf#L237) | Maintenance window configuration. | <code title="object({ daily_window_start_time = optional(string) recurring_window = optional(object({ start_time = string end_time = string recurrence = string })) maintenance_exclusions = optional(list(object({ name = string start_time = string end_time = string scope = optional(string) }))) })">object({…})</code> | | <code title="{ daily_window_start_time = "03:00" recurring_window = null maintenance_exclusion = [] }">{…}</code> |
|
|
| [max_pods_per_node](variables.tf#L260) | Maximum number of pods per node in this cluster. | <code>number</code> | | <code>110</code> |
|
|
| [min_master_version](variables.tf#L266) | Minimum version of the master, defaults to the version of the most recent official release. | <code>string</code> | | <code>null</code> |
|
|
| [monitoring_config](variables.tf#L272) | Monitoring configuration. Google Cloud Managed Service for Prometheus is enabled by default. | <code title="object({ enable_system_metrics = optional(bool, true) enable_api_server_metrics = optional(bool, false) enable_controller_manager_metrics = optional(bool, false) enable_scheduler_metrics = optional(bool, false) enable_daemonset_metrics = optional(bool, false) enable_deployment_metrics = optional(bool, false) enable_hpa_metrics = optional(bool, false) enable_pod_metrics = optional(bool, false) enable_statefulset_metrics = optional(bool, false) enable_storage_metrics = optional(bool, false) enable_managed_prometheus = optional(bool, true) })">object({…})</code> | | <code>{}</code> |
|
|
| [node_config](variables.tf#L327) | Node-level configuration. | <code title="object({ boot_disk_kms_key = optional(string) service_account = optional(string) tags = optional(list(string)) })">object({…})</code> | | <code>{}</code> |
|
|
| [node_locations](variables.tf#L337) | Zones in which the cluster's nodes are located. | <code>list(string)</code> | | <code>[]</code> |
|
|
| [private_cluster_config](variables.tf#L344) | Private cluster configuration. | <code title="object({ enable_private_endpoint = optional(bool) master_global_access = optional(bool) peering_config = optional(object({ export_routes = optional(bool) import_routes = optional(bool) project_id = optional(string) })) })">object({…})</code> | | <code>null</code> |
|
|
| [release_channel](variables.tf#L363) | Release channel for GKE upgrades. | <code>string</code> | | <code>null</code> |
|
|
|
|
## Outputs
|
|
|
|
| name | description | sensitive |
|
|
|---|---|:---:|
|
|
| [ca_certificate](outputs.tf#L17) | Public certificate of the cluster (base64-encoded). | ✓ |
|
|
| [cluster](outputs.tf#L23) | Cluster resource. | ✓ |
|
|
| [endpoint](outputs.tf#L29) | Cluster endpoint. | |
|
|
| [id](outputs.tf#L34) | FUlly qualified cluster id. | |
|
|
| [location](outputs.tf#L39) | Cluster location. | |
|
|
| [master_version](outputs.tf#L44) | Master version. | |
|
|
| [name](outputs.tf#L49) | Cluster name. | |
|
|
| [notifications](outputs.tf#L54) | GKE PubSub notifications topic. | |
|
|
| [self_link](outputs.tf#L59) | Cluster self link. | ✓ |
|
|
| [workload_identity_pool](outputs.tf#L65) | Workload identity pool. | |
|
|
<!-- END TFDOC -->
|