GKE stateful blueprints (#2059)

* var definitions

* skeleton, untested

* fix errors, test with existing cluster

* test vpc creation, todo notes

* initial variables for AR and image

* initial variables for AR and image

* Add support for remote repositories to artifact-registry

* Add support for virtual repositories to artifact-registry

* Add support for extra config options to artifact-registry

* artifact registry module: add validation and precondition, fix tests

* ar module id/name

* registry

* service accoutn and roles

* fetch pods, remove image prefix

* small changes

* use additive IAM at project level

* use additive IAM at project level

* configmaps

* manifests

* fix statefulset manifest

* service manifest

* fix configmap mode

* add todo

* job (broken)

* job

* wait on manifest, endpoints datasource

* fix job

* Fix local

* sa

* Update README.md

* Restructure gke bp

* refactor tree and infra variables

* no create test

* simplify cluster SA

* test cluster and vpc creation

* project creation fixes

* use iam_members variable

* nits

* readme with examples

* readme with examples

* outputs

* variables, provider configuration

* variables, manifests

* start cluster job

* fix redis cluster creation

Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com>

* Revert changes in autopilot cluster

* Default templates path, use namespace for node names

* Update readmes

* Fix IAM bindings

* Make STABLE the default release channel

* Use Cloud DNS as default DNS provider

* Allow optional Cloud NAT creation

* Allow backup agent and proxy only subnet

* Work around terraform not short-circuiting logical operators

* Rename create variables to be more consistent with other blueprints

* Add basic features

* Update variable names

* Initial kafka JS

* Move providers to a new file

* Kafka / Strimzi

* First possibily working version for MySQL (with a lot of todo's left)

* Explicitly use proxy repo + some other fixes

* Strimzi draft

* Refactor variables, use CluterIP as pointer for mysql-router for bootstraping

* Validate number of replicas, autoscale required number of running nodes to n/2+1

* Use seaprate service for bootstrap, do not recreate all resources on change of replicas count as the config is preserved in PV

* Test dual chart kafka

* Update chart for kafka

* Expose basic kafka configuration options

* Remove unused manifest

* Added batch blueprint

* Added README

* switch to kubectl_manifest

* Add README and support for static IP address

* Move namespace creation to helm

* Interpolate kafka variables

* Rename kafka-strimzi to kafka

* Added TUTORIAL for cloudshell for batch blueprint

* deleted tutorial

* Remove commented replace trigger

* Move to helm chart

* WIP of Cloud Shell tutorial for MySQL

* Rename folders

* Fix rename

* Update paths

* Unify styles

* Update paths

* Add Readme links

* Update mysql tutorial

* Fix path according to self-link

* Use relative path to cwd

* Fix service_account variable location

* Fix tfvars creation

* Restore some fixes for helm deployment

* Add cluster deletion_prevention

* Fixes for tutorial

* Update cluster docs

* Fixes to batch tutorial

* Bare bones readme for batch

* Update batch readme

* README fixes

* Fix README title for redis

* Fix Typos

* Make it easy to pass variables from autopilot-cluster to other modules

* Add connectivity test and bastion host

* updates to readme, and gpu fix

* Add versions.tf and README updates

* Fix typo

* Kafka and Redis README updates

* Update versions.tf

* Fixes

* Add boilerplate

* Fix linting

* Move mysql to separate branch

* Update cloud shell links

* Fix broken link

---------

Co-authored-by: Ludo <ludomagno@google.com>
Co-authored-by: Daniel Marzini <44803752+danielmarzini@users.noreply.github.com>
Co-authored-by: Wiktor Niesiobędzki <wiktorn@google.com>
Co-authored-by: Miren Esnaola <mirene@google.com>
This commit is contained in:
Julio Castillo 2024-02-08 19:28:41 +01:00 committed by GitHub
parent da11396e3a
commit c42c4c141f
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
44 changed files with 12529 additions and 1 deletions

3
.gitignore vendored
View File

@ -57,4 +57,5 @@ blueprints/gke/autopilot/ansible/gssh.sh
blueprints/gke/autopilot/ansible/vars/vars.yaml
blueprints/gke/autopilot/bundle/monitoring/kustomization.yaml
blueprints/gke/autopilot/bundle/locust/kustomization.yaml
blueprints/gke/autopilot/bundle.tar.gz
blueprints/gke/autopilot/bundle.tar.gz
blueprints/gke/patterns/batch/job-*.yaml

View File

@ -0,0 +1,5 @@
# GKE Jumpstart Blueprints
This directory includes several blueprints related to Google Kubernetes Engine (GKE), following Google recommendations and best practices. The blueprints in this directory split the deployment process into two stages: an initial infrastructure stage that provisions the cluster, and additional workload stages that deploy specific types of applications/workloads.
As a design rule, all the blueprints in this directory provide sensible defaults for most variables while still providing an enterprise-grade deployment with secure defaults and the ability to use existing resources that are typically found in an enterprise-grade environment.

View File

@ -0,0 +1,112 @@
# GKE Autopilot Cluster Pattern
This blueprint illustrates how to use GKE features to deploy a secure cluster that meets Google's best practices. The cluster deployed by this blueprint can be used to deploy other blueprints such as [Redis](../redis-cluster), [Kafka](../kafka), [Kueue](../batch).
<!-- BEGIN TOC -->
- [Design Decisions](#design-decisions)
- [GKE Onboarding Best Practices](#gke-onboarding-best-practices)
- [Environment setup](#environment-setup)
- [Cluster configuration](#cluster-configuration)
- [Security](#security)
- [Networking](#networking)
- [Multitenancy](#multitenancy)
- [Monitoring](#monitoring)
- [Maintenance](#maintenance)
- [Variables](#variables)
- [Outputs](#outputs)
<!-- END TOC -->
## Design Decisions
The main purpose of this blueprint is to showcase how to use GKE features to deploy a secure Kubernetes cluster according to Google best practices, including:
- **No public IP addresses** both the control plane and the nodes use private IP addresses. To to simplify the deployment of workloads, we enable [Connect Gateway](https://cloud.google.com/anthos/multicluster-management/gateway) to securely access the control plane even from outside the cluster's VPC. We also use [Remote Repositories](https://cloud.google.com/artifact-registry/docs/repositories/remote-overview) to allow the download of container images by the cluster without requiring Internet egress configured in the clusters's VPC.
- We provide **reasonable but secure defaults** that the user can override. For example, by default we avoid deploying a Cloud NAT gatewayt, but it is possible to enable it with just a few changes to the configuration.
- **Bring your own infrastructure**: that larger organizations might have teams dedicated to the provisioning and management of centralized infrastructure. This blueprint can be deployed to create any required infrastructure (GCP project, VPC, Artifact Registry, etc), or you can leverage existing resources by setting the appropriate variables.
## GKE Onboarding Best Practices
This Terraform blueprint helps you quickly implement most of the [GKE oboarding best practices](https://cloud.google.com/kubernetes-engine/docs/best-practices/onboarding#set-up-terraform) as outlined in the official GKE documentation. In this section we describe the relevant the decisions this blueprint simplifies
### Environment setup
- Set up Terraform: you'll need to install Terraform to use this blueprint. Instructions are [available here](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/getting_started).
- Terraform state storage: this blueprint doesn't automate this step but can easily be done by specifying a [backend](https://developer.hashicorp.com/terraform/language/settings/backends/gcs).
- Create a metrics scope using Terraform: if you're creating a new project with this blueprint, you can enable metrics scope using the `metrics_scope` variable in the `project` module. Otherwise, metrics scope setup occurs outside this blueprint's scope.
- Set up Artifact Registry: by default a remote repository is created to allow downloading container images
### Cluster configuration
This blueprint by default deploys an Autopilot cluster with private nodes and private control plane. By using Autopilot, Google automatically handles node configuration, scaling, and security
- Choose a mode of operation: this blueprint uses Autopilot clusters
- Isolate your cluster: this blueprint deploys a private cluster, with private control plane
- Configure backup for GKE: not configured but can easily be enabled through the `backup_configs` in the `gke-cluster-autopilot` module.
- Use Container-Optimized OS node images: Autopilot cluster always user COS
- Enable node auto-provisioning: automatically managed by Autopilot
- Separate kube-system Pods: automatically managed by Autopilot
### Security
- Use the security posture dashboard: enabled by default in new clusters
- Use group authentication: not needed by this blueprint but can be enabled through the `enable_features.groups_for_rbac` variable of the `gke-cluster-autopilot` module.
- Use RBAC to restrict access to cluster resources: this blueprint deploys the underlying infrastructure, RBAC configuration is out of scope.
- Enable Shielded GKE Nodes: automatically managed by Autopilot
- Enable Workload Identity: automatically managed by Autopilot
- Enable security bulletin notifications: out of scope for this blueprint
- Use least privilege Google service accounts: this blueprint creates a new service account for the cluster
- Restrict network access to the control plane and nodes: this blueprint deploys a private cluster
- Use namespaces to restrict access to cluster resources: this blueprint deploys the underlying infrastructure, namespace handling is left to applications.
### Networking
- Create a custom mode VPC: this blueprint can optinally deploy a new custom VPC with a single subnet. Otherwise, an existing VPC and subnet can be used.
- Create a proxy-only subnet: the `vpc_create` variable allows the creation of proxy only subnet, if needed.
- Configure Shared VPC: by default a new VPC is created within the project, but a Shared VPC can be used when the blueprint handles project creation.
- Connect the cluster's VPC network to an on-premises network: skipped, out of scope for this blueprint
- Enable Cloud NAT: the `vpc_create` variable allows the creation of Cloud NAT, if needed.
- Configure Cloud DNS for GKE: not needed by this blueprint but can be enabled through the `enable_features.dns` variable of the `gke-cluster-autopilot` module.
- Configure NodeLocal DNSCache: not needed by this blueprint
- Create firewall rules: only the default rules created by GKE
### Multitenancy
For simplicity, multi-tenancy is not used in this blueprint.
### Monitoring
- Configure GKE alert policies: out of scope for this blueprint
- Enable Google Cloud Managed Service for Prometheus: automatically managed by Autopilot
- Configure control plane metrics: enabled by default
- Enable metrics packages: out of scope for this blueprint
### Maintenance
- Create environments: out of scope for this blueprint
- Subscribe to Pub/Sub events: out of scope for this blueprint
- Enroll in release channels: the REGULAR channel is used by default
- Configure maintenance windows: not configured but can be enabled through the `maintenance_config` in the `gke-cluster-autopilot` module.
- Set Compute Engine quotas: out of scope for this blueprint
- Configure cost controls: TBD
- Configure billing alerts: out of scope for this blueprint
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [cluster_name](variables.tf#L42) | Name of new or existing cluster. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L70) | Project id of existing or created project. | <code>string</code> | ✓ | |
| [region](variables.tf#L75) | Region used for cluster and network resources. | <code>string</code> | ✓ | |
| [cluster_create](variables.tf#L17) | Cluster configuration for newly created cluster. Set to null to use existing cluster, or create using defaults in new project. | <code title="object&#40;&#123;&#10; deletion_protection &#61; optional&#40;bool, true&#41;&#10; labels &#61; optional&#40;map&#40;string&#41;&#41;&#10; master_authorized_ranges &#61; optional&#40;map&#40;string&#41;, &#123;&#10; rfc-1918-10-8 &#61; &#34;10.0.0.0&#47;8&#34;&#10; &#125;&#41;&#10; master_ipv4_cidr_block &#61; optional&#40;string, &#34;172.16.255.0&#47;28&#34;&#41;&#10; vpc &#61; optional&#40;object&#40;&#123;&#10; id &#61; string&#10; subnet_id &#61; string&#10; secondary_range_names &#61; optional&#40;object&#40;&#123;&#10; pods &#61; optional&#40;string, &#34;pods&#34;&#41;&#10; services &#61; optional&#40;string, &#34;services&#34;&#41;&#10; &#125;&#41;, &#123;&#125;&#41;&#10; &#125;&#41;&#41;&#10; options &#61; optional&#40;object&#40;&#123;&#10; release_channel &#61; optional&#40;string, &#34;REGULAR&#34;&#41;&#10; enable_backup_agent &#61; optional&#40;bool, false&#41;&#10; &#125;&#41;, &#123;&#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [fleet_project_id](variables.tf#L47) | GKE Fleet project id. If null cluster project will also be used for fleet. | <code>string</code> | | <code>null</code> |
| [prefix](variables.tf#L53) | Prefix used for resource names. | <code>string</code> | | <code>&#34;jump-0&#34;</code> |
| [project_create](variables.tf#L60) | Project configuration for newly created project. Leave null to use existing project. Project creation forces VPC and cluster creation. | <code title="object&#40;&#123;&#10; billing_account &#61; string&#10; parent &#61; optional&#40;string&#41;&#10; shared_vpc_host &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [registry_create](variables.tf#L80) | Create remote Docker Artifact Registry. | <code>bool</code> | | <code>true</code> |
| [vpc_create](variables.tf#L86) | Project configuration for newly created VPC. Leave null to use existing VPC, or defaults when project creation is required. | <code title="object&#40;&#123;&#10; name &#61; optional&#40;string&#41;&#10; subnet_name &#61; optional&#40;string&#41;&#10; primary_range_nodes &#61; optional&#40;string, &#34;10.0.0.0&#47;24&#34;&#41;&#10; secondary_range_pods &#61; optional&#40;string, &#34;10.16.0.0&#47;20&#34;&#41;&#10; secondary_range_services &#61; optional&#40;string, &#34;10.32.0.0&#47;24&#34;&#41;&#10; enable_cloud_nat &#61; optional&#40;bool, false&#41;&#10; proxy_only_subnet &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [created_resources](outputs.tf#L17) | IDs of the resources created, if any. | |
| [credentials_config](outputs.tf#L44) | Configure how Terraform authenticates to the cluster. | |
| [fleet_host](outputs.tf#L51) | Fleet Connect Gateway host that can be used to configure the GKE provider. | |
| [get_credentials](outputs.tf#L56) | Run one of these commands to get cluster credentials. Credentials via fleet allow reaching private clusters without no direct connectivity. | |
| [region](outputs.tf#L70) | Region used for cluster and network resources. | |
<!-- END TFDOC -->

View File

@ -0,0 +1,134 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
_cluster_sa = (
local.cluster_create
? module.cluster-service-account.0.email
: data.google_container_cluster.cluster.0.node_config.0.service_account
)
cluster_sa = (
local._cluster_sa == "default"
? module.project.service_accounts.default.compute
: local._cluster_sa
)
cluster_sa_roles = [
"roles/artifactregistry.reader",
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/stackdriver.resourceMetadata.writer"
]
cluster_vpc = (
local.use_shared_vpc || !local.vpc_create
# cluster variable configures networking
? {
network = try(
var.cluster_create.vpc.id, null
)
secondary_range_names = try(
var.cluster_create.vpc.secondary_range_names, null
)
subnet = try(
var.cluster_create.vpc.subnet_id, null
)
}
# VPC creation configures networking
: {
network = module.vpc.0.id
secondary_range_names = { pods = "pods", services = "services" }
subnet = values(module.vpc.0.subnet_ids)[0]
}
)
}
data "google_container_cluster" "cluster" {
count = !local.cluster_create ? 1 : 0
project = var.project_id
location = var.region
name = var.cluster_name
}
module "cluster-service-account" {
source = "../../../../modules/iam-service-account"
count = local.cluster_create ? 1 : 0
project_id = module.project.project_id
name = var.prefix
}
module "cluster" {
source = "../../../../modules/gke-cluster-autopilot"
count = local.cluster_create ? 1 : 0
project_id = module.project.project_id
deletion_protection = var.cluster_create.deletion_protection
name = var.cluster_name
location = var.region
vpc_config = {
network = local.cluster_vpc.network
subnetwork = local.cluster_vpc.subnet
secondary_range_names = local.cluster_vpc.secondary_range_names
master_authorized_ranges = var.cluster_create.master_authorized_ranges
master_ipv4_cidr_block = var.cluster_create.master_ipv4_cidr_block
}
private_cluster_config = {
enable_private_endpoint = true
master_global_access = true
}
node_config = {
service_account = module.cluster-service-account.0.email
}
labels = var.cluster_create.labels
release_channel = var.cluster_create.options.release_channel
backup_configs = {
enable_backup_agent = var.cluster_create.options.enable_backup_agent
}
enable_features = {
dns = {
provider = "CLOUD_DNS"
scope = "CLUSTER_SCOPE"
domain = "cluster.local"
}
cost_management = true
gateway_api = true
}
monitoring_config = {
enable_api_server_metrics = true
enable_controller_manager_metrics = true
enable_scheduler_metrics = true
}
logging_config = {
enable_api_server_logs = true
enable_scheduler_logs = true
enable_controller_manager_logs = true
}
maintenance_config = {
daily_window_start_time = "01:00"
}
}
check "cluster_networking" {
assert {
condition = (
local.use_shared_vpc
? (
try(var.cluster_create.vpc.id, null) != null &&
try(var.cluster_create.vpc.subnet_id, null) != null
)
: true
)
error_message = "Cluster network and subnetwork are required in shared VPC mode."
}
}

View File

@ -0,0 +1,168 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
cluster_create = var.cluster_create != null || local.vpc_create
create_nat = local.vpc_create && try(var.vpc_create.enable_cloud_nat, false) == true
vpc_create = (
!local.use_shared_vpc && (
var.vpc_create != null || var.project_create != null
)
)
fleet_host = join("", [
"https://connectgateway.googleapis.com/v1/",
"projects/${local.fleet_project.number}/",
"locations/global/gkeMemberships/${var.cluster_name}"
])
fleet_project = (
var.fleet_project_id == null
? {
project_id = var.project_id
number = module.project.number
}
: {
project_id = var.fleet_project_id
number = module.fleet-project.0.number
}
)
proxy_only_subnet = (local.vpc_create && try(var.vpc_create.proxy_only_subnet, null) != null) ? [
{
ip_cidr_range = var.vpc_create.proxy_only_subnet
name = "proxy"
region = var.region
active = true
}
] : null
use_shared_vpc = (
try(var.project_create.shared_vpc_host, null) != null
)
}
module "project" {
source = "../../../../modules/project"
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account, null)
name = var.project_id
project_create = var.project_create != null
services = compact([
"anthos.googleapis.com",
var.registry_create ? "artifactregistry.googleapis.com" : null,
"cloudresourcemanager.googleapis.com",
"connectgateway.googleapis.com",
"container.googleapis.com",
"gkeconnect.googleapis.com",
"gkehub.googleapis.com",
"stackdriver.googleapis.com"
])
shared_vpc_service_config = !local.use_shared_vpc ? null : {
attach = true
host_project = var.project_create.shared_vpc_host
# grant required roles on the host project to service identities
service_identity_iam = {
"roles/compute.networkUser" = [
"cloudservices", "container-engine"
]
"roles/container.hostServiceAgentUser" = [
"container-engine"
]
}
}
iam_bindings_additive = merge(
# allow GKE fleet service identity to manage clusters in this project
{
gkehub-robot = {
role = "roles/gkehub.serviceAgent"
member = (
var.fleet_project_id == null
? "serviceAccount:${module.project.service_accounts.robots.gkehub}"
: "serviceAccount:${module.fleet-project.0.service_accounts.robots.gkehub}"
)
}
},
# grant required roles to GKE node service account
{
for r in local.cluster_sa_roles : "gke-sa-${r}" => {
role = r
member = "serviceAccount:${local.cluster_sa}"
}
}
)
}
module "vpc" {
source = "../../../../modules/net-vpc"
count = local.vpc_create ? 1 : 0
project_id = module.project.project_id
name = coalesce(
try(var.vpc_create.name, null), var.prefix
)
subnets = [{
name = coalesce(
try(var.vpc_create.subnet_name, null), "${var.prefix}-default"
)
region = var.region
ip_cidr_range = try(
var.vpc_create.primary_range_nodes, "10.0.0.0/24"
)
secondary_ip_ranges = {
pods = try(
var.vpc_create.secondary_range_pods, "10.16.0.0/20"
)
services = try(
var.vpc_create.secondary_range_services, "10.32.0.0/24"
)
}
}]
subnets_proxy_only = local.proxy_only_subnet
}
module "fleet-project" {
source = "../../../../modules/project"
count = var.fleet_project_id == null ? 0 : 1
name = var.fleet_project_id
project_create = false
}
module "fleet" {
source = "../../../../modules/gke-hub"
project_id = local.fleet_project.project_id
clusters = {
(var.cluster_name) = (
var.cluster_create != null
? module.cluster.0.id
: "projects/${var.project_id}/locations/${var.region}/clusters/${var.cluster_name}"
)
}
}
module "registry" {
source = "../../../../modules/artifact-registry"
count = var.registry_create ? 1 : 0
project_id = module.project.project_id
location = var.region
name = var.prefix
format = { docker = {} }
mode = { remote = true }
}
module "nat" {
source = "../../../../modules/net-cloudnat"
count = local.create_nat ? 1 : 0
project_id = module.project.project_id
region = var.region
name = "default"
router_network = local.cluster_vpc.network
}

View File

@ -0,0 +1,73 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
output "created_resources" {
description = "IDs of the resources created, if any."
value = merge(
var.project_create == null ? {} : {
project = module.project.project_id
},
!local.vpc_create ? {} : {
subnet_id = one(values(module.vpc.0.subnet_ids))
vpc_id = module.vpc.0.id
},
!var.registry_create ? {} : {
registry = module.registry.0.image_path
},
!local.cluster_create ? {} : {
cluster = module.cluster.0.id
node_service_account = module.cluster-service-account.0.email
},
!local.create_nat ? {} : {
router = module.nat.0.id
cloud_nat = module.nat.0.router.id
},
local.proxy_only_subnet == null ? {} : {
proxy_only_subnet = one(values(module.vpc.0.subnets_proxy_only)).id
},
)
}
output "credentials_config" {
description = "Configure how Terraform authenticates to the cluster."
value = {
fleet_host = local.fleet_host
}
}
output "fleet_host" {
description = "Fleet Connect Gateway host that can be used to configure the GKE provider."
value = local.fleet_host
}
output "get_credentials" {
description = "Run one of these commands to get cluster credentials. Credentials via fleet allow reaching private clusters without no direct connectivity."
value = {
direct = join("", [
"gcloud container clusters get-credentials ${var.cluster_name} ",
"--project ${var.project_id} --location ${var.region}"
])
fleet = join("", [
"gcloud container fleet memberships get-credentials ${var.cluster_name}",
" --project ${var.project_id}"
])
}
}
output "region" {
description = "Region used for cluster and network resources."
value = var.region
}

View File

@ -0,0 +1,98 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "cluster_create" {
description = "Cluster configuration for newly created cluster. Set to null to use existing cluster, or create using defaults in new project."
type = object({
deletion_protection = optional(bool, true)
labels = optional(map(string))
master_authorized_ranges = optional(map(string), {
rfc-1918-10-8 = "10.0.0.0/8"
})
master_ipv4_cidr_block = optional(string, "172.16.255.0/28")
vpc = optional(object({
id = string
subnet_id = string
secondary_range_names = optional(object({
pods = optional(string, "pods")
services = optional(string, "services")
}), {})
}))
options = optional(object({
release_channel = optional(string, "REGULAR")
enable_backup_agent = optional(bool, false)
}), {})
})
default = null
}
variable "cluster_name" {
description = "Name of new or existing cluster."
type = string
}
variable "fleet_project_id" {
description = "GKE Fleet project id. If null cluster project will also be used for fleet."
type = string
default = null
}
variable "prefix" {
description = "Prefix used for resource names."
type = string
nullable = false
default = "jump-0"
}
variable "project_create" {
description = "Project configuration for newly created project. Leave null to use existing project. Project creation forces VPC and cluster creation."
type = object({
billing_account = string
parent = optional(string)
shared_vpc_host = optional(string)
})
default = null
}
variable "project_id" {
description = "Project id of existing or created project."
type = string
}
variable "region" {
description = "Region used for cluster and network resources."
type = string
}
variable "registry_create" {
description = "Create remote Docker Artifact Registry."
type = bool
default = true
}
variable "vpc_create" {
description = "Project configuration for newly created VPC. Leave null to use existing VPC, or defaults when project creation is required."
type = object({
name = optional(string)
subnet_name = optional(string)
primary_range_nodes = optional(string, "10.0.0.0/24")
secondary_range_pods = optional(string, "10.16.0.0/20")
secondary_range_services = optional(string, "10.32.0.0/24")
enable_cloud_nat = optional(bool, false)
proxy_only_subnet = optional(string)
})
default = null
}

View File

@ -0,0 +1,27 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.7.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 5.11.0, < 6.0.0" # tftest
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 5.11.0, < 6.0.0" # tftest
}
}
}

View File

@ -0,0 +1,62 @@
# Batch Processing on GKE with Kueue
<!-- BEGIN TOC -->
- [Introduction](#introduction)
- [Requirements](#requirements)
- [Cluster authentication](#cluster-authentication)
- [Kueue Configuration](#kueue-configuration)
- [Sample Configuration](#sample-configuration)
- [Variables](#variables)
<!-- END TOC -->
## Introduction
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=batch/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
</a>
This blueprint shows how to deploy a batch system using [Kueue](https://kueue.sigs.k8s.io/docs/overview/) to perform job queuing on Google Kubernetes Engine (GKE) using Terraform.
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
## Requirements
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
The Kueue manifests use container images hosted by registry.k8s.io, which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided [Autopilot Cluster Pattern](../autopilot-cluster), you can set the `enable_cloud_nat` option of the `vpc_create` variable.
## Cluster authentication
Once you have a cluster with Internet connectivity, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
## Kueue Configuration
Only two variables are available to control Kueue's configuration:
- `teams_namespaces` which controls the namespaces used by different teams to run jobs.
- `kueue_namespace` which controls the namepsace to deploy Kueue's own resources.
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
## Sample Configuration
The following template as a starting point for your terraform.tfvars
```tfvars
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
teams_namespaces = [
"team-a",
"team-b"
]
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object&#40;&#123;&#10; fleet_host &#61; optional&#40;string&#41;&#10; kubeconfig &#61; optional&#40;object&#40;&#123;&#10; context &#61; optional&#40;string&#41;&#10; path &#61; optional&#40;string, &#34;&#126;&#47;.kube&#47;config&#34;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [kueue_namespace](variables.tf#L36) | Namespaces of the teams running jobs in the clusters. | <code>string</code> | | <code>&#34;kueue-system&#34;</code> |
| [team_namespaces](variables.tf#L43) | Namespaces of the teams running jobs in the clusters. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;team-a&#34;,&#10; &#34;team-b&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [templates_path](variables.tf#L53) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
<!-- END TFDOC -->

View File

@ -0,0 +1,26 @@
#!/bin/bash
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FILE_POSITIONS=$(($# - 1))
SLEEP_TIME=${!#}
while :
do
for i in $(seq 1 $FILE_POSITIONS); do
kubectl create -f ${!i}
done
sleep ${SLEEP_TIME:-10}
done

View File

@ -0,0 +1,30 @@
# skip boilerplate check
apiVersion: batch/v1
kind: Job
metadata:
namespace: team-a # Job under team-a namespace
generateName: sample-job-
annotations:
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
spec:
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
parallelism: 3 # This Job will have 3 replicas running at the same time
completions: 3 # This Job requires 3 completions
suspend: true # Set to true to allow Kueue to control the Job when it starts
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
args: ["10s"] # Sleep for 10 seconds
resources:
requests:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
restartPolicy: Never

View File

@ -0,0 +1,30 @@
# skip boilerplate check
apiVersion: batch/v1
kind: Job
metadata:
namespace: team-b # Job under team-a namespace
generateName: sample-job-
annotations:
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
spec:
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
parallelism: 3 # This Job will have 3 replicas running at the same time
completions: 3 # This Job requires 3 completions
suspend: true # Set to true to allow Kueue to control the Job when it starts
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
args: ["10s"] # Sleep for 10 seconds
resources:
requests:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
restartPolicy: Never

View File

@ -0,0 +1,81 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
wl_templates_path = "${path.module}/manifest-templates"
}
resource "kubectl_manifest" "kueue_namespace_manifest" {
yaml_body = <<EOT
apiVersion: v1
kind: Namespace
metadata:
labels:
control-plane: controller-manager
name: ${var.kueue_namespace}
EOT
}
data "kubectl_file_documents" "kueue_docs" {
content = file("${local.wl_templates_path}/kueue.yaml")
}
data "kubectl_path_documents" "cluster_resources_docs" {
pattern = "${local.wl_templates_path}/cluster-resources/*.yaml"
}
resource "kubectl_manifest" "kueue_manifest" {
for_each = data.kubectl_file_documents.kueue_docs.manifests
yaml_body = each.value
override_namespace = var.kueue_namespace
depends_on = [kubectl_manifest.kueue_namespace_manifest]
}
resource "kubectl_manifest" "cluster_resources_manifests" {
for_each = toset(data.kubectl_path_documents.cluster_resources_docs.documents)
yaml_body = each.value
depends_on = [kubectl_manifest.kueue_manifest]
}
resource "kubectl_manifest" "team_namespace_manifests" {
for_each = toset(var.team_namespaces)
yaml_body = <<EOT
apiVersion: v1
kind: Namespace
metadata:
name: ${each.value}
EOT
}
resource "kubectl_manifest" "local_queues_manifests" {
for_each = toset(var.team_namespaces)
yaml_body = file("${local.wl_templates_path}/team-resources/local-queue.yaml")
override_namespace = each.value
depends_on = [
kubectl_manifest.cluster_resources_manifests,
kubectl_manifest.team_namespace_manifests
]
}
resource "local_file" "job_manifest_files" {
for_each = toset(var.team_namespaces)
content = templatefile("${local.wl_templates_path}/team-resources/job.yaml", {
namespace = each.value
})
filename = "${path.module}/job-${each.value}.yaml"
}

View File

@ -0,0 +1,32 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: cluster-queue
spec:
namespaceSelector: {} # Available to all namespaces
queueingStrategy: BestEffortFIFO # Default queueing strategy
resourceGroups:
- coveredResources: ["cpu", "memory", "ephemeral-storage"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 10
- name: "memory"
nominalQuota: 10Gi
- name: "ephemeral-storage"
nominalQuota: 10Gi

View File

@ -0,0 +1,20 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# [START gke_batch_kueue_intro_flavors]
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: default-flavor # This ResourceFlavor will be used for all the resources
# [END gke_batch_kueue_intro_flavors]

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,42 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: batch/v1
kind: Job
metadata:
namespace: ${namespace} # Job under team-a namespace
generateName: sample-job-
annotations:
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
spec:
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
parallelism: 3 # This Job will have 3 replicas running at the same time
completions: 3 # This Job requires 3 completions
suspend: true # Set to true to allow Kueue to control the Job when it starts
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:latest
args: ["10s"] # Sleep for 10 seconds
resources:
requests:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
limits:
cpu: "500m"
memory: "512Mi"
ephemeral-storage: "512Mi"
restartPolicy: Never

View File

@ -0,0 +1,21 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
namespace: team # LocalQueue under team-a namespace
name: local-queue
spec:
clusterQueue: cluster-queue # Point to the ClusterQueue

View File

@ -0,0 +1,36 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
data "google_client_config" "identity" {
count = var.credentials_config.fleet_host != null ? 1 : 0
}
provider "kubectl" {
config_path = (
var.credentials_config.kubeconfig == null
? null
: pathexpand(var.credentials_config.kubeconfig.path)
)
config_context = try(
var.credentials_config.kubeconfig.context, null
)
host = (
var.credentials_config.fleet_host == null
? null
: var.credentials_config.fleet_host
)
token = try(data.google_client_config.identity.0.access_token, null)
}

View File

@ -0,0 +1,215 @@
# Deploy a batch system using Kueue
This tutorial shows you how to deploy a batch system using Kueue to perform Job queueing on Google Kubernetes Engine (GKE) using Terraform.
Jobs are applications that run to completion, such as machine learning, rendering, simulation, analytics, CI/CD, and similar workloads.
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
Kueue has the following characteristics:
* It is optimized for cloud architectures, where resources are heterogeneous, interchangeable, and scalable.
* It provides a set of APIs to manage elastic quotas and manage Job queueing.
* It does not re-implement existing functionality such as autoscaling, pod scheduling, or Job lifecycle management.
* Kueue has built-in support for the Kubernetesbatch/v1.Job API.
* It can integrate with other job APIs.
* Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.
When working with Kueue there are a few concepts that ome needs to be familiar with:
* ResourceFlavour
An object that you can define to describe what resources are available in a cluster. Typically, it is associated with the characteristics of a group of Nodes: availability, pricing, architecture, models, etc.
* ClusterQueue
A cluster-scoped resource that governs a pool of resources, defining usage limits and fair sharing rules.
* LocalQueue
A namespaced resource that groups closely related workloads belonging to a single tenant.
* Workload
An application that will run to completion. It is the unit of admission in Kueue. Sometimes referred to as job
Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.
## Objectives
This tutorial is for cluster operators and other users that want to implement a batch system on Kubernetes. In this tutorial, you set up a shared cluster for two tenant teams. Each team has their own namespace where they create Jobs and share the same global resources that are controlled with the corresponding quotas.
In this tutorial we will be doing the following using Terraform code available in a git repository:
1. Create a GKE cluster.
2. Create a namespace for Kueue (kueue-system).
3. Create a namespace for each team running batch jobs in the cluster (team-a, team-b).
4. Install Kueue in the namespace created for it.
5. Create the ResourceFlavor.
6. Create the ClusterQueue.
7. Create a LocalQueue for each of the teams in the corresponding namespace.
8. Create for each of teams a manifest for a sample job associated with the corresponding LocalQueue.
Estimated time:
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
To get started, click Start.
## select/create a project
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
## Create the Autopilot GKE cluster
1. Change to the ```autopilot-cluster``` directory.
```bash
cd autopilot-cluster
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file and update any value as needed.
```hcl
project_id = "<walkthrough-project-name/>"
cluster_name = "cluster"
cluster_create = {
deletion_protection = false
}
region = "europe-west1"
vpc_create = {
enable_cloud_nat = true
}
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Fetch the cluster credentials.
```bash
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
```
8. Check the nodes are ready.
```bash
kubectl get pods -n kube-system
```
## Install Kueue and create associated resources
1. Change to the ```patterns/batch``` directory.
```bash
cd ../batch
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file.
```hcl
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Check that the Kueue pods are ready (Use CTRL+C to exit watching)
```bash
kubectl get pods -n kueue-system -w
```
8. Check the status of the ClusterQueue
```bash
kubectl get clusterqueue cluster-queue -o wide -w
```
9. Check the status of the LocalQueue for the teams
```bash
kubectl get localqueue -n team-a local-queue -o wide -w
```
```bash
kubectl get localqueue -n team-b local-queue -o wide -w
```
## Run jobs in the cluster
1. Create Jobs for namespace team-a and team-b every 10 seconds associated with the corresponding LocalQueue:
```bash
./create_jobs.sh job-team-a.yaml job-team-b.yaml 10
```
Hit Ctrl-C when you want to stop the creation of jobs
2. Observe the workloads being queued up, admitted in the ClusterQueue, and nodes being brought up with GKE Autopilot.
```bash
kubectl -n team-a get workloads
```
3. Copy a Job name from the previous step and observe the admission status and events for a Job through the W Workloads API:
```bash
kubectl -n team-a describe workload JOB_NAME
```
## Destroy resources (optional)
1. Change to the ```patterns/autopilot-cluster``` directory.
```bash
cd ../autopilot-cluster
```
2. Destroy the cluster with the following command.
```bash
terraform destroy
```
## Congratulations
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
Youre all set!

View File

@ -0,0 +1,58 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "credentials_config" {
description = "Configure how Terraform authenticates to the cluster."
type = object({
fleet_host = optional(string)
kubeconfig = optional(object({
context = optional(string)
path = optional(string, "~/.kube/config")
}))
})
nullable = false
validation {
condition = (
(var.credentials_config.fleet_host != null) !=
(var.credentials_config.kubeconfig != null)
)
error_message = "Exactly one of fleet host or kubeconfig must be set."
}
}
variable "kueue_namespace" {
description = "Namespaces of the teams running jobs in the clusters."
type = string
nullable = false
default = "kueue-system"
}
variable "team_namespaces" {
description = "Namespaces of the teams running jobs in the clusters."
type = list(string)
nullable = false
default = [
"team-a",
"team-b"
]
}
variable "templates_path" {
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
type = string
default = null
}

View File

@ -0,0 +1,27 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.7.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 5.11.0, < 6.0.0" # tftest
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 5.11.0, < 6.0.0" # tftest
}
}
}

View File

@ -0,0 +1,66 @@
# Highly Available Kafka on GKE
<!-- BEGIN TOC -->
- [Introduction](#introduction)
- [Requirements](#requirements)
- [Cluster authentication](#cluster-authentication)
- [Kafka Configuration](#kafka-configuration)
- [Sample Configuration](#sample-configuration)
- [Variables](#variables)
<!-- END TOC -->
## Introduction
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=kafka/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
</a>
This blueprints shows how to a hihgly available Kakfa instance on GKE using the Strimzi operator.
## Requirements
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kafka in it.
The Kafka manifests will download , which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided [Autopilot Cluster Pattern](../autopilot-cluster), you can set the `enable_cloud_nat` option of the `vpc_create` variable.
## Cluster authentication
Once you have a cluster with Internet connectivity, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
## Kafka Configuration
This template exposes several variables to configure the Kafka instance:
- `namespace` which controls the namespace used to deploy the Kafka instance
- `kafka_config` to customize the configuration of the Kafka instance. The default configuration deploys version 3.6.0 with 3 replicas, with a disk of 10Gi and 4096 MB of RAM.
- `zookeeper_config` to customize the configuration of the Zookeeper instance. The default configuration deploys 3 replicas, with a disk of 10Gi and 2048 MB of RAM.
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
## Sample Configuration
The following template as a starting point for your terraform.tfvars
```tfvars
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
kafka_config = {
volume_claim_size = "15Gi"
replicas = 4
}
zookeeper_config = {
volume_claim_size = "15Gi"
}
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object&#40;&#123;&#10; fleet_host &#61; optional&#40;string&#41;&#10; kubeconfig &#61; optional&#40;object&#40;&#123;&#10; context &#61; optional&#40;string&#41;&#10; path &#61; optional&#40;string, &#34;&#126;&#47;.kube&#47;config&#34;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [kafka_config](variables.tf#L36) | Configure Kafka cluster statefulset parameters. | <code title="object&#40;&#123;&#10; replicas &#61; optional&#40;number, 3&#41;&#10; volume_claim_size &#61; optional&#40;string, &#34;10Gi&#34;&#41;&#10; version &#61; optional&#40;string, &#34;3.6.0&#34;&#41;&#10; jvm_memory &#61; optional&#40;string, &#34;4096m&#34;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [namespace](variables.tf#L48) | Namespace used for Redis cluster resources. | <code>string</code> | | <code>&#34;kafka&#34;</code> |
| [templates_path](variables.tf#L55) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
| [zookeeper_config](variables.tf#L61) | Configure Zookeper cluster statefulset parameters. | <code title="object&#40;&#123;&#10; replicas &#61; optional&#40;number, 3&#41;&#10; volume_claim_size &#61; optional&#40;string, &#34;10Gi&#34;&#41;&#10; jvm_memory &#61; optional&#40;string, &#34;2048m&#34;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
<!-- END TFDOC -->

View File

@ -0,0 +1,49 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
wl_templates = [
for f in fileset(local.wl_templates_path, "*yaml") :
"${local.wl_templates_path}/${f}"
]
wl_templates_path = (
var.templates_path == null
? "${path.module}/manifest-templates"
: pathexpand(var.templates_path)
)
}
resource "helm_release" "strimzi-operator" {
name = "strimzi-operator"
repository = "https://strimzi.io/charts"
chart = "strimzi-kafka-operator"
namespace = var.namespace
create_namespace = true
}
resource "kubectl_manifest" "kafka-cluster" {
for_each = toset(local.wl_templates)
yaml_body = templatefile(each.value, {
name = "kafka"
namespace = var.namespace
kafka_config = var.kafka_config
zookeeper_config = var.zookeeper_config
})
timeouts {
create = "30m"
}
depends_on = [helm_release.strimzi-operator]
}

View File

@ -0,0 +1,148 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: "${name}"
namespace: "${namespace}"
spec:
kafka:
version: "${kafka_config.version}"
replicas: ${kafka_config.replicas}
template:
pod:
tolerations:
- key: "app.stateful/component"
operator: "Equal"
value: "kafka-broker"
effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "app.stateful/component"
operator: In
values:
- "kafka-broker"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: kafka
strimzi.io/cluster: "${name}"
strimzi.io/component-type: kafka
resources:
requests:
memory: 5Gi
cpu: "1"
limits:
memory: 5Gi
cpu: "2"
jvmOptions:
-Xms: ${kafka_config.jvm_memory}
-Xmx: ${kafka_config.jvm_memory}
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
config:
offsets.topic.replication.factor: 3
transaction.state.log.replication.factor: 3
transaction.state.log.min.isr: 2
default.replication.factor: 3
min.insync.replicas: 2
inter.broker.protocol.version: "3.4"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: ${kafka_config.volume_claim_size}
class: premium-rwo
deleteClaim: false
zookeeper:
template:
pod:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: "app.stateful/component"
operator: In
values:
- "zookeeper"
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "topology.kubernetes.io/zone"
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app.kubernetes.io/name: zookeeper
strimzi.io/cluster: "${name}"
strimzi.io/component-type: zookeeper
replicas: ${zookeeper_config.replicas}
resources:
requests:
memory: 2560Mi
cpu: 1000m
limits:
memory: 2560Mi
cpu: 2000m
jvmOptions:
-Xms: ${zookeeper_config.jvm_memory}
-Xmx: ${zookeeper_config.jvm_memory}
storage:
type: persistent-claim
size: ${zookeeper_config.volume_claim_size}
class: premium-rwo
deleteClaim: false
entityOperator:
tlsSidecar:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 128Mi
topicOperator:
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 500m
memory: 512Mi
userOperator:
resources:
requests:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi
limits:
cpu: 500m
ephemeral-storage: 1Gi
memory: 2Gi

View File

@ -0,0 +1,15 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

View File

@ -0,0 +1,69 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
data "google_client_config" "identity" {
count = var.credentials_config.fleet_host != null ? 1 : 0
}
# provider "kubernetes" {
# config_path = (
# var.credentials_config.kubeconfig == null
# ? null
# : pathexpand(var.credentials_config.kubeconfig.path)
# )
# config_context = try(
# var.credentials_config.kubeconfig.context, null
# )
# host = (
# var.credentials_config.fleet_host == null
# ? null
# : var.credentials_config.fleet_host
# )
# token = try(data.google_client_config.identity.0.access_token, null)
# }
provider "kubectl" {
host = (
var.credentials_config.fleet_host == null
? null
: var.credentials_config.fleet_host
)
config_path = (
var.credentials_config.kubeconfig == null
? null
: pathexpand(var.credentials_config.kubeconfig.path)
)
token = try(data.google_client_config.identity.0.access_token, null)
}
provider "helm" {
kubernetes {
config_path = (
var.credentials_config.kubeconfig == null
? null
: pathexpand(var.credentials_config.kubeconfig.path)
)
config_context = try(
var.credentials_config.kubeconfig.context, null
)
host = (
var.credentials_config.fleet_host == null
? null
: var.credentials_config.fleet_host
)
token = try(data.google_client_config.identity.0.access_token, null)
}
}

View File

@ -0,0 +1,156 @@
# Deploy Apache Kafka to GKE using Strimzi
The guide shows you how to use the Strimzi operator to deploy Apache Kafka clusters ok GKE.
## Objectives
This tutorial covers the following steps:
- Create a GKE cluster.
- Deploy and configure the Strimzi operator
- Configure Apache Kafka using the Strimzi operator
Estimated time:
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
To get started, click Start.
## select/create a project
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
## Create the Autopilot GKE cluster
1. Change to the ```autopilot-cluster``` directory.
```bash
cd autopilot-cluster
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file and update any value as needed.
```hcl
project_id = "<walkthrough-project-name/>"
cluster_name = "cluster"
cluster_create = {
deletion_protection = false
}
region = "europe-west1"
vpc_create = {
enable_cloud_nat = true
}
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Fetch the cluster credentials.
```bash
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
```
8. Check the nodes are ready.
```bash
kubectl get pods -n kube-system
```
## Install the Kafka Strimzi operator and create associated resources
1. Change to the ```patterns/batch``` directory.
```bash
cd ../redis-cluster
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file.
```hcl
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
kafka_config = {
volume_claim_size = "15Gi"
replicas = 4
}
zookeeper_config = {
volume_claim_size = "15Gi"
}
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Check that the Redis pods are ready
```bash
kubectl get pods -n kafka
```
8. Check that the Redis volumes match the number of replicas
```bash
kubectl get pv
```
8. Confirm the Kafka object is running
```bash
kubectl get kafka -n kafka
```
## Destroy resources (optional)
1. Change to the ```patterns/autopilot-cluster``` directory.
```bash
cd ../autopilot-cluster
```
2. Destroy the cluster with the following command.
```bash
terraform destroy
```
## Congratulations
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
Youre all set!

View File

@ -0,0 +1,70 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "credentials_config" {
description = "Configure how Terraform authenticates to the cluster."
type = object({
fleet_host = optional(string)
kubeconfig = optional(object({
context = optional(string)
path = optional(string, "~/.kube/config")
}))
})
nullable = false
validation {
condition = (
(var.credentials_config.fleet_host != null) !=
(var.credentials_config.kubeconfig != null)
)
error_message = "Exactly one of fleet host or kubeconfig must be set."
}
}
variable "kafka_config" {
description = "Configure Kafka cluster statefulset parameters."
type = object({
replicas = optional(number, 3)
volume_claim_size = optional(string, "10Gi")
version = optional(string, "3.6.0")
jvm_memory = optional(string, "4096m")
})
nullable = false
default = {}
}
variable "namespace" {
description = "Namespace used for Redis cluster resources."
type = string
nullable = false
default = "kafka"
}
variable "templates_path" {
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
type = string
default = null
}
variable "zookeeper_config" {
description = "Configure Zookeper cluster statefulset parameters."
type = object({
replicas = optional(number, 3)
volume_claim_size = optional(string, "10Gi")
jvm_memory = optional(string, "2048m")
})
nullable = false
default = {}
}

View File

@ -0,0 +1,27 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.7.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 5.11.0, < 6.0.0" # tftest
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 5.11.0, < 6.0.0" # tftest
}
}
}

View File

@ -0,0 +1,21 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_providers {
kubectl = {
source = "gavinbunney/kubectl"
}
}
}

View File

@ -0,0 +1,62 @@
# Highly Available Redis Cluster on GKE
<!-- BEGIN TOC -->
- [Introduction](#introduction)
- [Requirements](#requirements)
- [Cluster authentication](#cluster-authentication)
- [Redis Cluster Configuration](#redis-cluster-configuration)
- [Sample Configuration](#sample-configuration)
- [Variables](#variables)
<!-- END TOC -->
## Introduction
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=redis-cluster/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
</a>
This blueprint shows how to deploy a highly available Redis cluster on GKE following Google's recommended practices for creating a stateful application.
## Requirements
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
## Cluster authentication
Once you have a cluster with, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
## Redis Cluster Configuration
This template exposes several variables to configure the Redis cluster:
- `namespace` which controls the namespace used to deploy the Redis instances
- `image` to change the container image used by the Redis cluster. Defaults to `redis:6.2` (i.e. the official Redis image, version 6.2)
- `stateful_config` to customize the configuration of the Redis' stateful set configuration. The default configuration deploys a 6-node cluster with requests for 1 CPU, 1Gi of RAM and a 10Gi volume.
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
## Sample Configuration
The following template as a starting point for your terraform.tfvars
```tfvars
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
statefulset_config = {
replicas = 8
resource_requests = {
cpo = "2"
memory = "2Gi"
}
}
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object&#40;&#123;&#10; fleet_host &#61; optional&#40;string&#41;&#10; kubeconfig &#61; optional&#40;object&#40;&#123;&#10; context &#61; optional&#40;string&#41;&#10; path &#61; optional&#40;string, &#34;&#126;&#47;.kube&#47;config&#34;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [image](variables.tf#L36) | Container image to use. | <code>string</code> | | <code>&#34;redis:6.2&#34;</code> |
| [namespace](variables.tf#L43) | Namespace used for Redis cluster resources. | <code>string</code> | | <code>&#34;redis&#34;</code> |
| [statefulset_config](variables.tf#L50) | Configure Redis cluster statefulset parameters. | <code title="object&#40;&#123;&#10; replicas &#61; optional&#40;number, 6&#41;&#10; resource_requests &#61; optional&#40;object&#40;&#123;&#10; cpu &#61; optional&#40;string, &#34;1&#34;&#41;&#10; memory &#61; optional&#40;string, &#34;1Gi&#34;&#41;&#10; &#125;&#41;, &#123;&#125;&#41;&#10; volume_claim_size &#61; optional&#40;string, &#34;10Gi&#34;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [templates_path](variables.tf#L68) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
<!-- END TFDOC -->

View File

@ -0,0 +1,68 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
wl_templates = [
for f in fileset(local.wl_templates_path, "[0-9]*yaml") :
"${local.wl_templates_path}/${f}"
]
wl_templates_path = (
var.templates_path == null
? "${path.module}/manifest-templates"
: pathexpand(var.templates_path)
)
}
resource "kubernetes_namespace" "default" {
metadata {
name = var.namespace
}
}
resource "kubernetes_manifest" "default" {
for_each = toset(local.wl_templates)
manifest = yamldecode(templatefile(each.value, {
image = var.image
namespace = kubernetes_namespace.default.metadata.0.name
statefulset_config = var.statefulset_config
}))
dynamic "wait" {
for_each = strcontains(each.key, "statefulset") ? [""] : []
content {
fields = {
"status.readyReplicas" = var.statefulset_config.replicas
}
}
}
timeouts {
create = "30m"
}
}
resource "kubernetes_manifest" "cluster-start" {
manifest = yamldecode(templatefile("${local.wl_templates_path}/start-cluster.yaml", {
image = var.image
namespace = kubernetes_namespace.default.metadata.0.name
nodes = [
for i in range(var.statefulset_config.replicas) :
"redis-${i}.redis-cluster.${var.namespace}.svc.cluster.local"
]
}))
field_manager {
force_conflicts = true
}
depends_on = [kubernetes_manifest.default]
}

View File

@ -0,0 +1,28 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-cluster
namespace: ${namespace}
data:
redis.conf: |+
cluster-enabled yes
cluster-node-timeout 15000
cluster-config-file /data/nodes.conf
appendonly yes
protected-mode no
dir /data
port 6379

View File

@ -0,0 +1,46 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: v1
kind: ConfigMap
metadata:
name: redis-probes
namespace: ${namespace}
data:
readiness.sh: |-
#!/bin/sh
pingResponse="$(redis-cli -h localhost ping)"
if [ "$?" -eq "124" ]; then
echo "PING timed out"
exit 1
fi
if [ "$pingResponse" != "PONG"]; then
echo "$pingResponse"
exit 1
fi
liveness.sh: |-
#!/bin/sh
pingResponse="$(redis-cli -h localhost ping | head -n1 | awk '{print $1;}')"
if [ "$?" -eq "124" ]; then
echo "PING timed out"
exit 1
fi
if [ "$pingResponse" != "PONG"] && [ "$pingResponse" != "LOADING" ] && [ "$pingResponse" != "MASTERDOWN" ]; then
echo "$pingResponse"
exit 1
fi

View File

@ -0,0 +1,24 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: redis-pdb
namespace: ${namespace}
spec:
minAvailable: 3
selector:
matchLabels:
app: redis

View File

@ -0,0 +1,111 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: redis
namespace: ${namespace}
spec:
serviceName: "redis-cluster"
replicas: ${statefulset_config.replicas}
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
appCluster: redis-cluster
spec:
terminationGracePeriodSeconds: 20
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- redis
topologyKey: kubernetes.io/hostname
containers:
- name: redis
image: ${image}
command:
- "redis-server"
args:
- "/conf/redis.conf"
- "--protected-mode"
- "no"
resources:
requests:
cpu: ${statefulset_config.resource_requests.cpu}
ephemeral-storage: ${statefulset_config.volume_claim_size}
memory: ${statefulset_config.resource_requests.memory}
ports:
- name: redis
containerPort: 6379
protocol: "TCP"
- name: cluster
containerPort: 16379
protocol: "TCP"
startupProbe:
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 20
tcpSocket:
port: redis
livenessProbe:
periodSeconds: 5
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 5
exec:
command: ["sh", "-c", "/probes/liveness.sh"]
readinessProbe:
periodSeconds: 5
timeoutSeconds: 1
successThreshold: 1
failureThreshold: 5
exec:
command: ["sh", "-c", "/probes/readiness.sh"]
volumeMounts:
- name: conf
mountPath: /conf
- name: data
mountPath: /data
- name: probes
mountPath: /probes
readOnly: true
volumes:
- name: conf
configMap:
name: redis-cluster
defaultMode: 493
- name: probes
configMap:
name: redis-probes
defaultMode: 365
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: ${statefulset_config.volume_claim_size}

View File

@ -0,0 +1,19 @@
# skip boilerplate check
apiVersion: v1
kind: Service
metadata:
name: redis-cluster
namespace: ${namespace}
spec:
clusterIP: None
ports:
- name: redis-port
port: 6379
protocol: TCP
targetPort: 6379
selector:
app: redis
appCluster: redis-cluster
sessionAffinity: None
type: ClusterIP

View File

@ -0,0 +1,26 @@
#!/bin/bash
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
urls=$(kubectl get pods -l app=redis -o jsonpath='{range.items[*]}{.status.podIP} ')
command="kubectl exec -it redis-0 -- redis-cli --cluster create --cluster-replicas 1 "
for url in $urls
do
command+=$url":6379 "
done
echo "Executing command: " $command
$command

View File

@ -0,0 +1,56 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: batch/v1
kind: Job
metadata:
name: redis-cluster-start
namespace: ${namespace}
spec:
suspend: false
completions: 1
template:
spec:
restartPolicy: Never
volumes:
- name: shared-data
emptyDir: {}
initContainers:
# we resolve node names in an init container using alpine
# because the redis image doesn't include nslookup
- name: resolve-node-names
image: alpine
volumeMounts:
- name: shared-data
mountPath: /tmp/shared-data
command:
- /bin/sh
- -c
- |
%{~ for n in nodes ~}
echo "$(nslookup ${n} | awk '/^Address: / { print $2 }'):6379" >> /tmp/shared-data/nodes
%{~ endfor ~}
containers:
- name: redis-cluster-start
image: ${image}
volumeMounts:
- name: shared-data
mountPath: /tmp/shared-data
command:
- /bin/sh
- -c
- |
redis-cli --cluster-yes --cluster-replicas 1 --cluster create $(cat /tmp/shared-data/nodes)

View File

@ -0,0 +1,36 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
data "google_client_config" "identity" {
count = var.credentials_config.fleet_host != null ? 1 : 0
}
provider "kubernetes" {
config_path = (
var.credentials_config.kubeconfig == null
? null
: pathexpand(var.credentials_config.kubeconfig.path)
)
config_context = try(
var.credentials_config.kubeconfig.context, null
)
host = (
var.credentials_config.fleet_host == null
? null
: var.credentials_config.fleet_host
)
token = try(data.google_client_config.identity.0.access_token, null)
}

View File

@ -0,0 +1,155 @@
# Deploy a Redis cluster on GKE
## Objectives
This tutorial covers the following steps:
- Create a GKE cluster.
- Create a Redis Cluster on GKE.
- Confirm the redis is up and running.
- Confirm creation of the volumes for the stateful set.
- Confirm the Pod Disruption Budget (PDB).
Estimated time:
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
To get started, click Start.
## select/create a project
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
## Create the Autopilot GKE cluster
1. Change to the ```autopilot-cluster``` directory.
```bash
cd autopilot-cluster
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file and update any value as needed.
```hcl
project_id = "<walkthrough-project-name/>"
cluster_name = "cluster"
cluster_create = {
deletion_protection = false
}
region = "europe-west1"
vpc_create = { }
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Fetch the cluster credentials.
```bash
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
```
8. Check the nodes are ready.
```bash
kubectl get pods -n kube-system
```
## Install Redis and create associated resources
1. Change to the ```patterns/batch``` directory.
```bash
cd ../redis-cluster
```
2. Create a new file ```terraform.tfvars``` in that directory.
```bash
touch terraform.tfvars
```
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
4. Paste the following content in the file.
```hcl
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
statefulset_config = {
replicas = 8
resource_requests = {
cpu = "1"
memory = "1.5Gi"
}
}
```
5. Initialize the terraform configuration.
```bash
terraform init
```
6. Apply the terraform configuration.
```bash
terraform apply
```
7. Check that the Redis pods are ready
```bash
kubectl get pods -n redis
```
8. Check that the Redis volumes match the number of replicas
```bash
kubectl get pv
```
8. Confirm the Pod Disruption Budget for redis guarantees at least 3 pods are up during a voluntary disruption
```bash
kubectl describe pdb redis-pdb -n redis
```
## Destroy resources (optional)
1. Change to the ```patterns/autopilot-cluster``` directory.
```bash
cd ../autopilot-cluster
```
2. Destroy the cluster with the following command.
```bash
terraform destroy
```
## Congratulations
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
Youre all set!

View File

@ -0,0 +1,72 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "credentials_config" {
description = "Configure how Terraform authenticates to the cluster."
type = object({
fleet_host = optional(string)
kubeconfig = optional(object({
context = optional(string)
path = optional(string, "~/.kube/config")
}))
})
nullable = false
validation {
condition = (
(var.credentials_config.fleet_host != null) !=
(var.credentials_config.kubeconfig != null)
)
error_message = "Exactly one of fleet host or kubeconfig must be set."
}
}
variable "image" {
description = "Container image to use."
type = string
nullable = false
default = "redis:6.2"
}
variable "namespace" {
description = "Namespace used for Redis cluster resources."
type = string
nullable = false
default = "redis"
}
variable "statefulset_config" {
description = "Configure Redis cluster statefulset parameters."
type = object({
replicas = optional(number, 6)
resource_requests = optional(object({
cpu = optional(string, "1")
memory = optional(string, "1Gi")
}), {})
volume_claim_size = optional(string, "10Gi")
})
nullable = false
default = {}
validation {
condition = var.statefulset_config.replicas >= 6
error_message = "The minimum number of Redis cluster replicas is 6."
}
}
variable "templates_path" {
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
type = string
default = null
}

View File

@ -0,0 +1,27 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.7.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 5.11.0, < 6.0.0" # tftest
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 5.11.0, < 6.0.0" # tftest
}
}
}