GKE stateful blueprints (#2059)
* var definitions * skeleton, untested * fix errors, test with existing cluster * test vpc creation, todo notes * initial variables for AR and image * initial variables for AR and image * Add support for remote repositories to artifact-registry * Add support for virtual repositories to artifact-registry * Add support for extra config options to artifact-registry * artifact registry module: add validation and precondition, fix tests * ar module id/name * registry * service accoutn and roles * fetch pods, remove image prefix * small changes * use additive IAM at project level * use additive IAM at project level * configmaps * manifests * fix statefulset manifest * service manifest * fix configmap mode * add todo * job (broken) * job * wait on manifest, endpoints datasource * fix job * Fix local * sa * Update README.md * Restructure gke bp * refactor tree and infra variables * no create test * simplify cluster SA * test cluster and vpc creation * project creation fixes * use iam_members variable * nits * readme with examples * readme with examples * outputs * variables, provider configuration * variables, manifests * start cluster job * fix redis cluster creation Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com> * Revert changes in autopilot cluster * Default templates path, use namespace for node names * Update readmes * Fix IAM bindings * Make STABLE the default release channel * Use Cloud DNS as default DNS provider * Allow optional Cloud NAT creation * Allow backup agent and proxy only subnet * Work around terraform not short-circuiting logical operators * Rename create variables to be more consistent with other blueprints * Add basic features * Update variable names * Initial kafka JS * Move providers to a new file * Kafka / Strimzi * First possibily working version for MySQL (with a lot of todo's left) * Explicitly use proxy repo + some other fixes * Strimzi draft * Refactor variables, use CluterIP as pointer for mysql-router for bootstraping * Validate number of replicas, autoscale required number of running nodes to n/2+1 * Use seaprate service for bootstrap, do not recreate all resources on change of replicas count as the config is preserved in PV * Test dual chart kafka * Update chart for kafka * Expose basic kafka configuration options * Remove unused manifest * Added batch blueprint * Added README * switch to kubectl_manifest * Add README and support for static IP address * Move namespace creation to helm * Interpolate kafka variables * Rename kafka-strimzi to kafka * Added TUTORIAL for cloudshell for batch blueprint * deleted tutorial * Remove commented replace trigger * Move to helm chart * WIP of Cloud Shell tutorial for MySQL * Rename folders * Fix rename * Update paths * Unify styles * Update paths * Add Readme links * Update mysql tutorial * Fix path according to self-link * Use relative path to cwd * Fix service_account variable location * Fix tfvars creation * Restore some fixes for helm deployment * Add cluster deletion_prevention * Fixes for tutorial * Update cluster docs * Fixes to batch tutorial * Bare bones readme for batch * Update batch readme * README fixes * Fix README title for redis * Fix Typos * Make it easy to pass variables from autopilot-cluster to other modules * Add connectivity test and bastion host * updates to readme, and gpu fix * Add versions.tf and README updates * Fix typo * Kafka and Redis README updates * Update versions.tf * Fixes * Add boilerplate * Fix linting * Move mysql to separate branch * Update cloud shell links * Fix broken link --------- Co-authored-by: Ludo <ludomagno@google.com> Co-authored-by: Daniel Marzini <44803752+danielmarzini@users.noreply.github.com> Co-authored-by: Wiktor Niesiobędzki <wiktorn@google.com> Co-authored-by: Miren Esnaola <mirene@google.com>
This commit is contained in:
parent
da11396e3a
commit
c42c4c141f
|
@ -58,3 +58,4 @@ blueprints/gke/autopilot/ansible/vars/vars.yaml
|
||||||
blueprints/gke/autopilot/bundle/monitoring/kustomization.yaml
|
blueprints/gke/autopilot/bundle/monitoring/kustomization.yaml
|
||||||
blueprints/gke/autopilot/bundle/locust/kustomization.yaml
|
blueprints/gke/autopilot/bundle/locust/kustomization.yaml
|
||||||
blueprints/gke/autopilot/bundle.tar.gz
|
blueprints/gke/autopilot/bundle.tar.gz
|
||||||
|
blueprints/gke/patterns/batch/job-*.yaml
|
||||||
|
|
|
@ -0,0 +1,5 @@
|
||||||
|
# GKE Jumpstart Blueprints
|
||||||
|
|
||||||
|
This directory includes several blueprints related to Google Kubernetes Engine (GKE), following Google recommendations and best practices. The blueprints in this directory split the deployment process into two stages: an initial infrastructure stage that provisions the cluster, and additional workload stages that deploy specific types of applications/workloads.
|
||||||
|
|
||||||
|
As a design rule, all the blueprints in this directory provide sensible defaults for most variables while still providing an enterprise-grade deployment with secure defaults and the ability to use existing resources that are typically found in an enterprise-grade environment.
|
|
@ -0,0 +1,112 @@
|
||||||
|
# GKE Autopilot Cluster Pattern
|
||||||
|
|
||||||
|
This blueprint illustrates how to use GKE features to deploy a secure cluster that meets Google's best practices. The cluster deployed by this blueprint can be used to deploy other blueprints such as [Redis](../redis-cluster), [Kafka](../kafka), [Kueue](../batch).
|
||||||
|
|
||||||
|
<!-- BEGIN TOC -->
|
||||||
|
- [Design Decisions](#design-decisions)
|
||||||
|
- [GKE Onboarding Best Practices](#gke-onboarding-best-practices)
|
||||||
|
- [Environment setup](#environment-setup)
|
||||||
|
- [Cluster configuration](#cluster-configuration)
|
||||||
|
- [Security](#security)
|
||||||
|
- [Networking](#networking)
|
||||||
|
- [Multitenancy](#multitenancy)
|
||||||
|
- [Monitoring](#monitoring)
|
||||||
|
- [Maintenance](#maintenance)
|
||||||
|
- [Variables](#variables)
|
||||||
|
- [Outputs](#outputs)
|
||||||
|
<!-- END TOC -->
|
||||||
|
|
||||||
|
## Design Decisions
|
||||||
|
|
||||||
|
The main purpose of this blueprint is to showcase how to use GKE features to deploy a secure Kubernetes cluster according to Google best practices, including:
|
||||||
|
|
||||||
|
- **No public IP addresses** both the control plane and the nodes use private IP addresses. To to simplify the deployment of workloads, we enable [Connect Gateway](https://cloud.google.com/anthos/multicluster-management/gateway) to securely access the control plane even from outside the cluster's VPC. We also use [Remote Repositories](https://cloud.google.com/artifact-registry/docs/repositories/remote-overview) to allow the download of container images by the cluster without requiring Internet egress configured in the clusters's VPC.
|
||||||
|
|
||||||
|
- We provide **reasonable but secure defaults** that the user can override. For example, by default we avoid deploying a Cloud NAT gatewayt, but it is possible to enable it with just a few changes to the configuration.
|
||||||
|
|
||||||
|
- **Bring your own infrastructure**: that larger organizations might have teams dedicated to the provisioning and management of centralized infrastructure. This blueprint can be deployed to create any required infrastructure (GCP project, VPC, Artifact Registry, etc), or you can leverage existing resources by setting the appropriate variables.
|
||||||
|
|
||||||
|
## GKE Onboarding Best Practices
|
||||||
|
|
||||||
|
This Terraform blueprint helps you quickly implement most of the [GKE oboarding best practices](https://cloud.google.com/kubernetes-engine/docs/best-practices/onboarding#set-up-terraform) as outlined in the official GKE documentation. In this section we describe the relevant the decisions this blueprint simplifies
|
||||||
|
|
||||||
|
|
||||||
|
### Environment setup
|
||||||
|
- Set up Terraform: you'll need to install Terraform to use this blueprint. Instructions are [available here](https://registry.terraform.io/providers/hashicorp/google/latest/docs/guides/getting_started).
|
||||||
|
- Terraform state storage: this blueprint doesn't automate this step but can easily be done by specifying a [backend](https://developer.hashicorp.com/terraform/language/settings/backends/gcs).
|
||||||
|
- Create a metrics scope using Terraform: if you're creating a new project with this blueprint, you can enable metrics scope using the `metrics_scope` variable in the `project` module. Otherwise, metrics scope setup occurs outside this blueprint's scope.
|
||||||
|
- Set up Artifact Registry: by default a remote repository is created to allow downloading container images
|
||||||
|
|
||||||
|
### Cluster configuration
|
||||||
|
This blueprint by default deploys an Autopilot cluster with private nodes and private control plane. By using Autopilot, Google automatically handles node configuration, scaling, and security
|
||||||
|
|
||||||
|
- Choose a mode of operation: this blueprint uses Autopilot clusters
|
||||||
|
- Isolate your cluster: this blueprint deploys a private cluster, with private control plane
|
||||||
|
- Configure backup for GKE: not configured but can easily be enabled through the `backup_configs` in the `gke-cluster-autopilot` module.
|
||||||
|
- Use Container-Optimized OS node images: Autopilot cluster always user COS
|
||||||
|
- Enable node auto-provisioning: automatically managed by Autopilot
|
||||||
|
- Separate kube-system Pods: automatically managed by Autopilot
|
||||||
|
|
||||||
|
### Security
|
||||||
|
- Use the security posture dashboard: enabled by default in new clusters
|
||||||
|
- Use group authentication: not needed by this blueprint but can be enabled through the `enable_features.groups_for_rbac` variable of the `gke-cluster-autopilot` module.
|
||||||
|
- Use RBAC to restrict access to cluster resources: this blueprint deploys the underlying infrastructure, RBAC configuration is out of scope.
|
||||||
|
- Enable Shielded GKE Nodes: automatically managed by Autopilot
|
||||||
|
- Enable Workload Identity: automatically managed by Autopilot
|
||||||
|
- Enable security bulletin notifications: out of scope for this blueprint
|
||||||
|
- Use least privilege Google service accounts: this blueprint creates a new service account for the cluster
|
||||||
|
- Restrict network access to the control plane and nodes: this blueprint deploys a private cluster
|
||||||
|
- Use namespaces to restrict access to cluster resources: this blueprint deploys the underlying infrastructure, namespace handling is left to applications.
|
||||||
|
|
||||||
|
### Networking
|
||||||
|
- Create a custom mode VPC: this blueprint can optinally deploy a new custom VPC with a single subnet. Otherwise, an existing VPC and subnet can be used.
|
||||||
|
- Create a proxy-only subnet: the `vpc_create` variable allows the creation of proxy only subnet, if needed.
|
||||||
|
- Configure Shared VPC: by default a new VPC is created within the project, but a Shared VPC can be used when the blueprint handles project creation.
|
||||||
|
- Connect the cluster's VPC network to an on-premises network: skipped, out of scope for this blueprint
|
||||||
|
- Enable Cloud NAT: the `vpc_create` variable allows the creation of Cloud NAT, if needed.
|
||||||
|
- Configure Cloud DNS for GKE: not needed by this blueprint but can be enabled through the `enable_features.dns` variable of the `gke-cluster-autopilot` module.
|
||||||
|
- Configure NodeLocal DNSCache: not needed by this blueprint
|
||||||
|
- Create firewall rules: only the default rules created by GKE
|
||||||
|
|
||||||
|
### Multitenancy
|
||||||
|
For simplicity, multi-tenancy is not used in this blueprint.
|
||||||
|
|
||||||
|
### Monitoring
|
||||||
|
- Configure GKE alert policies: out of scope for this blueprint
|
||||||
|
- Enable Google Cloud Managed Service for Prometheus: automatically managed by Autopilot
|
||||||
|
- Configure control plane metrics: enabled by default
|
||||||
|
- Enable metrics packages: out of scope for this blueprint
|
||||||
|
|
||||||
|
### Maintenance
|
||||||
|
- Create environments: out of scope for this blueprint
|
||||||
|
- Subscribe to Pub/Sub events: out of scope for this blueprint
|
||||||
|
- Enroll in release channels: the REGULAR channel is used by default
|
||||||
|
- Configure maintenance windows: not configured but can be enabled through the `maintenance_config` in the `gke-cluster-autopilot` module.
|
||||||
|
- Set Compute Engine quotas: out of scope for this blueprint
|
||||||
|
- Configure cost controls: TBD
|
||||||
|
- Configure billing alerts: out of scope for this blueprint
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default |
|
||||||
|
|---|---|:---:|:---:|:---:|
|
||||||
|
| [cluster_name](variables.tf#L42) | Name of new or existing cluster. | <code>string</code> | ✓ | |
|
||||||
|
| [project_id](variables.tf#L70) | Project id of existing or created project. | <code>string</code> | ✓ | |
|
||||||
|
| [region](variables.tf#L75) | Region used for cluster and network resources. | <code>string</code> | ✓ | |
|
||||||
|
| [cluster_create](variables.tf#L17) | Cluster configuration for newly created cluster. Set to null to use existing cluster, or create using defaults in new project. | <code title="object({ deletion_protection = optional(bool, true) labels = optional(map(string)) master_authorized_ranges = optional(map(string), { rfc-1918-10-8 = "10.0.0.0/8" }) master_ipv4_cidr_block = optional(string, "172.16.255.0/28") vpc = optional(object({ id = string subnet_id = string secondary_range_names = optional(object({ pods = optional(string, "pods") services = optional(string, "services") }), {}) })) options = optional(object({ release_channel = optional(string, "REGULAR") enable_backup_agent = optional(bool, false) }), {}) })">object({…})</code> | | <code>null</code> |
|
||||||
|
| [fleet_project_id](variables.tf#L47) | GKE Fleet project id. If null cluster project will also be used for fleet. | <code>string</code> | | <code>null</code> |
|
||||||
|
| [prefix](variables.tf#L53) | Prefix used for resource names. | <code>string</code> | | <code>"jump-0"</code> |
|
||||||
|
| [project_create](variables.tf#L60) | Project configuration for newly created project. Leave null to use existing project. Project creation forces VPC and cluster creation. | <code title="object({ billing_account = string parent = optional(string) shared_vpc_host = optional(string) })">object({…})</code> | | <code>null</code> |
|
||||||
|
| [registry_create](variables.tf#L80) | Create remote Docker Artifact Registry. | <code>bool</code> | | <code>true</code> |
|
||||||
|
| [vpc_create](variables.tf#L86) | Project configuration for newly created VPC. Leave null to use existing VPC, or defaults when project creation is required. | <code title="object({ name = optional(string) subnet_name = optional(string) primary_range_nodes = optional(string, "10.0.0.0/24") secondary_range_pods = optional(string, "10.16.0.0/20") secondary_range_services = optional(string, "10.32.0.0/24") enable_cloud_nat = optional(bool, false) proxy_only_subnet = optional(string) })">object({…})</code> | | <code>null</code> |
|
||||||
|
|
||||||
|
## Outputs
|
||||||
|
|
||||||
|
| name | description | sensitive |
|
||||||
|
|---|---|:---:|
|
||||||
|
| [created_resources](outputs.tf#L17) | IDs of the resources created, if any. | |
|
||||||
|
| [credentials_config](outputs.tf#L44) | Configure how Terraform authenticates to the cluster. | |
|
||||||
|
| [fleet_host](outputs.tf#L51) | Fleet Connect Gateway host that can be used to configure the GKE provider. | |
|
||||||
|
| [get_credentials](outputs.tf#L56) | Run one of these commands to get cluster credentials. Credentials via fleet allow reaching private clusters without no direct connectivity. | |
|
||||||
|
| [region](outputs.tf#L70) | Region used for cluster and network resources. | |
|
||||||
|
<!-- END TFDOC -->
|
|
@ -0,0 +1,134 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
locals {
|
||||||
|
_cluster_sa = (
|
||||||
|
local.cluster_create
|
||||||
|
? module.cluster-service-account.0.email
|
||||||
|
: data.google_container_cluster.cluster.0.node_config.0.service_account
|
||||||
|
)
|
||||||
|
cluster_sa = (
|
||||||
|
local._cluster_sa == "default"
|
||||||
|
? module.project.service_accounts.default.compute
|
||||||
|
: local._cluster_sa
|
||||||
|
)
|
||||||
|
cluster_sa_roles = [
|
||||||
|
"roles/artifactregistry.reader",
|
||||||
|
"roles/logging.logWriter",
|
||||||
|
"roles/monitoring.metricWriter",
|
||||||
|
"roles/monitoring.viewer",
|
||||||
|
"roles/stackdriver.resourceMetadata.writer"
|
||||||
|
]
|
||||||
|
cluster_vpc = (
|
||||||
|
local.use_shared_vpc || !local.vpc_create
|
||||||
|
# cluster variable configures networking
|
||||||
|
? {
|
||||||
|
network = try(
|
||||||
|
var.cluster_create.vpc.id, null
|
||||||
|
)
|
||||||
|
secondary_range_names = try(
|
||||||
|
var.cluster_create.vpc.secondary_range_names, null
|
||||||
|
)
|
||||||
|
subnet = try(
|
||||||
|
var.cluster_create.vpc.subnet_id, null
|
||||||
|
)
|
||||||
|
}
|
||||||
|
# VPC creation configures networking
|
||||||
|
: {
|
||||||
|
network = module.vpc.0.id
|
||||||
|
secondary_range_names = { pods = "pods", services = "services" }
|
||||||
|
subnet = values(module.vpc.0.subnet_ids)[0]
|
||||||
|
}
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
data "google_container_cluster" "cluster" {
|
||||||
|
count = !local.cluster_create ? 1 : 0
|
||||||
|
project = var.project_id
|
||||||
|
location = var.region
|
||||||
|
name = var.cluster_name
|
||||||
|
}
|
||||||
|
|
||||||
|
module "cluster-service-account" {
|
||||||
|
source = "../../../../modules/iam-service-account"
|
||||||
|
count = local.cluster_create ? 1 : 0
|
||||||
|
project_id = module.project.project_id
|
||||||
|
name = var.prefix
|
||||||
|
}
|
||||||
|
|
||||||
|
module "cluster" {
|
||||||
|
source = "../../../../modules/gke-cluster-autopilot"
|
||||||
|
count = local.cluster_create ? 1 : 0
|
||||||
|
project_id = module.project.project_id
|
||||||
|
deletion_protection = var.cluster_create.deletion_protection
|
||||||
|
name = var.cluster_name
|
||||||
|
location = var.region
|
||||||
|
vpc_config = {
|
||||||
|
network = local.cluster_vpc.network
|
||||||
|
subnetwork = local.cluster_vpc.subnet
|
||||||
|
secondary_range_names = local.cluster_vpc.secondary_range_names
|
||||||
|
master_authorized_ranges = var.cluster_create.master_authorized_ranges
|
||||||
|
master_ipv4_cidr_block = var.cluster_create.master_ipv4_cidr_block
|
||||||
|
}
|
||||||
|
private_cluster_config = {
|
||||||
|
enable_private_endpoint = true
|
||||||
|
master_global_access = true
|
||||||
|
}
|
||||||
|
node_config = {
|
||||||
|
service_account = module.cluster-service-account.0.email
|
||||||
|
}
|
||||||
|
labels = var.cluster_create.labels
|
||||||
|
release_channel = var.cluster_create.options.release_channel
|
||||||
|
backup_configs = {
|
||||||
|
enable_backup_agent = var.cluster_create.options.enable_backup_agent
|
||||||
|
}
|
||||||
|
enable_features = {
|
||||||
|
dns = {
|
||||||
|
provider = "CLOUD_DNS"
|
||||||
|
scope = "CLUSTER_SCOPE"
|
||||||
|
domain = "cluster.local"
|
||||||
|
}
|
||||||
|
cost_management = true
|
||||||
|
gateway_api = true
|
||||||
|
}
|
||||||
|
monitoring_config = {
|
||||||
|
enable_api_server_metrics = true
|
||||||
|
enable_controller_manager_metrics = true
|
||||||
|
enable_scheduler_metrics = true
|
||||||
|
}
|
||||||
|
logging_config = {
|
||||||
|
enable_api_server_logs = true
|
||||||
|
enable_scheduler_logs = true
|
||||||
|
enable_controller_manager_logs = true
|
||||||
|
}
|
||||||
|
maintenance_config = {
|
||||||
|
daily_window_start_time = "01:00"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
check "cluster_networking" {
|
||||||
|
assert {
|
||||||
|
condition = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? (
|
||||||
|
try(var.cluster_create.vpc.id, null) != null &&
|
||||||
|
try(var.cluster_create.vpc.subnet_id, null) != null
|
||||||
|
)
|
||||||
|
: true
|
||||||
|
)
|
||||||
|
error_message = "Cluster network and subnetwork are required in shared VPC mode."
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,168 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
locals {
|
||||||
|
cluster_create = var.cluster_create != null || local.vpc_create
|
||||||
|
create_nat = local.vpc_create && try(var.vpc_create.enable_cloud_nat, false) == true
|
||||||
|
vpc_create = (
|
||||||
|
!local.use_shared_vpc && (
|
||||||
|
var.vpc_create != null || var.project_create != null
|
||||||
|
)
|
||||||
|
)
|
||||||
|
fleet_host = join("", [
|
||||||
|
"https://connectgateway.googleapis.com/v1/",
|
||||||
|
"projects/${local.fleet_project.number}/",
|
||||||
|
"locations/global/gkeMemberships/${var.cluster_name}"
|
||||||
|
])
|
||||||
|
fleet_project = (
|
||||||
|
var.fleet_project_id == null
|
||||||
|
? {
|
||||||
|
project_id = var.project_id
|
||||||
|
number = module.project.number
|
||||||
|
}
|
||||||
|
: {
|
||||||
|
project_id = var.fleet_project_id
|
||||||
|
number = module.fleet-project.0.number
|
||||||
|
}
|
||||||
|
)
|
||||||
|
proxy_only_subnet = (local.vpc_create && try(var.vpc_create.proxy_only_subnet, null) != null) ? [
|
||||||
|
{
|
||||||
|
ip_cidr_range = var.vpc_create.proxy_only_subnet
|
||||||
|
name = "proxy"
|
||||||
|
region = var.region
|
||||||
|
active = true
|
||||||
|
}
|
||||||
|
] : null
|
||||||
|
use_shared_vpc = (
|
||||||
|
try(var.project_create.shared_vpc_host, null) != null
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "project" {
|
||||||
|
source = "../../../../modules/project"
|
||||||
|
parent = try(var.project_create.parent, null)
|
||||||
|
billing_account = try(var.project_create.billing_account, null)
|
||||||
|
name = var.project_id
|
||||||
|
project_create = var.project_create != null
|
||||||
|
services = compact([
|
||||||
|
"anthos.googleapis.com",
|
||||||
|
var.registry_create ? "artifactregistry.googleapis.com" : null,
|
||||||
|
"cloudresourcemanager.googleapis.com",
|
||||||
|
"connectgateway.googleapis.com",
|
||||||
|
"container.googleapis.com",
|
||||||
|
"gkeconnect.googleapis.com",
|
||||||
|
"gkehub.googleapis.com",
|
||||||
|
"stackdriver.googleapis.com"
|
||||||
|
])
|
||||||
|
shared_vpc_service_config = !local.use_shared_vpc ? null : {
|
||||||
|
attach = true
|
||||||
|
host_project = var.project_create.shared_vpc_host
|
||||||
|
# grant required roles on the host project to service identities
|
||||||
|
service_identity_iam = {
|
||||||
|
"roles/compute.networkUser" = [
|
||||||
|
"cloudservices", "container-engine"
|
||||||
|
]
|
||||||
|
"roles/container.hostServiceAgentUser" = [
|
||||||
|
"container-engine"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
iam_bindings_additive = merge(
|
||||||
|
# allow GKE fleet service identity to manage clusters in this project
|
||||||
|
{
|
||||||
|
gkehub-robot = {
|
||||||
|
role = "roles/gkehub.serviceAgent"
|
||||||
|
member = (
|
||||||
|
var.fleet_project_id == null
|
||||||
|
? "serviceAccount:${module.project.service_accounts.robots.gkehub}"
|
||||||
|
: "serviceAccount:${module.fleet-project.0.service_accounts.robots.gkehub}"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
},
|
||||||
|
# grant required roles to GKE node service account
|
||||||
|
{
|
||||||
|
for r in local.cluster_sa_roles : "gke-sa-${r}" => {
|
||||||
|
role = r
|
||||||
|
member = "serviceAccount:${local.cluster_sa}"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "vpc" {
|
||||||
|
source = "../../../../modules/net-vpc"
|
||||||
|
count = local.vpc_create ? 1 : 0
|
||||||
|
project_id = module.project.project_id
|
||||||
|
name = coalesce(
|
||||||
|
try(var.vpc_create.name, null), var.prefix
|
||||||
|
)
|
||||||
|
subnets = [{
|
||||||
|
name = coalesce(
|
||||||
|
try(var.vpc_create.subnet_name, null), "${var.prefix}-default"
|
||||||
|
)
|
||||||
|
region = var.region
|
||||||
|
ip_cidr_range = try(
|
||||||
|
var.vpc_create.primary_range_nodes, "10.0.0.0/24"
|
||||||
|
)
|
||||||
|
secondary_ip_ranges = {
|
||||||
|
pods = try(
|
||||||
|
var.vpc_create.secondary_range_pods, "10.16.0.0/20"
|
||||||
|
)
|
||||||
|
services = try(
|
||||||
|
var.vpc_create.secondary_range_services, "10.32.0.0/24"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}]
|
||||||
|
subnets_proxy_only = local.proxy_only_subnet
|
||||||
|
}
|
||||||
|
|
||||||
|
module "fleet-project" {
|
||||||
|
source = "../../../../modules/project"
|
||||||
|
count = var.fleet_project_id == null ? 0 : 1
|
||||||
|
name = var.fleet_project_id
|
||||||
|
project_create = false
|
||||||
|
}
|
||||||
|
|
||||||
|
module "fleet" {
|
||||||
|
source = "../../../../modules/gke-hub"
|
||||||
|
project_id = local.fleet_project.project_id
|
||||||
|
clusters = {
|
||||||
|
(var.cluster_name) = (
|
||||||
|
var.cluster_create != null
|
||||||
|
? module.cluster.0.id
|
||||||
|
: "projects/${var.project_id}/locations/${var.region}/clusters/${var.cluster_name}"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "registry" {
|
||||||
|
source = "../../../../modules/artifact-registry"
|
||||||
|
count = var.registry_create ? 1 : 0
|
||||||
|
project_id = module.project.project_id
|
||||||
|
location = var.region
|
||||||
|
name = var.prefix
|
||||||
|
format = { docker = {} }
|
||||||
|
mode = { remote = true }
|
||||||
|
}
|
||||||
|
|
||||||
|
module "nat" {
|
||||||
|
source = "../../../../modules/net-cloudnat"
|
||||||
|
count = local.create_nat ? 1 : 0
|
||||||
|
project_id = module.project.project_id
|
||||||
|
region = var.region
|
||||||
|
name = "default"
|
||||||
|
router_network = local.cluster_vpc.network
|
||||||
|
}
|
|
@ -0,0 +1,73 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
output "created_resources" {
|
||||||
|
description = "IDs of the resources created, if any."
|
||||||
|
value = merge(
|
||||||
|
var.project_create == null ? {} : {
|
||||||
|
project = module.project.project_id
|
||||||
|
},
|
||||||
|
!local.vpc_create ? {} : {
|
||||||
|
subnet_id = one(values(module.vpc.0.subnet_ids))
|
||||||
|
vpc_id = module.vpc.0.id
|
||||||
|
},
|
||||||
|
!var.registry_create ? {} : {
|
||||||
|
registry = module.registry.0.image_path
|
||||||
|
},
|
||||||
|
!local.cluster_create ? {} : {
|
||||||
|
cluster = module.cluster.0.id
|
||||||
|
node_service_account = module.cluster-service-account.0.email
|
||||||
|
},
|
||||||
|
!local.create_nat ? {} : {
|
||||||
|
router = module.nat.0.id
|
||||||
|
cloud_nat = module.nat.0.router.id
|
||||||
|
},
|
||||||
|
local.proxy_only_subnet == null ? {} : {
|
||||||
|
proxy_only_subnet = one(values(module.vpc.0.subnets_proxy_only)).id
|
||||||
|
},
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
output "credentials_config" {
|
||||||
|
description = "Configure how Terraform authenticates to the cluster."
|
||||||
|
value = {
|
||||||
|
fleet_host = local.fleet_host
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "fleet_host" {
|
||||||
|
description = "Fleet Connect Gateway host that can be used to configure the GKE provider."
|
||||||
|
value = local.fleet_host
|
||||||
|
}
|
||||||
|
|
||||||
|
output "get_credentials" {
|
||||||
|
description = "Run one of these commands to get cluster credentials. Credentials via fleet allow reaching private clusters without no direct connectivity."
|
||||||
|
value = {
|
||||||
|
direct = join("", [
|
||||||
|
"gcloud container clusters get-credentials ${var.cluster_name} ",
|
||||||
|
"--project ${var.project_id} --location ${var.region}"
|
||||||
|
])
|
||||||
|
fleet = join("", [
|
||||||
|
"gcloud container fleet memberships get-credentials ${var.cluster_name}",
|
||||||
|
" --project ${var.project_id}"
|
||||||
|
])
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "region" {
|
||||||
|
description = "Region used for cluster and network resources."
|
||||||
|
value = var.region
|
||||||
|
}
|
|
@ -0,0 +1,98 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
variable "cluster_create" {
|
||||||
|
description = "Cluster configuration for newly created cluster. Set to null to use existing cluster, or create using defaults in new project."
|
||||||
|
type = object({
|
||||||
|
deletion_protection = optional(bool, true)
|
||||||
|
labels = optional(map(string))
|
||||||
|
master_authorized_ranges = optional(map(string), {
|
||||||
|
rfc-1918-10-8 = "10.0.0.0/8"
|
||||||
|
})
|
||||||
|
master_ipv4_cidr_block = optional(string, "172.16.255.0/28")
|
||||||
|
vpc = optional(object({
|
||||||
|
id = string
|
||||||
|
subnet_id = string
|
||||||
|
secondary_range_names = optional(object({
|
||||||
|
pods = optional(string, "pods")
|
||||||
|
services = optional(string, "services")
|
||||||
|
}), {})
|
||||||
|
}))
|
||||||
|
options = optional(object({
|
||||||
|
release_channel = optional(string, "REGULAR")
|
||||||
|
enable_backup_agent = optional(bool, false)
|
||||||
|
}), {})
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "cluster_name" {
|
||||||
|
description = "Name of new or existing cluster."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "fleet_project_id" {
|
||||||
|
description = "GKE Fleet project id. If null cluster project will also be used for fleet."
|
||||||
|
type = string
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "prefix" {
|
||||||
|
description = "Prefix used for resource names."
|
||||||
|
type = string
|
||||||
|
nullable = false
|
||||||
|
default = "jump-0"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "project_create" {
|
||||||
|
description = "Project configuration for newly created project. Leave null to use existing project. Project creation forces VPC and cluster creation."
|
||||||
|
type = object({
|
||||||
|
billing_account = string
|
||||||
|
parent = optional(string)
|
||||||
|
shared_vpc_host = optional(string)
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "project_id" {
|
||||||
|
description = "Project id of existing or created project."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "region" {
|
||||||
|
description = "Region used for cluster and network resources."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "registry_create" {
|
||||||
|
description = "Create remote Docker Artifact Registry."
|
||||||
|
type = bool
|
||||||
|
default = true
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "vpc_create" {
|
||||||
|
description = "Project configuration for newly created VPC. Leave null to use existing VPC, or defaults when project creation is required."
|
||||||
|
type = object({
|
||||||
|
name = optional(string)
|
||||||
|
subnet_name = optional(string)
|
||||||
|
primary_range_nodes = optional(string, "10.0.0.0/24")
|
||||||
|
secondary_range_pods = optional(string, "10.16.0.0/20")
|
||||||
|
secondary_range_services = optional(string, "10.32.0.0/24")
|
||||||
|
enable_cloud_nat = optional(bool, false)
|
||||||
|
proxy_only_subnet = optional(string)
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
|
@ -0,0 +1,27 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_version = ">= 1.7.0"
|
||||||
|
required_providers {
|
||||||
|
google = {
|
||||||
|
source = "hashicorp/google"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
google-beta = {
|
||||||
|
source = "hashicorp/google-beta"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,62 @@
|
||||||
|
# Batch Processing on GKE with Kueue
|
||||||
|
|
||||||
|
<!-- BEGIN TOC -->
|
||||||
|
- [Introduction](#introduction)
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Cluster authentication](#cluster-authentication)
|
||||||
|
- [Kueue Configuration](#kueue-configuration)
|
||||||
|
- [Sample Configuration](#sample-configuration)
|
||||||
|
- [Variables](#variables)
|
||||||
|
<!-- END TOC -->
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=batch/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
|
||||||
|
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
This blueprint shows how to deploy a batch system using [Kueue](https://kueue.sigs.k8s.io/docs/overview/) to perform job queuing on Google Kubernetes Engine (GKE) using Terraform.
|
||||||
|
|
||||||
|
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
|
||||||
|
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
|
||||||
|
|
||||||
|
The Kueue manifests use container images hosted by registry.k8s.io, which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided [Autopilot Cluster Pattern](../autopilot-cluster), you can set the `enable_cloud_nat` option of the `vpc_create` variable.
|
||||||
|
|
||||||
|
## Cluster authentication
|
||||||
|
Once you have a cluster with Internet connectivity, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
|
||||||
|
|
||||||
|
## Kueue Configuration
|
||||||
|
|
||||||
|
Only two variables are available to control Kueue's configuration:
|
||||||
|
- `teams_namespaces` which controls the namespaces used by different teams to run jobs.
|
||||||
|
- `kueue_namespace` which controls the namepsace to deploy Kueue's own resources.
|
||||||
|
|
||||||
|
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
|
||||||
|
|
||||||
|
## Sample Configuration
|
||||||
|
|
||||||
|
The following template as a starting point for your terraform.tfvars
|
||||||
|
```tfvars
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
teams_namespaces = [
|
||||||
|
"team-a",
|
||||||
|
"team-b"
|
||||||
|
]
|
||||||
|
```
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default |
|
||||||
|
|---|---|:---:|:---:|:---:|
|
||||||
|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object({ fleet_host = optional(string) kubeconfig = optional(object({ context = optional(string) path = optional(string, "~/.kube/config") })) })">object({…})</code> | ✓ | |
|
||||||
|
| [kueue_namespace](variables.tf#L36) | Namespaces of the teams running jobs in the clusters. | <code>string</code> | | <code>"kueue-system"</code> |
|
||||||
|
| [team_namespaces](variables.tf#L43) | Namespaces of the teams running jobs in the clusters. | <code>list(string)</code> | | <code title="[ "team-a", "team-b" ]">[…]</code> |
|
||||||
|
| [templates_path](variables.tf#L53) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
|
||||||
|
<!-- END TFDOC -->
|
|
@ -0,0 +1,26 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
FILE_POSITIONS=$(($# - 1))
|
||||||
|
SLEEP_TIME=${!#}
|
||||||
|
|
||||||
|
while :
|
||||||
|
do
|
||||||
|
for i in $(seq 1 $FILE_POSITIONS); do
|
||||||
|
kubectl create -f ${!i}
|
||||||
|
done
|
||||||
|
sleep ${SLEEP_TIME:-10}
|
||||||
|
done
|
|
@ -0,0 +1,30 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
namespace: team-a # Job under team-a namespace
|
||||||
|
generateName: sample-job-
|
||||||
|
annotations:
|
||||||
|
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
|
||||||
|
spec:
|
||||||
|
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
|
||||||
|
parallelism: 3 # This Job will have 3 replicas running at the same time
|
||||||
|
completions: 3 # This Job requires 3 completions
|
||||||
|
suspend: true # Set to true to allow Kueue to control the Job when it starts
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: dummy-job
|
||||||
|
image: gcr.io/k8s-staging-perf-tests/sleep:latest
|
||||||
|
args: ["10s"] # Sleep for 10 seconds
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
limits:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
restartPolicy: Never
|
|
@ -0,0 +1,30 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
namespace: team-b # Job under team-a namespace
|
||||||
|
generateName: sample-job-
|
||||||
|
annotations:
|
||||||
|
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
|
||||||
|
spec:
|
||||||
|
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
|
||||||
|
parallelism: 3 # This Job will have 3 replicas running at the same time
|
||||||
|
completions: 3 # This Job requires 3 completions
|
||||||
|
suspend: true # Set to true to allow Kueue to control the Job when it starts
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: dummy-job
|
||||||
|
image: gcr.io/k8s-staging-perf-tests/sleep:latest
|
||||||
|
args: ["10s"] # Sleep for 10 seconds
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
limits:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
restartPolicy: Never
|
|
@ -0,0 +1,81 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
locals {
|
||||||
|
wl_templates_path = "${path.module}/manifest-templates"
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "kueue_namespace_manifest" {
|
||||||
|
yaml_body = <<EOT
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
control-plane: controller-manager
|
||||||
|
name: ${var.kueue_namespace}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
data "kubectl_file_documents" "kueue_docs" {
|
||||||
|
content = file("${local.wl_templates_path}/kueue.yaml")
|
||||||
|
}
|
||||||
|
|
||||||
|
data "kubectl_path_documents" "cluster_resources_docs" {
|
||||||
|
pattern = "${local.wl_templates_path}/cluster-resources/*.yaml"
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "kueue_manifest" {
|
||||||
|
for_each = data.kubectl_file_documents.kueue_docs.manifests
|
||||||
|
yaml_body = each.value
|
||||||
|
override_namespace = var.kueue_namespace
|
||||||
|
depends_on = [kubectl_manifest.kueue_namespace_manifest]
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "cluster_resources_manifests" {
|
||||||
|
for_each = toset(data.kubectl_path_documents.cluster_resources_docs.documents)
|
||||||
|
yaml_body = each.value
|
||||||
|
depends_on = [kubectl_manifest.kueue_manifest]
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "team_namespace_manifests" {
|
||||||
|
for_each = toset(var.team_namespaces)
|
||||||
|
yaml_body = <<EOT
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: ${each.value}
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "local_queues_manifests" {
|
||||||
|
for_each = toset(var.team_namespaces)
|
||||||
|
yaml_body = file("${local.wl_templates_path}/team-resources/local-queue.yaml")
|
||||||
|
override_namespace = each.value
|
||||||
|
depends_on = [
|
||||||
|
kubectl_manifest.cluster_resources_manifests,
|
||||||
|
kubectl_manifest.team_namespace_manifests
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "local_file" "job_manifest_files" {
|
||||||
|
for_each = toset(var.team_namespaces)
|
||||||
|
content = templatefile("${local.wl_templates_path}/team-resources/job.yaml", {
|
||||||
|
namespace = each.value
|
||||||
|
})
|
||||||
|
filename = "${path.module}/job-${each.value}.yaml"
|
||||||
|
}
|
|
@ -0,0 +1,32 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: kueue.x-k8s.io/v1beta1
|
||||||
|
kind: ClusterQueue
|
||||||
|
metadata:
|
||||||
|
name: cluster-queue
|
||||||
|
spec:
|
||||||
|
namespaceSelector: {} # Available to all namespaces
|
||||||
|
queueingStrategy: BestEffortFIFO # Default queueing strategy
|
||||||
|
resourceGroups:
|
||||||
|
- coveredResources: ["cpu", "memory", "ephemeral-storage"]
|
||||||
|
flavors:
|
||||||
|
- name: "default-flavor"
|
||||||
|
resources:
|
||||||
|
- name: "cpu"
|
||||||
|
nominalQuota: 10
|
||||||
|
- name: "memory"
|
||||||
|
nominalQuota: 10Gi
|
||||||
|
- name: "ephemeral-storage"
|
||||||
|
nominalQuota: 10Gi
|
|
@ -0,0 +1,20 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# [START gke_batch_kueue_intro_flavors]
|
||||||
|
apiVersion: kueue.x-k8s.io/v1beta1
|
||||||
|
kind: ResourceFlavor
|
||||||
|
metadata:
|
||||||
|
name: default-flavor # This ResourceFlavor will be used for all the resources
|
||||||
|
# [END gke_batch_kueue_intro_flavors]
|
File diff suppressed because it is too large
Load Diff
|
@ -0,0 +1,42 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
namespace: ${namespace} # Job under team-a namespace
|
||||||
|
generateName: sample-job-
|
||||||
|
annotations:
|
||||||
|
kueue.x-k8s.io/queue-name: local-queue # Point to the LocalQueue
|
||||||
|
spec:
|
||||||
|
ttlSecondsAfterFinished: 60 # Job will be deleted after 60 seconds
|
||||||
|
parallelism: 3 # This Job will have 3 replicas running at the same time
|
||||||
|
completions: 3 # This Job requires 3 completions
|
||||||
|
suspend: true # Set to true to allow Kueue to control the Job when it starts
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
containers:
|
||||||
|
- name: dummy-job
|
||||||
|
image: gcr.io/k8s-staging-perf-tests/sleep:latest
|
||||||
|
args: ["10s"] # Sleep for 10 seconds
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
limits:
|
||||||
|
cpu: "500m"
|
||||||
|
memory: "512Mi"
|
||||||
|
ephemeral-storage: "512Mi"
|
||||||
|
restartPolicy: Never
|
|
@ -0,0 +1,21 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: kueue.x-k8s.io/v1beta1
|
||||||
|
kind: LocalQueue
|
||||||
|
metadata:
|
||||||
|
namespace: team # LocalQueue under team-a namespace
|
||||||
|
name: local-queue
|
||||||
|
spec:
|
||||||
|
clusterQueue: cluster-queue # Point to the ClusterQueue
|
|
@ -0,0 +1,36 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
data "google_client_config" "identity" {
|
||||||
|
count = var.credentials_config.fleet_host != null ? 1 : 0
|
||||||
|
}
|
||||||
|
|
||||||
|
provider "kubectl" {
|
||||||
|
config_path = (
|
||||||
|
var.credentials_config.kubeconfig == null
|
||||||
|
? null
|
||||||
|
: pathexpand(var.credentials_config.kubeconfig.path)
|
||||||
|
)
|
||||||
|
config_context = try(
|
||||||
|
var.credentials_config.kubeconfig.context, null
|
||||||
|
)
|
||||||
|
host = (
|
||||||
|
var.credentials_config.fleet_host == null
|
||||||
|
? null
|
||||||
|
: var.credentials_config.fleet_host
|
||||||
|
)
|
||||||
|
token = try(data.google_client_config.identity.0.access_token, null)
|
||||||
|
}
|
|
@ -0,0 +1,215 @@
|
||||||
|
# Deploy a batch system using Kueue
|
||||||
|
|
||||||
|
This tutorial shows you how to deploy a batch system using Kueue to perform Job queueing on Google Kubernetes Engine (GKE) using Terraform.
|
||||||
|
|
||||||
|
Jobs are applications that run to completion, such as machine learning, rendering, simulation, analytics, CI/CD, and similar workloads.
|
||||||
|
|
||||||
|
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
|
||||||
|
|
||||||
|
Kueue has the following characteristics:
|
||||||
|
|
||||||
|
* It is optimized for cloud architectures, where resources are heterogeneous, interchangeable, and scalable.
|
||||||
|
* It provides a set of APIs to manage elastic quotas and manage Job queueing.
|
||||||
|
* It does not re-implement existing functionality such as autoscaling, pod scheduling, or Job lifecycle management.
|
||||||
|
* Kueue has built-in support for the Kubernetesbatch/v1.Job API.
|
||||||
|
* It can integrate with other job APIs.
|
||||||
|
* Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.
|
||||||
|
|
||||||
|
When working with Kueue there are a few concepts that ome needs to be familiar with:
|
||||||
|
|
||||||
|
* ResourceFlavour
|
||||||
|
|
||||||
|
An object that you can define to describe what resources are available in a cluster. Typically, it is associated with the characteristics of a group of Nodes: availability, pricing, architecture, models, etc.
|
||||||
|
|
||||||
|
* ClusterQueue
|
||||||
|
|
||||||
|
A cluster-scoped resource that governs a pool of resources, defining usage limits and fair sharing rules.
|
||||||
|
|
||||||
|
* LocalQueue
|
||||||
|
|
||||||
|
A namespaced resource that groups closely related workloads belonging to a single tenant.
|
||||||
|
|
||||||
|
* Workload
|
||||||
|
|
||||||
|
An application that will run to completion. It is the unit of admission in Kueue. Sometimes referred to as job
|
||||||
|
|
||||||
|
Kueue refers to jobs defined with any API as Workloads, to avoid the confusion with the specific Kubernetes Job API.
|
||||||
|
|
||||||
|
## Objectives
|
||||||
|
|
||||||
|
This tutorial is for cluster operators and other users that want to implement a batch system on Kubernetes. In this tutorial, you set up a shared cluster for two tenant teams. Each team has their own namespace where they create Jobs and share the same global resources that are controlled with the corresponding quotas.
|
||||||
|
|
||||||
|
In this tutorial we will be doing the following using Terraform code available in a git repository:
|
||||||
|
|
||||||
|
1. Create a GKE cluster.
|
||||||
|
2. Create a namespace for Kueue (kueue-system).
|
||||||
|
3. Create a namespace for each team running batch jobs in the cluster (team-a, team-b).
|
||||||
|
4. Install Kueue in the namespace created for it.
|
||||||
|
5. Create the ResourceFlavor.
|
||||||
|
6. Create the ClusterQueue.
|
||||||
|
7. Create a LocalQueue for each of the teams in the corresponding namespace.
|
||||||
|
8. Create for each of teams a manifest for a sample job associated with the corresponding LocalQueue.
|
||||||
|
|
||||||
|
Estimated time:
|
||||||
|
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
|
||||||
|
|
||||||
|
To get started, click Start.
|
||||||
|
|
||||||
|
## select/create a project
|
||||||
|
|
||||||
|
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
|
||||||
|
|
||||||
|
## Create the Autopilot GKE cluster
|
||||||
|
|
||||||
|
1. Change to the ```autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file and update any value as needed.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
project_id = "<walkthrough-project-name/>"
|
||||||
|
cluster_name = "cluster"
|
||||||
|
cluster_create = {
|
||||||
|
deletion_protection = false
|
||||||
|
}
|
||||||
|
region = "europe-west1"
|
||||||
|
vpc_create = {
|
||||||
|
enable_cloud_nat = true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Fetch the cluster credentials.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check the nodes are ready.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n kube-system
|
||||||
|
```
|
||||||
|
|
||||||
|
## Install Kueue and create associated resources
|
||||||
|
|
||||||
|
1. Change to the ```patterns/batch``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../batch
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Check that the Kueue pods are ready (Use CTRL+C to exit watching)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n kueue-system -w
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check the status of the ClusterQueue
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get clusterqueue cluster-queue -o wide -w
|
||||||
|
```
|
||||||
|
|
||||||
|
9. Check the status of the LocalQueue for the teams
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get localqueue -n team-a local-queue -o wide -w
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get localqueue -n team-b local-queue -o wide -w
|
||||||
|
```
|
||||||
|
|
||||||
|
## Run jobs in the cluster
|
||||||
|
|
||||||
|
1. Create Jobs for namespace team-a and team-b every 10 seconds associated with the corresponding LocalQueue:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./create_jobs.sh job-team-a.yaml job-team-b.yaml 10
|
||||||
|
```
|
||||||
|
|
||||||
|
Hit Ctrl-C when you want to stop the creation of jobs
|
||||||
|
|
||||||
|
2. Observe the workloads being queued up, admitted in the ClusterQueue, and nodes being brought up with GKE Autopilot.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl -n team-a get workloads
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Copy a Job name from the previous step and observe the admission status and events for a Job through the W Workloads API:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl -n team-a describe workload JOB_NAME
|
||||||
|
```
|
||||||
|
|
||||||
|
## Destroy resources (optional)
|
||||||
|
1. Change to the ```patterns/autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Destroy the cluster with the following command.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform destroy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Congratulations
|
||||||
|
|
||||||
|
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
|
||||||
|
|
||||||
|
You’re all set!
|
|
@ -0,0 +1,58 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
variable "credentials_config" {
|
||||||
|
description = "Configure how Terraform authenticates to the cluster."
|
||||||
|
type = object({
|
||||||
|
fleet_host = optional(string)
|
||||||
|
kubeconfig = optional(object({
|
||||||
|
context = optional(string)
|
||||||
|
path = optional(string, "~/.kube/config")
|
||||||
|
}))
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
validation {
|
||||||
|
condition = (
|
||||||
|
(var.credentials_config.fleet_host != null) !=
|
||||||
|
(var.credentials_config.kubeconfig != null)
|
||||||
|
)
|
||||||
|
error_message = "Exactly one of fleet host or kubeconfig must be set."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "kueue_namespace" {
|
||||||
|
description = "Namespaces of the teams running jobs in the clusters."
|
||||||
|
type = string
|
||||||
|
nullable = false
|
||||||
|
default = "kueue-system"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "team_namespaces" {
|
||||||
|
description = "Namespaces of the teams running jobs in the clusters."
|
||||||
|
type = list(string)
|
||||||
|
nullable = false
|
||||||
|
default = [
|
||||||
|
"team-a",
|
||||||
|
"team-b"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "templates_path" {
|
||||||
|
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
|
||||||
|
type = string
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
|
@ -0,0 +1,27 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_version = ">= 1.7.0"
|
||||||
|
required_providers {
|
||||||
|
google = {
|
||||||
|
source = "hashicorp/google"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
google-beta = {
|
||||||
|
source = "hashicorp/google-beta"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,66 @@
|
||||||
|
# Highly Available Kafka on GKE
|
||||||
|
|
||||||
|
<!-- BEGIN TOC -->
|
||||||
|
- [Introduction](#introduction)
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Cluster authentication](#cluster-authentication)
|
||||||
|
- [Kafka Configuration](#kafka-configuration)
|
||||||
|
- [Sample Configuration](#sample-configuration)
|
||||||
|
- [Variables](#variables)
|
||||||
|
<!-- END TOC -->
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
|
||||||
|
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=kafka/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
|
||||||
|
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
This blueprints shows how to a hihgly available Kakfa instance on GKE using the Strimzi operator.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kafka in it.
|
||||||
|
|
||||||
|
The Kafka manifests will download , which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided [Autopilot Cluster Pattern](../autopilot-cluster), you can set the `enable_cloud_nat` option of the `vpc_create` variable.
|
||||||
|
|
||||||
|
## Cluster authentication
|
||||||
|
Once you have a cluster with Internet connectivity, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
|
||||||
|
|
||||||
|
## Kafka Configuration
|
||||||
|
|
||||||
|
This template exposes several variables to configure the Kafka instance:
|
||||||
|
- `namespace` which controls the namespace used to deploy the Kafka instance
|
||||||
|
- `kafka_config` to customize the configuration of the Kafka instance. The default configuration deploys version 3.6.0 with 3 replicas, with a disk of 10Gi and 4096 MB of RAM.
|
||||||
|
- `zookeeper_config` to customize the configuration of the Zookeeper instance. The default configuration deploys 3 replicas, with a disk of 10Gi and 2048 MB of RAM.
|
||||||
|
|
||||||
|
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
|
||||||
|
|
||||||
|
## Sample Configuration
|
||||||
|
|
||||||
|
The following template as a starting point for your terraform.tfvars
|
||||||
|
```tfvars
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
kafka_config = {
|
||||||
|
volume_claim_size = "15Gi"
|
||||||
|
replicas = 4
|
||||||
|
}
|
||||||
|
|
||||||
|
zookeeper_config = {
|
||||||
|
volume_claim_size = "15Gi"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default |
|
||||||
|
|---|---|:---:|:---:|:---:|
|
||||||
|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object({ fleet_host = optional(string) kubeconfig = optional(object({ context = optional(string) path = optional(string, "~/.kube/config") })) })">object({…})</code> | ✓ | |
|
||||||
|
| [kafka_config](variables.tf#L36) | Configure Kafka cluster statefulset parameters. | <code title="object({ replicas = optional(number, 3) volume_claim_size = optional(string, "10Gi") version = optional(string, "3.6.0") jvm_memory = optional(string, "4096m") })">object({…})</code> | | <code>{}</code> |
|
||||||
|
| [namespace](variables.tf#L48) | Namespace used for Redis cluster resources. | <code>string</code> | | <code>"kafka"</code> |
|
||||||
|
| [templates_path](variables.tf#L55) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
|
||||||
|
| [zookeeper_config](variables.tf#L61) | Configure Zookeper cluster statefulset parameters. | <code title="object({ replicas = optional(number, 3) volume_claim_size = optional(string, "10Gi") jvm_memory = optional(string, "2048m") })">object({…})</code> | | <code>{}</code> |
|
||||||
|
<!-- END TFDOC -->
|
|
@ -0,0 +1,49 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
locals {
|
||||||
|
wl_templates = [
|
||||||
|
for f in fileset(local.wl_templates_path, "*yaml") :
|
||||||
|
"${local.wl_templates_path}/${f}"
|
||||||
|
]
|
||||||
|
wl_templates_path = (
|
||||||
|
var.templates_path == null
|
||||||
|
? "${path.module}/manifest-templates"
|
||||||
|
: pathexpand(var.templates_path)
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "helm_release" "strimzi-operator" {
|
||||||
|
name = "strimzi-operator"
|
||||||
|
repository = "https://strimzi.io/charts"
|
||||||
|
chart = "strimzi-kafka-operator"
|
||||||
|
namespace = var.namespace
|
||||||
|
create_namespace = true
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubectl_manifest" "kafka-cluster" {
|
||||||
|
for_each = toset(local.wl_templates)
|
||||||
|
yaml_body = templatefile(each.value, {
|
||||||
|
name = "kafka"
|
||||||
|
namespace = var.namespace
|
||||||
|
kafka_config = var.kafka_config
|
||||||
|
zookeeper_config = var.zookeeper_config
|
||||||
|
})
|
||||||
|
timeouts {
|
||||||
|
create = "30m"
|
||||||
|
}
|
||||||
|
depends_on = [helm_release.strimzi-operator]
|
||||||
|
}
|
|
@ -0,0 +1,148 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: kafka.strimzi.io/v1beta2
|
||||||
|
kind: Kafka
|
||||||
|
metadata:
|
||||||
|
name: "${name}"
|
||||||
|
namespace: "${namespace}"
|
||||||
|
spec:
|
||||||
|
kafka:
|
||||||
|
version: "${kafka_config.version}"
|
||||||
|
replicas: ${kafka_config.replicas}
|
||||||
|
template:
|
||||||
|
pod:
|
||||||
|
tolerations:
|
||||||
|
- key: "app.stateful/component"
|
||||||
|
operator: "Equal"
|
||||||
|
value: "kafka-broker"
|
||||||
|
effect: NoSchedule
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 1
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: "app.stateful/component"
|
||||||
|
operator: In
|
||||||
|
values:
|
||||||
|
- "kafka-broker"
|
||||||
|
topologySpreadConstraints:
|
||||||
|
- maxSkew: 1
|
||||||
|
topologyKey: "topology.kubernetes.io/zone"
|
||||||
|
whenUnsatisfiable: DoNotSchedule
|
||||||
|
labelSelector:
|
||||||
|
matchLabels:
|
||||||
|
app.kubernetes.io/name: kafka
|
||||||
|
strimzi.io/cluster: "${name}"
|
||||||
|
strimzi.io/component-type: kafka
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
memory: 5Gi
|
||||||
|
cpu: "1"
|
||||||
|
limits:
|
||||||
|
memory: 5Gi
|
||||||
|
cpu: "2"
|
||||||
|
jvmOptions:
|
||||||
|
-Xms: ${kafka_config.jvm_memory}
|
||||||
|
-Xmx: ${kafka_config.jvm_memory}
|
||||||
|
listeners:
|
||||||
|
- name: plain
|
||||||
|
port: 9092
|
||||||
|
type: internal
|
||||||
|
tls: false
|
||||||
|
- name: tls
|
||||||
|
port: 9093
|
||||||
|
type: internal
|
||||||
|
tls: true
|
||||||
|
config:
|
||||||
|
offsets.topic.replication.factor: 3
|
||||||
|
transaction.state.log.replication.factor: 3
|
||||||
|
transaction.state.log.min.isr: 2
|
||||||
|
default.replication.factor: 3
|
||||||
|
min.insync.replicas: 2
|
||||||
|
inter.broker.protocol.version: "3.4"
|
||||||
|
storage:
|
||||||
|
type: jbod
|
||||||
|
volumes:
|
||||||
|
- id: 0
|
||||||
|
type: persistent-claim
|
||||||
|
size: ${kafka_config.volume_claim_size}
|
||||||
|
class: premium-rwo
|
||||||
|
deleteClaim: false
|
||||||
|
zookeeper:
|
||||||
|
template:
|
||||||
|
pod:
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 1
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: "app.stateful/component"
|
||||||
|
operator: In
|
||||||
|
values:
|
||||||
|
- "zookeeper"
|
||||||
|
topologySpreadConstraints:
|
||||||
|
- maxSkew: 1
|
||||||
|
topologyKey: "topology.kubernetes.io/zone"
|
||||||
|
whenUnsatisfiable: DoNotSchedule
|
||||||
|
labelSelector:
|
||||||
|
matchLabels:
|
||||||
|
app.kubernetes.io/name: zookeeper
|
||||||
|
strimzi.io/cluster: "${name}"
|
||||||
|
strimzi.io/component-type: zookeeper
|
||||||
|
replicas: ${zookeeper_config.replicas}
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
memory: 2560Mi
|
||||||
|
cpu: 1000m
|
||||||
|
limits:
|
||||||
|
memory: 2560Mi
|
||||||
|
cpu: 2000m
|
||||||
|
jvmOptions:
|
||||||
|
-Xms: ${zookeeper_config.jvm_memory}
|
||||||
|
-Xmx: ${zookeeper_config.jvm_memory}
|
||||||
|
storage:
|
||||||
|
type: persistent-claim
|
||||||
|
size: ${zookeeper_config.volume_claim_size}
|
||||||
|
class: premium-rwo
|
||||||
|
deleteClaim: false
|
||||||
|
entityOperator:
|
||||||
|
tlsSidecar:
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 128Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 128Mi
|
||||||
|
topicOperator:
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 512Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 512Mi
|
||||||
|
userOperator:
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 500m
|
||||||
|
ephemeral-storage: 1Gi
|
||||||
|
memory: 2Gi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
ephemeral-storage: 1Gi
|
||||||
|
memory: 2Gi
|
|
@ -0,0 +1,15 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
|
@ -0,0 +1,69 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
data "google_client_config" "identity" {
|
||||||
|
count = var.credentials_config.fleet_host != null ? 1 : 0
|
||||||
|
}
|
||||||
|
|
||||||
|
# provider "kubernetes" {
|
||||||
|
# config_path = (
|
||||||
|
# var.credentials_config.kubeconfig == null
|
||||||
|
# ? null
|
||||||
|
# : pathexpand(var.credentials_config.kubeconfig.path)
|
||||||
|
# )
|
||||||
|
# config_context = try(
|
||||||
|
# var.credentials_config.kubeconfig.context, null
|
||||||
|
# )
|
||||||
|
# host = (
|
||||||
|
# var.credentials_config.fleet_host == null
|
||||||
|
# ? null
|
||||||
|
# : var.credentials_config.fleet_host
|
||||||
|
# )
|
||||||
|
# token = try(data.google_client_config.identity.0.access_token, null)
|
||||||
|
# }
|
||||||
|
|
||||||
|
provider "kubectl" {
|
||||||
|
host = (
|
||||||
|
var.credentials_config.fleet_host == null
|
||||||
|
? null
|
||||||
|
: var.credentials_config.fleet_host
|
||||||
|
)
|
||||||
|
config_path = (
|
||||||
|
var.credentials_config.kubeconfig == null
|
||||||
|
? null
|
||||||
|
: pathexpand(var.credentials_config.kubeconfig.path)
|
||||||
|
)
|
||||||
|
token = try(data.google_client_config.identity.0.access_token, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
provider "helm" {
|
||||||
|
kubernetes {
|
||||||
|
config_path = (
|
||||||
|
var.credentials_config.kubeconfig == null
|
||||||
|
? null
|
||||||
|
: pathexpand(var.credentials_config.kubeconfig.path)
|
||||||
|
)
|
||||||
|
config_context = try(
|
||||||
|
var.credentials_config.kubeconfig.context, null
|
||||||
|
)
|
||||||
|
host = (
|
||||||
|
var.credentials_config.fleet_host == null
|
||||||
|
? null
|
||||||
|
: var.credentials_config.fleet_host
|
||||||
|
)
|
||||||
|
token = try(data.google_client_config.identity.0.access_token, null)
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,156 @@
|
||||||
|
# Deploy Apache Kafka to GKE using Strimzi
|
||||||
|
|
||||||
|
The guide shows you how to use the Strimzi operator to deploy Apache Kafka clusters ok GKE.
|
||||||
|
|
||||||
|
## Objectives
|
||||||
|
|
||||||
|
This tutorial covers the following steps:
|
||||||
|
|
||||||
|
- Create a GKE cluster.
|
||||||
|
- Deploy and configure the Strimzi operator
|
||||||
|
- Configure Apache Kafka using the Strimzi operator
|
||||||
|
|
||||||
|
Estimated time:
|
||||||
|
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
|
||||||
|
|
||||||
|
To get started, click Start.
|
||||||
|
|
||||||
|
## select/create a project
|
||||||
|
|
||||||
|
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
|
||||||
|
|
||||||
|
## Create the Autopilot GKE cluster
|
||||||
|
|
||||||
|
1. Change to the ```autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file and update any value as needed.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
project_id = "<walkthrough-project-name/>"
|
||||||
|
cluster_name = "cluster"
|
||||||
|
cluster_create = {
|
||||||
|
deletion_protection = false
|
||||||
|
}
|
||||||
|
region = "europe-west1"
|
||||||
|
vpc_create = {
|
||||||
|
enable_cloud_nat = true
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Fetch the cluster credentials.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check the nodes are ready.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n kube-system
|
||||||
|
```
|
||||||
|
|
||||||
|
## Install the Kafka Strimzi operator and create associated resources
|
||||||
|
|
||||||
|
1. Change to the ```patterns/batch``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../redis-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
kafka_config = {
|
||||||
|
volume_claim_size = "15Gi"
|
||||||
|
replicas = 4
|
||||||
|
}
|
||||||
|
zookeeper_config = {
|
||||||
|
volume_claim_size = "15Gi"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Check that the Redis pods are ready
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n kafka
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check that the Redis volumes match the number of replicas
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pv
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Confirm the Kafka object is running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get kafka -n kafka
|
||||||
|
```
|
||||||
|
|
||||||
|
## Destroy resources (optional)
|
||||||
|
1. Change to the ```patterns/autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Destroy the cluster with the following command.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform destroy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Congratulations
|
||||||
|
|
||||||
|
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
|
||||||
|
|
||||||
|
You’re all set!
|
|
@ -0,0 +1,70 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
variable "credentials_config" {
|
||||||
|
description = "Configure how Terraform authenticates to the cluster."
|
||||||
|
type = object({
|
||||||
|
fleet_host = optional(string)
|
||||||
|
kubeconfig = optional(object({
|
||||||
|
context = optional(string)
|
||||||
|
path = optional(string, "~/.kube/config")
|
||||||
|
}))
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
validation {
|
||||||
|
condition = (
|
||||||
|
(var.credentials_config.fleet_host != null) !=
|
||||||
|
(var.credentials_config.kubeconfig != null)
|
||||||
|
)
|
||||||
|
error_message = "Exactly one of fleet host or kubeconfig must be set."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "kafka_config" {
|
||||||
|
description = "Configure Kafka cluster statefulset parameters."
|
||||||
|
type = object({
|
||||||
|
replicas = optional(number, 3)
|
||||||
|
volume_claim_size = optional(string, "10Gi")
|
||||||
|
version = optional(string, "3.6.0")
|
||||||
|
jvm_memory = optional(string, "4096m")
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
default = {}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "namespace" {
|
||||||
|
description = "Namespace used for Redis cluster resources."
|
||||||
|
type = string
|
||||||
|
nullable = false
|
||||||
|
default = "kafka"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "templates_path" {
|
||||||
|
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
|
||||||
|
type = string
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "zookeeper_config" {
|
||||||
|
description = "Configure Zookeper cluster statefulset parameters."
|
||||||
|
type = object({
|
||||||
|
replicas = optional(number, 3)
|
||||||
|
volume_claim_size = optional(string, "10Gi")
|
||||||
|
jvm_memory = optional(string, "2048m")
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
default = {}
|
||||||
|
}
|
|
@ -0,0 +1,27 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_version = ">= 1.7.0"
|
||||||
|
required_providers {
|
||||||
|
google = {
|
||||||
|
source = "hashicorp/google"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
google-beta = {
|
||||||
|
source = "hashicorp/google-beta"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,21 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_providers {
|
||||||
|
kubectl = {
|
||||||
|
source = "gavinbunney/kubectl"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,62 @@
|
||||||
|
# Highly Available Redis Cluster on GKE
|
||||||
|
|
||||||
|
<!-- BEGIN TOC -->
|
||||||
|
- [Introduction](#introduction)
|
||||||
|
- [Requirements](#requirements)
|
||||||
|
- [Cluster authentication](#cluster-authentication)
|
||||||
|
- [Redis Cluster Configuration](#redis-cluster-configuration)
|
||||||
|
- [Sample Configuration](#sample-configuration)
|
||||||
|
- [Variables](#variables)
|
||||||
|
<!-- END TOC -->
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
<a href="https://shell.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git&cloudshell_tutorial=redis-cluster/tutorial.md&cloudshell_git_branch=master&cloudshell_workspace=blueprints/gke/patterns&show=ide%2Cterminal">
|
||||||
|
<img width="200px" src="../../../../assets/images/cloud-shell-button.png">
|
||||||
|
</a>
|
||||||
|
|
||||||
|
This blueprint shows how to deploy a highly available Redis cluster on GKE following Google's recommended practices for creating a stateful application.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying [Autopilot Cluster Pattern](../autopilot-cluster) to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
|
||||||
|
|
||||||
|
## Cluster authentication
|
||||||
|
Once you have a cluster with, create a `terraform.tfvars` and setup the `credentials_config` variable. We recommend using Anthos Fleet to simplify accessing the control plane.
|
||||||
|
|
||||||
|
## Redis Cluster Configuration
|
||||||
|
|
||||||
|
This template exposes several variables to configure the Redis cluster:
|
||||||
|
- `namespace` which controls the namespace used to deploy the Redis instances
|
||||||
|
- `image` to change the container image used by the Redis cluster. Defaults to `redis:6.2` (i.e. the official Redis image, version 6.2)
|
||||||
|
- `stateful_config` to customize the configuration of the Redis' stateful set configuration. The default configuration deploys a 6-node cluster with requests for 1 CPU, 1Gi of RAM and a 10Gi volume.
|
||||||
|
|
||||||
|
Any other configuration can be applied by directly modifying the YAML manifests under the [manifest-templates](manifest-templates) directory.
|
||||||
|
|
||||||
|
## Sample Configuration
|
||||||
|
|
||||||
|
The following template as a starting point for your terraform.tfvars
|
||||||
|
```tfvars
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
statefulset_config = {
|
||||||
|
replicas = 8
|
||||||
|
resource_requests = {
|
||||||
|
cpo = "2"
|
||||||
|
memory = "2Gi"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default |
|
||||||
|
|---|---|:---:|:---:|:---:|
|
||||||
|
| [credentials_config](variables.tf#L17) | Configure how Terraform authenticates to the cluster. | <code title="object({ fleet_host = optional(string) kubeconfig = optional(object({ context = optional(string) path = optional(string, "~/.kube/config") })) })">object({…})</code> | ✓ | |
|
||||||
|
| [image](variables.tf#L36) | Container image to use. | <code>string</code> | | <code>"redis:6.2"</code> |
|
||||||
|
| [namespace](variables.tf#L43) | Namespace used for Redis cluster resources. | <code>string</code> | | <code>"redis"</code> |
|
||||||
|
| [statefulset_config](variables.tf#L50) | Configure Redis cluster statefulset parameters. | <code title="object({ replicas = optional(number, 6) resource_requests = optional(object({ cpu = optional(string, "1") memory = optional(string, "1Gi") }), {}) volume_claim_size = optional(string, "10Gi") })">object({…})</code> | | <code>{}</code> |
|
||||||
|
| [templates_path](variables.tf#L68) | Path where manifest templates will be read from. Set to null to use the default manifests. | <code>string</code> | | <code>null</code> |
|
||||||
|
<!-- END TFDOC -->
|
|
@ -0,0 +1,68 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
locals {
|
||||||
|
wl_templates = [
|
||||||
|
for f in fileset(local.wl_templates_path, "[0-9]*yaml") :
|
||||||
|
"${local.wl_templates_path}/${f}"
|
||||||
|
]
|
||||||
|
wl_templates_path = (
|
||||||
|
var.templates_path == null
|
||||||
|
? "${path.module}/manifest-templates"
|
||||||
|
: pathexpand(var.templates_path)
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_namespace" "default" {
|
||||||
|
metadata {
|
||||||
|
name = var.namespace
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_manifest" "default" {
|
||||||
|
for_each = toset(local.wl_templates)
|
||||||
|
manifest = yamldecode(templatefile(each.value, {
|
||||||
|
image = var.image
|
||||||
|
namespace = kubernetes_namespace.default.metadata.0.name
|
||||||
|
statefulset_config = var.statefulset_config
|
||||||
|
}))
|
||||||
|
dynamic "wait" {
|
||||||
|
for_each = strcontains(each.key, "statefulset") ? [""] : []
|
||||||
|
content {
|
||||||
|
fields = {
|
||||||
|
"status.readyReplicas" = var.statefulset_config.replicas
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
timeouts {
|
||||||
|
create = "30m"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "kubernetes_manifest" "cluster-start" {
|
||||||
|
manifest = yamldecode(templatefile("${local.wl_templates_path}/start-cluster.yaml", {
|
||||||
|
image = var.image
|
||||||
|
namespace = kubernetes_namespace.default.metadata.0.name
|
||||||
|
nodes = [
|
||||||
|
for i in range(var.statefulset_config.replicas) :
|
||||||
|
"redis-${i}.redis-cluster.${var.namespace}.svc.cluster.local"
|
||||||
|
]
|
||||||
|
}))
|
||||||
|
field_manager {
|
||||||
|
force_conflicts = true
|
||||||
|
}
|
||||||
|
depends_on = [kubernetes_manifest.default]
|
||||||
|
}
|
|
@ -0,0 +1,28 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: redis-cluster
|
||||||
|
namespace: ${namespace}
|
||||||
|
data:
|
||||||
|
redis.conf: |+
|
||||||
|
cluster-enabled yes
|
||||||
|
cluster-node-timeout 15000
|
||||||
|
cluster-config-file /data/nodes.conf
|
||||||
|
appendonly yes
|
||||||
|
protected-mode no
|
||||||
|
dir /data
|
||||||
|
port 6379
|
|
@ -0,0 +1,46 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: redis-probes
|
||||||
|
namespace: ${namespace}
|
||||||
|
data:
|
||||||
|
readiness.sh: |-
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
pingResponse="$(redis-cli -h localhost ping)"
|
||||||
|
if [ "$?" -eq "124" ]; then
|
||||||
|
echo "PING timed out"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$pingResponse" != "PONG"]; then
|
||||||
|
echo "$pingResponse"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
liveness.sh: |-
|
||||||
|
#!/bin/sh
|
||||||
|
|
||||||
|
pingResponse="$(redis-cli -h localhost ping | head -n1 | awk '{print $1;}')"
|
||||||
|
if [ "$?" -eq "124" ]; then
|
||||||
|
echo "PING timed out"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
if [ "$pingResponse" != "PONG"] && [ "$pingResponse" != "LOADING" ] && [ "$pingResponse" != "MASTERDOWN" ]; then
|
||||||
|
echo "$pingResponse"
|
||||||
|
exit 1
|
||||||
|
fi
|
|
@ -0,0 +1,24 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: policy/v1
|
||||||
|
kind: PodDisruptionBudget
|
||||||
|
metadata:
|
||||||
|
name: redis-pdb
|
||||||
|
namespace: ${namespace}
|
||||||
|
spec:
|
||||||
|
minAvailable: 3
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: redis
|
|
@ -0,0 +1,111 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: StatefulSet
|
||||||
|
metadata:
|
||||||
|
name: redis
|
||||||
|
namespace: ${namespace}
|
||||||
|
spec:
|
||||||
|
serviceName: "redis-cluster"
|
||||||
|
replicas: ${statefulset_config.replicas}
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: redis
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: redis
|
||||||
|
appCluster: redis-cluster
|
||||||
|
spec:
|
||||||
|
terminationGracePeriodSeconds: 20
|
||||||
|
affinity:
|
||||||
|
podAntiAffinity:
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 100
|
||||||
|
podAffinityTerm:
|
||||||
|
labelSelector:
|
||||||
|
matchExpressions:
|
||||||
|
- key: app
|
||||||
|
operator: In
|
||||||
|
values:
|
||||||
|
- redis
|
||||||
|
topologyKey: kubernetes.io/hostname
|
||||||
|
containers:
|
||||||
|
- name: redis
|
||||||
|
image: ${image}
|
||||||
|
command:
|
||||||
|
- "redis-server"
|
||||||
|
args:
|
||||||
|
- "/conf/redis.conf"
|
||||||
|
- "--protected-mode"
|
||||||
|
- "no"
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: ${statefulset_config.resource_requests.cpu}
|
||||||
|
ephemeral-storage: ${statefulset_config.volume_claim_size}
|
||||||
|
memory: ${statefulset_config.resource_requests.memory}
|
||||||
|
ports:
|
||||||
|
- name: redis
|
||||||
|
containerPort: 6379
|
||||||
|
protocol: "TCP"
|
||||||
|
- name: cluster
|
||||||
|
containerPort: 16379
|
||||||
|
protocol: "TCP"
|
||||||
|
startupProbe:
|
||||||
|
periodSeconds: 5
|
||||||
|
timeoutSeconds: 5
|
||||||
|
successThreshold: 1
|
||||||
|
failureThreshold: 20
|
||||||
|
tcpSocket:
|
||||||
|
port: redis
|
||||||
|
livenessProbe:
|
||||||
|
periodSeconds: 5
|
||||||
|
timeoutSeconds: 5
|
||||||
|
successThreshold: 1
|
||||||
|
failureThreshold: 5
|
||||||
|
exec:
|
||||||
|
command: ["sh", "-c", "/probes/liveness.sh"]
|
||||||
|
readinessProbe:
|
||||||
|
periodSeconds: 5
|
||||||
|
timeoutSeconds: 1
|
||||||
|
successThreshold: 1
|
||||||
|
failureThreshold: 5
|
||||||
|
exec:
|
||||||
|
command: ["sh", "-c", "/probes/readiness.sh"]
|
||||||
|
volumeMounts:
|
||||||
|
- name: conf
|
||||||
|
mountPath: /conf
|
||||||
|
- name: data
|
||||||
|
mountPath: /data
|
||||||
|
- name: probes
|
||||||
|
mountPath: /probes
|
||||||
|
readOnly: true
|
||||||
|
volumes:
|
||||||
|
- name: conf
|
||||||
|
configMap:
|
||||||
|
name: redis-cluster
|
||||||
|
defaultMode: 493
|
||||||
|
- name: probes
|
||||||
|
configMap:
|
||||||
|
name: redis-probes
|
||||||
|
defaultMode: 365
|
||||||
|
volumeClaimTemplates:
|
||||||
|
- metadata:
|
||||||
|
name: data
|
||||||
|
spec:
|
||||||
|
accessModes: [ "ReadWriteOnce" ]
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: ${statefulset_config.volume_claim_size}
|
|
@ -0,0 +1,19 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: redis-cluster
|
||||||
|
namespace: ${namespace}
|
||||||
|
spec:
|
||||||
|
clusterIP: None
|
||||||
|
ports:
|
||||||
|
- name: redis-port
|
||||||
|
port: 6379
|
||||||
|
protocol: TCP
|
||||||
|
targetPort: 6379
|
||||||
|
selector:
|
||||||
|
app: redis
|
||||||
|
appCluster: redis-cluster
|
||||||
|
sessionAffinity: None
|
||||||
|
type: ClusterIP
|
|
@ -0,0 +1,26 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
urls=$(kubectl get pods -l app=redis -o jsonpath='{range.items[*]}{.status.podIP} ')
|
||||||
|
command="kubectl exec -it redis-0 -- redis-cli --cluster create --cluster-replicas 1 "
|
||||||
|
|
||||||
|
for url in $urls
|
||||||
|
do
|
||||||
|
command+=$url":6379 "
|
||||||
|
done
|
||||||
|
|
||||||
|
echo "Executing command: " $command
|
||||||
|
$command
|
|
@ -0,0 +1,56 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: redis-cluster-start
|
||||||
|
namespace: ${namespace}
|
||||||
|
spec:
|
||||||
|
suspend: false
|
||||||
|
completions: 1
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: Never
|
||||||
|
volumes:
|
||||||
|
- name: shared-data
|
||||||
|
emptyDir: {}
|
||||||
|
|
||||||
|
initContainers:
|
||||||
|
# we resolve node names in an init container using alpine
|
||||||
|
# because the redis image doesn't include nslookup
|
||||||
|
- name: resolve-node-names
|
||||||
|
image: alpine
|
||||||
|
volumeMounts:
|
||||||
|
- name: shared-data
|
||||||
|
mountPath: /tmp/shared-data
|
||||||
|
command:
|
||||||
|
- /bin/sh
|
||||||
|
- -c
|
||||||
|
- |
|
||||||
|
%{~ for n in nodes ~}
|
||||||
|
echo "$(nslookup ${n} | awk '/^Address: / { print $2 }'):6379" >> /tmp/shared-data/nodes
|
||||||
|
%{~ endfor ~}
|
||||||
|
|
||||||
|
containers:
|
||||||
|
- name: redis-cluster-start
|
||||||
|
image: ${image}
|
||||||
|
volumeMounts:
|
||||||
|
- name: shared-data
|
||||||
|
mountPath: /tmp/shared-data
|
||||||
|
command:
|
||||||
|
- /bin/sh
|
||||||
|
- -c
|
||||||
|
- |
|
||||||
|
redis-cli --cluster-yes --cluster-replicas 1 --cluster create $(cat /tmp/shared-data/nodes)
|
|
@ -0,0 +1,36 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
data "google_client_config" "identity" {
|
||||||
|
count = var.credentials_config.fleet_host != null ? 1 : 0
|
||||||
|
}
|
||||||
|
|
||||||
|
provider "kubernetes" {
|
||||||
|
config_path = (
|
||||||
|
var.credentials_config.kubeconfig == null
|
||||||
|
? null
|
||||||
|
: pathexpand(var.credentials_config.kubeconfig.path)
|
||||||
|
)
|
||||||
|
config_context = try(
|
||||||
|
var.credentials_config.kubeconfig.context, null
|
||||||
|
)
|
||||||
|
host = (
|
||||||
|
var.credentials_config.fleet_host == null
|
||||||
|
? null
|
||||||
|
: var.credentials_config.fleet_host
|
||||||
|
)
|
||||||
|
token = try(data.google_client_config.identity.0.access_token, null)
|
||||||
|
}
|
|
@ -0,0 +1,155 @@
|
||||||
|
# Deploy a Redis cluster on GKE
|
||||||
|
|
||||||
|
|
||||||
|
## Objectives
|
||||||
|
|
||||||
|
This tutorial covers the following steps:
|
||||||
|
|
||||||
|
- Create a GKE cluster.
|
||||||
|
- Create a Redis Cluster on GKE.
|
||||||
|
- Confirm the redis is up and running.
|
||||||
|
- Confirm creation of the volumes for the stateful set.
|
||||||
|
- Confirm the Pod Disruption Budget (PDB).
|
||||||
|
|
||||||
|
Estimated time:
|
||||||
|
<walkthrough-tutorial-duration duration="30"></walkthrough-tutorial-duration>
|
||||||
|
|
||||||
|
To get started, click Start.
|
||||||
|
|
||||||
|
## select/create a project
|
||||||
|
|
||||||
|
<walkthrough-project-setup billing="true"></walkthrough-project-setup>
|
||||||
|
|
||||||
|
## Create the Autopilot GKE cluster
|
||||||
|
|
||||||
|
1. Change to the ```autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="autopilot-cluster/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file and update any value as needed.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
project_id = "<walkthrough-project-name/>"
|
||||||
|
cluster_name = "cluster"
|
||||||
|
cluster_create = {
|
||||||
|
deletion_protection = false
|
||||||
|
}
|
||||||
|
region = "europe-west1"
|
||||||
|
vpc_create = { }
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Fetch the cluster credentials.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
gcloud container fleet memberships get-credentials cluster --project "<walkthrough-project-name/>"
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check the nodes are ready.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n kube-system
|
||||||
|
```
|
||||||
|
|
||||||
|
## Install Redis and create associated resources
|
||||||
|
|
||||||
|
1. Change to the ```patterns/batch``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../redis-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Create a new file ```terraform.tfvars``` in that directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
touch terraform.tfvars
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Open the <walkthrough-editor-open-file filePath="batch/terraform.tfvars">file</walkthrough-editor-open-file> for editing.
|
||||||
|
|
||||||
|
4. Paste the following content in the file.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
credentials_config = {
|
||||||
|
kubeconfig = {
|
||||||
|
path = "~/.kube/config"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
statefulset_config = {
|
||||||
|
replicas = 8
|
||||||
|
resource_requests = {
|
||||||
|
cpu = "1"
|
||||||
|
memory = "1.5Gi"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Initialize the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Apply the terraform configuration.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Check that the Redis pods are ready
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pods -n redis
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Check that the Redis volumes match the number of replicas
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl get pv
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Confirm the Pod Disruption Budget for redis guarantees at least 3 pods are up during a voluntary disruption
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl describe pdb redis-pdb -n redis
|
||||||
|
```
|
||||||
|
|
||||||
|
## Destroy resources (optional)
|
||||||
|
1. Change to the ```patterns/autopilot-cluster``` directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ../autopilot-cluster
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Destroy the cluster with the following command.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform destroy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Congratulations
|
||||||
|
|
||||||
|
<walkthrough-conclusion-trophy></walkthrough-conclusion-trophy>
|
||||||
|
|
||||||
|
You’re all set!
|
|
@ -0,0 +1,72 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2024 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
variable "credentials_config" {
|
||||||
|
description = "Configure how Terraform authenticates to the cluster."
|
||||||
|
type = object({
|
||||||
|
fleet_host = optional(string)
|
||||||
|
kubeconfig = optional(object({
|
||||||
|
context = optional(string)
|
||||||
|
path = optional(string, "~/.kube/config")
|
||||||
|
}))
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
validation {
|
||||||
|
condition = (
|
||||||
|
(var.credentials_config.fleet_host != null) !=
|
||||||
|
(var.credentials_config.kubeconfig != null)
|
||||||
|
)
|
||||||
|
error_message = "Exactly one of fleet host or kubeconfig must be set."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "image" {
|
||||||
|
description = "Container image to use."
|
||||||
|
type = string
|
||||||
|
nullable = false
|
||||||
|
default = "redis:6.2"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "namespace" {
|
||||||
|
description = "Namespace used for Redis cluster resources."
|
||||||
|
type = string
|
||||||
|
nullable = false
|
||||||
|
default = "redis"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "statefulset_config" {
|
||||||
|
description = "Configure Redis cluster statefulset parameters."
|
||||||
|
type = object({
|
||||||
|
replicas = optional(number, 6)
|
||||||
|
resource_requests = optional(object({
|
||||||
|
cpu = optional(string, "1")
|
||||||
|
memory = optional(string, "1Gi")
|
||||||
|
}), {})
|
||||||
|
volume_claim_size = optional(string, "10Gi")
|
||||||
|
})
|
||||||
|
nullable = false
|
||||||
|
default = {}
|
||||||
|
validation {
|
||||||
|
condition = var.statefulset_config.replicas >= 6
|
||||||
|
error_message = "The minimum number of Redis cluster replicas is 6."
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "templates_path" {
|
||||||
|
description = "Path where manifest templates will be read from. Set to null to use the default manifests."
|
||||||
|
type = string
|
||||||
|
default = null
|
||||||
|
}
|
|
@ -0,0 +1,27 @@
|
||||||
|
# Copyright 2024 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
terraform {
|
||||||
|
required_version = ">= 1.7.0"
|
||||||
|
required_providers {
|
||||||
|
google = {
|
||||||
|
source = "hashicorp/google"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
google-beta = {
|
||||||
|
source = "hashicorp/google-beta"
|
||||||
|
version = ">= 5.11.0, < 6.0.0" # tftest
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
Loading…
Reference in New Issue