cloud-foundation-fabric/blueprints/gke/patterns/batch/README.md

3.8 KiB

Batch Processing on GKE with Kueue

Introduction

This blueprint shows how to deploy a batch system using Kueue to perform job queuing on Google Kubernetes Engine (GKE) using Terraform.

Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.

Requirements

This blueprint assumes the GKE cluster already exists. We recommend using the accompanying Autopilot Cluster Pattern to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.

The Kueue manifests use container images hosted by registry.k8s.io, which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided Autopilot Cluster Pattern, you can set the enable_cloud_nat option of the vpc_create variable.

Cluster authentication

Once you have a cluster with Internet connectivity, create a terraform.tfvars and setup the credentials_config variable. We recommend using Anthos Fleet to simplify accessing the control plane.

Kueue Configuration

Only two variables are available to control Kueue's configuration:

  • teams_namespaces which controls the namespaces used by different teams to run jobs.
  • kueue_namespace which controls the namepsace to deploy Kueue's own resources.

Any other configuration can be applied by directly modifying the YAML manifests under the manifest-templates directory.

Sample Configuration

The following template as a starting point for your terraform.tfvars

credentials_config = {
  kubeconfig = {
    path = "~/.kube/config"
  }
}
teams_namespaces = [
  "team-a",
  "team-b"
]

Variables

name description type required default
credentials_config Configure how Terraform authenticates to the cluster. object({…})
kueue_namespace Namespaces of the teams running jobs in the clusters. string "kueue-system"
team_namespaces Namespaces of the teams running jobs in the clusters. list(string) […]
templates_path Path where manifest templates will be read from. Set to null to use the default manifests. string null