6941313c7d
* factories refactor doc * Adds file schema and filesystem organization * Update 20231106-factories.md * move factories out of blueprints and create new factories README * align factory in billing-account module * align factory in dataplex-datascan module * align factory in billing-account module * align factory in net-firewall-policy module * align factory in dns-response-policy module * align factory in net-vpc-firewall module * align factory in net-vpc module * align factory variable names in FAST * remove decentralized firewall blueprint * bump terraform version * bump module versions * update top-level READMEs * move project factory to modules * fix variable names and tests * tfdoc * remove changelog link * add project factory to top-level README * fix cludrun eventarc diff * fix README * fix cludrun eventarc diff --------- Co-authored-by: Simone Ruffilli <sruffilli@google.com> |
||
---|---|---|
.. | ||
manifest-templates | ||
README.md | ||
create_jobs.sh | ||
job-team-a.yaml | ||
job-team-b.yaml | ||
main.tf | ||
providers.tf | ||
tutorial.md | ||
variables.tf | ||
versions.tf |
README.md
Batch Processing on GKE with Kueue
Introduction
This blueprint shows how to deploy a batch system using Kueue to perform job queuing on Google Kubernetes Engine (GKE) using Terraform.
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
Requirements
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying Autopilot Cluster Pattern to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
The Kueue manifests use container images hosted by registry.k8s.io, which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided Autopilot Cluster Pattern, you can set the enable_cloud_nat
option of the vpc_create
variable.
Cluster authentication
Once you have a cluster with Internet connectivity, create a terraform.tfvars
and setup the credentials_config
variable. We recommend using Anthos Fleet to simplify accessing the control plane.
Kueue Configuration
Only two variables are available to control Kueue's configuration:
teams_namespaces
which controls the namespaces used by different teams to run jobs.kueue_namespace
which controls the namepsace to deploy Kueue's own resources.
Any other configuration can be applied by directly modifying the YAML manifests under the manifest-templates directory.
Sample Configuration
The following template as a starting point for your terraform.tfvars
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
teams_namespaces = [
"team-a",
"team-b"
]
Variables
name | description | type | required | default |
---|---|---|---|---|
credentials_config | Configure how Terraform authenticates to the cluster. | object({…}) |
✓ | |
kueue_namespace | Namespaces of the teams running jobs in the clusters. | string |
"kueue-system" |
|
team_namespaces | Namespaces of the teams running jobs in the clusters. | list(string) |
[…] |
|
templates_path | Path where manifest templates will be read from. Set to null to use the default manifests. | string |
null |