2022-09-09 07:40:37 -07:00
# GKE Multitenant Blueprint
2022-07-29 06:09:57 -07:00
2022-09-09 07:40:37 -07:00
This blueprint presents an opinionated architecture to handle multiple homogeneous GKE clusters. The general idea behind this blueprint is to deploy a single project hosting multiple clusters leveraging several useful GKE features.
2022-08-30 12:33:09 -07:00
2022-09-09 07:40:37 -07:00
The pattern used in this design is useful, for blueprint, in cases where multiple clusters host/support the same workloads, such as in the case of a multi-regional deployment. Furthermore, combined with Anthos Config Sync and proper RBAC, this architecture can be used to host multiple tenants (e.g. teams, applications) sharing the clusters.
2022-08-30 12:33:09 -07:00
FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052)
* rename stages
* remove support for external org billing, rename output files
* resman: make groups optional, align on new billing account variable
* bootstrap: multitenant outputs
* tenant bootstrap stage, untested
* fix folder name
* fix stage 0 output names
* optional creation for tag keys in organization module
* single tenant bootstrap minus tag
* rename output files, add tenant tag key
* fix organization module tag values output
* test skipping creation for tags in organization module
* single tenant bootstrap plan working
* multitenant bootstrap
* tfdoc
* fix check links error messages
* fix links
* tfdoc
* fix links
* rename fast tests, fix bootstrap tests
* multitenant stages have their own folder, simplify stage numbering
* stage renumbering
* wip
* rename tests
* exclude fast providers in fixture
* stage 0 tests
* stage 1 tests
* network stages tests
* stage tests
* tfdoc
* fix links
* tfdoc
* multitenant tests
* remove local files
* stage links command
* fix links script, TODO
* wip
* wip single tenant bootstrap
* working tenant bootstrap
* update gitignore
* remove local files
* tfdoc
* remove local files
* allow tests for tenant bootstrap stage
* tenant bootstrap proxies stage 1 tfvars
* stage 2 and 3 service accounts and IAM in tenant bootstrap
* wip
* wip
* wip
* drop multitenant bootstrap
* tfdoc
* add missing stage 2 SAs, fix org-level IAM condition
* wip
* wip
* optional tag value creation in organization module
* stage 1 working
* linting
* linting
* READMEs
* wip
* Make stage-links script work in old macos bash
* stage links command help
* fix output file names
* diagrams
* fix svg
* stage 0 skeleton and diagram
* test svg
* test svg
* test diagram
* diagram
* readme
* fix stage links script
* stage 0 readme
* README changes
* stage readmes
* fix outputs order
* fix link
* fix tests
* stage 1 test
* skip stage example
* boilerplate
* fix tftest skip
* default bootstrap stage log sinks to log buckets
* add logging to tenant bootstrap
* move iam variables out of tenant config
* fix cicd, reintroduce missing variable
* use optional in stage 1 cicd variable
* rename extras stage
* rename and move identity providers local, use optional for cicd variable
* tfdoc
* add support for wif pool and providers, ci/cd
* tfdoc
* fix links
* better handling of modules repository
* add missing role on logging project
* fix cicd pools in locals, test cicd
* fix workflow extension
* fix module source replacement
* allow tenant bootstrap cicd sa to impersonate resman sa
* tenant workflow templates fix for no providers file
* fix output files, push github workflow template to new repository
* remove try from outpout files
* align stage 1 cicd internals to stage 0
* tfdoc
* tests
* fix tests
* tests
* improve variable descriptions
* use optional in fast features
* actually create tenant log sinks, and allow the resman sa to do it
* test
* tests
* aaaand tests again
* fast features tenant override
* fast features tenant override
* fix wording
* add missing comment
* configure pf service accounts
* add missing comment
* tfdoc
* tests
* IAM docs
* update copyright
---------
Co-authored-by: Julio Castillo <jccb@google.com>
2023-02-04 06:00:45 -08:00
This blueprint is used as part of the [FAST GKE stage ](../../../fast/stages/3-gke-multitenant/ ) but it can also be used independently if desired.
2022-07-29 06:09:57 -07:00
2022-08-12 02:24:21 -07:00
< p align = "center" >
< img src = "diagram.png" alt = "GKE multitenant" >
< / p >
2022-09-02 02:53:35 -07:00
The overall architecture is based on the following design decisions:
- All clusters are assumed to be [private ](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters ), therefore only [VPC-native clusters ](https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips ) are supported.
- Logging and monitoring configured to use Cloud Operations for system components and user workloads.
2022-09-08 13:23:00 -07:00
- [GKE metering ](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-usage-metering ) enabled by default and stored in a bigquery dataset created within the project.
2022-09-02 02:53:35 -07:00
- Optional [GKE Fleet ](https://cloud.google.com/kubernetes-engine/docs/fleets-overview ) support with the possibility to enable any of the following features:
- [Fleet workload identity ](https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity )
- [Anthos Config Management ](https://cloud.google.com/anthos-config-management/docs/overview )
- [Anthos Service Mesh ](https://cloud.google.com/service-mesh/docs/overview )
- [Anthos Identity Service ](https://cloud.google.com/anthos/identity/setup/fleet )
- [Multi-cluster services ](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-services )
- [Multi-cluster ingress ](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-ingress ).
- Support for [Config Sync ](https://cloud.google.com/anthos-config-management/docs/config-sync-overview ), [Hierarchy Controller ](https://cloud.google.com/anthos-config-management/docs/concepts/hierarchy-controller ), and [Policy Controller ](https://cloud.google.com/anthos-config-management/docs/concepts/policy-controller ) when using Anthos Config Management.
- [Groups for GKE ](https://cloud.google.com/kubernetes-engine/docs/how-to/google-groups-rbac ) can be enabled to facilitate the creation of flexible RBAC policies referencing group principals.
- Support for [application layer secret encryption ](https://cloud.google.com/kubernetes-engine/docs/how-to/encrypting-secrets ).
- Support to customize peering configuration of the control plane VPC (e.g. to import/export routes to the peered network)
- Some features are enabled by default in all clusters:
- [Intranode visibility ](https://cloud.google.com/kubernetes-engine/docs/how-to/intranode-visibility )
- [Dataplane v2 ](https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2 )
- [Shielded GKE nodes ](https://cloud.google.com/kubernetes-engine/docs/how-to/shielded-gke-nodes )
- [Workload identity ](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity )
- [Node local DNS cache ](https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache )
- [Use of the GCE persistent disk CSI driver ](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver )
- Node [auto-upgrade ](https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades ) and [auto-repair ](https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-repair ) for all node pools
<!--
- [GKE subsetting for L4 internal load balancers ](https://cloud.google.com/kubernetes-engine/docs/concepts/service-load-balancer#subsetting ) enabled by default in all clusters
-->
## Basic usage
2022-10-12 03:59:36 -07:00
The following example shows how to deploy two clusters and one node pool for each
2022-09-06 06:24:25 -07:00
```hcl
2022-10-12 03:59:36 -07:00
locals {
cluster_defaults = {
private_cluster_config = {
enable_private_endpoint = true
master_global_access = true
}
}
subnet_self_links = {
ew1 = "projects/prj-host/regions/europe-west1/subnetworks/gke-0"
ew3 = "projects/prj-host/regions/europe-west3/subnetworks/gke-0"
}
}
module "gke-fleet" {
2022-09-12 02:54:18 -07:00
source = "./fabric/blueprints/gke/multitenant-fleet/"
2022-09-07 07:31:42 -07:00
project_id = var.project_id
billing_account_id = var.billing_account_id
folder_id = var.folder_id
prefix = "myprefix"
group_iam = {
"gke-admin@example.com" = [
"roles/container.admin"
]
}
iam = {
"roles/container.clusterAdmin" = [
"cicd@my-cicd-project.iam.gserviceaccount.com"
]
}
clusters = {
2022-10-12 03:59:36 -07:00
cluster-0 = {
location = "europe-west1"
private_cluster_config = local.cluster_defaults.private_cluster_config
vpc_config = {
2022-12-16 03:53:56 -08:00
subnetwork = local.subnet_self_links.ew1
2022-10-12 03:59:36 -07:00
master_ipv4_cidr_block = "172.16.10.0/28"
2022-09-07 07:31:42 -07:00
}
2022-08-10 06:59:56 -07:00
}
2022-10-12 03:59:36 -07:00
cluster-1 = {
location = "europe-west3"
private_cluster_config = local.cluster_defaults.private_cluster_config
vpc_config = {
2022-12-16 03:53:56 -08:00
subnetwork = local.subnet_self_links.ew3
2022-10-12 03:59:36 -07:00
master_ipv4_cidr_block = "172.16.20.0/28"
2022-08-10 06:59:56 -07:00
}
}
}
2022-09-07 07:31:42 -07:00
nodepools = {
2022-10-12 03:59:36 -07:00
cluster-0 = {
nodepool-0 = {
node_config = {
2022-12-16 03:53:56 -08:00
disk_type = "pd-balanced"
2022-10-12 03:59:36 -07:00
machine_type = "n2-standard-4"
2022-12-16 03:53:56 -08:00
spot = true
2022-10-12 03:59:36 -07:00
}
2022-09-07 07:31:42 -07:00
}
2022-08-10 06:59:56 -07:00
}
2022-10-12 03:59:36 -07:00
cluster-1 = {
nodepool-0 = {
node_config = {
2022-12-16 03:53:56 -08:00
disk_type = "pd-balanced"
2022-10-12 03:59:36 -07:00
machine_type = "n2-standard-4"
2022-09-07 07:31:42 -07:00
}
2022-08-10 06:59:56 -07:00
}
}
}
2022-10-12 03:59:36 -07:00
vpc_config = {
host_project_id = "my-host-project-id"
vpc_self_link = "projects/prj-host/global/networks/prod-0"
}
2022-08-10 06:59:56 -07:00
}
2023-01-29 04:42:06 -08:00
# tftest modules=7 resources=27
2022-08-10 06:59:56 -07:00
```
2022-10-12 03:59:36 -07:00
## GKE Fleet
2022-09-07 07:31:42 -07:00
2022-09-08 13:23:00 -07:00
This example deploys two clusters and configures several GKE Fleet features:
2022-09-07 07:31:42 -07:00
2022-09-08 00:26:04 -07:00
- Enables [multi-cluster ingress ](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-ingress ) and sets the configuration cluster to be `cluster-eu1` .
2022-09-08 13:23:00 -07:00
- Enables [Multi-cluster services ](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-services ) and assigns the [required roles ](https://cloud.google.com/kubernetes-engine/docs/how-to/multi-cluster-services#authenticating ) to its service accounts.
2022-09-08 00:26:04 -07:00
- A `default` Config Management template is created with binary authorization, config sync enabled with a git repository, hierarchy controller, and policy controller.
- The two clusters are configured to use the `default` Config Management template.
2022-09-07 07:31:42 -07:00
2022-08-10 06:59:56 -07:00
```hcl
2022-10-12 03:59:36 -07:00
locals {
subnet_self_links = {
ew1 = "projects/prj-host/regions/europe-west1/subnetworks/gke-0"
ew3 = "projects/prj-host/regions/europe-west3/subnetworks/gke-0"
}
}
2022-09-07 07:31:42 -07:00
module "gke" {
2022-09-12 02:54:18 -07:00
source = "./fabric/blueprints/gke/multitenant-fleet/"
2022-09-07 07:31:42 -07:00
project_id = var.project_id
billing_account_id = var.billing_account_id
folder_id = var.folder_id
prefix = "myprefix"
clusters = {
2022-10-12 03:59:36 -07:00
cluster-0 = {
2022-12-16 03:53:56 -08:00
location = "europe-west1"
2022-10-12 03:59:36 -07:00
vpc_config = {
subnetwork = local.subnet_self_links.ew1
2022-09-07 07:31:42 -07:00
}
}
2022-10-12 03:59:36 -07:00
cluster-1 = {
2022-12-16 03:53:56 -08:00
location = "europe-west3"
2022-10-12 03:59:36 -07:00
vpc_config = {
subnetwork = local.subnet_self_links.ew3
2022-09-07 07:31:42 -07:00
}
}
}
nodepools = {
2022-10-12 03:59:36 -07:00
cluster-0 = {
nodepool-0 = {
node_config = {
2022-12-16 03:53:56 -08:00
disk_type = "pd-balanced"
2022-10-12 03:59:36 -07:00
machine_type = "n2-standard-4"
2022-12-16 03:53:56 -08:00
spot = true
2022-10-12 03:59:36 -07:00
}
2022-09-07 07:31:42 -07:00
}
}
2022-10-12 03:59:36 -07:00
cluster-1 = {
nodepool-0 = {
node_config = {
2022-12-16 03:53:56 -08:00
disk_type = "pd-balanced"
2022-10-12 03:59:36 -07:00
machine_type = "n2-standard-4"
}
2022-08-10 06:59:56 -07:00
}
}
}
2022-09-08 00:26:04 -07:00
fleet_features = {
appdevexperience = false
configmanagement = true
identityservice = true
2022-10-12 03:59:36 -07:00
multiclusteringress = "cluster-0"
2022-09-08 00:26:04 -07:00
multiclusterservicediscovery = true
servicemesh = true
}
fleet_workload_identity = true
2022-09-07 07:31:42 -07:00
fleet_configmanagement_templates = {
default = {
2022-09-08 00:26:04 -07:00
binauthz = true
2022-09-07 07:31:42 -07:00
config_sync = {
git = {
gcp_service_account_email = null
https_proxy = null
policy_dir = "configsync"
secret_type = "none"
source_format = "hierarchy"
sync_branch = "main"
2022-09-08 00:26:04 -07:00
sync_repo = "https://github.com/myorg/myrepo"
2022-09-07 07:31:42 -07:00
sync_rev = null
sync_wait_secs = null
}
prevent_drift = true
source_format = "hierarchy"
}
2022-09-08 00:26:04 -07:00
hierarchy_controller = {
enable_hierarchical_resource_quota = true
enable_pod_tree_labels = true
}
2022-12-16 03:53:56 -08:00
policy_controller = {
2022-09-08 00:26:04 -07:00
audit_interval_seconds = 30
exemptable_namespaces = ["kube-system"]
log_denies_enabled = true
referential_rules_enabled = true
template_library_installed = true
}
2022-12-16 03:53:56 -08:00
version = "1.10.2"
2022-09-07 07:31:42 -07:00
}
}
fleet_configmanagement_clusters = {
2022-10-12 03:59:36 -07:00
default = ["cluster-0", "cluster-1"]
}
vpc_config = {
host_project_id = "my-host-project-id"
vpc_self_link = "projects/prj-host/global/networks/prod-0"
2022-09-07 07:31:42 -07:00
}
2022-08-10 06:59:56 -07:00
}
2022-09-07 07:31:42 -07:00
2023-01-29 04:42:06 -08:00
# tftest modules=8 resources=38
2022-08-10 06:59:56 -07:00
```
2022-10-12 03:59:36 -07:00
<!-- TFDOC OPTS files:1 -->
2022-07-29 06:09:57 -07:00
<!-- BEGIN TFDOC -->
## Files
| name | description | modules |
|---|---|---|
2022-10-12 03:59:36 -07:00
| [gke-clusters.tf ](./gke-clusters.tf ) | GKE clusters. | < code > gke-cluster</ code > |
| [gke-hub.tf ](./gke-hub.tf ) | GKE hub configuration. | < code > gke-hub</ code > |
| [gke-nodepools.tf ](./gke-nodepools.tf ) | GKE nodepools. | < code > gke-nodepool</ code > |
| [main.tf ](./main.tf ) | Project and usage dataset. | < code > bigquery-dataset</ code > · < code > project</ code > |
2022-07-29 06:09:57 -07:00
| [outputs.tf ](./outputs.tf ) | Output variables. | |
| [variables.tf ](./variables.tf ) | Module variables. | |
## Variables
2022-10-12 03:59:36 -07:00
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [billing_account_id ](variables.tf#L17 ) | Billing account id. | < code > string</ code > | ✓ | |
2022-10-25 14:18:50 -07:00
| [folder_id ](variables.tf#L132 ) | Folder used for the GKE project in folders/nnnnnnnnnnn format. | < code > string</ code > | ✓ | |
2022-11-23 02:09:00 -08:00
| [prefix ](variables.tf#L179 ) | Prefix used for resource names. | < code > string</ code > | ✓ | |
| [project_id ](variables.tf#L188 ) | ID of the project that will contain all the clusters. | < code > string</ code > | ✓ | |
| [vpc_config ](variables.tf#L200 ) | Shared VPC project and VPC details. | < code title = "object({ host_project_id = string vpc_self_link = string })" > object({…}) </ code > | ✓ | |
2022-10-25 14:18:50 -07:00
| [clusters ](variables.tf#L22 ) | Clusters configuration. Refer to the gke-cluster module for type details. | < code title = "map(object({ cluster_autoscaling = optional(any) description = optional(string) enable_addons = optional(any, { horizontal_pod_autoscaling = true, http_load_balancing = true }) enable_features = optional(any, { workload_identity = true }) issue_client_certificate = optional(bool, false) labels = optional(map(string)) location = string logging_config = optional(list(string), ["SYSTEM_COMPONENTS"]) maintenance_config = optional(any, { daily_window_start_time = "03:00" recurring_window = null maintenance_exclusion = [] }) max_pods_per_node = optional(number, 110) min_master_version = optional(string) monitoring_config = optional(object({ enable_components = optional(list(string), ["SYSTEM_COMPONENTS"]) managed_prometheus = optional(bool) })) node_locations = optional(list(string)) private_cluster_config = optional(any) release_channel = optional(string) vpc_config = object({ subnetwork = string network = optional(string) secondary_range_blocks = optional(object({ pods = string services = string })) secondary_range_names = optional(object({ pods = string services = string }), { pods = "pods", services = "services" }) master_authorized_ranges = optional(map(string)) master_ipv4_cidr_block = optional(string) }) }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [fleet_configmanagement_clusters ](variables.tf#L70 ) | Config management features enabled on specific sets of member clusters, in config name => [cluster name] format. | < code > map( list( string)) </ code > | | < code > {} </ code > |
| [fleet_configmanagement_templates ](variables.tf#L77 ) | Sets of config management configurations that can be applied to member clusters, in config name => {options} format. | < code title = "map(object({ binauthz = bool config_sync = object({ git = object({ gcp_service_account_email = string https_proxy = string policy_dir = string secret_type = string sync_branch = string sync_repo = string sync_rev = string sync_wait_secs = number }) prevent_drift = string source_format = string }) hierarchy_controller = object({ enable_hierarchical_resource_quota = bool enable_pod_tree_labels = bool }) policy_controller = object({ audit_interval_seconds = number exemptable_namespaces = list(string) log_denies_enabled = bool referential_rules_enabled = bool template_library_installed = bool }) version = string }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [fleet_features ](variables.tf#L112 ) | Enable and configue fleet features. Set to null to disable GKE Hub if fleet workload identity is not used. | < code title = "object({ appdevexperience = bool configmanagement = bool identityservice = bool multiclusteringress = string multiclusterservicediscovery = bool servicemesh = bool })" > object({…}) </ code > | | < code > null</ code > |
| [fleet_workload_identity ](variables.tf#L125 ) | Use Fleet Workload Identity for clusters. Enables GKE Hub if set to true. | < code > bool</ code > | | < code > false</ code > |
| [group_iam ](variables.tf#L137 ) | Project-level IAM bindings for groups. Use group emails as keys, list of roles as values. | < code > map( list( string)) </ code > | | < code > {} </ code > |
| [iam ](variables.tf#L144 ) | Project-level authoritative IAM bindings for users and service accounts in {ROLE => [MEMBERS]} format. | < code > map( list( string)) </ code > | | < code > {} </ code > |
| [labels ](variables.tf#L151 ) | Project-level labels. | < code > map( string) </ code > | | < code > {} </ code > |
| [nodepools ](variables.tf#L157 ) | Nodepools configuration. Refer to the gke-nodepool module for type details. | < code title = "map(map(object({ gke_version = optional(string) labels = optional(map(string), {}) max_pods_per_node = optional(number) name = optional(string) node_config = optional(any, { disk_type = "pd-balanced" }) node_count = optional(map(number), { initial = 1 }) node_locations = optional(list(string)) nodepool_config = optional(any) pod_range = optional(any) reservation_affinity = optional(any) service_account = optional(any) sole_tenant_nodegroup = optional(string) tags = optional(list(string)) taints = optional(list(any)) })))" > map( map( object({…}))) </ code > | | < code > {} </ code > |
2022-11-23 02:09:00 -08:00
| [project_services ](variables.tf#L193 ) | Additional project services to enable. | < code > list( string) </ code > | | < code > [] </ code > |
2022-07-29 06:09:57 -07:00
## Outputs
2022-10-12 03:59:36 -07:00
| name | description | sensitive |
|---|---|:---:|
2022-11-19 03:38:37 -08:00
| [cluster_ids ](outputs.tf#L17 ) | Cluster ids. | |
| [clusters ](outputs.tf#L24 ) | Cluster resources. | |
2022-10-12 03:59:36 -07:00
| [project_id ](outputs.tf#L29 ) | GKE project id. | |
2022-07-29 06:09:57 -07:00
<!-- END TFDOC -->