# GKE Multitenant This stage allows creation and management of a fleet of GKE multitenant clusters, optionally leveraging GKE Hub to configure additional features. It's designed to be replicated once for every homogeneous set of clusters, either per environment or with more granularity as needed (e.g. teams or sets of teams sharing similar requirements). The following diagram illustrates the high-level design of created resources, which can be adapted to specific requirements via variables:

GKE multitenant

## Design overview and choices > The detailed architecture of the underlying resources is explained in the documentation of [GKE multitenant module](../../../../blueprints/gke/multitenant-fleet/README.md). This stage creates a project containing and as many clusters and node pools as requested by the user through the [variables](#variables) explained below. The GKE clusters are created with the with the following setup: - All clusters are assumed to be [private](https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters), therefore only [VPC-native clusters](https://cloud.google.com/kubernetes-engine/docs/concepts/alias-ips) are supported. - Logging and monitoring configured to use Cloud Operations for system components and user workloads. - [GKE metering](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-usage-metering) enabled by default and stored in a bigquery dataset created within the project. - Optional [GKE Fleet](https://cloud.google.com/kubernetes-engine/docs/fleets-overview) support with the possibility to enable any of the following features: - [Fleet workload identity](https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity) - [Anthos Config Management](https://cloud.google.com/anthos-config-management/docs/overview) - [Anthos Service Mesh](https://cloud.google.com/service-mesh/docs/overview) - [Anthos Identity Service](https://cloud.google.com/anthos/identity/setup/fleet) - [Multi-cluster services](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-services) - [Multi-cluster ingress](https://cloud.google.com/kubernetes-engine/docs/concepts/multi-cluster-ingress). - Support for [Config Sync](https://cloud.google.com/anthos-config-management/docs/config-sync-overview), [Hierarchy Controller](https://cloud.google.com/anthos-config-management/docs/concepts/hierarchy-controller), and [Policy Controller](https://cloud.google.com/anthos-config-management/docs/concepts/policy-controller) when using Anthos Config Management. - [Groups for GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/google-groups-rbac) can be enabled to facilitate the creation of flexible RBAC policies referencing group principals. - Support for [application layer secret encryption](https://cloud.google.com/kubernetes-engine/docs/how-to/encrypting-secrets). - Support to customize peering configuration of the control plane VPC (e.g. to import/export routes to the peered network) - Some features are enabled by default in all clusters: - [Intranode visibility](https://cloud.google.com/kubernetes-engine/docs/how-to/intranode-visibility) - [Dataplane v2](https://cloud.google.com/kubernetes-engine/docs/concepts/dataplane-v2) - [Shielded GKE nodes](https://cloud.google.com/kubernetes-engine/docs/how-to/shielded-gke-nodes) - [Workload identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) - [Node local DNS cache](https://cloud.google.com/kubernetes-engine/docs/how-to/nodelocal-dns-cache) - [Use of the GCE persistent disk CSI driver](https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver) - Node [auto-upgrade](https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-upgrades) and [auto-repair](https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-repair) for all node pools ## How to run this stage This stage is meant to be executed after the FAST "foundational" stages: bootstrap, resource management, security and networking stages. It's of course possible to run this stage in isolation, refer to the *[Running in isolation](#running-in-isolation)* section below for details. Before running this stage, you need to make sure you have the correct credentials and permissions, and localize variables by assigning values that match your configuration. ### Provider and Terraform variables As all other FAST stages, the [mechanism used to pass variable values and pre-built provider files from one stage to the next](../../0-bootstrap/README.md#output-files-and-cross-stage-variables) is also leveraged here. The commands to link or copy the provider and terraform variable files can be easily derived from the `stage-links.sh` script in the FAST root folder, passing it a single argument with the local output files folder (if configured) or the GCS output bucket in the automation project (derived from stage 0 outputs). The following examples demonstrate both cases, and the resulting commands that then need to be copy/pasted and run. ```bash ../../../stage-links.sh ~/fast-config # copy and paste the following commands for '3-gke-multitenant' ln -s /home/ludomagno/fast-config/providers/3-gke-multitenant-providers.tf ./ ln -s /home/ludomagno/fast-config/tfvars/0-globals.auto.tfvars.json ./ ln -s /home/ludomagno/fast-config/tfvars/0-bootstrap.auto.tfvars.json ./ ln -s /home/ludomagno/fast-config/tfvars/1-resman.auto.tfvars.json ./ ln -s /home/ludomagno/fast-config/tfvars/2-networking.auto.tfvars.json ./ ln -s /home/ludomagno/fast-config/tfvars/2-security.auto.tfvars.json ./ ``` ```bash ../../../stage-links.sh gs://xxx-prod-iac-core-outputs-0 # copy and paste the following commands for '3-gke-multitenant' gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/providers/3-gke-multitenant-providers.tf ./ gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/0-globals.auto.tfvars.json ./ gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/0-bootstrap.auto.tfvars.json ./ gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/1-resman.auto.tfvars.json ./ gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/2-networking.auto.tfvars.json ./ gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/2-security.auto.tfvars.json ./ ``` ### Impersonating the automation service account The preconfigured provider file uses impersonation to run with this stage's automation service account's credentials. The `gcp-devops` and `organization-admins` groups have the necessary IAM bindings in place to do that, so make sure the current user is a member of one of those groups. ### Variable configuration Variables in this stage -- like most other FAST stages -- are broadly divided into three separate sets: - variables which refer to global values for the whole organization (org id, billing account id, prefix, etc.), which are pre-populated via the `0-globals.auto.tfvars.json` file linked or copied above - variables which refer to resources managed by previous stage, which are prepopulated here via the `*.auto.tfvars.json` files linked or copied above - and finally variables that optionally control this stage's behaviour and customizations, and can to be set in a custom `terraform.tfvars` file The latter set is explained in the [Customization](#customizations) sections below, and the full list can be found in the [Variables](#variables) table at the bottom of this document. ### Running the stage Once provider and variable values are in place and the correct user is configured, the stage can be run: ```bash terraform init terraform apply ``` ### Running in isolation It's of course possible to run this stage in isolation, by making sure the architectural prerequisites are satisfied (e.g., networking), and that the Service Account running the stage is granted the roles/permissions below: - on the organization or network folder level - `roles/xpnAdmin` or a custom role which includes the following permissions - `compute.organizations.enableXpnResource`, - `compute.organizations.disableXpnResource`, - `compute.subnetworks.setIamPolicy`, - on each folder where projects are created - `roles/logging.admin` - `roles/owner` - `roles/resourcemanager.folderAdmin` - `roles/resourcemanager.projectCreator` - `roles/xpnAdmin` - on the host project for the Shared VPC - `roles/browser` - `roles/compute.viewer` - on the organization or billing account - `roles/billing.admin` The VPC host project, VPC and subnets should already exist. ## Customizations ### Cluster and node pools This stage is designed with multi-tenancy in mind, and the expectation is that GKE clusters will mostly share a common set of defaults. Variables are designed to support this approach for both clusters and node pools: - the `cluster_default` variable allows defining common defaults for all clusters - the `clusters` variable is used to declare the actual GKE clusters and allows overriding defaults on a per-cluster basis - the `nodepool_defaults` variable allows defining common defaults for all node pools - the `nodepools` variable is used to declare cluster node pools and allows overriding defaults on a per-cluster basis There are two additional variables that influence cluster configuration: `authenticator_security_group` to configure [Google Groups for RBAC](https://cloud.google.com/kubernetes-engine/docs/how-to/google-groups-rbac), `dns_domain` to configure [Cloud DNS for GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/cloud-dns). ### Fleet management Fleet management is entirely optional, and uses three separate variables: - `fleet_features`: specifies the [GKE fleet](https://cloud.google.com/anthos/fleet-management/docs/fleet-concepts#fleet-enabled-components) features you want activate - `fleet_configmanagement_templates`: defines configuration templates for specific sets of features ([Config Management](https://cloud.google.com/anthos-config-management/docs/how-to/install-anthos-config-management) currently) - `fleet_configmanagement_clusters`: specifies which clusters are managed by fleet features, and the optional Config Management template for each cluster - `fleet_workload_identity`: to enables optional centralized [Workload Identity](https://cloud.google.com/anthos/fleet-management/docs/use-workload-identity) Leave all these variables unset (or set to `null`) to disable fleet management. ## Files | name | description | modules | resources | |---|---|---|---| | [main.tf](./main.tf) | GKE multitenant for development environment. | multitenant-fleet | | | [outputs.tf](./outputs.tf) | Output variables. | | google_storage_bucket_object · local_file | | [variables.tf](./variables.tf) | Module variables. | | | ## Variables | name | description | type | required | default | producer | |---|---|:---:|:---:|:---:|:---:| | [automation](variables.tf#L21) | Automation resources created by the bootstrap stage. | object({…}) | ✓ | | 0-bootstrap | | [billing_account](variables.tf#L29) | Billing account id. If billing account is not part of the same org set `is_org_level` to false. | object({…}) | ✓ | | 0-bootstrap | | [folder_ids](variables.tf#L174) | Folders to be used for the networking resources in folders/nnnnnnnnnnn format. If null, folder will be created. | object({…}) | ✓ | | 1-resman | | [host_project_ids](variables.tf#L189) | Host project for the shared VPC. | object({…}) | ✓ | | 2-networking | | [prefix](variables.tf#L242) | Prefix used for resources that need unique names. | string | ✓ | | | | [vpc_self_links](variables.tf#L258) | Self link for the shared VPC. | object({…}) | ✓ | | 2-networking | | [clusters](variables.tf#L42) | Clusters configuration. Refer to the gke-cluster-standard module for type details. | map(object({…})) | | {} | | | [fleet_configmanagement_clusters](variables.tf#L111) | Config management features enabled on specific sets of member clusters, in config name => [cluster name] format. | map(list(string)) | | {} | | | [fleet_configmanagement_templates](variables.tf#L119) | Sets of config management configurations that can be applied to member clusters, in config name => {options} format. | map(object({…})) | | {} | | | [fleet_features](variables.tf#L154) | Enable and configure fleet features. Set to null to disable GKE Hub if fleet workload identity is not used. | object({…}) | | null | | | [fleet_workload_identity](variables.tf#L167) | Use Fleet Workload Identity for clusters. Enables GKE Hub if set to true. | bool | | false | | | [group_iam](variables.tf#L182) | Project-level authoritative IAM bindings for groups in {GROUP_EMAIL => [ROLES]} format. Use group emails as keys, list of roles as values. | map(list(string)) | | {} | | | [iam](variables.tf#L197) | Project-level authoritative IAM bindings for users and service accounts in {ROLE => [MEMBERS]} format. | map(list(string)) | | {} | | | [labels](variables.tf#L204) | Project-level labels. | map(string) | | {} | | | [nodepools](variables.tf#L210) | Nodepools configuration. Refer to the gke-nodepool module for type details. | map(map(object({…}))) | | {} | | | [outputs_location](variables.tf#L236) | Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable. | string | | null | | | [project_services](variables.tf#L251) | Additional project services to enable. | list(string) | | [] | | ## Outputs | name | description | sensitive | consumers | |---|---|:---:|---| | [cluster_ids](outputs.tf#L57) | Cluster ids. | | | | [clusters](outputs.tf#L62) | Cluster resources. | ✓ | | | [project_id](outputs.tf#L68) | GKE project id. | | |