cloud-foundation-fabric/fast/stages/3-gke-multitenant/dev
Ludovico Magnocavallo 5453c585e0
FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052)
* rename stages

* remove support for external org billing, rename output files

* resman: make groups optional, align on new billing account variable

* bootstrap: multitenant outputs

* tenant bootstrap stage, untested

* fix folder name

* fix stage 0 output names

* optional creation for tag keys in organization module

* single tenant bootstrap minus tag

* rename output files, add tenant tag key

* fix organization module tag values output

* test skipping creation for tags in organization module

* single tenant bootstrap plan working

* multitenant bootstrap

* tfdoc

* fix check links error messages

* fix links

* tfdoc

* fix links

* rename fast tests, fix bootstrap tests

* multitenant stages have their own folder, simplify stage numbering

* stage renumbering

* wip

* rename tests

* exclude fast providers in fixture

* stage 0 tests

* stage 1 tests

* network stages tests

* stage tests

* tfdoc

* fix links

* tfdoc

* multitenant tests

* remove local files

* stage links command

* fix links script, TODO

* wip

* wip single tenant bootstrap

* working tenant bootstrap

* update gitignore

* remove local files

* tfdoc

* remove local files

* allow tests for tenant bootstrap stage

* tenant bootstrap proxies stage 1 tfvars

* stage 2 and 3 service accounts and IAM in tenant bootstrap

* wip

* wip

* wip

* drop multitenant bootstrap

* tfdoc

* add missing stage 2 SAs, fix org-level IAM condition

* wip

* wip

* optional tag value creation in organization module

* stage 1 working

* linting

* linting

* READMEs

* wip

* Make stage-links script work in old macos bash

* stage links command help

* fix output file names

* diagrams

* fix svg

* stage 0 skeleton and diagram

* test svg

* test svg

* test diagram

* diagram

* readme

* fix stage links script

* stage 0 readme

* README changes

* stage readmes

* fix outputs order

* fix link

* fix tests

* stage 1 test

* skip stage example

* boilerplate

* fix tftest skip

* default bootstrap stage log sinks to log buckets

* add logging to tenant bootstrap

* move iam variables out of tenant config

* fix cicd, reintroduce missing variable

* use optional in stage 1 cicd variable

* rename extras stage

* rename and move identity providers local, use optional for cicd variable

* tfdoc

* add support for wif pool and providers, ci/cd

* tfdoc

* fix links

* better handling of modules repository

* add missing role on logging project

* fix cicd pools in locals, test cicd

* fix workflow extension

* fix module source replacement

* allow tenant bootstrap cicd sa to impersonate resman sa

* tenant workflow templates fix for no providers file

* fix output files, push github workflow template to new repository

* remove try from outpout files

* align stage 1 cicd internals to stage 0

* tfdoc

* tests

* fix tests

* tests

* improve variable descriptions

* use optional in fast features

* actually create tenant log sinks, and allow the resman sa to do it

* test

* tests

* aaaand tests again

* fast features tenant override

* fast features tenant override

* fix wording

* add missing comment

* configure pf service accounts

* add missing comment

* tfdoc

* tests

* IAM docs

* update copyright

---------

Co-authored-by: Julio Castillo <jccb@google.com>
2023-02-04 15:00:45 +01:00
..
README.md FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
diagram.png FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
main.tf FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
outputs.tf FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
variables.tf FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00

README.md

GKE Multitenant

This stage allows creation and management of a fleet of GKE multitenant clusters, optionally leveraging GKE Hub to configure additional features. It's designed to be replicated once for every homogeneous set of clusters, either per environment or with more granularity as needed (e.g. teams or sets of teams sharing similar requirements).

The following diagram illustrates the high-level design of created resources, which can be adapted to specific requirements via variables:

GKE multitenant

Design overview and choices

The detailed architecture of the underlying resources is explained in the documentation of GKE multitenant module.

This stage creates a project containing and as many clusters and node pools as requested by the user through the variables explained below. The GKE clusters are created with the with the following setup:

How to run this stage

This stage is meant to be executed after "foundational stages" (i.e., stages 00-bootstrap, 01-resman, 02-networking (either VPN or NVA) and 02-security) have been run.

It's of course possible to run this stage in isolation, by making sure the architectural prerequisites are satisfied (e.g., networking), and that the Service Account running the stage is granted the roles/permissions below:

  • on the organization or network folder level
    • roles/xpnAdmin or a custom role which includes the following permissions
      • compute.organizations.enableXpnResource,
      • compute.organizations.disableXpnResource,
      • compute.subnetworks.setIamPolicy,
  • on each folder where projects are created
    • roles/logging.admin
    • roles/owner
    • roles/resourcemanager.folderAdmin
    • roles/resourcemanager.projectCreator
    • roles/xpnAdmin
  • on the host project for the Shared VPC
    • roles/browser
    • roles/compute.viewer
  • on the organization or billing account
    • roles/billing.admin

The VPC host project, VPC and subnets should already exist.

Providers configuration

If you're running this on top of FAST, you should run the following commands to create the providers file, and populate the required variables from the previous stage.

# Variable `outputs_location` is set to `~/fast-config` in stage 01-resman
$ cd fabric-fast/stages/03-gke-multitenant/dev
ln -s ~/fast-config/providers/03-gke-dev-providers.tf .

Variable configuration

There are two broad sets of variables you will need to fill in:

  • variables shared by other stages (organization id, billing account id, etc.), or derived from a resource managed by a different stage (folder id, automation project id, etc.)
  • variables specific to resources managed by this stage

Variables passed in from other stages

To avoid the tedious job of filling in the first group of variables with values derived from other stages' outputs, the same mechanism used above for the provider configuration can be used to leverage pre-configured .tfvars files.

If you configured a valid path for outputs_location in the bootstrap and networking stage, simply link the relevant terraform-*.auto.tfvars.json files from this stage's outputs folder (under the path you specified), where the * above is set to the name of the stage that produced it. For this stage, a single .tfvars file is available:

# Variable `outputs_location` is set to `~/fast-config`
ln -s ~/fast-config/tfvars/00-bootstrap.auto.tfvars.json .
ln -s ~/fast-config/tfvars/01-resman.auto.tfvars.json . 
ln -s ~/fast-config/tfvars/02-networking.auto.tfvars.json .

If you're not using FAST, refer to the Variables table at the bottom of this document for a full list of variables, their origin (e.g., a stage or specific to this one), and descriptions explaining their meaning.

Cluster and node pools

This stage is designed with multi-tenancy in mind, and the expectation is that GKE clusters will mostly share a common set of defaults. Variables are designed to support this approach for both clusters and node pools:

  • the cluster_default variable allows defining common defaults for all clusters
  • the clusters variable is used to declare the actual GKE clusters and allows overriding defaults on a per-cluster basis
  • the nodepool_defaults variable allows definining common defaults for all node pools
  • the nodepools variable is used to declare cluster node pools and allows overriding defaults on a per-cluster basis

There are two additional variables that influence cluster configuration: authenticator_security_group to configure Google Groups for RBAC, dns_domain to configure Cloud DNS for GKE.

Fleet management

Fleet management is entirely optional, and uses three separate variables:

  • fleet_features: specifies the GKE fleet features you want activate
  • fleet_configmanagement_templates: defines configuration templates for specific sets of features (Config Management currently)
  • fleet_configmanagement_clusters: specifies which clusters are managed by fleet features, and the optional Config Management template for each cluster
  • fleet_workload_identity: to enables optional centralized Workload Identity

Leave all these variables unset (or set to null) to disable fleet management.

Running Terraform

Once the provider and variable configuration is complete, you can apply this stage:

terraform init
terraform apply

Files

name description modules resources
main.tf GKE multitenant for development environment. multitenant-fleet
outputs.tf Output variables. google_storage_bucket_object · local_file
variables.tf Module variables.

Variables

name description type required default producer
automation Automation resources created by the bootstrap stage. object({…}) 0-bootstrap
billing_account Billing account id. If billing account is not part of the same org set is_org_level to false. object({…}) 0-bootstrap
folder_ids Folders to be used for the networking resources in folders/nnnnnnnnnnn format. If null, folder will be created. object({…}) 1-resman
host_project_ids Host project for the shared VPC. object({…}) 2-networking
prefix Prefix used for resources that need unique names. string
vpc_self_links Self link for the shared VPC. object({…}) 2-networking
clusters Clusters configuration. Refer to the gke-cluster module for type details. map(object({…})) {}
fleet_configmanagement_clusters Config management features enabled on specific sets of member clusters, in config name => [cluster name] format. map(list(string)) {}
fleet_configmanagement_templates Sets of config management configurations that can be applied to member clusters, in config name => {options} format. map(object({…})) {}
fleet_features Enable and configue fleet features. Set to null to disable GKE Hub if fleet workload identity is not used. object({…}) null
fleet_workload_identity Use Fleet Workload Identity for clusters. Enables GKE Hub if set to true. bool false
group_iam Project-level authoritative IAM bindings for groups in {GROUP_EMAIL => [ROLES]} format. Use group emails as keys, list of roles as values. map(list(string)) {}
iam Project-level authoritative IAM bindings for users and service accounts in {ROLE => [MEMBERS]} format. map(list(string)) {}
labels Project-level labels. map(string) {}
nodepools Nodepools configuration. Refer to the gke-nodepool module for type details. map(map(object({…}))) {}
outputs_location Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable. string null
project_services Additional project services to enable. list(string) []

Outputs

name description sensitive consumers
cluster_ids Cluster ids.
clusters Cluster resources.
project_id GKE project id.