cloud-foundation-fabric/blueprints/gke/multitenant-fleet
Ludovico Magnocavallo 5453c585e0
FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052)
* rename stages

* remove support for external org billing, rename output files

* resman: make groups optional, align on new billing account variable

* bootstrap: multitenant outputs

* tenant bootstrap stage, untested

* fix folder name

* fix stage 0 output names

* optional creation for tag keys in organization module

* single tenant bootstrap minus tag

* rename output files, add tenant tag key

* fix organization module tag values output

* test skipping creation for tags in organization module

* single tenant bootstrap plan working

* multitenant bootstrap

* tfdoc

* fix check links error messages

* fix links

* tfdoc

* fix links

* rename fast tests, fix bootstrap tests

* multitenant stages have their own folder, simplify stage numbering

* stage renumbering

* wip

* rename tests

* exclude fast providers in fixture

* stage 0 tests

* stage 1 tests

* network stages tests

* stage tests

* tfdoc

* fix links

* tfdoc

* multitenant tests

* remove local files

* stage links command

* fix links script, TODO

* wip

* wip single tenant bootstrap

* working tenant bootstrap

* update gitignore

* remove local files

* tfdoc

* remove local files

* allow tests for tenant bootstrap stage

* tenant bootstrap proxies stage 1 tfvars

* stage 2 and 3 service accounts and IAM in tenant bootstrap

* wip

* wip

* wip

* drop multitenant bootstrap

* tfdoc

* add missing stage 2 SAs, fix org-level IAM condition

* wip

* wip

* optional tag value creation in organization module

* stage 1 working

* linting

* linting

* READMEs

* wip

* Make stage-links script work in old macos bash

* stage links command help

* fix output file names

* diagrams

* fix svg

* stage 0 skeleton and diagram

* test svg

* test svg

* test diagram

* diagram

* readme

* fix stage links script

* stage 0 readme

* README changes

* stage readmes

* fix outputs order

* fix link

* fix tests

* stage 1 test

* skip stage example

* boilerplate

* fix tftest skip

* default bootstrap stage log sinks to log buckets

* add logging to tenant bootstrap

* move iam variables out of tenant config

* fix cicd, reintroduce missing variable

* use optional in stage 1 cicd variable

* rename extras stage

* rename and move identity providers local, use optional for cicd variable

* tfdoc

* add support for wif pool and providers, ci/cd

* tfdoc

* fix links

* better handling of modules repository

* add missing role on logging project

* fix cicd pools in locals, test cicd

* fix workflow extension

* fix module source replacement

* allow tenant bootstrap cicd sa to impersonate resman sa

* tenant workflow templates fix for no providers file

* fix output files, push github workflow template to new repository

* remove try from outpout files

* align stage 1 cicd internals to stage 0

* tfdoc

* tests

* fix tests

* tests

* improve variable descriptions

* use optional in fast features

* actually create tenant log sinks, and allow the resman sa to do it

* test

* tests

* aaaand tests again

* fast features tenant override

* fast features tenant override

* fix wording

* add missing comment

* configure pf service accounts

* add missing comment

* tfdoc

* tests

* IAM docs

* update copyright

---------

Co-authored-by: Julio Castillo <jccb@google.com>
2023-02-04 15:00:45 +01:00
..
README.md FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
diagram.png Rename gke-serverless to gke and add test for fast gke stage 2022-09-12 09:56:25 +02:00
gke-clusters.tf Refactor GKE nodepool and blueprints (#875) 2022-10-12 12:59:36 +02:00
gke-hub.tf Refactor GKE nodepool and blueprints (#875) 2022-10-12 12:59:36 +02:00
gke-nodepools.tf Refactor GKE nodepool and blueprints (#875) 2022-10-12 12:59:36 +02:00
main.tf Refactor GKE nodepool and blueprints (#875) 2022-10-12 12:59:36 +02:00
outputs.tf Sort variables and outputs 2022-11-21 13:17:55 +01:00
variables.tf Normalize prefix handling in blueprints (#1003) 2022-11-23 11:09:00 +01:00

README.md

GKE Multitenant Blueprint

This blueprint presents an opinionated architecture to handle multiple homogeneous GKE clusters. The general idea behind this blueprint is to deploy a single project hosting multiple clusters leveraging several useful GKE features.

The pattern used in this design is useful, for blueprint, in cases where multiple clusters host/support the same workloads, such as in the case of a multi-regional deployment. Furthermore, combined with Anthos Config Sync and proper RBAC, this architecture can be used to host multiple tenants (e.g. teams, applications) sharing the clusters.

This blueprint is used as part of the FAST GKE stage but it can also be used independently if desired.

GKE multitenant

The overall architecture is based on the following design decisions:

Basic usage

The following example shows how to deploy two clusters and one node pool for each

locals {
  cluster_defaults = {
    private_cluster_config = {
      enable_private_endpoint = true
      master_global_access    = true
    }
  }
  subnet_self_links = {
    ew1 = "projects/prj-host/regions/europe-west1/subnetworks/gke-0"
    ew3 = "projects/prj-host/regions/europe-west3/subnetworks/gke-0"
  }
}

module "gke-fleet" {
  source             = "./fabric/blueprints/gke/multitenant-fleet/"
  project_id         = var.project_id
  billing_account_id = var.billing_account_id
  folder_id          = var.folder_id
  prefix             = "myprefix"
  group_iam = {
    "gke-admin@example.com" = [
      "roles/container.admin"
    ]
  }
  iam = {
    "roles/container.clusterAdmin" = [
      "cicd@my-cicd-project.iam.gserviceaccount.com"
    ]
  }
  clusters = {
    cluster-0 = {
      location               = "europe-west1"
      private_cluster_config = local.cluster_defaults.private_cluster_config
      vpc_config = {
        subnetwork             = local.subnet_self_links.ew1
        master_ipv4_cidr_block = "172.16.10.0/28"
      }
    }
    cluster-1 = {
      location               = "europe-west3"
      private_cluster_config = local.cluster_defaults.private_cluster_config
      vpc_config = {
        subnetwork             = local.subnet_self_links.ew3
        master_ipv4_cidr_block = "172.16.20.0/28"
      }
    }
  }
  nodepools = {
    cluster-0 = {
      nodepool-0 = {
        node_config = {
          disk_type    = "pd-balanced"
          machine_type = "n2-standard-4"
          spot         = true
        }
      }
    }
    cluster-1 = {
      nodepool-0 = {
        node_config = {
          disk_type    = "pd-balanced"
          machine_type = "n2-standard-4"
        }
      }
    }
  }
  vpc_config = {
    host_project_id = "my-host-project-id"
    vpc_self_link   = "projects/prj-host/global/networks/prod-0"
  }
}
# tftest modules=7 resources=27

GKE Fleet

This example deploys two clusters and configures several GKE Fleet features:

  • Enables multi-cluster ingress and sets the configuration cluster to be cluster-eu1.
  • Enables Multi-cluster services and assigns the required roles to its service accounts.
  • A default Config Management template is created with binary authorization, config sync enabled with a git repository, hierarchy controller, and policy controller.
  • The two clusters are configured to use the default Config Management template.
locals {
  subnet_self_links = {
    ew1 = "projects/prj-host/regions/europe-west1/subnetworks/gke-0"
    ew3 = "projects/prj-host/regions/europe-west3/subnetworks/gke-0"
  }
}

module "gke" {
  source             = "./fabric/blueprints/gke/multitenant-fleet/"
  project_id         = var.project_id
  billing_account_id = var.billing_account_id
  folder_id          = var.folder_id
  prefix             = "myprefix"
  clusters = {
    cluster-0 = {
      location = "europe-west1"
      vpc_config = {
        subnetwork = local.subnet_self_links.ew1
      }
    }
    cluster-1 = {
      location = "europe-west3"
      vpc_config = {
        subnetwork = local.subnet_self_links.ew3
      }
    }
  }
  nodepools = {
    cluster-0 = {
      nodepool-0 = {
        node_config = {
          disk_type    = "pd-balanced"
          machine_type = "n2-standard-4"
          spot         = true
        }
      }
    }
    cluster-1 = {
      nodepool-0 = {
        node_config = {
          disk_type    = "pd-balanced"
          machine_type = "n2-standard-4"
        }
      }
    }
  }
  fleet_features = {
    appdevexperience             = false
    configmanagement             = true
    identityservice              = true
    multiclusteringress          = "cluster-0"
    multiclusterservicediscovery = true
    servicemesh                  = true
  }
  fleet_workload_identity = true
  fleet_configmanagement_templates = {
    default = {
      binauthz = true
      config_sync = {
        git = {
          gcp_service_account_email = null
          https_proxy               = null
          policy_dir                = "configsync"
          secret_type               = "none"
          source_format             = "hierarchy"
          sync_branch               = "main"
          sync_repo                 = "https://github.com/myorg/myrepo"
          sync_rev                  = null
          sync_wait_secs            = null
        }
        prevent_drift = true
        source_format = "hierarchy"
      }
      hierarchy_controller = {
        enable_hierarchical_resource_quota = true
        enable_pod_tree_labels             = true
      }
      policy_controller = {
        audit_interval_seconds     = 30
        exemptable_namespaces      = ["kube-system"]
        log_denies_enabled         = true
        referential_rules_enabled  = true
        template_library_installed = true
      }
      version = "1.10.2"
    }
  }
  fleet_configmanagement_clusters = {
    default = ["cluster-0", "cluster-1"]
  }
  vpc_config = {
    host_project_id = "my-host-project-id"
    vpc_self_link   = "projects/prj-host/global/networks/prod-0"
  }
}

# tftest modules=8 resources=38

Files

name description modules
gke-clusters.tf GKE clusters. gke-cluster
gke-hub.tf GKE hub configuration. gke-hub
gke-nodepools.tf GKE nodepools. gke-nodepool
main.tf Project and usage dataset. bigquery-dataset · project
outputs.tf Output variables.
variables.tf Module variables.

Variables

name description type required default
billing_account_id Billing account id. string
folder_id Folder used for the GKE project in folders/nnnnnnnnnnn format. string
prefix Prefix used for resource names. string
project_id ID of the project that will contain all the clusters. string
vpc_config Shared VPC project and VPC details. object({…})
clusters Clusters configuration. Refer to the gke-cluster module for type details. map(object({…})) {}
fleet_configmanagement_clusters Config management features enabled on specific sets of member clusters, in config name => [cluster name] format. map(list(string)) {}
fleet_configmanagement_templates Sets of config management configurations that can be applied to member clusters, in config name => {options} format. map(object({…})) {}
fleet_features Enable and configue fleet features. Set to null to disable GKE Hub if fleet workload identity is not used. object({…}) null
fleet_workload_identity Use Fleet Workload Identity for clusters. Enables GKE Hub if set to true. bool false
group_iam Project-level IAM bindings for groups. Use group emails as keys, list of roles as values. map(list(string)) {}
iam Project-level authoritative IAM bindings for users and service accounts in {ROLE => [MEMBERS]} format. map(list(string)) {}
labels Project-level labels. map(string) {}
nodepools Nodepools configuration. Refer to the gke-nodepool module for type details. map(map(object({…}))) {}
project_services Additional project services to enable. list(string) []

Outputs

name description sensitive
cluster_ids Cluster ids.
clusters Cluster resources.
project_id GKE project id.