cloud-foundation-fabric/fast
Ludovico Magnocavallo 5453c585e0
FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052)
* rename stages

* remove support for external org billing, rename output files

* resman: make groups optional, align on new billing account variable

* bootstrap: multitenant outputs

* tenant bootstrap stage, untested

* fix folder name

* fix stage 0 output names

* optional creation for tag keys in organization module

* single tenant bootstrap minus tag

* rename output files, add tenant tag key

* fix organization module tag values output

* test skipping creation for tags in organization module

* single tenant bootstrap plan working

* multitenant bootstrap

* tfdoc

* fix check links error messages

* fix links

* tfdoc

* fix links

* rename fast tests, fix bootstrap tests

* multitenant stages have their own folder, simplify stage numbering

* stage renumbering

* wip

* rename tests

* exclude fast providers in fixture

* stage 0 tests

* stage 1 tests

* network stages tests

* stage tests

* tfdoc

* fix links

* tfdoc

* multitenant tests

* remove local files

* stage links command

* fix links script, TODO

* wip

* wip single tenant bootstrap

* working tenant bootstrap

* update gitignore

* remove local files

* tfdoc

* remove local files

* allow tests for tenant bootstrap stage

* tenant bootstrap proxies stage 1 tfvars

* stage 2 and 3 service accounts and IAM in tenant bootstrap

* wip

* wip

* wip

* drop multitenant bootstrap

* tfdoc

* add missing stage 2 SAs, fix org-level IAM condition

* wip

* wip

* optional tag value creation in organization module

* stage 1 working

* linting

* linting

* READMEs

* wip

* Make stage-links script work in old macos bash

* stage links command help

* fix output file names

* diagrams

* fix svg

* stage 0 skeleton and diagram

* test svg

* test svg

* test diagram

* diagram

* readme

* fix stage links script

* stage 0 readme

* README changes

* stage readmes

* fix outputs order

* fix link

* fix tests

* stage 1 test

* skip stage example

* boilerplate

* fix tftest skip

* default bootstrap stage log sinks to log buckets

* add logging to tenant bootstrap

* move iam variables out of tenant config

* fix cicd, reintroduce missing variable

* use optional in stage 1 cicd variable

* rename extras stage

* rename and move identity providers local, use optional for cicd variable

* tfdoc

* add support for wif pool and providers, ci/cd

* tfdoc

* fix links

* better handling of modules repository

* add missing role on logging project

* fix cicd pools in locals, test cicd

* fix workflow extension

* fix module source replacement

* allow tenant bootstrap cicd sa to impersonate resman sa

* tenant workflow templates fix for no providers file

* fix output files, push github workflow template to new repository

* remove try from outpout files

* align stage 1 cicd internals to stage 0

* tfdoc

* tests

* fix tests

* tests

* improve variable descriptions

* use optional in fast features

* actually create tenant log sinks, and allow the resman sa to do it

* test

* tests

* aaaand tests again

* fast features tenant override

* fast features tenant override

* fix wording

* add missing comment

* configure pf service accounts

* add missing comment

* tfdoc

* tests

* IAM docs

* update copyright

---------

Co-authored-by: Julio Castillo <jccb@google.com>
2023-02-04 15:00:45 +01:00
..
assets FAST: fixes to GitHub workflow and 02/net outputs (#976) 2022-11-15 08:48:32 +01:00
extras FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
stages FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
stages-multitenant FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
README.md FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
stage-links.sh FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
stages.png FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
stages.svg FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00

README.md

Fabric FAST

Setting up a production-ready GCP organization is often a time-consuming process. Fabric FAST aims to speed up this process via two complementary goals. On the one hand, FAST provides a design of a GCP organization that includes the typical elements required by enterprise customers. Secondly, we provide a reference implementation of the FAST design using Terraform.

Note that while our implementation is necessarily influenced (and constrained) by the way Terraform works, the design we put forward only refers to GCP constructs and features. In other words, while we use Terraform for our reference implementation, in theory, the FAST design can be implemented using any other tool (e.g., Pulumi, bash scripts, or even calling the relevant APIs directly).

Fabric FAST comes from engineers in Google Cloud's Professional Services Organization, with a combined experience of decades solving the typical technical problems faced by GCP customers. While every GCP user has specific requirements, many common issues arise repeatedly. Solving those issues correctly from the beginning is key to a robust and scalable GCP setup. It's those common issues and their solutions that Fabric FAST aims to collect and present coherently.

Fabric FAST was initially conceived to help enterprises quickly set up a GCP organization following battle-tested and widely-used patterns. Despite its origin in enterprise environments, FAST includes many customization points making it an ideal blueprint for organizations of all sizes, ranging from startups to the largest companies.

Guiding principles

Contracts and stages

FAST uses the concept of stages, which individually perform precise tasks but taken together build a functional, ready-to-use GCP organization. More importantly, stages are modeled around the security boundaries that typically appear in mature organizations. This arrangement allows delegating ownership of each stage to the team responsible for the types of resources it manages. For example, as its name suggests, the networking stage sets up all the networking elements and is usually the responsibility of a dedicated networking team within the organization.

From the perspective of FAST's overall design, stages also work as contacts or interfaces, defining a set of pre-requisites and inputs required to perform their designed task and generating outputs needed by other stages lower in the chain. The diagram below shows the relationships between stages.

Stages diagram

Please refer to the stages section for further details on each stage. For details on tenant-level stages which introduce a deeper level of autonomy via nested FAST setups rooted in a top-level folder, refer to the multitenant stages section below.

Security-first design

Security was, from the beginning, one of the most critical elements in the design of Fabric FAST. Many of FAST's design decisions aim to build the foundations of a secure organization. In fact, the first two stages deal mainly with the organization-wide security setup.

FAST also aims to minimize the number of permissions granted to principals according to the security-first approach previously mentioned. We achieve this through the meticulous use of groups, service accounts, custom roles, and Cloud IAM Conditions, among other things.

Extensive use of factories

A resource factory consumes a simple representation of a resource (e.g., in YAML) and deploys it (e.g., using Terraform). Used correctly, factories can help decrease the management overhead of large-scale infrastructure deployments. See "Resource Factories: A descriptive approach to Terraform" for more details and the rationale behind factories.

FAST uses YAML-based factories to deploy subnets and firewall rules and, as its name suggests, in the project factory stage.

CI/CD

One of our objectives with FAST is to provide a lightweight reference design for the IaC repositories, and a built-in implementation for running our code in automated pipelines. Our CI/CD approach leverages Workload Identity Federation, and provides sample workflow configurations for several major providers. Refer to the CI/CD section in the bootstrap stage for more details. We also provide separate optional small stages to help you configure your CI/CD provider.

Multitenant organizations

FAST has built-in support for complex multitenant organizations, where each tenant has complete control over a separate hierarchy rooted in a top-level folder. This approach is particularly suited for large enterprises or governments, where country-level subsidiaries or government agencies have a wide degree of autonomy within a shared GCP organization managed by a central entity.

FAST implements multitenancy via dedicated stages for tenant-level bootstrap and resource management, which configure separate hierarchies within the organization rooted in top-level folders, so that subsequent FAST stages (networking, security, data, etc.) can be used directly for each tenant. The diagram below shows the relationships between organization-level and tenant-level stages.

Stages diagram

Implementation

There are many decisions and tasks required to convert an empty GCP organization to one that can host production environments safely. Arguably, FAST could expose those decisions as configuration options to allow for different outcomes. However, supporting all the possible combinations is almost impossible and leads to code which is hard to maintain efficiently.

Instead, FAST aims to leverage different reference architectures as “pluggable modules”, and then have a small set of variables covering only the essential options of each stage. While we could expose every option of the underlying resources as stage-level variables, we prefer to provide the basic implementation and encourage users to modify the codebase if additional (or different) behavior is needed.

Since we expect users to customize FAST to their specific needs, we strive to make its code easy to understand and modify. Root-level modules (i.e., stages) should be low in complexity, which among other things, means:

  • Code should avoid magic and be as explicit as possible.
  • We hide advanced features and complexity behind modules.
  • We prefer as little indirection as possible.
  • We favor flat over nested.

We also recognize that FAST users don't need all of its features. Therefore, you don't need to use our project factory or our GKE implementation if you don't want to. Instead, remove those stages or pieces of code and keep what suits you.

Those familiar with Python will note that FAST follows many of the maxims in the Zen of Python.

Roadmap

Besides the features already described, FAST also includes:

  • Stage to deploy environment-specific multitenant GKE clusters following Google's best practices
  • Stage to deploy a fully featured data platform
  • Reference implementation to use FAST in CI/CD pipelines
  • Static policy enforcement (planned)