cloud-foundation-fabric/CONTRIBUTING.md

60 KiB

Contributing

Contributors are the engine that keeps Fabric alive so if you were or are planning to be active in this repo, a huge thanks from all of us for dedicating your time!!! If you have free time and are looking for suggestions on what to work on, our issue tracker generally has a few pending feature requests: you are welcome to send a PR for any of them.

Table of Contents

I just found a bug / have a feature request

Feel free to open a new issue if you find something that does not work, need clarifications on usage (a good incentive for us to improve docs!), or have a feature request.

If you feel like tackling it directly via a PR check out the quick developer workflow below, we always welcome new contributors!

Quick developer workflow

For small or first time issues the simplest way is to fork our repo, but if you are a regular contributor or are developing a large addition, ask us to be added directly to the repo so you can work on local branches, it makes life easier for both of us!

Fork or clone and go through the usual edit/add/commit cycle until your code is ready.

git checkout master
git pull
git checkout -b username/my-feature
git add -A
git commit -m "changed ham so that spam and eggs"

Once you are satisfied with your changes, make sure Terraform linting is ok. If you changed Python code you need to conform to our standard linting, see the last section for details on how to configure it.

terraform fmt -recursive

If you changed variables or outputs you need to regenerate the relevant tables in the documentation via our tfdoc tool. For help installing Python requirements and setting up virtualenv see the last section.

# point tfdoc to the folder containing your changed variables and outputs
./tools/tfdoc.py modules/my-changed-module

If the folder contains files which won't be pushed to the repository, for example provider files used in FAST stages, you need to change the command above to specifically exclude them from tfdoc generated output.

# exclude a local provider file from the generated documentation
./tools/tfdoc.py -x 00-bootstrap-providers.tf fast/stages/00-bootstrap

Run tests to make sure your changes work and you didn't accidentally break their consumers. Again, if you need help setting up the Python virtualenv and requirements or want to run specific test subsets see the last section.

pytest tests

Keep in mind we also test documentation examples so even if your PR only changes README files, you need to run a subset of tests.

# use if you only changed README examples, ignore if you ran all tests
pytest tests/examples

Once everything looks good, add/commit any pending changes then push and open a PR on GitHub. We typically enforce a set of design and style conventions, so please make sure you have familiarized yourself with the following sections and implemented them in your code, to avoid lengthy review cycles.

HINT: if you work on high-latency or low-bandwidth network use TF_PLUGIN_CACHE_DIR environment variable to dramatically speed up the tests, for example:

TF_PLUGIN_CACHE_DIR=/tmp/tfcache pytest tests

Or just add into your terraformrc:

plugin_cache_dir = "$HOME/.terraform.d/plugin-cache"

Developer's handbook

Over the years we have assembled a specific set of design principles and style conventions that allow for better readability and make understanding and changing code more predictable.

We expect your code to conform to those principles in both design and style, so that it integrates well with the rest of Fabric/FAST without having to go through long and painful PR cycles before it can be merged.

The sections below describe our design approach and style conventions, with specific mentions of FAST stages where their larger scope requires additional rules.

The Zen of Fabric

While our approach to Terraform is constantly evolving as we meet new requirements or language features are released, there's a small set of core principles which influences all our code, and that you are expected to make yours before sending a PR.

Borrowing the format from the Zen of Python here is our fundamental design philosophy:

  • always design for composition as our objective is to support whole infrastructures
  • encapsulate logical entities that match single functional units in modules or stages to improve readability and composition (don't design by product or feature)
  • adopt common interfaces across modules and design small variable spaces to decrease cognitive overload
  • write flat and concise code which is easy to clone, evolve and troubleshoot independently
  • don't aim at covering all use cases but make default ones simple and complex ones possible, to support rapid prototyping and specific production requirements
  • when in doubt always prefer code readability for simplified maintenance and to achieve IaC as documentation
  • don't be too opinionated in resource configurations as this makes it harder for users to implement their exact requirements
  • avoid side effects and never rely on external tools to eliminate friction and reduce fragility

The following sections describe how these principles are applied in practice, with actual code examples from modules and FAST stages.

Design principles in action

This section illustrates how our design principles translate into actual code. We consider this a living document that can be updated at any time.

Design by logical entity instead of product/feature

“The most fundamental problem in computer science is problem decomposition: how to take a complex problem and divide it up into pieces that can be solved independently.”

— John Ousterhout in "A Philosophy of Software Design"

This is probably our oldest and most important design principle. When designing a module or a FAST stage we look at its domain from a functional point of view: what is the subset of resources (or modules for FAST) that fully describes one entity and allows encapsulating its full configuration?

It's a radically different approach from designing by product or feature, where boundaries are drawn around a single GCP functionality.

Our modules -- and in a much broader sense our FAST stages -- are all designed to encapsulate a set of functionally related resources and their configurations. This achieves two main goals: to dramatically improve readability by using a single block of code -- a module declaration -- for a logical component; and to allow consumers to rely on outputs without having to worry about the dependency chain, as all related resources and configurations are managed internally in the module or stage.

Taking IAM as an example, we do not offer a single module to centrally manage role bindings (the product/feature based approach) but implement it instead in each module (the logical component approach) since:

  • users understand IAM as an integral part of each resource, having bindings in the same context improves readability and speeds up changes
  • resources are not fully usable before their relevant IAM bindings have been applied, encapsulating those allows referencing fully configured resources from the outside
  • managing resources and their bindings in a single module makes code more portable with fewer dependencies

The most extensive examples of this approach are our resource management modules. For instance, the project module encapsulates resources for project, project services, logging sinks, project-level IAM bindings, Shared VPC enablement and attachment, metrics scope, budget alerts, organization policies, and several other functionalities in a single place.

A typical project module code block is easy to read as it centralizes all the information in one place, and allows consumers referencing it to trust that it will behave as a fully configured unit.

module "project" {
  source          = "./modules/project"
  parent          = "folders/1234567890"
  name            = "project-example"
  billing_account = local.billing_account
  services        = [
    "container.googleapis.com",
    "stackdriver.googleapis.com",
    "storage.googleapis.com",
  ]
  iam = {
    "roles/viewer" = ["user1:one@example.org"]
  }
  policy_boolean = {
    "constraints/compute.disableGuestAttributesAccess" = true
    "constraints/compute.skipDefaultNetworkCreation" = true
  }
  service_encryption_key_ids = {
    compute = [local.kms.europe-west1.compute]
    storage = [local.kms.europe.gcs]
  }
  shared_vpc_service_config = {
    attach       = true
    host_project = "project-host"
  }
}

Define and reuse stable interfaces

Our second oldest and most important principle also stems from the need to design for composition: whenever the same functionality is implemented in different modules, a stable variables interface should be designed and reused identically across them.

Adopting the same interface across different modules reduces cognitive overload on users, improves readability by turning configurations into repeated patterns, and makes code more robust by using the same implementation everywhere.

Taking IAM again as an example, every module that allows management of IAM bindings conforms to the same interface.

module "project" {
  source          = "./modules/project"
  name            = "project-example"
  iam = {
    "roles/viewer" = ["user1:one@example.org"]
  }
}

module "pubsub" {
  source     = "./modules/pubsub"
  project_id = module.project.project_id
  name       = "my-topic"
  iam = {
    "roles/pubsub.viewer"     = ["group:foo@example.com"]
    "roles/pubsub.subscriber" = ["user:user1@example.com"]
  }
}

We have several such interfaces defined for IAM, log sinks, organizational policies, etc. and always reuse them across modules.

Design interfaces to support actual usage

“When developing a module, look for opportunities to take a little bit of extra suffering upon yourself in order to reduce the suffering of your users.”

“Providing choice is good, but interfaces should be designed to make the common case as simple as possible”

— John Ousterhout in "A Philosophy of Software Design"

Variables should not simply map to the underlying resource attributes, but their interfaces should be designed to match common use cases to reduce friction and offer the highest possible degree of legibility.

This translates into different practical approaches:

  • multiple sets of interfaces that support the same feature which are then internally combined into the same resources (e.g. IAM groups below)
  • functional interfaces that don't map 1:1 to resources (e.g. project service identities below)
  • crossing the project boundary to configure resources which support key logical functionality (e.g shared VPC below)

The most pervasive example of the first practical approach above is IAM: given its importance we implement both a role-based interface and a group-based interface, which is less verbose and makes it easy to understand at a glance the roles assigned to a specific group. Both interfaces provide data that is then internally combined to drive the same IAM binding resource, and are available for authoritative and additive roles.

module "project" {
  source          = "./modules/project"
  name            = "project-example"
  group_iam = {
    "roles/editor" = [
      "group:foo@example.com"
    ]
  }
  iam = {
    "roles/editor" = [
      "serviceAccount:${module.project.service_accounts.cloud_services}"
    ]
  }
}

Another practical consequence of this design principle is supporting common use cases via interfaces that don't directly map to a resource. The example below shows support for enabling service identities access to KMS keys used for CMEK encryption in the project module: there's no specific resource for service identities, but it's such a frequent use case that we support them directly in the module.

module "project" {
  source          = "./modules/project"
  name            = "project-example"
  service_encryption_key_ids = {
    compute = [local.kms.europe-west1.compute]
    storage = [local.kms.europe.gcs]
  }
}

The principle also applies to output interfaces: it's often useful to assemble specific pieces of information in the module itself, as this improves overall code legibility. For example, we also support service identities in the project module's outputs (used here self-referentially).

module "project" {
  source          = "./modules/project"
  name            = "project-example"
  iam = {
    "roles/editor" = [
      "serviceAccount:${module.project.service_accounts.cloud_services}"
    ]
  }
}

And the last practical application of the principle which we show here is crossing project boundaries to support specific functionality, as in the two examples below that support Shared VPC in the project module.

Host-based management, typically used where absolute control over service project attachment is required:

module "project" {
  source          = "./modules/project"
  name            = "project-host"
  shared_vpc_host_config = {
    enabled          = true
    service_projects = [
      "prj-1", "prj-2"
    ]
  }
}

Service-based attachment, more common and typically used to delegate service project attachment at project creation, possibly from a project factory.

module "project" {
  source          = "./modules/project"
  name            = "prj-1"
  shared_vpc_service_config = {
    attach       = true
    host_project = "project-host"
  }
}

Design compact variable spaces

"The best modules are those whose interfaces are much simpler than their implementations"

— John Ousterhout in "A Philosophy of Software Design"

Designing variable spaces is one of the most complex aspects to get right, as they are the main entry point through which users consume modules, examples and FAST stages. We always strive to design small variable spaces by leveraging objects and implementing defaults so that users can quickly produce highly readable code.

One of many examples of this approach comes from disk support in the compute-vm module, where preset defaults allow quick VM management with very few lines of code, and optional variables allow progressively expanding the code when more control is needed.

This brings up an instance with a 10GB PD balanced boot disk using a Debian 11 image, and is generally a good default when a quick VM is needed for experimentation.

module "simple-vm-example" {
  source     = "./modules/compute-vm"
  project_id = var.project_id
  zone     = "europe-west1-b"
  name       = "test"
}

Changing boot disks defaults is of course possible, and adds some verbosity to the simple example above as you need to specify all of them.

module "simple-vm-example" {
  source     = "./modules/compute-vm"
  project_id = var.project_id
  zone       = "europe-west1-b"
  name       = "test"
  boot_disk  = {
    initialize_params = {
      image = "projects/debian-cloud/global/images/family/cos-97-lts"
      type = "pd-balanced"
      size = 10
    }
  }
}

Where this results in objects with too many attributes, we usually split attributes between required and optional by adding a second level, as in this example where VM attached_disks[].options contains less used attributes and can be set to null if not needed.

module "simple-vm-example" {
  source     = "./modules/compute-vm"
  project_id = var.project_id
  zone       = "europe-west1-b"
  name       = "test"
  attached_disks = [
    { name="data", size=10, source=null, source_type=null, options=null }
  ]
}

Whenever options are not passed like in the example above, we typically infer their values from a defaults variable which can be customized when using defaults across several items. In the following example instead of specifying regional PD options for both disks, we set their options to null and change the defaults used for all disks.

module "simple-vm-example" {
  source     = "./modules/compute-vm"
  project_id = var.project_id
  zone       = "europe-west1-b"
  name       = "test"
  attached_disk_defaults = {
    auto_delete = false
    mode = "READ_WRITE"
    replica_zone = "europe-west1-c"
    type = "pd-balanced"
  }
  attached_disks = [
    { name="data1", size=10, source=null, source_type=null, options=null },
    { name="data2", size=10, source=null, source_type=null, options=null }
  ]
}

Depend outputs on internal resources

We mentioned this principle when discussing encapsulation above but it's worth repeating it explicitly: set explicit dependencies in outputs so consumers will wait for full resource configuration.

As an example, users can safely reference the project module's project_id output from other modules, knowing that the dependency tree for project configurations (service activation, IAM, etc.) has already been defined inside the module itself. In this particular example the output is also interpolated instead of derived from the resource, so as to avoid issues when used in for_each keys.

output "project_id" {
  description = "Project id."
  value       = "${local.prefix}${var.name}"
  depends_on = [
    google_project.project,
    data.google_project.project,
    google_project_organization_policy.boolean,
    google_project_organization_policy.list,
    google_project_service.project_services,
    google_compute_shared_vpc_service_project.service_projects,
    google_project_iam_member.shared_vpc_host_robots,
    google_kms_crypto_key_iam_member.service_identity_cmek,
    google_project_service_identity.servicenetworking,
    google_project_iam_member.servicenetworking
  ]
}

Why we don't use random strings in names

This is more a convention than a design principle, but it's still important enough to be mentioned here: we never use random strings for resource naming and instead rely on an optional prefix variable which is implemented in most modules.

This matches actual use where naming is a key requirement that needs to integrate with company-wide CMDBs and naming schemes used on-prem or in other clouds, and usually is formed by concatenating progressively more specific tokens (something like myco-gcp-dev-net-hub-0).

Our approach supports easy implementation of company-specific policies and good readability, while still allowing a fairly compact way of ensuring unique resources have unique names.

# prefix = "foo-gcp-dev"

module "project" {
  source = "./modules/project"
  name   = "net-host-0"
  prefix = var.prefix
}

module "project" {
  source = "./modules/project"
  name   = "net-svc-0"
  prefix = var.prefix
}

FAST stage design

Due to their increased complexity and larger scope, FAST stages have some additional design considerations. Please refer to the FAST documentation for additional context.

Standalone usage

Each FAST stage should be designed so that it can optionally be used in isolation, with no dependencies on anything other than its variables.

Stage interfaces

“The best modules are those that provide powerful functionality yet have simple interfaces.”

— John Ousterhout in "A Philosophy of Software Design"

Stages are designed based on the concept of "contracts" or interfaces, which define what information is produced by one stage via outputs, which is then consumed by subsequent stages via variables.

Interfaces are compact in size (few variables) but broad in scope (variables typically leverage maps), so that consumers can declare in variable types only the bits of information they are interested in.

For example, resource management stages only export three map variables: folder_ids, service_accounts, tag_names. Those variables contain values for all the relevant resources created, but consumers are only interested in some of them and only need to declare those: networking stages for example only declare the folder and service account names they need.

variable "folder_ids" {
  # tfdoc:variable:source 01-resman
  description = "Folders to be used for the networking resources in folders/nnnnnnnnnnn format. If null, folder will be created."
  type = object({
    networking      = string
    networking-dev  = string
    networking-prod = string
  })
}

When creating a new stage or adding a feature to an existing one, always try to leverage the existing interfaces when some of the information you produce needs to cross the stage boundary, so as to minimize impact on producers and consumers logically dependent on your stage.

Output files

FAST stages rely on generated provider and tfvars files, as an easy convenience that allows automated setup and passing of contract values between stages.

Files are written to a special GCS bucket in order to be leveraged by both humans and CI/CD workflows, and optionally also written to local storage if needed.

When editing or adding a stage, you are expected to maintain the output files system so any new contact output is also present in files.

Style guide reference

Similarly to our design principles above, we evolved a set of style conventions that we try to standardize on to make code more legible and uniform. This reduces friction when coding, and ideally moves us closer to the goal of using IaC as live documentation.

Group logical resources or modules in separate files

Over time and as our codebase got larger, we switched away from the canonical main.tf/outputs.tf/variables.tf triplet of file names and now tend to prefer descriptive file names that refer to the logical entities (resources or modules) they contain.

We still use traditional names for variables and outputs, but tend to use main only for top-level locals or resources (e.g. the project resource in the project module), or for those resources that would end up in very small files.

While some older modules and examples are still using three files, we are slowly bringing all code up to date and any new development should use descriptive file names.

Our tfdoc tool has a way of generating a documentation table that maps file names with descriptions and the actual resources and modules they contain, refer to the last section for details on how to activate the mode in your code.

Enforce line lengths

We enforce line length for legibility, and adopted the 79 characters convention from other languages for simplicity.

This convention is relaxed for long resource attribute names (even though in some cases you might want to alias them to short local names), and for variable and output descriptions.

In most other cases you should break long lines, especially in for and for_each loops. Some of the conventions we adopted:

  • break after opening and before closing braces/parenthesis
  • break after a colon in for loops
  • add extra parenthesis and breaks to split long ternary operators
  • break right before the : and ? in long ternary operators

This is one of many examples.

locals {
  sink_bindings = {
    for type in ["bigquery", "pubsub", "logging", "storage"] :
    type => {
      for name, sink in var.logging_sinks :
      name => sink if sink.iam && sink.type == type
    }
  }
}

Use alphabetical order for outputs and variables

We enforce alphabetical ordering for outputs and variables and have a check that prevents PRs using the wrong order to be merged. We also tend to prefer alphabetical ordering in locals when there's no implied logical grouping (e.g. for successive data transformations).

Additionally, we adopt a convention similar to the one used in Python for private class members, so that locals only referenced from inside the same locals block are prefixed by _, as in the example shown in the next section.

locals {
  # compute the host project IAM bindings for this project's service identities
  _svpc_service_iam = flatten([
    for role, services in local._svpc_service_identity_iam : [
      for service in services : { role = role, service = service }
    ]
  ])
  _svpc_service_identity_iam = coalesce(
    local.svpc_service_config.service_identity_iam, {}
  )
  svpc_host_config = {
    enabled = coalesce(
      try(var.shared_vpc_host_config.enabled, null), false
    )
    service_projects = coalesce(
      try(var.shared_vpc_host_config.service_projects, null), []
    )
  }
  svpc_service_config = coalesce(var.shared_vpc_service_config, {
    host_project = null, service_identity_iam = {}
  })
  svpc_service_iam = {
    for b in local._svpc_service_iam : "${b.role}:${b.service}" => b
  }
}

Move complex transformations to locals

When data needs to be transformed in a for or for_each loop, we prefer moving the relevant code to locals so that module or resource attribute values have as little line noise as possible. This is especially relevant for complex transformations, which should be split in multiple smaller stages with descriptive names.

This is an example from the project module. Notice how we're breaking two of the rules above: line length in the last local so as to use the same formatting as the previous one, and alphabetical ordering so the order follows the transformation steps. Our rules are meant to improve legibility, so when they don't feel free to ignore them (and sometimes we'll push back anyway).

locals {
  _group_iam_roles = distinct(flatten(values(var.group_iam)))
  _group_iam = {
    for r in local._group_iam_roles : r => [
      for k, v in var.group_iam : "group:${k}" if try(index(v, r), null) != null
    ]
  }
  _iam_additive_pairs = flatten([
    for role, members in var.iam_additive : [
      for member in members : { role = role, member = member }
    ]
  ])
  _iam_additive_member_pairs = flatten([
    for member, roles in var.iam_additive_members : [
      for role in roles : { role = role, member = member }
    ]
  ])
  iam = {
    for role in distinct(concat(keys(var.iam), keys(local._group_iam))) :
    role => concat(
      try(var.iam[role], []),
      try(local._group_iam[role], [])
    )
  }
  iam_additive = {
    for pair in concat(local._iam_additive_pairs, local._iam_additive_member_pairs) :
    "${pair.role}-${pair.member}" => pair
  }
}

The prefix variable

If you would like to use a "prefix" variable for resource names, please keep its definition consistent across all modules:

# variables.tf
variable "prefix" {
  description = "Optional prefix used for resource names."
  type        = string
  default     = null
  validation {
    condition     = var.prefix != ""
    error_message = "Prefix cannot be empty, please use null instead."
  }
}

# main.tf
locals {
  prefix = var.prefix == null ? "" : "${var.prefix}-"
}

For blueprints the prefix is mandatory:

variable "prefix" {
  description = "Prefix used for resource names."
  type        = string
  validation {
    condition     = var.prefix != ""
    error_message = "Prefix cannot be empty."
  }
}

Interacting with checks and tools

Our modules are designed for composition and live in a monorepo together with several end-to-end blueprints, so it was inevitable that over time we found ways of ensuring that a change does not break consumers.

Our tests exercise most of the code in the repo including documentation examples, and leverages the tftest Python library we developed and independently published on PyPi.

Automated workflows run checks on PRs to ensure all tests pass, together with a few other controls that ensure code is linted, documentation reflects variables and outputs files, etc.

The following sections describe how interact with the above, and how to leverage some of the small utilities contained in this repo.

Python environment setup

All our tests and tools use Python, this section shows you how to bring up an environment with the correct dependencies installed.

First, follow the official guide so that you have a working virtual environment and pip installed.

Once you have created and activated a virtual environment, install the dependencies we use for testing and tools.

pip install -r tests/requirements.txt
pip install -r tools/requirements.txt

Automated checks on PRs

We run two GitHub workflows on PRs:

  • .github/workflows/linting.yml
  • .github/workflows/tests.yml

The linting workflow tests:

  • that the correct copyright boilerplate is present in all files, using tools/check_boilerplate.py
  • that all Terraform code is linted via terraform fmt
  • that Terraform variables and outputs are sorted alphabetically
  • that all README files have up to date outputs, variables, and files (where relevant) tables, via tools/check_documentation.py
  • that all links in README files are syntactically correct and valid if internal, via tools/check_links.py
  • that resource names used in FAST stages stay within a length limit, via tools/check_names.py
  • that all Python code has been formatted with the correct yapf style

You can run those checks individually on your code to address any error before sending a PR, all you need to do is run the same command used in the workflow file from within your virtual environment. To run documentation tests for example if you changed the project module:

./tools/check_documentation.py modules/project

Our tools generally support a --help switch, so you can also use them for other purposes:

/tools/check_documentation.py --help
Usage: check_documentation.py [OPTIONS] [DIRS]...

  Cycle through modules and ensure READMEs are up-to-date.

Options:
  -x, --exclude-file TEXT
  --files / --no-files
  --show-diffs / --no-show-diffs
  --show-extra / --no-show-extra
  --help                          Show this message and exit.

As a convenience, we provide a script that runs the same set of checks as our GitHub workflow. Before submitting a PR, run tools/lint.sh and fix any errors. You can use the tools described above to find out more about the failures.

The test workflow runs test suites in parallel. Refer to the next section for more details on running and writing tests.

Using and writing tests

Our testing approach follows a simple philosophy: we mainly test to ensure code works, and that it does not break due to changes to dependencies (modules) or provider resources.

This makes testing very simple, as a successful terraform plan run in a test case is often enough. We only write more specialized tests when we need to check the output of complex transformations in for loops.

As our testing needs are very simple, we also wanted to reduce the friction required to write new tests as much as possible: our tests are written in Python and use pytest which is the standard for the language, leveraging our tftest library, which wraps the Terraform executable and returns familiar data structures for most commands.

Writing pytest unit tests to check plan results is really easy, but since wrapping modules and examples in dedicated fixtures and hand-coding checks gets annoying after a while, we developed additional ways that allow us to simplify the overall process.

In the following sections we describe the three testing approaches we currently have:

  • Example-based tests: this is perhaps the easiest and most common way to test either a module or a blueprint. You simply have to provide an example call to your module and a few metadata values in the module's README.md.
  • tfvars-based tests: allows you to test a module or blueprint by providing variables via tfvar files and an expected plan result in form of an inventory. This type of test is useful, for example, for FAST stages that don't have any examples within their READMEs.
  • Python-based (legacy) tests: in some situations you might still want to interact directly with tftest via Python, if that's the case, use this method to write custom Python logic to test your module in any way you see fit.

Testing via README.md example blocks

This is the preferred method to write tests for modules and blueprints. Example-based tests are triggered from HCL Markdown fenced code blocks in any file named README.md, hence there's no need to create any additional files or revert to Python to write a test. Most of our documentation examples are using this method.

To enable an example for testing just use the special tftest comment as the last line in the example, listing the number of modules and resources expected.

A few preset variables are available for use, as shown in this example from the dns module documentation.

module "private-dns" {
  source          = "./modules/dns"
  project_id      = "myproject"
  type            = "private"
  name            = "test-example"
  domain          = "test.example."
  client_networks = [var.vpc.self_link]
  recordsets = {
    "A localhost" = { ttl = 300, records = ["127.0.0.1"] }
  }
}
# tftest modules=1 resources=2

This is enough to tell our test suite to run this example and assert that the resulting plan has one module (modules=1) and two resources (resources=2)

Note that all HCL code examples in READMEs are automatically tested. To prevent this behavior, include tftest skip somewhere in the code.

Testing examples against an inventory YAML

If you want to go further, you can define a yaml "inventory" with the plan and output results you want to test.

Continuing with the example above, imagine you want to ensure the plan also includes the creation of the A record specified in the recordsets variable. To do this we add the inventory parameter to the tftest directive, as shown below.

module "private-dns" {
  source          = "./modules/dns"
  project_id      = "myproject"
  type            = "private"
  name            = "test-example"
  domain          = "test.example."
  client_networks = [var.vpc.self_link]
  recordsets = {
    "A localhost" = { ttl = 300, records = ["127.0.0.1"] }
  }
}
# tftest modules=1 resources=2 inventory=recordsets.yaml

Next define the corresponding "inventory" yaml file which will be used to assert values from the plan. The inventory is loaded from tests/[module path]/examples/[inventory_name]. In our example we have to create tests/modules/dns/examples/recordsets.yaml.

In the inventory file you have three sections available, and all of them are optional:

  • values is a map of resource indexes (the same ones used by Terraform state) and their attribute name and values; you can define just the attributes you are interested in and the rest will be ignored
  • counts is a map of resource types (eg google_compute_engine) and the number of times each type occurs in the plan; here too only define the ones that need checking
  • outputs is a map of outputs and their values; where a value is unknown at plan time use the special __missing__ token

Going back to our example, we create the inventory with values for the recordset and we also include the zone for good measure.

# file: tests/modules/dns/examples/recordsets.yaml
values:
  module.private-dns.google_dns_managed_zone.non-public[0]:
    dns_name: test.example.
    forwarding_config: []
    name: test-example
    peering_config: []
    project: myproject
    reverse_lookup: false
    service_directory_config: []
    visibility: private
  module.private-dns.google_dns_record_set.cloud-static-records["A localhost"]:
    managed_zone: test-example
    name: localhost.test.example.
    project: myproject
    routing_policy: []
    rrdatas:
    - 127.0.0.1
    ttl: 300
    type: A

counts:
  google_dns_managed_zone: 1
  google_dns_record_set: 1

Using external files

In some situations your module might require additional files to properly test it. This is a common situation with modules that implement factories that drive the creation of resources from YAML files. If you're in this situation, you can still use example-based tests as described below:

  • create your regular hcl code block example and add the tftest directive as described above.
  • create a new code block with the contents of the additional file and use the tftest-file directive. You have to specify a label for the file and a relative path where the file will live.
  • update your hcl code block to use the files parameters and pass a comma separated list of file ids that you want to make available to the module.

Continuing with the DNS example, imagine you want to load the recordsets from a YAML file

module "private-dns" {
  source          = "./modules/dns"
  project_id      = "myproject"
  type            = "private"
  name            = "test-example"
  domain          = "test.example."
  client_networks = [var.vpc.self_link]
  recordsets      = yamldecode(file("records/example.yaml"))
}
# tftest modules=1 resources=2 files=records
# tftest-file id=records path=records/example.yaml
A localhost:
 ttl: 300
 records: ["127.0.0.1"]
A myhost:
 ttl: 600
 records: ["10.10.0.1"]

Note that you can use the files parameters together with inventory to allow more fine-grained assertions. Please review the subnet factory in the net-vpc module for an example of this.

Running tests for specific examples

As mentioned before, we use pytest as our test runner, so you can use any of the standard test selection options available in pytest.

Example-based test are named based on the section within the README.md that contains them. You can use this name to select specific tests.

Here we show a few commonly used selection commands:

  • Run all examples:
    • pytest tests/examples
  • Run all examples for blueprints only:
    • pytest -k blueprints tests/examples
  • Run all examples for modules only:
    • pytest -k modules tests/examples
  • Run all examples for the net-vpc module:
    • pytest -k 'modules and net-vpc:' tests/examples
  • Run a specific example (identified by a substring match on its name) from the net-vpc module:
    • pytest -k 'modules and net-vpc: and ipv6' tests/examples
  • Run a specific example (identified by its full name) from the net-vpc module:
    • pytest -v 'tests/examples/test_plan.py::test_example[modules/net-vpc:IPv6:1]'
  • Run tests for all blueprints except those under the gke directory:
    • pytest -k 'blueprints and not gke' tests/examples

[!NOTE] The colon symbol (:) in pytest keyword expression 'modules and net-vpc:' makes sure that net-vpc is matched but net-vpc-firewall or net-vpc-peering are not.

Tip: to list all tests matched by your keyword expression (-k ...) without actually running them, you can use the --collect-only flag.

The following command executes a dry run that lists all example-based tests for the gke-cluster-autopilot module:

pytest -k 'modules and gke-cluster-autopilot:' tests/examples --collect-only

Once you find the expression matching your desired test(s), remove the --collect-only flag.

The next command executes an example-based test found in the Monitoring Configuration section of the README file for the gke-cluster-autopilot module. That section actually has two tests, so the :2 part selects the second test only:

pytest -k 'modules and gke-cluster-autopilot: and monitoring and :2' tests/examples

Generating the inventory automatically

Building an inventory file by hand is difficult. To simplify this task, the default test runner for examples prints the inventory for the full plan if it succeeds. Therefore, you can start without an inventory and then run a test to get the full plan and extract the pieces you want to build the inventory file.

Suppose you want to generate the inventory for the last DNS example above (the one creating the recordsets from a YAML file). Assuming that example is the first code block under the "Private Zone" section in the README for the dns module, you can run the following command to build the inventory:

pytest -s 'tests/examples/test_plan.py::test_example[modules/dns:Private Zone:1]'

which will generate a output similar to this:

==================================== test session starts ====================================
platform ... -- Python 3.11.2, pytest-7.2.1, pluggy-1.0.0
rootdir: ...
plugins: xdist-3.1.0
collected 1 item

tests/examples/test_plan.py

values:
  module.private-dns.google_dns_managed_zone.non-public[0]:
    description: Terraform managed.
    dns_name: test.example.
    dnssec_config: []
    force_destroy: false
    forwarding_config: []
    labels: null
    name: test-example
    peering_config: []
    private_visibility_config:
    - gke_clusters: []
      networks:
      - network_url: projects/xxx/global/networks/aaa
    project: myproject
    reverse_lookup: false
    service_directory_config: []
    timeouts: null
    visibility: private
  module.private-dns.google_dns_record_set.cloud-static-records["A localhost"]:
    managed_zone: test-example
    name: localhost.test.example.
    project: myproject
    routing_policy: []
    rrdatas:
    - 127.0.0.1
    ttl: 300
    type: A
  module.private-dns.google_dns_record_set.cloud-static-records["A myhost"]:
    managed_zone: test-example
    name: myhost.test.example.
    project: myproject
    routing_policy: []
    rrdatas:
    - 10.10.0.1
    ttl: 600
    type: A

counts:
  google_dns_managed_zone: 1
  google_dns_record_set: 2
  modules: 1
  resources: 3

outputs: {}

.

===================================== 1 passed in 3.46s =====================================

You can use that output to build the inventory file.

Note that for complex modules, the output can be very large and includes a lot of details about the resources. Extract only those resources and fields that are relevant to your test. There is a fine balance between asserting the critical bits related to your test scenario and including too many details that end up making the test too specific.

Building tests for blueprints

Generally blueprints are used as top-level modules which means that usually their READMEs include sample values for their variables but there are no examples showing how to use them as modules.

If you want to test a blueprint using an example, we suggest adding a "Test" section at the end of the README and include the example there. See any existing blueprint for a concrete example.

Testing via tfvars and yaml (aka tftest-based tests)

The second approach to testing requires you to:

  • create a folder in the right tests hierarchy where specific test files will be hosted
  • define tfvars files each with a specific variable configuration to test
  • define yaml "inventory" files with the plan and output results you want to test
  • declare which of these files need to be run as tests in a tftest.yaml file

Let's go through each step in succession, assuming you are testing the new net-lb-app-ext module.

First create a new folder under tests/modules replacing any dash in the module name with underscores. Note that if you were testing a blueprint the folder would go in tests/blueprints.

mkdir tests/modules/net_glb

Then define a tfvars file with one of the module configurations you want to test. If you have a lot of variables which are shared across different tests, you can group all the common variables in a single tfvars file and associate it with each test's specific tfvars file (check the organization module test for an example).

# file: tests/modules/net_glb/test-simple.tfvars
name       = "glb-test-0"
project_id = "my-project"
backend_buckets_config = {
  default = {
    bucket_name = "my-bucket"
  }
}

Next define the corresponding "inventory" yaml file which will be used to assert values from the plan that uses the tfvars file above. In the inventory file you have three sections available:

  • values is a map of resource indexes (the same ones used by Terraform state) and their attribute name and values; you can define just the attributes you are interested in and the other will be ignored
  • counts is a map of resource types (eg google_compute_engine) and the number of times each type occurs in the plan; here too just define the ones the that need checking
  • outputs is a map of outputs and their values; where a value is unknown at plan time use the special __missing__ token
# file: tests/modules/net_glb/test-simple.yaml
values:
  google_compute_global_forwarding_rule.default:
    description: Terraform managed.
    load_balancing_scheme: EXTERNAL
  google_compute_target_http_proxy.default[0]:
    name: glb-test-1
counts:
  google_compute_backend_bucket: 1
  google_compute_global_forwarding_rule: 1
  google_compute_health_check: 1
  google_compute_target_http_proxy: 1
  google_compute_url_map: 1
outputs:
  address: __missing__
  backend_service_ids: __missing__
  forwarding_rule: __missing__
  group_ids: __missing__
  health_check_ids: __missing__
  neg_ids: __missing__

Create as many pairs of tfvars/yaml files as you need to test every scenario and feature, then create the file that triggers our fixture and converts them into pytest tests.

# file: tests/modules/net_glb/tftest.yaml
module: modules/net-lb-app-ext
# if there are variables shared among all tests you can define a common file
# common_tfvars:
#   - defaults.tfvars
tests:
  # run a test named `test-plan`, load the specified tfvars files
  # use the default inventory file of `test-plan.yaml`
  test-plan:
    tfvars: # if omitted, we load test-plan.tfvars by default
      - test-plan.tfvars
      - test-plan-extra.tfvars
    inventory:
      - test-plan.yaml
  # You can use `extra_files` to include additional tf files outside 
  # the module's path before running the test.
  # extra_files:  
  #   - ../plugin-x/*.tf

  # You can omit the tfvars and inventory sections and they will
  # default to the name of the test. The following two examples are equivalent:
  #
  # test-plan2:
  #   tfvars:
  #     - test-plan2.tfvars
  #   inventory:
  #     - test-plan2.yaml
  # test-plan2:

A good example of tests showing different ways of leveraging our framework is in the tests/modules/organization folder.

Generating the inventory for tftest-based tests

Just as you can generate an initial inventory for example-based tests, you can do the same for tftest-based tests. Currently the process relies on an additional tool (tools/plan_summary.py) but but we have plans to unify both cases in the future.

As an example, if you want to generate the inventory for the organization module using the common.tfvars and audit_config.tfvars found in tests/modules/organization/, simply run plan_summary.py as follows:

$ python tools/plan_summary.py modules/organization \
   tests/modules/organization/common.tfvars \
   tests/modules/organization/audit_config.tfvars

values:
  google_organization_iam_audit_config.config["allServices"]:
    audit_log_config:
    - exempted_members:
      - user:me@example.org
      log_type: DATA_WRITE
    - exempted_members: []
      log_type: DATA_READ
    org_id: '1234567890'
    service: allServices

counts:
  google_organization_iam_audit_config: 1
  modules: 0
  resources: 1

outputs:
  custom_role_id: {}
  custom_roles: {}
  firewall_policies: {}
  firewall_policy_id: {}
  network_tag_keys: {}
  network_tag_values: {}
  organization_id: organizations/1234567890
  sink_writer_identities: {}
  tag_keys: {}
  tag_values: {}

You can now use this output to create the inventory file for your test. As mentioned before, please only use those values relevant to your test scenario.

Running end-to-end tests

You can use end-to-end tests to verify your code against GCP API. These tests verify that terraform apply succeeds, terraform plan is empty afterwards and that terraform destroy raises no error.

Prerequisites

Prepare following information:

  • billing account id
  • organization id
  • parent folder under which resources will be created
    • (you may want to disable / restore to default some organization policies under this folder)
  • decide in which region you want to deploy (choose one, that has wide service coverage)
  • (optional) prepare service account that has necessary permissions (able to assign billing account to project, resource creation etc)
  • prepare a prefix (this is to provide project and other global resources name uniqueness)

How does it work

Each test case is provided by additional environment defined in variables.tf. This simplifies writing the examples as this follows the same structure as for non-end-to-end tests, and allows multiple, independent and concurrent runs of tests.

The test environment can be provisioned automatically during the test run (which takes ~2 minutes) and destroyed at the end, when all tests finish (option 1 below), which is targeting automated runs in CI/CD pipeline, or it can be provisioned manually (option 2 below) to reduce test time, which might be typical use case for tests run locally.

For development, to keep the feedback loop short, consider using local sandbox and paste specific example in main.tf file.

Option 1 - automatically provision and de-provision testing infrastructure

Set variables in environment:

export TFTEST_E2E_billing_account="123456-123456-123456"  # billing account id to associate projects
export TFTEST_E2E_group_email="group@example.org" # existing group within organization
export TFTEST_E2E_organization_id="1234567890" # your organization id
export TFTEST_E2E_parent="folders/1234567890"  # folder under which test resources will be created
export TFTEST_E2E_prefix="your-unique-prefix"  # unique prefix for projects, no longer than 7 characters
export TFTEST_E2E_region="europe-west4"  # region to use

To use Service Account Impersonation, use provider environment variable

export GOOGLE_IMPERSONATE_SERVICE_ACCOUNT=<username>@<project-id>.iam.gserviceaccount.com

You can keep the prefix the same for all the tests run, the tests will add necessary suffix for subsequent runs, and in case tests are run in parallel, use separate suffix for the workers.

Run the tests

pytest tests/examples_e2e

Option 2 - Provision manually test environment and use it for tests

Provision manually test environment

In tests/examples_e2e/setup_module create terraform.tfvars with following values:

billing_account = "123456-123456-123456"  # billing account id to associate projects
group_email     = "group@example.org"  # existing group within organization
organization_id = "1234567890"  # your organization id
parent          = "folders/1234567890"  # folder under which test resources will be created
prefix          = "your-unique-prefix"  # unique prefix for projects
region          = "europe-west4"  # region to use
timestamp       = "1696444185" # generate your own timestamp - will be used as a part of prefix for globally unique resources

If you use service account impersonation, set GOOGLE_IMPERSONATE_SERVICE_ACCOUNT

export GOOGLE_IMPERSONATE_SERVICE_ACCOUNT=<username>@<project-id>.iam.gserviceaccount.com

Provision the environment using terraform

(cd tests/examples_e2e/setup_module/ && terraform init && terraform apply)

This will generate also tests/examples_e2e/setup_module/e2e_tests.tfvars for you, which can be used by tests.

Setup your environment
export TFTEST_E2E_TFVARS_PATH=`pwd`/tests/examples_e2e/setup_module/e2e_tests.tfvars  # generated above
export TFTEST_E2E_prefix="your-unique-prefix"  # unique prefix for projects, no longer than 7 characters
De-provision the environment

Once you are done with the tests, run:

(cd tests/examples_e2e/setup_module/ && terraform apply -destroy)

To remove all resources created for testing and tests/examples_e2e/setup_module/e2e_tests.tfvars file.

Run tests

Run tests using:

pytest tests/examples_e2e

Creating sandbox environment for examples

When developing it is convenient to have a module that represents chosen example, so you can inspect the environment after running apply and quickly verify fixes. Shell script create_e2e_sandbox.sh will create such environment for you.

Prepare the environment variables as defined in Option 1 above and run:

$ tools/create_e2e_sandbox.sh <directory>

The script will create in <directory> following structure:

<directory>
├── default-versions.tf
├── e2e_tests.auto.tfvars -> infra/e2e_tests.tfvars
├── fabric -> <cloud-foundation-fabric root>
├── infra
│   ├── e2e_tests.tfvars
│   ├── e2e_tests.tfvars.tftpl
│   ├── main.tf
│   ├── randomizer.auto.tfvars
│   ├── terraform.tfvars
│   └── variables.tf
├── main.tf
└── variables.tf

The infra directory contains the sandbox infrastructure as well as all environment variables dumped into terraform.tfvars file. The script runs terraform init and terraform apply -auto-approve in this folder.

The <direcotry> has empty main.tf where you can paste any example, and it will get all necessary variables from e2e_tests.auto.tfvars file.

If there are any changes to the test sandbox, you can rerun the script and only changes will be applied to the project.

Cleaning up interrupted E2E tests / failed destroys

Tests take the effort to clean after themselves but in following situations some resources may be left in GCP:

  • you interrupt the test run
  • terraform destroy failed (for example, because of some bug in the example of module code)

To clean up the old dangling resources you may run this commands, to remove folders and projects older than 1 week


for folder_id in $(
  gcloud resource-manager folders list --folder "${TFTEST_E2E_parent}" --filter="createTime<-P1W" --format='value(name)'
  ) ; do
  for project_id in $(
  gcloud alpha projects list --folder "${folder_id}" --format='value(project_id)'
  ) ; do
    echo $project_id
    gcloud projects delete --quiet "${project_id}"
  done
  gcloud resource-manager folders delete --quiet "${folder_id}"
done

Take care, as this may also attempt to remove folders/projects created for Option 2 or sandbox.

Writing tests in Python (legacy approach)

Where possible, we recommend using the testing methods described in the previous sections. However, if you need it, you can still write tests using Python directly.

In general, you should try to use the plan_summary fixture, which runs a a terraform plan and returns a PlanSummary object. The most important arguments to plan_summary are:

  • the path of the Terraform module you want to test, relative to the root of the repository
  • a list of paths representing the tfvars file to pass in to terraform. These paths are relative to the python file defining the test.

If successful, plan_summary will return a PlanSummary object with the values, counts and outputs attributes following the same semantics described in the previous section. You can use this fields to write your custom tests.

Like before let's imagine we're writing a (python) test for net-lb-app-ext module. First create a new folder under tests/modules replacing any dash in the module name with underscores. You also need to create an empty __init__.py file in it, to ensure pytest discovers you new tests automatically.

mkdir tests/modules/net_glb
touch tests/modules/net_glb/__init__.py

Now create a file containing your tests, e.g. test_plan.py:

def test_name(plan_summary, tfvars_to_yaml, tmp_path):
  s = plan_summary('modules/net-lb-app-ext', tf_var_files=['test-plan.tfvars'])
  address = 'google_compute_url_map.default'
  assert s.values[address]['project'] == 'my-project'

For more examples on how to write python tests, check the tests for the organization module.

Running tests from a temporary directory

Most of the time you can run tests using the pytest command as described in previous. However, the plan_summary fixture allows copying the root module and running the test from a temporary directory.

To enable this option, just define the environment variable TFTEST_COPY and any tests using the plan_summary fixture will automatically run from a temporary directory.

Running tests from temporary directories is useful if:

  • you're running tests in parallel using pytest-xdist. In this case, just run you tests as follows:

    TFTEST_COPY=1 pytest -n 4
    
  • you're running tests for the fast/ directory which contain tfvars and auto.tfvars files (which are read by terraform automatically) making your tests fail. In this case, you can run

    TFTEST_COPY=1 pytest fast/
    

Fabric tools

The main tool you will interact with in development is tfdoc, used to generate file, output and variable tables in README documents.

By default, tfdoc expects the path to a folder as its argument, and will parse variables and outputs files contained in it and embed generated tables in its README file.

You decide where the generated tables will be placed (or replaced if they already exist) via two special HTML comment tags, that mark the beginning and end of the space that will be managed by tfdoc.

<!-- BEGIN TFDOC -->
<!-- everything between these two tags will be managed by tfdoc -->
<!-- END TFDOC -->

You can also set tfdoc options directly in a README file, so that a) you don't need to remember to pass the right options when running the tool, and b) our automated workflow checks will know how to generate the right output.

<!-- the following comment configures tfdoc options -->
<!-- TFDOC OPTS files:1 show_extra:1 -->

When generating the files table, a special annotation can be used to fill in the file description in Terraform files:

# tfdoc:file:description Networking stage resources.

The tool can also be run so that it prints the generated output on standard output instead of replacing in files. Run tfdoc --help to see all available options.