cloud-foundation-fabric/fast/stages/2-networking-d-separate-envs
Ludovico Magnocavallo 6941313c7d
Factories refactor (#1843)
* factories refactor doc

* Adds file schema and filesystem organization

* Update 20231106-factories.md

* move factories out of blueprints and create new factories  README

* align factory in billing-account module

* align factory in dataplex-datascan module

* align factory in billing-account module

* align factory in net-firewall-policy module

* align factory in dns-response-policy module

* align factory in net-vpc-firewall module

* align factory in net-vpc module

* align factory variable names in FAST

* remove decentralized firewall blueprint

* bump terraform version

* bump module versions

* update top-level READMEs

* move project factory to modules

* fix variable names and tests

* tfdoc

* remove changelog link

* add project factory to top-level README

* fix cludrun eventarc diff

* fix README

* fix cludrun eventarc diff

---------

Co-authored-by: Simone Ruffilli <sruffilli@google.com>
2024-02-26 10:16:52 +00:00
..
data Selectively enable logging in FAST and firewall policy module rules (#2032) 2024-01-31 09:50:35 +01:00
.gitignore FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
IAM.md FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
README.md Extend FAST to support different principal types (#2064) 2024-02-12 14:35:30 +01:00
diagram.png FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
diagram.svg FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) 2023-02-04 15:00:45 +01:00
dns-dev.tf Factories refactor (#1843) 2024-02-26 10:16:52 +00:00
dns-prod.tf Factories refactor (#1843) 2024-02-26 10:16:52 +00:00
main.tf Factories refactor (#1843) 2024-02-26 10:16:52 +00:00
monitoring-vpn-onprem.tf Add VPN monitoring alerts to 2-networking and VPN usage chart 2023-06-06 13:49:21 +01:00
monitoring.tf Allow configuring regions from tfvars in FAST networking stages (#1137) 2023-02-08 09:59:43 +01:00
net-dev.tf Leverage net-vpc module for DNS logging in FAST (#2041) 2024-02-03 08:16:00 +01:00
net-prod.tf Leverage net-vpc module for DNS logging in FAST (#2041) 2024-02-03 08:16:00 +01:00
outputs.tf Simplify VPN implementation in FAST networking stages (#1228) 2023-03-09 17:57:44 +01:00
regions.tf Allow configuring regions from tfvars in FAST networking stages (#1137) 2023-02-08 09:59:43 +01:00
test-resources.tf Allow configuring regions from tfvars in FAST networking stages (#1137) 2023-02-08 09:59:43 +01:00
variables.tf Extend FAST to support different principal types (#2064) 2024-02-12 14:35:30 +01:00
vpn-onprem.tf Simplify VPN implementation in FAST networking stages (#1228) 2023-03-09 17:57:44 +01:00

README.md

Networking with separated single environment

This stage sets up the shared network infrastructure for the whole organization. It implements a single shared VPC per environment, where each environment is independently connected to the on-premise environment, to maintain a fully separated routing domain on GCP.

While no communication between environment is implemented on this design, that can be achieved with a number of different options:

  • VPC Peering - which is not recommended as it would effectively create a full line of sight between workloads belonging to different environments
  • VPN HA tunnels between environments, exchanging a subset of well-defined routes.
  • Multi-NIC appliances implemented in 2-networking-c-nva and 2-networking-e-nva-bgp connecting the different environments, allowing the use of NVAs to enforce networking policies.

The following diagram illustrates the high-level design, and should be used as a reference for the following sections. The final number of subnets, and their IP addressing design will of course depend on customer-specific requirements, and can be easily changed via variables or external data files without having to edit the actual code.

Networking diagram

Table of contents

Design overview and choices

VPC design

This architecture creates one VPC for each environment, each in its respective project. Each VPC hosts external connectivity and shared services solely serving its own environment.

As each environment is fully independent, this design trivialises the creation of new environments.

External connectivity

External connectivity to on-prem is implemented here via HA VPN (two tunnels per region), as this is the minimum common denominator often used directly, or as a stop-gap solution to validate routing and transfer data, while waiting for interconnects to be provisioned.

Connectivity to additional on-prem sites or other cloud providers should be implemented in a similar fashion, via VPN tunnels or interconnects on each of the environment VPCs, sharing the same regional router.

IP ranges, subnetting, routing

Minimizing the number of routes (and subnets) in use on the cloud environment is an important consideration, as it simplifies management and avoids hitting Cloud Router and VPC quotas and limits. For this reason, we recommend careful planning of the IP space used in your cloud environment, to be able to use large IP CIDR blocks in routes whenever possible.

This stage uses a dedicated /16 block (which should of course be sized to your needs) shared by all regions and environments, and subnets created in each VPC derive their ranges from their relevant block.

Each VPC also defines and reserves two "special" CIDR ranges dedicated to PSA (Private Service Access) and Internal Application Load Balancers (L7 LBs).

Routes in GCP are either automatically created for VPC subnets, manually created via static routes, or dynamically programmed by Cloud Routers via BGP sessions, which can be configured to advertise VPC ranges, and/or custom ranges via custom advertisements.

In this setup:

  • routes between multiple subnets within the same VPC are automatically programmed by GCP
  • on-premises is connected to each environment VPC and dynamically exchanges BGP routes with GCP using HA VPN

Internet egress

Cloud NAT provides the simplest path for internet egress. This setup uses Cloud NAT, with optional per-VPC NAT gateways. Cloud NAT is disabled by default; enable it by setting the enable_cloud_nat variable.

Several other scenarios are possible of course, with varying degrees of complexity:

  • a forward proxy, with optional URL filters
  • a default route to on-prem to leverage existing egress infrastructure
  • a full-fledged perimeter firewall to control egress and implement additional security features like IPS

Future pluggable modules will allow to easily experiment, or deploy the above scenarios.

VPC and Hierarchical Firewall

The GCP Firewall is a stateful, distributed feature that allows the creation of L4 policies, either via VPC-level rules or more recently via hierarchical policies applied on the resource hierarchy (organization, folders).

The current setup adopts both firewall types, and uses hierarchical rules on the Networking folder for common ingress rules (egress is open by default), e.g. from health check or IAP forwarders ranges, and VPC rules for the environment or workload-level ingress.

Rules and policies are defined in simple YAML files, described below.

DNS

DNS often goes hand in hand with networking, especially on GCP where Cloud DNS zones and policies are associated at the VPC level. This setup implements both DNS flows:

  • on-prem to cloud via private zones for cloud-managed domains, and an inbound policy used as forwarding target or via delegation (requires some extra configuration) from on-prem DNS resolvers
  • cloud to on-prem via forwarding zones for the on-prem managed domains
  • Private Google Access is enabled via DNS Response Policies for most of the supported domains

To complete the configuration, the 35.199.192.0/19 range should be routed on the VPN tunnels from on-prem, and the following names configured for DNS forwarding to cloud:

  • private.googleapis.com
  • restricted.googleapis.com
  • gcp.example.com (used as a placeholder)

From cloud, the example.com domain (used as a placeholder) is forwarded to on-prem.

This configuration is battle-tested, and flexible enough to lend itself to simple modifications without subverting its design, for example by forwarding and peering root zones to bypass Cloud DNS external resolution.

Stage structure and files layout

VPCs

VPCs are defined in separate files, one for each of prod and dev.

These files contain different resources:

  • project (project) contains the VPC, and enables the required APIs and sets itself as a "host project".
  • VPCs (net-vpc): manages the subnets, the explicit routes for {private,restricted}.googleapis.com and the DNS inbound policy for the trusted landing VPC. Non-infrastructural subnets are created leveraging resource factories. Sample subnets are shipped in data/subnets and can be easily customized to fit users' needs. PSA are configured by the variable psa_ranges if managed services are needed.
  • Cloud NAT (net-cloudnat) manages the networking infrastructure required to enable internet egress.

VPNs

Connectivity to on-prem is implemented with HA VPN (net-vpn) and defined in vpn-onprem.tf. The file implements a single logical connection between each environment and onprem in the primary region, and the relevant parameters for its configuration are found in the vpn_onprem_dev_primary_config and vpn_onprem_prod_primary_config variables.

Routing and BGP

Each VPC network (net-vpc) manages a separate routing table, which can define static routes (e.g. to private.googleapis.com) and receives dynamic routes from BGP sessions established with neighbor networks (e.g. from onprem).

Static routes are defined in net-*.tf files, in the routes section of each net-vpc module.

Firewall

VPC firewall rules (net-vpc-firewall) are defined per-vpc on each net-*.tf file and leverage a resource factory to massively create rules. To add a new firewall rule, create a new file or edit an existing one in the data_folder directory defined in the module net-vpc-firewall, following the examples of the "Rules factory" section of the module documentation. Sample firewall rules are shipped in data/firewall-rules/dev and can be easily customised.

Hierarchical firewall policies (folder) are defined in main.tf and managed through a policy factory implemented by the net-firewall-policy module, which is then applied to the Networking folder containing all the core networking infrastructure. Policies are defined in the rules_file file, to define a new one simply use the firewall policy module documentation". Sample hierarchical firewall rules are shipped in data/hierarchical-ingress-rules.yaml and can be easily customised.

DNS architecture

The DNS (dns) infrastructure is defined in the respective dns-xxx.tf files.

Cloud DNS manages onprem forwarding and environment-specific zones (i.e. dev.gcp.example.com and prod.gcp.example.com).

Cloud to on-prem

Leveraging the forwarding zones defined on each environment, the cloud environment can resolve in-addr.arpa. and onprem.example.com. using the on-premises DNS infrastructure. Onprem resolvers IPs are set in variable dns.

DNS queries sent to the on-premises infrastructure come from the 35.199.192.0/19 source range, which is only accessible from within a VPC or networks connected to one.

When implementing this architecture, make sure you'll be able to route packets coming from the /19 range to the right environment (route to prod requests coming from prod and to dev for requests coming from dev). As an alternative, consider leveraging self-managed DNS resolvers (e.g. CoreDNS forwarders) on each environment.

On-prem to cloud

The Inbound DNS Policy defined on eachVPC automatically reserves the first available IP address on each created subnet (typically the third one in a CIDR) to expose the Cloud DNS service so that it can be consumed from outside of GCP.

How to run this stage

This stage is meant to be executed after the resource management stage has run, as it leverages the automation service account and bucket created there, and additional resources configured in the bootstrap stage.

It's of course possible to run this stage in isolation, but that's outside the scope of this document, and you would need to refer to the code for the previous stages for the environmental requirements.

Before running this stage, you need to make sure you have the correct credentials and permissions, and localize variables by assigning values that match your configuration.

Provider and Terraform variables

As all other FAST stages, the mechanism used to pass variable values and pre-built provider files from one stage to the next is also leveraged here.

The commands to link or copy the provider and terraform variable files can be easily derived from the stage-links.sh script in the FAST root folder, passing it a single argument with the local output files folder (if configured) or the GCS output bucket in the automation project (derived from stage 0 outputs). The following examples demonstrate both cases, and the resulting commands that then need to be copy/pasted and run.

../../stage-links.sh ~/fast-config

# copy and paste the following commands for '2-networking-a-peering'

ln -s ~/fast-config/providers/2-networking-providers.tf ./
ln -s ~/fast-config/tfvars/0-globals.auto.tfvars.json ./
ln -s ~/fast-config/tfvars/0-bootstrap.auto.tfvars.json ./
ln -s ~/fast-config/tfvars/1-resman.auto.tfvars.json ./
../../stage-links.sh gs://xxx-prod-iac-core-outputs-0

# copy and paste the following commands for '2-networking-a-peering'

gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/providers/2-networking-providers.tf ./
gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/0-globals.auto.tfvars.json ./
gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/0-bootstrap.auto.tfvars.json ./
gcloud alpha storage cp gs://xxx-prod-iac-core-outputs-0/tfvars/1-resman.auto.tfvars.json ./

Impersonating the automation service account

The preconfigured provider file uses impersonation to run with this stage's automation service account's credentials. The gcp-devops and organization-admins groups have the necessary IAM bindings in place to do that, so make sure the current user is a member of one of those groups.

Variable configuration

Variables in this stage -- like most other FAST stages -- are broadly divided into three separate sets:

  • variables which refer to global values for the whole organization (org id, billing account id, prefix, etc.), which are pre-populated via the 0-globals.auto.tfvars.json file linked or copied above
  • variables which refer to resources managed by previous stage, which are prepopulated here via the 0-bootstrap.auto.tfvars.json and 1-resman.auto.tfvars.json files linked or copied above
  • and finally variables that optionally control this stage's behaviour and customizations, and can to be set in a custom terraform.tfvars file

The latter set is explained in the Customization sections below, and the full list can be found in the Variables table at the bottom of this document.

Note that the outputs_location variable is disabled by default, you need to explicitly set it in your terraform.tfvars file if you want output files to be generated by this stage. This is a sample terraform.tfvars that configures it, refer to the bootstrap stage documentation for more details:

outputs_location = "~/fast-config"

Using delayed billing association for projects

This configuration is possible but unsupported and only exists for development purposes, use at your own risk:

  • temporarily switch billing_account.id to null in 0-globals.auto.tfvars.json
  • for each project resources in the project modules used in this stage (dev-spoke-project, prod-spoke-project)
    • apply using -target, for example terraform apply -target 'module.prod-spoke-project.google_project.project[0]'
    • untaint the project resource after applying, for example terraform untaint 'module.prod-spoke-project.google_project.project[0]'
  • go through the process to associate the billing account with the two projects
  • switch billing_account.id back to the real billing account id
  • resume applying normally

Running the stage

Once provider and variable values are in place and the correct user is configured, the stage can be run:

terraform init
terraform apply

Post-deployment activities

  • On-prem routers should be configured to advertise all relevant CIDRs to the GCP environments. To avoid hitting GCP quotas, we recommend aggregating routes as much as possible.
  • On-prem routers should accept BGP sessions from their cloud peers.
  • On-prem DNS servers should have forward zones for GCP-managed ones.

Private Google Access

Private Google Access (or PGA) enables VMs and on-prem systems to consume Google APIs from within the Google network, and is already fully configured on this environment:

  • DNS response policies in the landing project implement rules for all supported domains reachable via PGA
  • routes for the private and restricted ranges are defined in all VPCs

To enable PGA access from on premises advertise the private/restricted ranges via the vpn_onprem_dev_primary_config and vpn_onprem_prod_primary_config variables, using router or tunnel custom advertisements.

Customizations

Configuring the VPNs to on prem

This stage includes basic support for an HA VPN connecting each environment landing zone in the primary region to on prem. Configuration is via the vpn_onprem_dev_primary_config and vpn_onprem_prod_primary_config variables, that closely mirrors the variables defined in the net-vpn-ha.

Support for the onprem VPNs is disabled by default so that no resources are created, this is an example of how to configure one variable to enable the VPN for dev in the primary region:

vpn_onprem_dev_primary_config = {
  peer_external_gateways = {
    default = {
      redundancy_type = "SINGLE_IP_INTERNALLY_REDUNDANT"
      interfaces      = ["8.8.8.8"]
    }
  }
  router_config = {
    asn = 65501
    custom_advertise = {
      all_subnets = false
      ip_ranges   = {
        "10.1.0.0/16"     = "gcp"
        "35.199.192.0/19" = "gcp-dns"
        "199.36.153.4/30" = "gcp-restricted"
      }
    }
  }
  tunnels = {
    "0" = {
      bgp_peer = {
        address = "169.254.1.1"
        asn     = 65500
      }
      bgp_session_range               = "169.254.1.2/30"
      peer_external_gateway_interface = 0
      shared_secret                   = "foo"
      vpn_gateway_interface           = 0
    }
    "1" = {
      bgp_peer = {
        address = "169.254.2.1"
        asn     = 64513
      }
      bgp_session_range               = "169.254.2.2/30"
      peer_external_gateway_interface = 1
      shared_secret                   = "foo"
      vpn_gateway_interface           = 1
    }
  }
}

Changing default regions

Regions are defined via the regions variable which sets up a mapping between the regions.primary and regions.secondary logical names and actual GCP region names. If you need to change regions from the defaults:

  • change the values of the mappings in the regions variable to the regions you are going to use
  • change the regions in the factory subnet files in the data folder

Files

name description modules resources
dns-dev.tf Development spoke DNS zones and peerings setup. dns · dns-response-policy
dns-prod.tf Production spoke DNS zones and peerings setup. dns · dns-response-policy
main.tf Networking folder and hierarchical policy. folder · net-firewall-policy
monitoring-vpn-onprem.tf VPN monitoring alerts. google_monitoring_alert_policy
monitoring.tf Network monitoring dashboards. google_monitoring_dashboard
net-dev.tf Dev spoke VPC and related resources. net-cloudnat · net-vpc · net-vpc-firewall · project
net-prod.tf Production spoke VPC and related resources. net-cloudnat · net-vpc · net-vpc-firewall · project
outputs.tf Module outputs. google_storage_bucket_object · local_file
regions.tf Compute short names for regions.
test-resources.tf Temporary instances for testing compute-vm
variables.tf Module variables.
vpn-onprem.tf VPN between landing and onprem. net-vpn-ha

Variables

name description type required default producer
automation Automation resources created by the bootstrap stage. object({…}) 0-bootstrap
billing_account Billing account id. If billing account is not part of the same org set is_org_level to false. object({…}) 0-bootstrap
folder_ids Folders to be used for the networking resources in folders/nnnnnnnnnnn format. If null, folder will be created. object({…}) 1-resman
organization Organization details. object({…}) 0-bootstrap
prefix Prefix used for resources that need unique names. Use 9 characters or less. string 0-bootstrap
alert_config Configuration for monitoring alerts. object({…}) {…}
custom_roles Custom roles defined at the org level, in key => id format. object({…}) null 0-bootstrap
dns DNS configuration. object({…}) {}
enable_cloud_nat Deploy Cloud NAT. bool false
essential_contacts Email used for essential contacts, unset if null. string null
factories_config Configuration for network resource factories. object({…}) {…}
outputs_location Path where providers and tfvars files for the following stages are written. Leave empty to disable. string null
psa_ranges IP ranges used for Private Service Access (e.g. CloudSQL). object({…}) null
regions Region definitions. object({…}) {…}
service_accounts Automation service accounts in name => email format. object({…}) null 1-resman
vpn_onprem_dev_primary_config VPN gateway configuration for onprem interconnection from dev in the primary region. object({…}) null
vpn_onprem_prod_primary_config VPN gateway configuration for onprem interconnection from prod in the primary region. object({…}) null

Outputs

name description sensitive consumers
dev_cloud_dns_inbound_policy IP Addresses for Cloud DNS inbound policy for the dev environment.
host_project_ids Network project ids.
host_project_numbers Network project numbers.
prod_cloud_dns_inbound_policy IP Addresses for Cloud DNS inbound policy for the prod environment.
shared_vpc_self_links Shared VPC host projects.
tfvars Terraform variables file for the following stages.
vpn_gateway_endpoints External IP Addresses for the GCP VPN gateways.