826ff14ef9 | ||
---|---|---|
.. | ||
data | ||
.gitignore | ||
IAM.md | ||
README.md | ||
diagram.png | ||
diagram.svg | ||
dns-dev.tf | ||
dns-prod.tf | ||
main.tf | ||
monitoring.tf | ||
outputs.tf | ||
spoke-dev.tf | ||
spoke-prod.tf | ||
test-resources.tf | ||
variables.tf | ||
vpn-onprem-dev.tf | ||
vpn-onprem-prod.tf |
README.md
Networking
This stage sets up the shared network infrastructure for the whole organization. It implements a single shared VPC per environment, where each environment is independently connected to the on-premise environment, to maintain a fully separated routing domain on GCP.
While no communication between environment is implemented on this design, that can be achieved with a number of different options:
- VPC Peering - which is not recommended as it would effectively create a full line of sight between workloads belonging to different environments
- VPN HA tunnels between environments, exchanging a subset of well-defined routes.
- Multi-NIC appliances connecting the different environments, allowing the use of NVAs to enforce networking policies.
The following diagram illustrates the high-level design, and should be used as a reference for the following sections. The final number of subnets, and their IP addressing design will of course depend on customer-specific requirements, and can be easily changed via variables or external data files without having to edit the actual code.
Design overview and choices
VPC design
This architecture creates one VPC for each environment, each in its respective project. Each VPC hosts external connectivity and shared services solely serving its own environment.
As each environment is fully independent, this design trivialises the creation of new environments.
External connectivity
External connectivity to on-prem is implemented here via HA VPN (two tunnels per region), as this is the minimum common denominator often used directly, or as a stop-gap solution to validate routing and transfer data, while waiting for interconnects to be provisioned.
Connectivity to additional on-prem sites or other cloud providers should be implemented in a similar fashion, via VPN tunnels or interconnects on each of the environment VPCs, sharing the same regional router.
IP ranges, subnetting, routing
Minimizing the number of routes (and subnets) in use on the cloud environment is an important consideration, as it simplifies management and avoids hitting Cloud Router and VPC quotas and limits. For this reason, we recommend careful planning of the IP space used in your cloud environment, to be able to use large IP CIDR blocks in routes whenever possible.
This stage uses a dedicated /16 block (which should of course be sized to your needs) shared by all regions and environments, and subnets created in each VPC derive their ranges from their relevant block.
Each VPC also defines and reserves two "special" CIDR ranges dedicated to PSA (Private Service Access) and Internal HTTPs Load Balancers (L7ILB).
Routes in GCP are either automatically created for VPC subnets, manually created via static routes, or dynamically programmed by Cloud Routers via BGP sessions, which can be configured to advertise VPC ranges, and/or custom ranges via custom advertisements.
In this setup:
- routes between multiple subnets within the same VPC are automatically programmed by GCP
- on-premises is connected to each environment VPC and dynamically exchanges BGP routes with GCP using HA VPN
Internet egress
The path of least resistance for Internet egress is using Cloud NAT, and that is what's implemented in this setup, with a NAT gateway configured for each VPC.
Several other scenarios are possible of course, with varying degrees of complexity:
- a forward proxy, with optional URL filters
- a default route to on-prem to leverage existing egress infrastructure
- a full-fledged perimeter firewall to control egress and implement additional security features like IPS
Future pluggable modules will allow to easily experiment, or deploy the above scenarios.
VPC and Hierarchical Firewall
The GCP Firewall is a stateful, distributed feature that allows the creation of L4 policies, either via VPC-level rules or more recently via hierarchical policies applied on the resource hierarchy (organization, folders).
The current setup adopts both firewall types, and uses hierarchical rules on the Networking folder for common ingress rules (egress is open by default), e.g. from health check or IAP forwarders ranges, and VPC rules for the environment or workload-level ingress.
Rules and policies are defined in simple YAML files, described below.
DNS
DNS often goes hand in hand with networking, especially on GCP where Cloud DNS zones and policies are associated at the VPC level. This setup implements both DNS flows:
- on-prem to cloud via private zones for cloud-managed domains, and an inbound policy used as forwarding target or via delegation (requires some extra configuration) from on-prem DNS resolvers
- cloud to on-prem via forwarding zones for the on-prem managed domains
- Private Google Access is enabled for a selection of the supported domains, namely
private.googleapis.com
restricted.googleapis.com
gcr.io
packages.cloud.google.com
pkg.dev
pki.goog
To complete the configuration, the 35.199.192.0/19 range should be routed on the VPN tunnels from on-prem, and the following names configured for DNS forwarding to cloud:
private.googleapis.com
restricted.googleapis.com
gcp.example.com
(used as a placeholder)
From cloud, the example.com
domain (used as a placeholder) is forwarded to on-prem.
This configuration is battle-tested, and flexible enough to lend itself to simple modifications without subverting its design, for example by forwarding and peering root zones to bypass Cloud DNS external resolution.
How to run this stage
This stage is meant to be executed after the resman stage has run, as it leverages the automation service account and bucket created there, and additional resources configured in the bootstrap stage.
It's of course possible to run this stage in isolation, but that's outside the scope of this document, and you would need to refer to the code for the previous stages for the environmental requirements.
Before running this stage, you need to make sure you have the correct credentials and permissions, and localize variables by assigning values that match your configuration.
Providers configuration
The default way of making sure you have the right permissions, is to use the identity of the service account pre-created for this stage during the resource management stage, and that you are a member of the group that can impersonate it via provider-level configuration (gcp-devops
or organization-admins
).
To simplify setup, the previous stage pre-configures a valid providers file in its output, and optionally writes it to a local file if the outputs_location
variable is set to a valid path.
If you have set a valid value for outputs_location
in the bootstrap stage, simply link the relevant providers.tf
file from this stage's folder in the path you specified:
# `outputs_location` is set to `~/fast-config`
ln -s ~/fast-config/providers/02-networking-providers.tf .
If you have not configured outputs_location
in bootstrap, you can derive the providers file from that stage's outputs:
cd ../01-resman
terraform output -json providers | jq -r '.["02-networking"]' \
> ../02-networking/providers.tf
Variable configuration
There are two broad sets of variables you will need to fill in:
- variables shared by other stages (org id, billing account id, etc.), or derived from a resource managed by a different stage (folder id, automation project id, etc.)
- variables specific to resources managed by this stage
To avoid the tedious job of filling in the first group of variables with values derived from other stages' outputs, the same mechanism used above for the provider configuration can be used to leverage pre-configured .tfvars
files.
If you have set a valid value for outputs_location
in the bootstrap and in the resman stage, simply link the relevant terraform-*.auto.tfvars.json
files from this stage's folder in the path you specified, where the *
above is set to the name of the stage that produced it. For this stage, a single .tfvars
file is available:
# `outputs_location` is set to `~/fast-config`
ln -s ../../configs/example/02-networking/terraform-bootstrap.auto.tfvars.json
ln -s ../../configs/example/02-networking/terraform-resman.auto.tfvars.json
# also copy the tfvars file used for the bootstrap stage
cp ../00-bootstrap/terraform.tfvars .
A second set of variables is specific to this stage, they are all optional so if you need to customize them, add them to the file copied from bootstrap.
Please refer to the Variables table below for a map of the variable origins, and to the sections below on how to adapt this stage to your networking configuration.
VPCs
VPCs are defined in separate files, one for each of prod
and dev
.
Each file contains the same resources, described in the following paragraphs.
The project (project
) contains the VPC, and enables the required APIs and sets itself as a "host project".
The VPC (net-vpc
) manages the DNS inbound policy, explicit routes for {private,restricted}.googleapis.com
, and its subnets. Subnets are created leveraging a "resource factory" paradigm, where the configuration is separated from the module that implements it, and stored in a well-structured file. To add a new subnet, simply create a new file in the data_folder
directory defined in the module, following the examples found in the Fabric net-vpc
documentation. Sample subnets are shipped in data/subnets, and can be easily customised to fit your needs.
Subnets for L7 ILBs are handled differently, and defined in variable l7ilb_subnets
, while ranges for PSA are configured by variable psa_ranges
- such variables are consumed by spoke VPCs.
Cloud NAT (net-cloudnat
) manages the networking infrastructure required to enable internet egress.
VPNs
Connectivity to on-prem is implemented with HA VPN (net-vpn
) and defined in vpn-onprem-{dev,prod}.tf
. The files provisionally implement each a single logical connection between onprem and environment at europe-west1
, and the relevant parameters for its configuration are found in variable vpn_onprem_configs
.
Routing and BGP
Each VPC network (net-vpc
) manages a separate routing table, which can define static routes (e.g. to private.googleapis.com) and receives dynamic routes from BGP sessions established with neighbor networks (e.g. from onprem).
Static routes are defined in net-*.tf
files, in the routes
section of each net-vpc
module.
Firewall
VPC firewall rules (net-vpc-firewall
) are defined per-vpc on each net-*.tf
file and leverage a resource factory to massively create rules.
To add a new firewall rule, create a new file or edit an existing one in the data_folder
directory defined in the module net-vpc-firewall
, following the examples of the "Rules factory" section of the module documentation. Sample firewall rules are shipped in data/firewall-rules/dev and can be easily customised.
Hierarchical firewall policies (folder
) are defined in main.tf
, and managed through a policy factory implemented by the folder
module, which applies the defined hierarchical to the Networking
folder, which contains all the core networking infrastructure. Policies are defined in the rules_file
file - to define a new one simply use the instructions found on "Firewall policy factory". Sample hierarchical firewall policies are shipped in data/hierarchical-policy-rules.yaml and can be easily customised.
DNS architecture
The DNS (dns
) infrastructure is defined in the respective dns-xxx.tf
files.
Cloud DNS manages onprem forwarding and environment-specific zones (i.e. dev.gcp.example.com
and prod.gcp.example.com
).
Cloud to on-prem
Leveraging the forwarding zones defined on each environment, the cloud environment can resolve in-addr.arpa.
and onprem.example.com.
using the on-premises DNS infrastructure. Onprem resolvers IPs are set in variable dns
.
DNS queries sent to the on-premises infrastructure come from the 35.199.192.0/19
source range, which is only accessible from within a VPC or networks connected to one.
When implementing this architecture, make sure you'll be able to route packets coming from the /19 range to the right environment (route to prod requests coming from prod and to dev for requests coming from dev). As an alternative, consider leveraging self-managed DNS resolvers (e.g. CoreDNS forwarders) on each environment.
On-prem to cloud
The Inbound DNS Policy defined on eachVPC automatically reserves the first available IP address on each created subnet (typically the third one in a CIDR) to expose the Cloud DNS service so that it can be consumed from outside of GCP.
Private Google Access
Private Google Access (or PGA) enables VMs and on-prem systems to consume Google APIs from within the Google network, and is already fully configured on this environment.
For PGA to work:
-
Private Google Access should be enabled on the subnet.
Subnets created by thenet-vpc
module are PGA-enabled by default. -
199.36.153.4/30 (
restricted.googleapis.com
) and 199.36.153.8/30 (private.googleapis.com
) should be routed from on-prem to VPC, and from there to thedefault-internet-gateway
.
Per variablevpn_onprem_configs
such ranges are advertised to onprem - furthermore every VPC has explicit routes set in case the0.0.0.0/0
route is changed. -
A private DNS zone for
googleapis.com
should be created and configured per this article, as implemented in modulegoogleapis-private-zone
indns-xxx.tf
Preliminar activities
Before running terraform apply
on this stage, make sure to adapt all of variables.tf
to your needs, to update all reference to regions (e.g. europe-west1
or ew1
) in the whole directory to match your preferences.
If you're not using FAST, you'll also need to create a providers.tf
file to configure the GCS backend and the service account to use to run the deployment.
You're now ready to run terraform init
and apply
.
Post-deployment activities
- On-prem routers should be configured to advertise all relevant CIDRs to the GCP environments. To avoid hitting GCP quotas, we recomment aggregating routes as much as possible.
- On-prem routers should accept BGP sessions from their cloud peers.
- On-prem DNS servers should have forward zones for GCP-managed ones.
Files
name | description | modules | resources |
---|---|---|---|
dns-dev.tf | Development spoke DNS zones and peerings setup. | dns |
|
dns-prod.tf | Production spoke DNS zones and peerings setup. | dns |
|
main.tf | Networking folder and hierarchical policy. | folder |
|
monitoring.tf | Network monitoring dashboards. | google_monitoring_dashboard |
|
outputs.tf | Module outputs. | google_storage_bucket_object · local_file |
|
spoke-dev.tf | Dev spoke VPC and related resources. | net-cloudnat · net-vpc · net-vpc-firewall · project |
google_project_iam_binding |
spoke-prod.tf | Production spoke VPC and related resources. | net-cloudnat · net-vpc · net-vpc-firewall · project |
google_project_iam_binding |
test-resources.tf | Temporary instances for testing | compute-vm |
|
variables.tf | Module variables. | ||
vpn-onprem-dev.tf | VPN between dev and onprem. | net-vpn-ha |
|
vpn-onprem-prod.tf | VPN between prod and onprem. | net-vpn-ha |
Variables
name | description | type | required | default | producer |
---|---|---|---|---|---|
automation | Automation resources created by the bootstrap stage. | object({…}) |
✓ | 00-bootstrap |
|
billing_account | Billing account id and organization id ('nnnnnnnn' or null). | object({…}) |
✓ | 00-bootstrap |
|
folder_ids | Folders to be used for the networking resources in folders/nnnnnnnnnnn format. If null, folder will be created. | object({…}) |
✓ | 01-resman |
|
organization | Organization details. | object({…}) |
✓ | 00-bootstrap |
|
prefix | Prefix used for resources that need unique names. Use 9 characters or less. | string |
✓ | 00-bootstrap |
|
custom_adv | Custom advertisement definitions in name => range format. | map(string) |
{…} |
||
custom_roles | Custom roles defined at the org level, in key => id format. | object({…}) |
null |
00-bootstrap |
|
data_dir | Relative path for the folder storing configuration data for network resources. | string |
"data" |
||
dns | Onprem DNS resolvers. | map(list(string)) |
{…} |
||
l7ilb_subnets | Subnets used for L7 ILBs. | map(list(object({…}))) |
{…} |
||
outputs_location | Path where providers and tfvars files for the following stages are written. Leave empty to disable. | string |
null |
||
psa_ranges | IP ranges used for Private Service Access (e.g. CloudSQL). | object({…}) |
null |
||
router_onprem_configs | Configurations for routers used for onprem connectivity. | map(object({…})) |
{…} |
||
service_accounts | Automation service accounts in name => email format. | object({…}) |
null |
01-resman |
|
vpn_onprem_configs | VPN gateway configuration for onprem interconnection. | map(object({…})) |
{…} |
Outputs
name | description | sensitive | consumers |
---|---|---|---|
dev_cloud_dns_inbound_policy | IP Addresses for Cloud DNS inbound policy for the dev environment. | ||
host_project_ids | Network project ids. | ||
host_project_numbers | Network project numbers. | ||
prod_cloud_dns_inbound_policy | IP Addresses for Cloud DNS inbound policy for the prod environment. | ||
shared_vpc_self_links | Shared VPC host projects. | ||
tfvars | Terraform variables file for the following stages. | ✓ | |
vpn_gateway_endpoints | External IP Addresses for the GCP VPN gateways. |