Add Data Platform to FAST (#510)
* Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * merge tools changes * Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * add bolierplate to validate_schema Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com> * stage 02-security * Import Fast from dev repository. Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * Copy FAST top level README * Copy FAST top level README * TODO list * TODO list * fix linting action to account for fast * remove providers file * add missing boilerplate * update factory README * align examples tfdoc * fast readmes tfdoc * disable markdown link check * really disable markdown link check * update TODO * switch to local module refs in stage0 * replace module refs in 02-sec * Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * merge tools changes * Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * add bolierplate to validate_schema Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com> * Import Fast from dev repository. > > Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * stage 02-security * Import Fast from dev repository. Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> * Copy FAST top level README * Copy FAST top level README * TODO list * TODO list * fix linting action to account for fast * remove providers file * add missing boilerplate * update factory README * align examples tfdoc * fast readmes tfdoc * disable markdown link check * really disable markdown link check * update TODO * switch to local module refs in stage0 * replace module refs in 02-sec * Move first draft to fast branch * Fix roles and variables. Add e2e DAG example! * Fix example * Fix KMS * First draft: README * Update README * Add DLP, update README * Update Readme * README * Add todos * Merge master * Merge master * Merge master * Fix and test KMS, Fix and test existing prj (it works also with single prj), Update README * Fix READM and Demo * add on TF files * Remove block comments * simplify service_encryption_keys logic * fix README * Fix TODOs * fix tfdoc description * fix demo README * fix sample files * rename tf files * Fix outputs file name, fix README, remove dependeces on composer resource * Add test. * Fix README. * Initial README update * README review * Fix issues & readme * Fix README * Fix README * Fix test error * Fix test error * Add datacatalog * Fix test, for real? :-) * fix readme * support policy_boolean * split Cloud NAT flag * Fix README. * Fix Shared VPC, first try :-) * Fix tests and resource name * fix tests * fix tests * README refactor * Fix secondary range logic * First commit * Replace existing data platform * Fix secondary range logic * Fix README * Replace DP example tests with the new one. * Fix test module location. * Fix test module location, for real. * Support DataPlatform project in VPC-SC * Fix VPC-SC * Add TODO, VPC-SC * Possible improvement to handle VPC-SC perimeter projects with folder as variable * Add TODO * Fix module path * Initial fix for KMS * Add PubSub encryption * Fix secondary range logic * First commit * Support DataPlatform project in VPC-SC * Fix VPC-SC * Add TODO, VPC-SC * Possible improvement to handle VPC-SC perimeter projects with folder as variable * Add TODO * Fix module path * Initial fix for KMS * Update READMEs * Update README * Fix composer roles and README. * Fix test. * Fixes. * Add DLP documentation link. * Temp commit with errors * Refactor variables * Fix secondary range logic * First commit * Support DataPlatform project in VPC-SC * Fix VPC-SC * Add TODO, VPC-SC * Possible improvement to handle VPC-SC perimeter projects with folder as variable * Add TODO * Fix module path * Initial fix for KMS * rebase * rebase * rebase * Rebase * rebase * Update READMEs * Fixes. * Fix new variables * Fix misconfiguration and tests. * Fix secondary range logic * First commit * Support DataPlatform project in VPC-SC * Fix VPC-SC * Add TODO, VPC-SC * Possible improvement to handle VPC-SC perimeter projects with folder as variable * Add TODO * Fix module path * Initial fix for KMS * rebase * rebase * rebase * Rebase * rebase * Update READMEs * Fixes. * Rebase - Fix secondary range logic * Rebase - First commit * Support DataPlatform project in VPC-SC * Fix VPC-SC * Possible improvement to handle VPC-SC perimeter projects with folder as variable * Initial fix for KMS * Fix secondary range logic * First commit * Support DataPlatform project in VPC-SC * Fix VPC-SC * Fix module path * Initial fix for KMS * Update READMEs * Fixes. * Fix new variables * Revert VPC-SC logic * Fix variable typos * README fixes * Fix Project Name logic * Fix Linting * READEME * update READEME * update READEME * update README * mandatory project creation, refactor * formatting * add TODO for service accounts descriptive name * use project module to assign shared vpc roles * Fix shared-vpc-project module * Fix vpc name and tests * README * update to newer version Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com> Co-authored-by: Simone Ruffilli <sruffilli@google.com> Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com> Co-authored-by: Julio Castillo <jccb@google.com>
This commit is contained in:
parent
9076c2f2b0
commit
bf64a3dfda
|
@ -5,7 +5,7 @@ This section contains **[foundational examples](./foundations/)** that bootstrap
|
||||||
Currently available examples:
|
Currently available examples:
|
||||||
|
|
||||||
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management)
|
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management)
|
||||||
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/)
|
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/), [Data Platform Foundations](./data-solutions/data-platform-foundations/)
|
||||||
- **factories** - [The why and the how of resource factories](./factories/README.md)
|
- **factories** - [The why and the how of resource factories](./factories/README.md)
|
||||||
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
|
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
|
||||||
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [PSC for on-premises Cloud Function invocation](./networking/private-cloud-function-from-onprem/), [decentralized firewall](./networking/decentralized-firewall)
|
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [PSC for on-premises Cloud Function invocation](./networking/private-cloud-function-from-onprem/), [decentralized firewall](./networking/decentralized-firewall)
|
||||||
|
|
|
@ -18,6 +18,6 @@ They are meant to be used as minimal but complete starting points to create actu
|
||||||
|
|
||||||
### Data Platform Foundations
|
### Data Platform Foundations
|
||||||
|
|
||||||
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/02-resources/diagram.png" align="left" width="280px"></a>
|
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
|
||||||
This [example](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
|
This [example](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
|
||||||
<br clear="left">
|
<br clear="left">
|
||||||
|
|
|
@ -1,72 +0,0 @@
|
||||||
# Data Platform Foundations - Environment (Step 1)
|
|
||||||
|
|
||||||
This is the first step needed to deploy Data Platform Foundations, which creates projects and service accounts. Please refer to the [top-level Data Platform README](../README.md) for prerequisites.
|
|
||||||
|
|
||||||
The projects that will be created are:
|
|
||||||
|
|
||||||
- Common services
|
|
||||||
- Landing
|
|
||||||
- Orchestration & Transformation
|
|
||||||
- DWH
|
|
||||||
- Datamart
|
|
||||||
|
|
||||||
A main service account named `projects-editor-sa` will be created under the common services project, and it will be granted editor permissions on all the projects in scope.
|
|
||||||
|
|
||||||
This is a high level diagram of the created resources:
|
|
||||||
|
|
||||||
![Environment - Phase 1](./diagram.png "High-level Environment diagram")
|
|
||||||
|
|
||||||
## Running the example
|
|
||||||
|
|
||||||
To create the infrastructure:
|
|
||||||
|
|
||||||
- specify your variables in a `terraform.tvars`
|
|
||||||
|
|
||||||
```tfm
|
|
||||||
billing_account = "1234-1234-1234"
|
|
||||||
parent = "folders/12345678"
|
|
||||||
admins = ["user:xxxxx@yyyyy.com"]
|
|
||||||
```
|
|
||||||
|
|
||||||
- make sure you have the right authentication setup (application default credentials, or a service account key) with the right permissions
|
|
||||||
- **The output of this stage contains the values for the resources stage**
|
|
||||||
- the `admins` variable contain a list of principals allowed to impersonate the service accounts. These principals will be given the `iam.serviceAccountTokenCreator` role
|
|
||||||
- run `terraform init` and `terraform apply`
|
|
||||||
|
|
||||||
Once done testing, you can clean up resources by running `terraform destroy`.
|
|
||||||
|
|
||||||
### CMEK configuration
|
|
||||||
You can configure GCP resources to use existing CMEK keys configuring the 'service_encryption_key_ids' variable. You need to specify a 'global' and a 'multiregional' key.
|
|
||||||
|
|
||||||
### VPC-SC configuration
|
|
||||||
You can assign projects to an existing VPC-SC standard perimeter configuring the 'service_perimeter_standard' variable. You can retrieve the list of existing perimeters from the GCP console or using the following command:
|
|
||||||
|
|
||||||
'''
|
|
||||||
gcloud access-context-manager perimeters list --format="json" | grep name
|
|
||||||
'''
|
|
||||||
|
|
||||||
The script use 'google_access_context_manager_service_perimeter_resource' terraform resource. If this resource is used alongside the 'vpc-sc' module, remember to uncomment the lifecycle block in the 'vpc-sc' module so they don't fight over which resources should be in the perimeter.
|
|
||||||
<!-- BEGIN TFDOC -->
|
|
||||||
|
|
||||||
## Variables
|
|
||||||
|
|
||||||
| name | description | type | required | default |
|
|
||||||
|---|---|:---:|:---:|:---:|
|
|
||||||
| [billing_account_id](variables.tf#L21) | Billing account id. | <code>string</code> | ✓ | |
|
|
||||||
| [root_node](variables.tf#L50) | Parent folder or organization in 'folders/folder_id' or 'organizations/org_id' format. | <code>string</code> | ✓ | |
|
|
||||||
| [admins](variables.tf#L15) | List of users allowed to impersonate the service account. | <code>list(string)</code> | | <code>null</code> |
|
|
||||||
| [prefix](variables.tf#L26) | Prefix used to generate project id and name. | <code>string</code> | | <code>null</code> |
|
|
||||||
| [project_names](variables.tf#L32) | Override this variable if you need non-standard names. | <code title="object({ datamart = string dwh = string landing = string services = string transformation = string })">object({…})</code> | | <code title="{ datamart = "datamart" dwh = "datawh" landing = "landing" services = "services" transformation = "transformation" }">{…}</code> |
|
|
||||||
| [service_account_names](variables.tf#L55) | Override this variable if you need non-standard names. | <code title="object({ main = string })">object({…})</code> | | <code title="{ main = "data-platform-main" }">{…}</code> |
|
|
||||||
| [service_encryption_key_ids](variables.tf#L65) | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object({ multiregional = string global = string })">object({…})</code> | | <code title="{ multiregional = null global = null }">{…}</code> |
|
|
||||||
| [service_perimeter_standard](variables.tf#L78) | VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode. | <code>string</code> | | <code>null</code> |
|
|
||||||
|
|
||||||
## Outputs
|
|
||||||
|
|
||||||
| name | description | sensitive |
|
|
||||||
|---|---|:---:|
|
|
||||||
| [project_ids](outputs.tf#L17) | Project ids for created projects. | |
|
|
||||||
| [service_account](outputs.tf#L28) | Main service account. | |
|
|
||||||
| [service_encryption_key_ids](outputs.tf#L33) | Cloud KMS encryption keys in {LOCATION => [KEY_URL]} format. | |
|
|
||||||
|
|
||||||
<!-- END TFDOC -->
|
|
Binary file not shown.
Before Width: | Height: | Size: 275 KiB |
|
@ -1,162 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2020 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# projects #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "project-datamart" {
|
|
||||||
source = "../../../../modules/project"
|
|
||||||
parent = var.root_node
|
|
||||||
billing_account = var.billing_account_id
|
|
||||||
prefix = var.prefix
|
|
||||||
name = var.project_names.datamart
|
|
||||||
services = [
|
|
||||||
"bigquery.googleapis.com",
|
|
||||||
"bigquerystorage.googleapis.com",
|
|
||||||
"bigqueryreservation.googleapis.com",
|
|
||||||
"storage.googleapis.com",
|
|
||||||
"storage-component.googleapis.com",
|
|
||||||
]
|
|
||||||
|
|
||||||
iam_additive = {
|
|
||||||
"roles/owner" = [module.sa-services-main.iam_email]
|
|
||||||
}
|
|
||||||
service_encryption_key_ids = {
|
|
||||||
bq = [var.service_encryption_key_ids.multiregional]
|
|
||||||
storage = [var.service_encryption_key_ids.multiregional]
|
|
||||||
}
|
|
||||||
# If used, remember to uncomment 'lifecycle' block in the
|
|
||||||
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
|
|
||||||
service_perimeter_standard = var.service_perimeter_standard
|
|
||||||
}
|
|
||||||
|
|
||||||
module "project-dwh" {
|
|
||||||
source = "../../../../modules/project"
|
|
||||||
parent = var.root_node
|
|
||||||
billing_account = var.billing_account_id
|
|
||||||
prefix = var.prefix
|
|
||||||
name = var.project_names.dwh
|
|
||||||
services = [
|
|
||||||
"bigquery.googleapis.com",
|
|
||||||
"bigquerystorage.googleapis.com",
|
|
||||||
"bigqueryreservation.googleapis.com",
|
|
||||||
"storage.googleapis.com",
|
|
||||||
"storage-component.googleapis.com",
|
|
||||||
]
|
|
||||||
iam_additive = {
|
|
||||||
"roles/owner" = [module.sa-services-main.iam_email]
|
|
||||||
}
|
|
||||||
service_encryption_key_ids = {
|
|
||||||
bq = [var.service_encryption_key_ids.multiregional]
|
|
||||||
storage = [var.service_encryption_key_ids.multiregional]
|
|
||||||
}
|
|
||||||
# If used, remember to uncomment 'lifecycle' block in the
|
|
||||||
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
|
|
||||||
service_perimeter_standard = var.service_perimeter_standard
|
|
||||||
}
|
|
||||||
|
|
||||||
module "project-landing" {
|
|
||||||
source = "../../../../modules/project"
|
|
||||||
parent = var.root_node
|
|
||||||
billing_account = var.billing_account_id
|
|
||||||
prefix = var.prefix
|
|
||||||
name = var.project_names.landing
|
|
||||||
services = [
|
|
||||||
"pubsub.googleapis.com",
|
|
||||||
"storage.googleapis.com",
|
|
||||||
"storage-component.googleapis.com",
|
|
||||||
]
|
|
||||||
iam_additive = {
|
|
||||||
"roles/owner" = [module.sa-services-main.iam_email]
|
|
||||||
}
|
|
||||||
service_encryption_key_ids = {
|
|
||||||
pubsub = [var.service_encryption_key_ids.global]
|
|
||||||
storage = [var.service_encryption_key_ids.multiregional]
|
|
||||||
}
|
|
||||||
# If used, remember to uncomment 'lifecycle' block in the
|
|
||||||
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
|
|
||||||
service_perimeter_standard = var.service_perimeter_standard
|
|
||||||
}
|
|
||||||
|
|
||||||
module "project-services" {
|
|
||||||
source = "../../../../modules/project"
|
|
||||||
parent = var.root_node
|
|
||||||
billing_account = var.billing_account_id
|
|
||||||
prefix = var.prefix
|
|
||||||
name = var.project_names.services
|
|
||||||
services = [
|
|
||||||
"bigquery.googleapis.com",
|
|
||||||
"cloudresourcemanager.googleapis.com",
|
|
||||||
"iam.googleapis.com",
|
|
||||||
"pubsub.googleapis.com",
|
|
||||||
"storage.googleapis.com",
|
|
||||||
"storage-component.googleapis.com",
|
|
||||||
"sourcerepo.googleapis.com",
|
|
||||||
"stackdriver.googleapis.com",
|
|
||||||
"cloudasset.googleapis.com",
|
|
||||||
"cloudkms.googleapis.com"
|
|
||||||
]
|
|
||||||
iam_additive = {
|
|
||||||
"roles/owner" = [module.sa-services-main.iam_email]
|
|
||||||
}
|
|
||||||
service_encryption_key_ids = {
|
|
||||||
storage = [var.service_encryption_key_ids.multiregional]
|
|
||||||
}
|
|
||||||
# If used, remember to uncomment 'lifecycle' block in the
|
|
||||||
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
|
|
||||||
service_perimeter_standard = var.service_perimeter_standard
|
|
||||||
}
|
|
||||||
|
|
||||||
module "project-transformation" {
|
|
||||||
source = "../../../../modules/project"
|
|
||||||
parent = var.root_node
|
|
||||||
billing_account = var.billing_account_id
|
|
||||||
prefix = var.prefix
|
|
||||||
name = var.project_names.transformation
|
|
||||||
services = [
|
|
||||||
"bigquery.googleapis.com",
|
|
||||||
"cloudbuild.googleapis.com",
|
|
||||||
"compute.googleapis.com",
|
|
||||||
"dataflow.googleapis.com",
|
|
||||||
"servicenetworking.googleapis.com",
|
|
||||||
"storage.googleapis.com",
|
|
||||||
"storage-component.googleapis.com",
|
|
||||||
]
|
|
||||||
iam_additive = {
|
|
||||||
"roles/owner" = [module.sa-services-main.iam_email]
|
|
||||||
}
|
|
||||||
service_encryption_key_ids = {
|
|
||||||
compute = [var.service_encryption_key_ids.global]
|
|
||||||
storage = [var.service_encryption_key_ids.multiregional]
|
|
||||||
dataflow = [var.service_encryption_key_ids.global]
|
|
||||||
}
|
|
||||||
# If used, remember to uncomment 'lifecycle' block in the
|
|
||||||
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
|
|
||||||
service_perimeter_standard = var.service_perimeter_standard
|
|
||||||
}
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# service accounts #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "sa-services-main" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = module.project-services.project_id
|
|
||||||
name = var.service_account_names.main
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
|
|
||||||
}
|
|
|
@ -1,36 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2020 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
output "project_ids" {
|
|
||||||
description = "Project ids for created projects."
|
|
||||||
value = {
|
|
||||||
datamart = module.project-datamart.project_id
|
|
||||||
dwh = module.project-dwh.project_id
|
|
||||||
landing = module.project-landing.project_id
|
|
||||||
services = module.project-services.project_id
|
|
||||||
transformation = module.project-transformation.project_id
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
output "service_account" {
|
|
||||||
description = "Main service account."
|
|
||||||
value = module.sa-services-main.email
|
|
||||||
}
|
|
||||||
|
|
||||||
output "service_encryption_key_ids" {
|
|
||||||
description = "Cloud KMS encryption keys in {LOCATION => [KEY_URL]} format."
|
|
||||||
value = var.service_encryption_key_ids
|
|
||||||
}
|
|
|
@ -1,82 +0,0 @@
|
||||||
# Copyright 2020 Google LLC
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# https://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
|
|
||||||
variable "admins" {
|
|
||||||
description = "List of users allowed to impersonate the service account."
|
|
||||||
type = list(string)
|
|
||||||
default = null
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "billing_account_id" {
|
|
||||||
description = "Billing account id."
|
|
||||||
type = string
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "prefix" {
|
|
||||||
description = "Prefix used to generate project id and name."
|
|
||||||
type = string
|
|
||||||
default = null
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "project_names" {
|
|
||||||
description = "Override this variable if you need non-standard names."
|
|
||||||
type = object({
|
|
||||||
datamart = string
|
|
||||||
dwh = string
|
|
||||||
landing = string
|
|
||||||
services = string
|
|
||||||
transformation = string
|
|
||||||
})
|
|
||||||
default = {
|
|
||||||
datamart = "datamart"
|
|
||||||
dwh = "datawh"
|
|
||||||
landing = "landing"
|
|
||||||
services = "services"
|
|
||||||
transformation = "transformation"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "root_node" {
|
|
||||||
description = "Parent folder or organization in 'folders/folder_id' or 'organizations/org_id' format."
|
|
||||||
type = string
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "service_account_names" {
|
|
||||||
description = "Override this variable if you need non-standard names."
|
|
||||||
type = object({
|
|
||||||
main = string
|
|
||||||
})
|
|
||||||
default = {
|
|
||||||
main = "data-platform-main"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "service_encryption_key_ids" {
|
|
||||||
description = "Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project."
|
|
||||||
type = object({
|
|
||||||
multiregional = string
|
|
||||||
global = string
|
|
||||||
})
|
|
||||||
default = {
|
|
||||||
multiregional = null
|
|
||||||
global = null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
variable "service_perimeter_standard" {
|
|
||||||
description = "VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode."
|
|
||||||
type = string
|
|
||||||
default = null
|
|
||||||
}
|
|
|
@ -0,0 +1,139 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description land project and resources.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
land_orch_service_accounts = [
|
||||||
|
module.load-sa-df-0.iam_email, module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "land-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lnd"
|
||||||
|
group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/bigquery.dataEditor",
|
||||||
|
"roles/pubsub.editor",
|
||||||
|
"roles/storage.admin",
|
||||||
|
"roles/storage.objectViewer",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
iam = {
|
||||||
|
"roles/bigquery.dataEditor" = [module.land-sa-bq-0.iam_email]
|
||||||
|
"roles/bigquery.dataViewer" = local.land_orch_service_accounts
|
||||||
|
"roles/bigquery.jobUser" = [module.orch-sa-cmp-0.iam_email]
|
||||||
|
"roles/bigquery.user" = [module.load-sa-df-0.iam_email]
|
||||||
|
"roles/pubsub.publisher" = [module.land-sa-ps-0.iam_email]
|
||||||
|
"roles/pubsub.subscriber" = local.land_orch_service_accounts
|
||||||
|
"roles/storage.objectAdmin" = [module.load-sa-df-0.iam_email]
|
||||||
|
"roles/storage.objectCreator" = [module.land-sa-cs-0.iam_email]
|
||||||
|
"roles/storage.objectViewer" = [module.orch-sa-cmp-0.iam_email]
|
||||||
|
"roles/storage.admin" = [module.load-sa-df-0.iam_email]
|
||||||
|
}
|
||||||
|
services = concat(var.project_services, [
|
||||||
|
"bigquery.googleapis.com",
|
||||||
|
"bigqueryreservation.googleapis.com",
|
||||||
|
"bigquerystorage.googleapis.com",
|
||||||
|
"cloudkms.googleapis.com",
|
||||||
|
"pubsub.googleapis.com",
|
||||||
|
"storage.googleapis.com",
|
||||||
|
"storage-component.googleapis.com",
|
||||||
|
])
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
bq = [try(local.service_encryption_keys.bq, null)]
|
||||||
|
pubsub = [try(local.service_encryption_keys.pubsub, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cloud Storage
|
||||||
|
|
||||||
|
module "land-sa-cs-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lnd-cs-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [
|
||||||
|
local.groups_iam.data-engineers
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "land-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lnd-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
force_destroy = var.data_force_destroy
|
||||||
|
# retention_policy = {
|
||||||
|
# retention_period = 7776000 # 90 * 24 * 60 * 60
|
||||||
|
# is_locked = false
|
||||||
|
# }
|
||||||
|
}
|
||||||
|
|
||||||
|
# PubSub
|
||||||
|
|
||||||
|
module "land-sa-ps-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lnd-ps-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [
|
||||||
|
local.groups_iam.data-engineers
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "land-ps-0" {
|
||||||
|
source = "../../../modules/pubsub"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
name = "${var.prefix}-lnd-ps-0"
|
||||||
|
kms_key = try(local.service_encryption_keys.pubsub, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
# BigQuery
|
||||||
|
|
||||||
|
module "land-sa-bq-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lnd-bq-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "land-bq-0" {
|
||||||
|
source = "../../../modules/bigquery-dataset"
|
||||||
|
project_id = module.land-project.project_id
|
||||||
|
id = "${replace(var.prefix, "-", "_")}lnd_bq_0"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.bq, null)
|
||||||
|
}
|
|
@ -0,0 +1,148 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Load project and VPC.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
load_service_accounts = [
|
||||||
|
"serviceAccount:${module.load-project.service_accounts.robots.dataflow}",
|
||||||
|
module.load-sa-df-0.iam_email
|
||||||
|
]
|
||||||
|
load_subnet = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.subnet_self_links.orchestration
|
||||||
|
: values(module.load-vpc.0.subnet_self_links)[0]
|
||||||
|
)
|
||||||
|
load_vpc = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.network_self_link
|
||||||
|
: module.load-vpc.0.self_link
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Project
|
||||||
|
|
||||||
|
module "load-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "lod"
|
||||||
|
group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/compute.viewer",
|
||||||
|
"roles/dataflow.admin",
|
||||||
|
"roles/dataflow.developer",
|
||||||
|
"roles/viewer",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
iam = {
|
||||||
|
"roles/bigquery.jobUser" = [module.load-sa-df-0.iam_email]
|
||||||
|
"roles/dataflow.admin" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email, module.load-sa-df-0.iam_email
|
||||||
|
]
|
||||||
|
"roles/dataflow.worker" = [module.load-sa-df-0.iam_email]
|
||||||
|
"roles/storage.objectAdmin" = local.load_service_accounts
|
||||||
|
# TODO: these are needed on the shared VPC?
|
||||||
|
# "roles/compute.serviceAgent" = [
|
||||||
|
# "serviceAccount:${module.load-project.service_accounts.robots.compute}"
|
||||||
|
# ]
|
||||||
|
# "roles/dataflow.serviceAgent" = [
|
||||||
|
# "serviceAccount:${module.load-project.service_accounts.robots.dataflow}"
|
||||||
|
# ]
|
||||||
|
}
|
||||||
|
services = concat(var.project_services, [
|
||||||
|
"bigquery.googleapis.com",
|
||||||
|
"bigqueryreservation.googleapis.com",
|
||||||
|
"bigquerystorage.googleapis.com",
|
||||||
|
"cloudkms.googleapis.com",
|
||||||
|
"compute.googleapis.com",
|
||||||
|
"dataflow.googleapis.com",
|
||||||
|
"dlp.googleapis.com",
|
||||||
|
"pubsub.googleapis.com",
|
||||||
|
"servicenetworking.googleapis.com",
|
||||||
|
"storage.googleapis.com",
|
||||||
|
"storage-component.googleapis.com"
|
||||||
|
])
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
pubsub = [try(local.service_encryption_keys.pubsub, null)]
|
||||||
|
dataflow = [try(local.service_encryption_keys.dataflow, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
|
||||||
|
attach = true
|
||||||
|
host_project = local.shared_vpc_project
|
||||||
|
service_identity_iam = {}
|
||||||
|
# service_identity_iam = {
|
||||||
|
# "compute.networkUser" = ["dataflow"]
|
||||||
|
# }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "load-sa-df-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.load-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "load-df-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
|
||||||
|
"roles/iam.serviceAccountUser" = [module.orch-sa-cmp-0.iam_email]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "load-cs-df-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.load-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "load-cs-0"
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
# internal VPC resources
|
||||||
|
|
||||||
|
module "load-vpc" {
|
||||||
|
source = "../../../modules/net-vpc"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.load-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
subnets = [
|
||||||
|
{
|
||||||
|
ip_cidr_range = "10.10.0.0/24"
|
||||||
|
name = "default"
|
||||||
|
region = var.region
|
||||||
|
secondary_ip_range = {}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "load-vpc-firewall" {
|
||||||
|
source = "../../../modules/net-vpc-firewall"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.load-project.project_id
|
||||||
|
network = module.load-vpc.0.name
|
||||||
|
admin_ranges = ["10.10.0.0/24"]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "load-nat" {
|
||||||
|
source = "../../../modules/net-cloudnat"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.load-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
region = var.region
|
||||||
|
router_network = module.load-vpc.0.name
|
||||||
|
}
|
|
@ -1,83 +0,0 @@
|
||||||
# Data Platform Foundations - Resources (Step 2)
|
|
||||||
|
|
||||||
This is the second step needed to deploy Data Platform Foundations, which creates resources needed to store and process the data, in the projects created in the [previous step](../01-environment/README.md). Please refer to the [top-level README](../README.md) for prerequisites and how to run the first step.
|
|
||||||
|
|
||||||
![Data Foundation - Phase 2](./diagram.png "High-level diagram")
|
|
||||||
|
|
||||||
The resources that will be create in each project are:
|
|
||||||
|
|
||||||
- Common
|
|
||||||
- Landing
|
|
||||||
- [x] GCS
|
|
||||||
- [x] Pub/Sub
|
|
||||||
- Orchestration & Transformation
|
|
||||||
- [x] Dataflow
|
|
||||||
- DWH
|
|
||||||
- [x] Bigquery (L0/1/2)
|
|
||||||
- [x] GCS
|
|
||||||
- Datamart
|
|
||||||
- [x] Bigquery (views/table)
|
|
||||||
- [x] GCS
|
|
||||||
- [ ] BigTable
|
|
||||||
|
|
||||||
## Running the example
|
|
||||||
|
|
||||||
In the previous step, we created the environment (projects and service account) which we are going to use in this step.
|
|
||||||
|
|
||||||
To create the resources, copy the output of the environment step (**project_ids**) and paste it into the `terraform.tvars`:
|
|
||||||
|
|
||||||
- Specify your variables in a `terraform.tvars`, you can use the output from the environment stage
|
|
||||||
|
|
||||||
```tfm
|
|
||||||
project_ids = {
|
|
||||||
datamart = "datamart-project_id"
|
|
||||||
dwh = "dwh-project_id"
|
|
||||||
landing = "landing-project_id"
|
|
||||||
services = "services-project_id"
|
|
||||||
transformation = "transformation-project_id"
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
- The providers.tf file has been configured to impersonate the **main** service account
|
|
||||||
|
|
||||||
- To launch terraform:
|
|
||||||
```bash
|
|
||||||
terraform plan
|
|
||||||
terraform apply
|
|
||||||
```
|
|
||||||
Once done testing, you can clean up resources by running `terraform destroy`.
|
|
||||||
|
|
||||||
### CMEK configuration
|
|
||||||
You can configure GCP resources to use existing CMEK keys configuring the 'service_encryption_key_ids' variable. You need to specify a 'global' and a 'multiregional' key.
|
|
||||||
<!-- BEGIN TFDOC -->
|
|
||||||
|
|
||||||
## Variables
|
|
||||||
|
|
||||||
| name | description | type | required | default |
|
|
||||||
|---|---|:---:|:---:|:---:|
|
|
||||||
| [project_ids](variables.tf#L108) | Project IDs. | <code title="object({ datamart = string dwh = string landing = string services = string transformation = string })">object({…})</code> | ✓ | |
|
|
||||||
| [admins](variables.tf#L16) | List of users allowed to impersonate the service account. | <code>list(string)</code> | | <code>null</code> |
|
|
||||||
| [datamart_bq_datasets](variables.tf#L22) | Datamart Bigquery datasets. | <code title="map(object({ iam = map(list(string)) location = string }))">map(object({…}))</code> | | <code title="{ bq_datamart_dataset = { location = "EU" iam = { } } }">{…}</code> |
|
|
||||||
| [dwh_bq_datasets](variables.tf#L40) | DWH Bigquery datasets. | <code title="map(object({ location = string iam = map(list(string)) }))">map(object({…}))</code> | | <code title="{ bq_raw_dataset = { iam = {} location = "EU" } }">{…}</code> |
|
|
||||||
| [landing_buckets](variables.tf#L54) | List of landing buckets to create. | <code title="map(object({ location = string name = string }))">map(object({…}))</code> | | <code title="{ raw-data = { location = "EU" name = "raw-data" } data-schema = { location = "EU" name = "data-schema" } }">{…}</code> |
|
|
||||||
| [landing_pubsub](variables.tf#L72) | List of landing pubsub topics and subscriptions to create. | <code title="map(map(object({ iam = map(list(string)) labels = map(string) options = object({ ack_deadline_seconds = number message_retention_duration = number retain_acked_messages = bool expiration_policy_ttl = number }) })))">map(map(object({…})))</code> | | <code title="{ landing-1 = { sub1 = { iam = { } labels = {} options = null } sub2 = { iam = {} labels = {}, options = null }, } }">{…}</code> |
|
|
||||||
| [landing_service_account](variables.tf#L102) | landing service accounts list. | <code>string</code> | | <code>"sa-landing"</code> |
|
|
||||||
| [service_account_names](variables.tf#L119) | Project service accounts list. | <code title="object({ datamart = string dwh = string landing = string services = string transformation = string })">object({…})</code> | | <code title="{ datamart = "sa-datamart" dwh = "sa-datawh" landing = "sa-landing" services = "sa-services" transformation = "sa-transformation" }">{…}</code> |
|
|
||||||
| [service_encryption_key_ids](variables.tf#L137) | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object({ multiregional = string global = string })">object({…})</code> | | <code title="{ multiregional = null global = null }">{…}</code> |
|
|
||||||
| [transformation_buckets](variables.tf#L149) | List of transformation buckets to create. | <code title="map(object({ location = string name = string }))">map(object({…}))</code> | | <code title="{ temp = { location = "EU" name = "temp" }, templates = { location = "EU" name = "templates" }, }">{…}</code> |
|
|
||||||
| [transformation_subnets](variables.tf#L167) | List of subnets to create in the transformation Project. | <code title="list(object({ ip_cidr_range = string name = string region = string secondary_ip_range = map(string) }))">list(object({…}))</code> | | <code title="[ { ip_cidr_range = "10.1.0.0/20" name = "transformation-subnet" region = "europe-west3" secondary_ip_range = {} }, ]">[…]</code> |
|
|
||||||
| [transformation_vpc_name](variables.tf#L185) | Name of the VPC created in the transformation Project. | <code>string</code> | | <code>"transformation-vpc"</code> |
|
|
||||||
|
|
||||||
## Outputs
|
|
||||||
|
|
||||||
| name | description | sensitive |
|
|
||||||
|---|---|:---:|
|
|
||||||
| [datamart-datasets](outputs.tf#L17) | List of bigquery datasets created for the datamart project. | |
|
|
||||||
| [dwh-datasets](outputs.tf#L24) | List of bigquery datasets created for the dwh project. | |
|
|
||||||
| [landing-buckets](outputs.tf#L29) | List of buckets created for the landing project. | |
|
|
||||||
| [landing-pubsub](outputs.tf#L34) | List of pubsub topics and subscriptions created for the landing project. | |
|
|
||||||
| [transformation-buckets](outputs.tf#L44) | List of buckets created for the transformation project. | |
|
|
||||||
| [transformation-vpc](outputs.tf#L49) | Transformation VPC details. | |
|
|
||||||
|
|
||||||
<!-- END TFDOC -->
|
|
Binary file not shown.
Before Width: | Height: | Size: 470 KiB |
|
@ -1,211 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2020 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# IAM #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "datamart-sa" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = var.project_ids.datamart
|
|
||||||
name = var.service_account_names.datamart
|
|
||||||
iam_project_roles = {
|
|
||||||
"${var.project_ids.datamart}" = ["roles/editor"]
|
|
||||||
}
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
}
|
|
||||||
|
|
||||||
module "dwh-sa" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = var.project_ids.dwh
|
|
||||||
name = var.service_account_names.dwh
|
|
||||||
|
|
||||||
iam_project_roles = {
|
|
||||||
"${var.project_ids.dwh}" = ["roles/bigquery.admin"]
|
|
||||||
}
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
}
|
|
||||||
|
|
||||||
module "landing-sa" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = var.project_ids.landing
|
|
||||||
name = var.service_account_names.landing
|
|
||||||
iam_project_roles = {
|
|
||||||
"${var.project_ids.landing}" = [
|
|
||||||
"roles/pubsub.publisher",
|
|
||||||
"roles/storage.objectCreator"]
|
|
||||||
}
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
}
|
|
||||||
|
|
||||||
module "services-sa" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = var.project_ids.services
|
|
||||||
name = var.service_account_names.services
|
|
||||||
iam_project_roles = {
|
|
||||||
"${var.project_ids.services}" = ["roles/editor"]
|
|
||||||
}
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
}
|
|
||||||
|
|
||||||
module "transformation-sa" {
|
|
||||||
source = "../../../../modules/iam-service-account"
|
|
||||||
project_id = var.project_ids.transformation
|
|
||||||
name = var.service_account_names.transformation
|
|
||||||
iam_project_roles = {
|
|
||||||
"${var.project_ids.transformation}" = [
|
|
||||||
"roles/logging.logWriter",
|
|
||||||
"roles/monitoring.metricWriter",
|
|
||||||
"roles/dataflow.admin",
|
|
||||||
"roles/iam.serviceAccountUser",
|
|
||||||
"roles/bigquery.dataOwner",
|
|
||||||
"roles/bigquery.jobUser",
|
|
||||||
"roles/dataflow.worker",
|
|
||||||
"roles/bigquery.metadataViewer",
|
|
||||||
"roles/storage.objectViewer",
|
|
||||||
],
|
|
||||||
"${var.project_ids.landing}" = [
|
|
||||||
"roles/storage.objectViewer",
|
|
||||||
],
|
|
||||||
"${var.project_ids.dwh}" = [
|
|
||||||
"roles/bigquery.dataOwner",
|
|
||||||
"roles/bigquery.jobUser",
|
|
||||||
"roles/bigquery.metadataViewer",
|
|
||||||
]
|
|
||||||
}
|
|
||||||
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
|
|
||||||
}
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# GCS #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "landing-buckets" {
|
|
||||||
source = "../../../../modules/gcs"
|
|
||||||
for_each = var.landing_buckets
|
|
||||||
project_id = var.project_ids.landing
|
|
||||||
prefix = var.project_ids.landing
|
|
||||||
name = each.value.name
|
|
||||||
location = each.value.location
|
|
||||||
iam = {
|
|
||||||
"roles/storage.objectCreator" = [module.landing-sa.iam_email]
|
|
||||||
"roles/storage.admin" = [module.transformation-sa.iam_email]
|
|
||||||
}
|
|
||||||
encryption_key = var.service_encryption_key_ids.multiregional
|
|
||||||
}
|
|
||||||
|
|
||||||
module "transformation-buckets" {
|
|
||||||
source = "../../../../modules/gcs"
|
|
||||||
for_each = var.transformation_buckets
|
|
||||||
project_id = var.project_ids.transformation
|
|
||||||
prefix = var.project_ids.transformation
|
|
||||||
name = each.value.name
|
|
||||||
location = each.value.location
|
|
||||||
iam = {
|
|
||||||
"roles/storage.admin" = [module.transformation-sa.iam_email]
|
|
||||||
}
|
|
||||||
encryption_key = var.service_encryption_key_ids.multiregional
|
|
||||||
}
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# Bigquery #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "datamart-bq" {
|
|
||||||
source = "../../../../modules/bigquery-dataset"
|
|
||||||
for_each = var.datamart_bq_datasets
|
|
||||||
project_id = var.project_ids.datamart
|
|
||||||
id = each.key
|
|
||||||
location = each.value.location
|
|
||||||
iam = {
|
|
||||||
for k, v in each.value.iam : k => (
|
|
||||||
k == "roles/bigquery.dataOwner"
|
|
||||||
? concat(v, [module.datamart-sa.iam_email])
|
|
||||||
: v
|
|
||||||
)
|
|
||||||
}
|
|
||||||
encryption_key = var.service_encryption_key_ids.multiregional
|
|
||||||
}
|
|
||||||
|
|
||||||
module "dwh-bq" {
|
|
||||||
source = "../../../../modules/bigquery-dataset"
|
|
||||||
for_each = var.dwh_bq_datasets
|
|
||||||
project_id = var.project_ids.dwh
|
|
||||||
id = each.key
|
|
||||||
location = each.value.location
|
|
||||||
iam = {
|
|
||||||
for k, v in each.value.iam : k => (
|
|
||||||
k == "roles/bigquery.dataOwner"
|
|
||||||
? concat(v, [module.dwh-sa.iam_email])
|
|
||||||
: v
|
|
||||||
)
|
|
||||||
}
|
|
||||||
encryption_key = var.service_encryption_key_ids.multiregional
|
|
||||||
}
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# Network #
|
|
||||||
###############################################################################
|
|
||||||
module "vpc-transformation" {
|
|
||||||
source = "../../../../modules/net-vpc"
|
|
||||||
project_id = var.project_ids.transformation
|
|
||||||
name = var.transformation_vpc_name
|
|
||||||
subnets = var.transformation_subnets
|
|
||||||
}
|
|
||||||
|
|
||||||
module "firewall" {
|
|
||||||
source = "../../../../modules/net-vpc-firewall"
|
|
||||||
project_id = var.project_ids.transformation
|
|
||||||
network = module.vpc-transformation.name
|
|
||||||
admin_ranges = []
|
|
||||||
http_source_ranges = []
|
|
||||||
https_source_ranges = []
|
|
||||||
ssh_source_ranges = []
|
|
||||||
|
|
||||||
custom_rules = {
|
|
||||||
iap-svc = {
|
|
||||||
description = "Dataflow service."
|
|
||||||
direction = "INGRESS"
|
|
||||||
action = "allow"
|
|
||||||
sources = ["dataflow"]
|
|
||||||
targets = ["dataflow"]
|
|
||||||
ranges = []
|
|
||||||
use_service_accounts = false
|
|
||||||
rules = [{ protocol = "tcp", ports = ["12345-12346"] }]
|
|
||||||
extra_attributes = {}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
###############################################################################
|
|
||||||
# Pub/Sub #
|
|
||||||
###############################################################################
|
|
||||||
|
|
||||||
module "landing-pubsub" {
|
|
||||||
source = "../../../../modules/pubsub"
|
|
||||||
for_each = var.landing_pubsub
|
|
||||||
project_id = var.project_ids.landing
|
|
||||||
name = each.key
|
|
||||||
subscriptions = {
|
|
||||||
for k, v in each.value : k => { labels = v.labels, options = v.options }
|
|
||||||
}
|
|
||||||
subscription_iam = {
|
|
||||||
for k, v in each.value : k => merge(v.iam, {
|
|
||||||
"roles/pubsub.subscriber" = [module.transformation-sa.iam_email]
|
|
||||||
})
|
|
||||||
}
|
|
||||||
kms_key = var.service_encryption_key_ids.global
|
|
||||||
}
|
|
|
@ -1,60 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2020 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
output "datamart-datasets" {
|
|
||||||
description = "List of bigquery datasets created for the datamart project."
|
|
||||||
value = [
|
|
||||||
for k, datasets in module.datamart-bq : datasets.dataset_id
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
output "dwh-datasets" {
|
|
||||||
description = "List of bigquery datasets created for the dwh project."
|
|
||||||
value = [for k, datasets in module.dwh-bq : datasets.dataset_id]
|
|
||||||
}
|
|
||||||
|
|
||||||
output "landing-buckets" {
|
|
||||||
description = "List of buckets created for the landing project."
|
|
||||||
value = [for k, bucket in module.landing-buckets : bucket.name]
|
|
||||||
}
|
|
||||||
|
|
||||||
output "landing-pubsub" {
|
|
||||||
description = "List of pubsub topics and subscriptions created for the landing project."
|
|
||||||
value = {
|
|
||||||
for t in module.landing-pubsub : t.topic.name => {
|
|
||||||
id = t.topic.id
|
|
||||||
subscriptions = { for s in t.subscriptions : s.name => s.id }
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
output "transformation-buckets" {
|
|
||||||
description = "List of buckets created for the transformation project."
|
|
||||||
value = [for k, bucket in module.transformation-buckets : bucket.name]
|
|
||||||
}
|
|
||||||
|
|
||||||
output "transformation-vpc" {
|
|
||||||
description = "Transformation VPC details."
|
|
||||||
value = {
|
|
||||||
name = module.vpc-transformation.name
|
|
||||||
subnets = {
|
|
||||||
for k, s in module.vpc-transformation.subnets : k => {
|
|
||||||
ip_cidr_range = s.ip_cidr_range
|
|
||||||
region = s.region
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
|
@ -1,23 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2022 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
provider "google" {
|
|
||||||
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
|
|
||||||
}
|
|
||||||
|
|
||||||
provider "google-beta" {
|
|
||||||
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
|
|
||||||
}
|
|
|
@ -1,189 +0,0 @@
|
||||||
# Copyright 2020 Google LLC
|
|
||||||
#
|
|
||||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
# you may not use this file except in compliance with the License.
|
|
||||||
# You may obtain a copy of the License at
|
|
||||||
#
|
|
||||||
# https://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
#
|
|
||||||
# Unless required by applicable law or agreed to in writing, software
|
|
||||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
# See the License for the specific language governing permissions and
|
|
||||||
# limitations under the License.
|
|
||||||
|
|
||||||
|
|
||||||
variable "admins" {
|
|
||||||
description = "List of users allowed to impersonate the service account."
|
|
||||||
type = list(string)
|
|
||||||
default = null
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "datamart_bq_datasets" {
|
|
||||||
description = "Datamart Bigquery datasets."
|
|
||||||
type = map(object({
|
|
||||||
iam = map(list(string))
|
|
||||||
location = string
|
|
||||||
}))
|
|
||||||
default = {
|
|
||||||
bq_datamart_dataset = {
|
|
||||||
location = "EU"
|
|
||||||
iam = {
|
|
||||||
# "roles/bigquery.dataOwner" = []
|
|
||||||
# "roles/bigquery.dataEditor" = []
|
|
||||||
# "roles/bigquery.dataViewer" = []
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "dwh_bq_datasets" {
|
|
||||||
description = "DWH Bigquery datasets."
|
|
||||||
type = map(object({
|
|
||||||
location = string
|
|
||||||
iam = map(list(string))
|
|
||||||
}))
|
|
||||||
default = {
|
|
||||||
bq_raw_dataset = {
|
|
||||||
iam = {}
|
|
||||||
location = "EU"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "landing_buckets" {
|
|
||||||
description = "List of landing buckets to create."
|
|
||||||
type = map(object({
|
|
||||||
location = string
|
|
||||||
name = string
|
|
||||||
}))
|
|
||||||
default = {
|
|
||||||
raw-data = {
|
|
||||||
location = "EU"
|
|
||||||
name = "raw-data"
|
|
||||||
}
|
|
||||||
data-schema = {
|
|
||||||
location = "EU"
|
|
||||||
name = "data-schema"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "landing_pubsub" {
|
|
||||||
description = "List of landing pubsub topics and subscriptions to create."
|
|
||||||
type = map(map(object({
|
|
||||||
iam = map(list(string))
|
|
||||||
labels = map(string)
|
|
||||||
options = object({
|
|
||||||
ack_deadline_seconds = number
|
|
||||||
message_retention_duration = number
|
|
||||||
retain_acked_messages = bool
|
|
||||||
expiration_policy_ttl = number
|
|
||||||
})
|
|
||||||
})))
|
|
||||||
default = {
|
|
||||||
landing-1 = {
|
|
||||||
sub1 = {
|
|
||||||
iam = {
|
|
||||||
# "roles/pubsub.subscriber" = []
|
|
||||||
}
|
|
||||||
labels = {}
|
|
||||||
options = null
|
|
||||||
}
|
|
||||||
sub2 = {
|
|
||||||
iam = {}
|
|
||||||
labels = {},
|
|
||||||
options = null
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "landing_service_account" {
|
|
||||||
description = "landing service accounts list."
|
|
||||||
type = string
|
|
||||||
default = "sa-landing"
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "project_ids" {
|
|
||||||
description = "Project IDs."
|
|
||||||
type = object({
|
|
||||||
datamart = string
|
|
||||||
dwh = string
|
|
||||||
landing = string
|
|
||||||
services = string
|
|
||||||
transformation = string
|
|
||||||
})
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "service_account_names" {
|
|
||||||
description = "Project service accounts list."
|
|
||||||
type = object({
|
|
||||||
datamart = string
|
|
||||||
dwh = string
|
|
||||||
landing = string
|
|
||||||
services = string
|
|
||||||
transformation = string
|
|
||||||
})
|
|
||||||
default = {
|
|
||||||
datamart = "sa-datamart"
|
|
||||||
dwh = "sa-datawh"
|
|
||||||
landing = "sa-landing"
|
|
||||||
services = "sa-services"
|
|
||||||
transformation = "sa-transformation"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "service_encryption_key_ids" {
|
|
||||||
description = "Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project."
|
|
||||||
type = object({
|
|
||||||
multiregional = string
|
|
||||||
global = string
|
|
||||||
})
|
|
||||||
default = {
|
|
||||||
multiregional = null
|
|
||||||
global = null
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "transformation_buckets" {
|
|
||||||
description = "List of transformation buckets to create."
|
|
||||||
type = map(object({
|
|
||||||
location = string
|
|
||||||
name = string
|
|
||||||
}))
|
|
||||||
default = {
|
|
||||||
temp = {
|
|
||||||
location = "EU"
|
|
||||||
name = "temp"
|
|
||||||
},
|
|
||||||
templates = {
|
|
||||||
location = "EU"
|
|
||||||
name = "templates"
|
|
||||||
},
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "transformation_subnets" {
|
|
||||||
description = "List of subnets to create in the transformation Project."
|
|
||||||
type = list(object({
|
|
||||||
ip_cidr_range = string
|
|
||||||
name = string
|
|
||||||
region = string
|
|
||||||
secondary_ip_range = map(string)
|
|
||||||
}))
|
|
||||||
default = [
|
|
||||||
{
|
|
||||||
ip_cidr_range = "10.1.0.0/20"
|
|
||||||
name = "transformation-subnet"
|
|
||||||
region = "europe-west3"
|
|
||||||
secondary_ip_range = {}
|
|
||||||
},
|
|
||||||
]
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "transformation_vpc_name" {
|
|
||||||
description = "Name of the VPC created in the transformation Project."
|
|
||||||
type = string
|
|
||||||
default = "transformation-vpc"
|
|
||||||
}
|
|
|
@ -0,0 +1,121 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Orchestration Cloud Composer definition.
|
||||||
|
|
||||||
|
module "orch-sa-cmp-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.orch-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "orc-cmp-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
|
||||||
|
"roles/iam.serviceAccountUser" = [module.orch-sa-cmp-0.iam_email]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "google_composer_environment" "orch-cmp-0" {
|
||||||
|
provider = google-beta
|
||||||
|
project = module.orch-project.project_id
|
||||||
|
name = "${var.prefix}-orc-cmp-0"
|
||||||
|
region = var.region
|
||||||
|
config {
|
||||||
|
node_count = var.composer_config.node_count
|
||||||
|
node_config {
|
||||||
|
zone = "${var.region}-b"
|
||||||
|
service_account = module.orch-sa-cmp-0.email
|
||||||
|
network = local.orch_vpc
|
||||||
|
subnetwork = local.orch_subnet
|
||||||
|
tags = ["composer-worker", "http-server", "https-server"]
|
||||||
|
ip_allocation_policy {
|
||||||
|
use_ip_aliases = "true"
|
||||||
|
cluster_secondary_range_name = try(
|
||||||
|
var.network_config.composer_secondary_ranges.pods, "pods"
|
||||||
|
)
|
||||||
|
services_secondary_range_name = try(
|
||||||
|
var.network_config.composer_secondary_ranges.services, "services"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
software_config {
|
||||||
|
image_version = var.composer_config.airflow_version
|
||||||
|
env_variables = merge(
|
||||||
|
var.composer_config.env_variables, {
|
||||||
|
DTL_L0_PRJ = module.lake-0-project.project_id
|
||||||
|
DTL_L0_BQ_DATASET = module.lake-0-bq-0.dataset_id
|
||||||
|
DTL_L0_GCS = module.lake-0-cs-0.url
|
||||||
|
DTL_L1_PRJ = module.lake-1-project.project_id
|
||||||
|
DTL_L1_BQ_DATASET = module.lake-1-bq-0.dataset_id
|
||||||
|
DTL_L1_GCS = module.lake-1-cs-0.url
|
||||||
|
DTL_L2_PRJ = module.lake-2-project.project_id
|
||||||
|
DTL_L2_BQ_DATASET = module.lake-2-bq-0.dataset_id
|
||||||
|
DTL_L2_GCS = module.lake-2-cs-0.url
|
||||||
|
DTL_PLG_PRJ = module.lake-plg-project.project_id
|
||||||
|
DTL_PLG_BQ_DATASET = module.lake-plg-bq-0.dataset_id
|
||||||
|
DTL_PLG_GCS = module.lake-plg-cs-0.url
|
||||||
|
GCP_REGION = var.region
|
||||||
|
LND_PRJ = module.land-project.project_id
|
||||||
|
LND_BQ = module.land-bq-0.dataset_id
|
||||||
|
LND_GCS = module.land-cs-0.url
|
||||||
|
LND_PS = module.land-ps-0.id
|
||||||
|
LOD_PRJ = module.load-project.project_id
|
||||||
|
LOD_GCS_STAGING = module.load-cs-df-0.url
|
||||||
|
LOD_NET_VPC = local.load_vpc
|
||||||
|
LOD_NET_SUBNET = local.load_subnet
|
||||||
|
LOD_SA_DF = module.load-sa-df-0.email
|
||||||
|
ORC_PRJ = module.orch-project.project_id
|
||||||
|
ORC_GCS = module.orch-cs-0.url
|
||||||
|
TRF_PRJ = module.transf-project.project_id
|
||||||
|
TRF_GCS_STAGING = module.transf-cs-df-0.url
|
||||||
|
TRF_NET_VPC = local.transf_vpc
|
||||||
|
TRF_NET_SUBNET = local.transf_subnet
|
||||||
|
TRF_SA_DF = module.transf-sa-df-0.email
|
||||||
|
TRF_SA_BQ = module.transf-sa-bq-0.email
|
||||||
|
}
|
||||||
|
)
|
||||||
|
}
|
||||||
|
private_environment_config {
|
||||||
|
enable_private_endpoint = "true"
|
||||||
|
cloud_sql_ipv4_cidr_block = try(
|
||||||
|
var.network_config.composer_ip_ranges.cloudsql, "10.20.10.0/24"
|
||||||
|
)
|
||||||
|
master_ipv4_cidr_block = try(
|
||||||
|
var.network_config.composer_ip_ranges.gke_master, "10.20.11.0/28"
|
||||||
|
)
|
||||||
|
web_server_ipv4_cidr_block = try(
|
||||||
|
var.network_config.composer_ip_ranges.web_server, "10.20.11.16/28"
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
dynamic "encryption_config" {
|
||||||
|
for_each = (
|
||||||
|
try(local.service_encryption_keys.composer != null, false)
|
||||||
|
? { 1 = 1 }
|
||||||
|
: {}
|
||||||
|
)
|
||||||
|
content {
|
||||||
|
kms_key_name = try(local.service_encryption_keys.composer, null)
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# web_server_network_access_control {
|
||||||
|
# allowed_ip_range {
|
||||||
|
# value = "172.16.0.0/12"
|
||||||
|
# description = "Allowed ip range"
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,168 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Orchestration project and VPC.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
orch_subnet = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.subnet_self_links.orchestration
|
||||||
|
: values(module.orch-vpc.0.subnet_self_links)[0]
|
||||||
|
)
|
||||||
|
orch_vpc = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.network_self_link
|
||||||
|
: module.orch-vpc.0.self_link
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "orch-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "orc"
|
||||||
|
group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/bigquery.dataEditor",
|
||||||
|
"roles/bigquery.jobUser",
|
||||||
|
"roles/cloudbuild.builds.editor",
|
||||||
|
"roles/composer.admin",
|
||||||
|
"roles/composer.environmentAndStorageObjectAdmin",
|
||||||
|
"roles/iap.httpsResourceAccessor",
|
||||||
|
"roles/iam.serviceAccountUser",
|
||||||
|
"roles/compute.networkUser",
|
||||||
|
"roles/storage.objectAdmin",
|
||||||
|
"roles/storage.admin",
|
||||||
|
"roles/compute.networkUser"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
iam = {
|
||||||
|
"roles/bigquery.dataEditor" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/bigquery.jobUser" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/composer.worker" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
"roles/iam.serviceAccountUser" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
"roles/storage.objectAdmin" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
"serviceAccount:${module.orch-project.service_accounts.robots.composer}",
|
||||||
|
]
|
||||||
|
"roles/storage.admin" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
oslogin = false
|
||||||
|
policy_boolean = {
|
||||||
|
"constraints/compute.requireOsLogin" = false
|
||||||
|
}
|
||||||
|
services = concat(var.project_services, [
|
||||||
|
"artifactregistry.googleapis.com",
|
||||||
|
"bigquery.googleapis.com",
|
||||||
|
"bigqueryreservation.googleapis.com",
|
||||||
|
"bigquerystorage.googleapis.com",
|
||||||
|
"cloudbuild.googleapis.com",
|
||||||
|
"cloudkms.googleapis.com",
|
||||||
|
"composer.googleapis.com",
|
||||||
|
"compute.googleapis.com",
|
||||||
|
"container.googleapis.com",
|
||||||
|
"containerregistry.googleapis.com",
|
||||||
|
"dataflow.googleapis.com",
|
||||||
|
"pubsub.googleapis.com",
|
||||||
|
"servicenetworking.googleapis.com",
|
||||||
|
"storage.googleapis.com",
|
||||||
|
"storage-component.googleapis.com"
|
||||||
|
])
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
composer = [try(local.service_encryption_keys.composer, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
|
||||||
|
attach = true
|
||||||
|
host_project = local.shared_vpc_project
|
||||||
|
service_identity_iam = {}
|
||||||
|
# service_identity_iam = {
|
||||||
|
# "roles/composer.sharedVpcAgent" = [
|
||||||
|
# "composer"
|
||||||
|
# ]
|
||||||
|
# "roles/compute.networkUser" = [
|
||||||
|
# "cloudservices", "container-engine", "dataflow"
|
||||||
|
# ]
|
||||||
|
# "roles/container.hostServiceAgentUser" = [
|
||||||
|
# "container-engine"
|
||||||
|
# ]
|
||||||
|
# }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cloud Storage
|
||||||
|
|
||||||
|
module "orch-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.orch-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "orc-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
# internal VPC resources
|
||||||
|
|
||||||
|
module "orch-vpc" {
|
||||||
|
source = "../../../modules/net-vpc"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.orch-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
subnets = [
|
||||||
|
{
|
||||||
|
ip_cidr_range = "10.10.0.0/24"
|
||||||
|
name = "default"
|
||||||
|
region = var.region
|
||||||
|
secondary_ip_range = {
|
||||||
|
pods = "10.10.8.0/22"
|
||||||
|
services = "10.10.12.0/24"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "orch-vpc-firewall" {
|
||||||
|
source = "../../../modules/net-vpc-firewall"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.orch-project.project_id
|
||||||
|
network = module.orch-vpc.0.name
|
||||||
|
admin_ranges = ["10.10.0.0/24"]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "orch-nat" {
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
source = "../../../modules/net-cloudnat"
|
||||||
|
project_id = module.orch-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
region = var.region
|
||||||
|
router_network = module.orch-vpc.0.name
|
||||||
|
}
|
|
@ -1,8 +0,0 @@
|
||||||
# Manual pipeline Example
|
|
||||||
|
|
||||||
Once you deployed projects [step 1](../01-environment/README.md) and resources [step 2](../02-resources/README.md) you can use it to run your data pipeline.
|
|
||||||
|
|
||||||
Here we will demo 2 pipelines:
|
|
||||||
|
|
||||||
* [GCS to Bigquery](./gcs_to_bigquery.md)
|
|
||||||
* [PubSub to Bigquery](./pubsub_to_bigquery.md)
|
|
|
@ -1,140 +0,0 @@
|
||||||
# Manual pipeline Example: GCS to Bigquery
|
|
||||||
|
|
||||||
In this example we will publish person message in the following format:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
name,surname,1617898199
|
|
||||||
```
|
|
||||||
|
|
||||||
A Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project.
|
|
||||||
|
|
||||||
[TODO] An autorized view will be created in the datamart project to expose the table.
|
|
||||||
[TODO] Further automation is expected in future.
|
|
||||||
|
|
||||||
## Set up the env vars
|
|
||||||
```bash
|
|
||||||
export DWH_PROJECT_ID=**dwh_project_id**
|
|
||||||
export LANDING_PROJECT_ID=**landing_project_id**
|
|
||||||
export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
|
|
||||||
```
|
|
||||||
|
|
||||||
## Create BQ table
|
|
||||||
Those steps should be done as DWH Service Account.
|
|
||||||
|
|
||||||
You can run the command to create a table:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
|
|
||||||
alpha bq tables create person \
|
|
||||||
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
|
|
||||||
--description "This is a Test Person table" \
|
|
||||||
--schema name=STRING,surname=STRING,timestamp=TIMESTAMP
|
|
||||||
```
|
|
||||||
|
|
||||||
## Produce CSV data file, JSON schema file and UDF JS file
|
|
||||||
|
|
||||||
Those steps should be done as landing Service Account:
|
|
||||||
|
|
||||||
Let's now create a series of messages we can use to import:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
for i in {0..10}
|
|
||||||
do
|
|
||||||
echo "Lorenzo,Caggioni,$(date +%s)" >> person.csv
|
|
||||||
done
|
|
||||||
```
|
|
||||||
|
|
||||||
and copy files to the GCS bucket:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person.csv gs://$LANDING_PROJECT_ID-eu-raw-data
|
|
||||||
```
|
|
||||||
|
|
||||||
Let's create the data JSON schema:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cat <<'EOF' >> person_schema.json
|
|
||||||
{
|
|
||||||
"BigQuery Schema": [
|
|
||||||
{
|
|
||||||
"name": "name",
|
|
||||||
"type": "STRING"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "surname",
|
|
||||||
"type": "STRING"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "timestamp",
|
|
||||||
"type": "TIMESTAMP"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
and copy files to the GCS bucket:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_schema.json gs://$LANDING_PROJECT_ID-eu-data-schema
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
Let's create the data UDF function to transform message data:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cat <<'EOF' >> person_udf.js
|
|
||||||
function transform(line) {
|
|
||||||
var values = line.split(',');
|
|
||||||
|
|
||||||
var obj = new Object();
|
|
||||||
obj.name = values[0];
|
|
||||||
obj.surname = values[1];
|
|
||||||
obj.timestamp = values[2];
|
|
||||||
var jsonString = JSON.stringify(obj);
|
|
||||||
|
|
||||||
return jsonString;
|
|
||||||
}
|
|
||||||
EOF
|
|
||||||
```
|
|
||||||
|
|
||||||
and copy files to the GCS bucket:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_udf.js gs://$LANDING_PROJECT_ID-eu-data-schema
|
|
||||||
```
|
|
||||||
|
|
||||||
if you want to check files copied to GCS, you can use the Transformation service account:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-raw-data
|
|
||||||
gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-data-schema
|
|
||||||
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dataflow
|
|
||||||
|
|
||||||
Those steps should be done as transformation Service Account.
|
|
||||||
|
|
||||||
|
|
||||||
Let's than start a Dataflow batch pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com dataflow jobs run test_batch_01 \
|
|
||||||
--gcs-location gs://dataflow-templates/latest/GCS_Text_to_BigQuery \
|
|
||||||
--project $TRANSFORMATION_PROJECT_ID \
|
|
||||||
--region europe-west3 \
|
|
||||||
--disable-public-ips \
|
|
||||||
--network transformation-vpc \
|
|
||||||
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \
|
|
||||||
--staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
|
|
||||||
--service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
|
|
||||||
--parameters \
|
|
||||||
javascriptTextTransformFunctionName=transform,\
|
|
||||||
JSONPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_schema.json,\
|
|
||||||
javascriptTextTransformGcsPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_udf.js,\
|
|
||||||
inputFilePattern=gs://$LANDING_PROJECT_ID-eu-raw-data/person.csv,\
|
|
||||||
outputTable=$DWH_PROJECT_ID:bq_raw_dataset.person,\
|
|
||||||
bigQueryLoadingTemporaryDirectory=gs://$TRANSFORMATION_PROJECT_ID-eu-temp
|
|
||||||
|
|
||||||
```
|
|
|
@ -1,75 +0,0 @@
|
||||||
# Manual pipeline Example: PubSub to Bigquery
|
|
||||||
|
|
||||||
In this example we will publish person message in the following format:
|
|
||||||
|
|
||||||
```txt
|
|
||||||
name: Name
|
|
||||||
surname: Surname
|
|
||||||
timestamp: 1617898199
|
|
||||||
```
|
|
||||||
|
|
||||||
a Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project.
|
|
||||||
|
|
||||||
An autorized view will be created in the datamart project to expose the table.
|
|
||||||
|
|
||||||
[TODO] Further automation is expected in future.
|
|
||||||
|
|
||||||
## Set up the env vars
|
|
||||||
```bash
|
|
||||||
export DWH_PROJECT_ID=**dwh_project_id**
|
|
||||||
export LANDING_PROJECT_ID=**landing_project_id**
|
|
||||||
export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
|
|
||||||
```
|
|
||||||
|
|
||||||
## Create BQ table
|
|
||||||
Those steps should be done as DWH Service Account.
|
|
||||||
|
|
||||||
You can run the command to create a table:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
|
|
||||||
alpha bq tables create person \
|
|
||||||
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
|
|
||||||
--description "This is a Test Person table" \
|
|
||||||
--schema name=STRING,surname=STRING,timestamp=TIMESTAMP
|
|
||||||
```
|
|
||||||
|
|
||||||
## Produce PubSub messages
|
|
||||||
|
|
||||||
Those steps should be done as landing Service Account:
|
|
||||||
|
|
||||||
Let's now create a series of messages we can use to import:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
for i in {0..10}
|
|
||||||
do
|
|
||||||
gcloud --impersonate-service-account=sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com pubsub topics publish projects/$LANDING_PROJECT_ID/topics/landing-1 --message="{\"name\": \"Lorenzo\", \"surname\": \"Caggioni\", \"timestamp\": \"$(date +%s)\"}"
|
|
||||||
done
|
|
||||||
```
|
|
||||||
|
|
||||||
if you want to check messages published, you can use the Transformation service account and read a message (message won't be acked and will stay in the subscription):
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com pubsub subscriptions pull projects/$LANDING_PROJECT_ID/subscriptions/sub1
|
|
||||||
```
|
|
||||||
|
|
||||||
## Dataflow
|
|
||||||
|
|
||||||
Those steps should be done as transformation Service Account:
|
|
||||||
|
|
||||||
Let's than start a Dataflow streaming pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
gcloud dataflow jobs run test_streaming01 \
|
|
||||||
--gcs-location gs://dataflow-templates/latest/PubSub_Subscription_to_BigQuery \
|
|
||||||
--project $TRANSFORMATION_PROJECT_ID \
|
|
||||||
--region europe-west3 \
|
|
||||||
--disable-public-ips \
|
|
||||||
--network transformation-vpc \
|
|
||||||
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \
|
|
||||||
--staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
|
|
||||||
--service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
|
|
||||||
--parameters \
|
|
||||||
inputSubscription=projects/$LANDING_PROJECT_ID/subscriptions/sub1,\
|
|
||||||
outputTableSpec=$DWH_PROJECT_ID:bq_raw_dataset.person
|
|
||||||
```
|
|
|
@ -1,26 +0,0 @@
|
||||||
{
|
|
||||||
"schema": {
|
|
||||||
"fields": [
|
|
||||||
{
|
|
||||||
"mode": "NULLABLE",
|
|
||||||
"name": "name",
|
|
||||||
"type": "STRING"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"mode": "NULLABLE",
|
|
||||||
"name": "surname",
|
|
||||||
"type": "STRING"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"mode": "NULLABLE",
|
|
||||||
"name": "age",
|
|
||||||
"type": "INTEGER"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"mode": "NULLABLE",
|
|
||||||
"name": "boolean_val",
|
|
||||||
"type": "BOOLEAN"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
|
@ -0,0 +1,167 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Trasformation project and VPC.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
transf_subnet = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.subnet_self_links.orchestration
|
||||||
|
: values(module.transf-vpc.0.subnet_self_links)[0]
|
||||||
|
)
|
||||||
|
transf_vpc = (
|
||||||
|
local.use_shared_vpc
|
||||||
|
? var.network_config.network_self_link
|
||||||
|
: module.transf-vpc.0.self_link
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "transf-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "trf"
|
||||||
|
group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/bigquery.jobUser",
|
||||||
|
"roles/dataflow.admin",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
iam = {
|
||||||
|
"roles/bigquery.dataViewer" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
"roles/bigquery.jobUser" = [
|
||||||
|
module.transf-sa-bq-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/dataflow.admin" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/dataflow.worker" = [
|
||||||
|
module.transf-sa-df-0.iam_email
|
||||||
|
]
|
||||||
|
"roles/storage.objectAdmin" = [
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
"serviceAccount:${module.transf-project.service_accounts.robots.dataflow}"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
services = concat(var.project_services, [
|
||||||
|
"bigquery.googleapis.com",
|
||||||
|
"bigqueryreservation.googleapis.com",
|
||||||
|
"bigquerystorage.googleapis.com",
|
||||||
|
"cloudkms.googleapis.com",
|
||||||
|
"compute.googleapis.com",
|
||||||
|
"dataflow.googleapis.com",
|
||||||
|
"dlp.googleapis.com",
|
||||||
|
"pubsub.googleapis.com",
|
||||||
|
"servicenetworking.googleapis.com",
|
||||||
|
"storage.googleapis.com",
|
||||||
|
"storage-component.googleapis.com"
|
||||||
|
])
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
dataflow = [try(local.service_encryption_keys.dataflow, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
|
||||||
|
attach = true
|
||||||
|
host_project = local.shared_vpc_project
|
||||||
|
service_identity_iam = {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cloud Storage
|
||||||
|
|
||||||
|
module "transf-sa-df-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "trf-df-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [
|
||||||
|
local.groups_iam.data-engineers,
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
],
|
||||||
|
"roles/iam.serviceAccountUser" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "transf-cs-df-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "trf-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
# BigQuery
|
||||||
|
|
||||||
|
module "transf-sa-bq-0" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "trf-bq-0"
|
||||||
|
# TODO: descriptive name
|
||||||
|
display_name = "TODO"
|
||||||
|
iam = {
|
||||||
|
"roles/iam.serviceAccountTokenCreator" = [
|
||||||
|
local.groups_iam.data-engineers,
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
],
|
||||||
|
"roles/iam.serviceAccountUser" = [
|
||||||
|
module.orch-sa-cmp-0.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# internal VPC resources
|
||||||
|
|
||||||
|
module "transf-vpc" {
|
||||||
|
source = "../../../modules/net-vpc"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
subnets = [
|
||||||
|
{
|
||||||
|
ip_cidr_range = "10.10.0.0/24"
|
||||||
|
name = "default"
|
||||||
|
region = var.region
|
||||||
|
secondary_ip_range = {}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "transf-vpc-firewall" {
|
||||||
|
source = "../../../modules/net-vpc-firewall"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
network = module.transf-vpc.0.name
|
||||||
|
admin_ranges = ["10.10.0.0/24"]
|
||||||
|
}
|
||||||
|
|
||||||
|
module "transf-nat" {
|
||||||
|
source = "../../../modules/net-cloudnat"
|
||||||
|
count = local.use_shared_vpc ? 0 : 1
|
||||||
|
project_id = module.transf-project.project_id
|
||||||
|
name = "${var.prefix}-default"
|
||||||
|
region = var.region
|
||||||
|
router_network = module.transf-vpc.0.name
|
||||||
|
}
|
|
@ -0,0 +1,213 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Datalake projects.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
lake_group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/bigquery.dataEditor",
|
||||||
|
"roles/storage.admin",
|
||||||
|
],
|
||||||
|
(local.groups.data-analysts) = [
|
||||||
|
"roles/bigquery.dataViewer",
|
||||||
|
"roles/bigquery.jobUser",
|
||||||
|
"roles/bigquery.user",
|
||||||
|
"roles/datacatalog.viewer",
|
||||||
|
"roles/datacatalog.tagTemplateViewer",
|
||||||
|
"roles/storage.objectViewer",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
lake_iam = {
|
||||||
|
"roles/bigquery.dataEditor" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-bq-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/bigquery.jobUser" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/storage.admin" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/storage.objectCreator" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-bq-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
"roles/storage.objectViewer" = [
|
||||||
|
module.transf-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-bq-0.iam_email,
|
||||||
|
module.orch-sa-cmp-0.iam_email,
|
||||||
|
]
|
||||||
|
}
|
||||||
|
lake_services = concat(var.project_services, [
|
||||||
|
"bigquery.googleapis.com",
|
||||||
|
"bigqueryreservation.googleapis.com",
|
||||||
|
"bigquerystorage.googleapis.com",
|
||||||
|
"cloudkms.googleapis.com",
|
||||||
|
"compute.googleapis.com",
|
||||||
|
"dataflow.googleapis.com",
|
||||||
|
"pubsub.googleapis.com",
|
||||||
|
"servicenetworking.googleapis.com",
|
||||||
|
"storage.googleapis.com",
|
||||||
|
"storage-component.googleapis.com"
|
||||||
|
])
|
||||||
|
}
|
||||||
|
|
||||||
|
# Project
|
||||||
|
|
||||||
|
module "lake-0-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-0"
|
||||||
|
group_iam = local.lake_group_iam
|
||||||
|
iam = local.lake_iam
|
||||||
|
services = local.lake_services
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
bq = [try(local.service_encryption_keys.bq, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-1-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-1"
|
||||||
|
group_iam = local.lake_group_iam
|
||||||
|
iam = local.lake_iam
|
||||||
|
services = local.lake_services
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
bq = [try(local.service_encryption_keys.bq, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-2-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-2"
|
||||||
|
group_iam = local.lake_group_iam
|
||||||
|
iam = local.lake_iam
|
||||||
|
services = local.lake_services
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
bq = [try(local.service_encryption_keys.bq, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-plg-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-plg"
|
||||||
|
group_iam = local.lake_group_iam
|
||||||
|
iam = local.lake_iam
|
||||||
|
services = local.lake_services
|
||||||
|
service_encryption_key_ids = {
|
||||||
|
bq = [try(local.service_encryption_keys.bq, null)]
|
||||||
|
storage = [try(local.service_encryption_keys.storage, null)]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# Bigquery
|
||||||
|
|
||||||
|
module "lake-0-bq-0" {
|
||||||
|
source = "../../../modules/bigquery-dataset"
|
||||||
|
project_id = module.lake-0-project.project_id
|
||||||
|
id = "${replace(var.prefix, "-", "_")}_dtl_0_bq_0"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.bq, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-1-bq-0" {
|
||||||
|
source = "../../../modules/bigquery-dataset"
|
||||||
|
project_id = module.lake-1-project.project_id
|
||||||
|
id = "${replace(var.prefix, "-", "_")}_dtl_1_bq_0"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.bq, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-2-bq-0" {
|
||||||
|
source = "../../../modules/bigquery-dataset"
|
||||||
|
project_id = module.lake-2-project.project_id
|
||||||
|
id = "${replace(var.prefix, "-", "_")}_dtl_2_bq_0"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.bq, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-plg-bq-0" {
|
||||||
|
source = "../../../modules/bigquery-dataset"
|
||||||
|
project_id = module.lake-plg-project.project_id
|
||||||
|
id = "${replace(var.prefix, "-", "_")}_dtl_plg_bq_0"
|
||||||
|
location = var.region
|
||||||
|
encryption_key = try(local.service_encryption_keys.bq, null)
|
||||||
|
}
|
||||||
|
|
||||||
|
# Cloud storage
|
||||||
|
|
||||||
|
module "lake-0-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.lake-0-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-0-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
force_destroy = var.data_force_destroy
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-1-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.lake-1-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-1-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
force_destroy = var.data_force_destroy
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-2-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.lake-2-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-2-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
force_destroy = var.data_force_destroy
|
||||||
|
}
|
||||||
|
|
||||||
|
module "lake-plg-cs-0" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = module.lake-plg-project.project_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "dtl-plg-cs-0"
|
||||||
|
location = var.region
|
||||||
|
storage_class = "REGIONAL"
|
||||||
|
encryption_key = try(local.service_encryption_keys.storage, null)
|
||||||
|
force_destroy = var.data_force_destroy
|
||||||
|
}
|
|
@ -0,0 +1,83 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description common project.
|
||||||
|
|
||||||
|
module "common-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
parent = var.folder_id
|
||||||
|
billing_account = var.billing_account_id
|
||||||
|
prefix = var.prefix
|
||||||
|
name = "cmn"
|
||||||
|
group_iam = {
|
||||||
|
(local.groups.data-engineers) = [
|
||||||
|
"roles/dlp.reader",
|
||||||
|
"roles/dlp.user",
|
||||||
|
"roles/dlp.estimatesAdmin",
|
||||||
|
]
|
||||||
|
(local.groups.data-security) = [
|
||||||
|
"roles/dlp.admin",
|
||||||
|
]
|
||||||
|
}
|
||||||
|
iam = {
|
||||||
|
"roles/dlp.user" = [
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
module.transf-sa-df-0.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
services = concat(var.project_services, [
|
||||||
|
"datacatalog.googleapis.com",
|
||||||
|
"dlp.googleapis.com",
|
||||||
|
])
|
||||||
|
}
|
||||||
|
|
||||||
|
# To create KMS keys in the common projet: uncomment this section and assigne key links accondingly in local.service_encryption_keys variable
|
||||||
|
|
||||||
|
# module "cmn-kms-0" {
|
||||||
|
# source = "../../../modules/kms"
|
||||||
|
# project_id = module.cmn-prj.project_id
|
||||||
|
# keyring = {
|
||||||
|
# name = "${var.prefix}-kr-global",
|
||||||
|
# location = var.location_config.region
|
||||||
|
# }
|
||||||
|
# keys = {
|
||||||
|
# pubsub = null
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
|
||||||
|
# module "cmn-kms-1" {
|
||||||
|
# source = "../../../modules/kms"
|
||||||
|
# project_id = module.cmn-prj.project_id
|
||||||
|
# keyring = {
|
||||||
|
# name = "${var.prefix}-kr-mregional",
|
||||||
|
# location = var.location_config.region
|
||||||
|
# }
|
||||||
|
# keys = {
|
||||||
|
# bq = null
|
||||||
|
# storage = null
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
|
||||||
|
# module "cmn-kms-2" {
|
||||||
|
# source = "../../../modules/kms"
|
||||||
|
# project_id = module.cmn-prj.project_id
|
||||||
|
# keyring = {
|
||||||
|
# name = "${var.prefix}-kr-regional",
|
||||||
|
# location = var.location_config.region
|
||||||
|
# }
|
||||||
|
# keys = {
|
||||||
|
# composer = null
|
||||||
|
# dataflow = null
|
||||||
|
# }
|
||||||
|
# }
|
|
@ -12,18 +12,12 @@
|
||||||
# See the License for the specific language governing permissions and
|
# See the License for the specific language governing permissions and
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
terraform {
|
# tfdoc:file:description common project.
|
||||||
required_version = ">= 1.0.0"
|
|
||||||
required_providers {
|
module "exp-project" {
|
||||||
google = {
|
source = "../../../modules/project"
|
||||||
source = "hashicorp/google"
|
parent = var.folder_id
|
||||||
version = ">= 4.0.0"
|
billing_account = var.billing_account_id
|
||||||
}
|
prefix = var.prefix
|
||||||
google-beta = {
|
name = "exp"
|
||||||
source = "hashicorp/google-beta"
|
|
||||||
version = ">= 4.0.0"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -1,61 +1,274 @@
|
||||||
# Data Foundation Platform
|
# Data Platform
|
||||||
|
|
||||||
The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.
|
This module implements an opinionated Data Platform Architecture that creates and sets up projects and related resources, to be used to create your end to end data environment.
|
||||||
|
|
||||||
The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.
|
The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design.
|
||||||
|
|
||||||
1. **[Environment Setup](./01-environment/)**
|
The following diagram is a high-level reference of the resources created and managed here:
|
||||||
*(once per environment)*
|
|
||||||
* projects
|
|
||||||
* VPC configuration
|
|
||||||
* Composer environment and identity
|
|
||||||
* shared buckets and datasets
|
|
||||||
1. **[Data Source Setup](./02-resources)**
|
|
||||||
*(once per data source)*
|
|
||||||
* landing and archive bucket
|
|
||||||
* internal and external identities
|
|
||||||
* domain specific datasets
|
|
||||||
1. **[Pipeline Setup](./03-pipeline)**
|
|
||||||
*(once per pipeline)*
|
|
||||||
* pipeline-specific tables and views
|
|
||||||
* pipeline code
|
|
||||||
* Composer DAG
|
|
||||||
|
|
||||||
The resulting GCP architecture is outlined in this diagram
|
![Data Platform architecture overview](./images/overview_diagram.png "Data Platform architecture overview")
|
||||||
![Target architecture](./02-resources/diagram.png)
|
|
||||||
|
|
||||||
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.
|
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to verify or test the setup quickly.
|
||||||
|
|
||||||
## Prerequisites
|
## Design overview and choices
|
||||||
|
|
||||||
In order to bring up this example, you will need
|
Despite its simplicity, this stage implements the basics of a design that we've seen working well for various customers.
|
||||||
|
|
||||||
|
The approach adapts to different high-level requirements:
|
||||||
|
|
||||||
|
- boundaries for each step
|
||||||
|
- clear and defined actors
|
||||||
|
- least privilege principle
|
||||||
|
- rely on service account impersonation
|
||||||
|
|
||||||
|
The code in this example doesn't address Organization level configuration (Organization policy, VPC-SC, centralized logs). We expect to address those aspects on stages external to this script.
|
||||||
|
|
||||||
|
### Project structure
|
||||||
|
|
||||||
|
The Data Platform is designed to rely on several projects, one project per data stage. The stages identified are:
|
||||||
|
|
||||||
|
- landing
|
||||||
|
- load
|
||||||
|
- data lake
|
||||||
|
- orchestration
|
||||||
|
- transformation
|
||||||
|
- exposure
|
||||||
|
|
||||||
|
This separation into projects allows adhering the least-privilege principle relying on project-level roles.
|
||||||
|
|
||||||
|
The script will create the following projects:
|
||||||
|
|
||||||
|
- **Landing** This project is intended to store data temporarily. Data are pushed to Cloud Storage, BigQuery, or Cloud PubSub. Resource configured with 3-months lifecycle policy.
|
||||||
|
- **Load** This project is intended to load data from `landing` to the `data lake`. The load is made with minimal to zero transformation logic (mainly `cast`). This stage can anonymization/tokenization Personally Identifiable Information (PII). Alternatively, it can be done in the transformation stage depending on your requirements. The use of [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates) is recommended.
|
||||||
|
- **Data Lake** projects where data are stored. itìs composed of 3 layers that progressively process and define data:
|
||||||
|
- **L0 - Raw data** Structured Data, stored in the adequate format: structured data stored in BigQuery, unstructured data stored on Cloud Storage with additional metadata stored in BigQuery (for example pictures stored in Cloud Storage and analysis of the images for Cloud Vision API stored in BigQuery).
|
||||||
|
- **L1 - Cleansed, aggregated and standardized data**
|
||||||
|
- **L2 - Curated layer**
|
||||||
|
- **Playground** Store temporary tables that Data Analyst may use to perform R&D on data available on other Data Lake layers
|
||||||
|
- **Orchestration** This project is intended to host Cloud Composer. Cloud Composer will orchestrate all tasks to move your data on its journey.
|
||||||
|
- **Transformation** This project is used to move data between layers of the Data Lake. We strongly suggest relying on BigQuery engine to perform transformations. If BigQuery doesn't have the feature needed to perform your transformation you recommend using Cloud Dataflow together with [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates). This stage can optionally be used to anonymiza/tokenize PII.
|
||||||
|
- **Exposure** This project is intended to host resources to share your processed data with external systems your data. For the porpuse of this example we leace this project empty. Depending on the access pattern, data can be presented on Cloud SQL, BigQuery, or Bigtable. For BigQuery data, we strongly suggest relying on [Authorized views](https://cloud.google.com/bigquery/docs/authorized-views).
|
||||||
|
|
||||||
|
### Roles
|
||||||
|
|
||||||
|
We assign roles on resources at project level setting the appropriate role to groups. We recommend not adding human users directly to the resource-access groups with IAM permissions to access data.
|
||||||
|
|
||||||
|
### Service accounts
|
||||||
|
|
||||||
|
Service account creation follows the least privilege principle, performing a single task which requires access to a defined set of resources. In the table below you can find an high level overview on roles for each service account on each data layer. For semplicy `READ` or `WRITE` roles are used, for detailed roles please refer to the code.
|
||||||
|
|
||||||
|
|
||||||
|
|Service Account|Landing|DataLake L0|DataLake L1|DataLake L2|
|
||||||
|
|-|:-:|:-:|:-:|:-:|
|
||||||
|
|landing-sa|WRITE|-|-|-|
|
||||||
|
|load-sa|READ|READ/WRITE|-|-|
|
||||||
|
|transformation-sa|-|READ/WRITE|READ/WRITE|READ/WRITE|
|
||||||
|
|orchestration-sa|-|-|-|-|
|
||||||
|
|
||||||
|
- Each service account perform a single task having access to the minimum number of resources (example: the Cloud Dataflow Service Account has access to the Landing project and the Data Lake L0 project)
|
||||||
|
- Each Service Account has the least privilege on each project.
|
||||||
|
|
||||||
|
#### Service Account Keys
|
||||||
|
|
||||||
|
The use of SAK within a data pipeline incurs several security risks, as these credentials, that could be leaked without oversight or control. This example relies on Service Account Impersonation to avoid the creation of private keys.
|
||||||
|
|
||||||
|
### User groups
|
||||||
|
|
||||||
|
User groups are important. They provide a stable frame of reference that allows decoupling the final set of permissions for each group, from the stage where entities and resources are created and their IAM bindings defined.
|
||||||
|
|
||||||
|
We use three groups to control access to resources:
|
||||||
|
|
||||||
|
- *Data Engineers* They handle and run the Data Hub, with read access to all resources in order to troubleshoot possible issues with pipelines. This team can also impersonate any service account.
|
||||||
|
- *Data Analyst*. They perform analysis on datasets, with read access to the data lake L2 project, and BigQuery READ/WRITE access to the playground project.
|
||||||
|
- *Data Security*:. They handle security configurations related to the Data Hub. This team has admin access to the common project to configure Cloud DLP templates or Data Catalog policy tags.
|
||||||
|
|
||||||
|
In the table below you can find an high level overview on roles for each group on each project. For semplicy `READ`, `WRITE` and `ADMIN` roles are used, for detailed roles please refer to the code.
|
||||||
|
|
||||||
|
|Group|Landing|Load|Transformation|Data Lake L0|Data Lake L1|Data Lake L2|Data Lake Playground|Orchestration|Common|
|
||||||
|
|-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|
||||||
|
|Data Engineers|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|
|
||||||
|
|Data Analyst|-|-|-|-|-|READ|READ/WRITE|-|-|
|
||||||
|
|Data Security|-|-|-|-|-|-|-|-|ADMIN|
|
||||||
|
|
||||||
|
### Groups
|
||||||
|
|
||||||
|
We use thress groups based on the required access:
|
||||||
|
|
||||||
|
- *Data Engineers*: the group that handles and runs the Data Hub. The group has Read access to all resources to troubleshoot possible issues with the pipeline. The team also can impersonate all service accounts. Default value: `gcp-data-engineers@DOMAIN.COM`.
|
||||||
|
- *Data Analyst*: the group that performs analysis on the dataset. The group has Read access to the Data Lake L2 project and BigQuery READ/WRITE access to the `playground` project. Default value: `gcp-data-analyst@DOMAIN.COM`
|
||||||
|
- *Data Security*: the group handling security configurations related to the Data Hub. Default name: `gcp-data-security@DOMAIN.com`
|
||||||
|
|
||||||
|
### Virtual Private Cloud (VPC) design
|
||||||
|
|
||||||
|
The Data Platform accepts as input an existing [Shared-VPC](https://cloud.google.com/vpc/docs/shared-vpc) to run resources. You can configure subnets for data resources by specifying the link to the subnet in the `network_config` variable. You may want to configure a shared-VPC to host your resources if your pipelines may need to reach on-premise resources.
|
||||||
|
|
||||||
|
If `network_config` variable is not provided, the script will create a VPC on each project that requires a VPC: *load*, *transformation*, and *orchestration* projects with the default configuration.
|
||||||
|
|
||||||
|
### IP ranges, subnetting
|
||||||
|
|
||||||
|
To deploy your Data Platform you need the following ranges:
|
||||||
|
|
||||||
|
- Load project VPC for Cloud Dataflow workers. Range: '/24'.
|
||||||
|
- Transformation VPC for Cloud Dataflow workers. Range: '/24'.
|
||||||
|
- Orchestration VPC for Cloud Composer:
|
||||||
|
- Cloud SQL. Range: '/24'
|
||||||
|
- GKE Master. Range: '/28'
|
||||||
|
- Web Server: Range: '/28'
|
||||||
|
- Secondary IP ranges. Pods range: '/22', Services range: '/24'
|
||||||
|
|
||||||
|
### Resource naming convention
|
||||||
|
|
||||||
|
Resources follow the naming convention described below.
|
||||||
|
|
||||||
|
- `prefix-layer` for projects
|
||||||
|
- `prefix-layer-prduct` for resources
|
||||||
|
- `prefix-layer[2]-gcp-product[2]-counter` for services and service accounts
|
||||||
|
|
||||||
|
### Encryption
|
||||||
|
|
||||||
|
We suggest a centralized approach to key management, where Organization Security is the only team that can access encryption material, and keyrings and keys are managed in a project external to the DP.
|
||||||
|
|
||||||
|
![Centralized Cloud Key Management high-level diagram](./images/kms_diagram.png "Centralized Cloud Key Management high-level diagram")
|
||||||
|
|
||||||
|
To configure the use of Cloud Key Management on resources you have to specify the key URL on the 'service_encryption_keys'. Keys location should match the resource location. Example:
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
service_encryption_keys = {
|
||||||
|
bq = "KEY_URL_MULTIREGIONAL"
|
||||||
|
composer = "KEY_URL_REGIONAL"
|
||||||
|
dataflow = "KEY_URL_REGIONAL"
|
||||||
|
storage = "KEY_URL_MULTIREGIONAL"
|
||||||
|
pubsub = "KEY_URL_MULTIREGIONAL"
|
||||||
|
```
|
||||||
|
|
||||||
|
We consider this step optional, it depends on customer policy and security best practices.
|
||||||
|
|
||||||
|
## Data Anonymization
|
||||||
|
|
||||||
|
We suggest using Cloud Data Loss Prevention to identify/mask/tokenize your confidential data. Implementing the Data Loss Prevention strategy is out of scope for this example. We enable the service in 2 different projects to implement the data loss prevention strategy. We expect you will use [Cloud Data Loss Prevention templates](https://cloud.google.com/dlp/docs/concepts-templates) in one of the following ways:
|
||||||
|
|
||||||
|
- During the ingestion phase, from Dataflow
|
||||||
|
- During the transformation phase, from [BigQuery](https://cloud.google.com/bigquery/docs/scan-with-dlp) or [Cloud Dataflow](https://cloud.google.com/architecture/running-automated-dataflow-pipeline-de-identify-pii-dataset)
|
||||||
|
|
||||||
|
We implemented a centralized model for Cloud Data Loss Prevention resources. Templates will be stored in the security project:
|
||||||
|
|
||||||
|
![Centralized Cloud Data Loss Prevention high-level diagram](./images/dlp_diagram.png "Centralized Cloud Data Loss Prevention high-level diagram")
|
||||||
|
|
||||||
|
## How to run this script
|
||||||
|
|
||||||
|
To deploy this example on your GCP organization, you will need
|
||||||
|
|
||||||
- a folder or organization where new projects will be created
|
- a folder or organization where new projects will be created
|
||||||
- a billing account that will be associated to new projects
|
- a billing account that will be associated with the new projects
|
||||||
- an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account
|
|
||||||
|
|
||||||
## Bringing up the platform
|
The Data Platform is meant to be executed by a Service Account (or a regular user) having this minimal set of permission:
|
||||||
|
|
||||||
[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fterraform-google-modules%2Fcloud-foundation-fabric.git&cloudshell_open_in_editor=README.md&cloudshell_workspace=examples%2Fdata-solutions%2Fdata-platform-foundations)
|
- Org level
|
||||||
|
- `"compute.organizations.enableXpnResource"`
|
||||||
|
- `"compute.organizations.disableXpnResource"`
|
||||||
|
- `"compute.subnetworks.setIamPolicy"`
|
||||||
|
- Folder level
|
||||||
|
- `"roles/logging.admin"`
|
||||||
|
- `"roles/owner"`
|
||||||
|
- `"roles/resourcemanager.folderAdmin"`
|
||||||
|
- `"roles/resourcemanager.projectCreator"`
|
||||||
|
- Cloud Key Management Keys** (if Cloud Key Management keys are configured):
|
||||||
|
- `"roles/cloudkms.admin"` or Permissions: `cloudkms.cryptoKeys.getIamPolicy`, `cloudkms.cryptoKeys.list`, `cloudkms.cryptoKeys.setIamPolicy`
|
||||||
|
- on the host project for the Shared VPC/s
|
||||||
|
- `"roles/browser"`
|
||||||
|
- `"roles/compute.viewer"`
|
||||||
|
- `"roles/dns.admin"`
|
||||||
|
|
||||||
The end-to-end example is composed of 2 foundational, and 1 optional steps:
|
## Variable configuration
|
||||||
|
|
||||||
1. [Environment setup](./01-environment/)
|
There are three sets of variables you will need to fill in:
|
||||||
1. [Data source setup](./02-resources/)
|
|
||||||
1. (Optional) [Pipeline setup](./03-pipeline/)
|
|
||||||
|
|
||||||
The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.
|
```hcl
|
||||||
|
prefix = "PRFX"
|
||||||
|
project_create = {
|
||||||
|
parent = "folders/123456789012"
|
||||||
|
billing_account_id = "111111-222222-333333"
|
||||||
|
}
|
||||||
|
organization = {
|
||||||
|
domain = "DOMAIN.com"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
## TODO
|
For more fine details check variables on [`variables.tf`](./variables.tf) and update according to the desired configuration. Remember to create team groups described [below](#groups).
|
||||||
|
|
||||||
| Description | Priority (1:High - 5:Low ) | Status | Remarks |
|
Once the configuration is complete, run the project factory by running
|
||||||
|-------------|----------|:------:|---------|
|
|
||||||
| DLP best practices in the pipeline | 2 | Not Started | |
|
```bash
|
||||||
| Add Composer with a static DAG running the example | 3 | Not Started | |
|
terraform init
|
||||||
| Integrate [CI/CD composer data processing workflow framework](https://github.com/jaketf/ci-cd-for-data-processing-workflow) | 3 | Not Started | |
|
terraform apply
|
||||||
| Schema changes, how to handle | 4 | Not Started | |
|
```
|
||||||
| Data lineage | 4 | Not Started | |
|
|
||||||
| Data quality checks | 4 | Not Started | |
|
## Customizations
|
||||||
| Shared-VPC | 5 | Not Started | |
|
|
||||||
| Logging & monitoring | TBD | Not Started | |
|
### Create Cloud Key Management keys as part of the Data Platform
|
||||||
| Orcestration for ingestion pipeline (just in the readme) | TBD | Not Started | |
|
|
||||||
|
To create Cloud Key Management keys in the Data Platform you can uncomment the Cloud Key Management resources configured in the [`06-common.tf`](./06-common.tf) file and update Cloud Key Management keys pointers on `local.service_encryption_keys.*` to the local resource created.
|
||||||
|
|
||||||
|
### Assign roles at BQ Dataset level
|
||||||
|
|
||||||
|
To handle multiple groups of `data-analysts` accessing the same Data Lake layer projects but only to the dataset belonging to a specific group, you may want to assign roles at BigQuery dataset level instead of at project-level.
|
||||||
|
To do this, you need to remove IAM binging at project-level for the `data-analysts` group and give roles at BigQuery dataset level using the `iam` variable on `bigquery-dataset` modules.
|
||||||
|
|
||||||
|
## Demo pipeline
|
||||||
|
|
||||||
|
The application layer is out of scope of this script, but as a demo, it is provided with a Cloud Composer DAG to mode data from the `landing` area to the `DataLake L2` dataset.
|
||||||
|
|
||||||
|
Just follow the commands you find in the `demo_commands` Terraform output, go in the Cloud Composer UI and run the `data_pipeline_dag`.
|
||||||
|
|
||||||
|
Description of commands:
|
||||||
|
|
||||||
|
- 01: copy sample data to a `landing` Cloud Storage bucket impersonating the `load` service account.
|
||||||
|
- 02: copy sample data structure definition in the `orchestration` Cloud Storage bucket impersonating the `orchestration` service account.
|
||||||
|
- 03: copy the Cloud Composer DAG to the Cloud Composer Storage bucket impersonating the `orchestration` service account.
|
||||||
|
- 04: Open the Cloud Composer Airflow UI and run the imported DAG.
|
||||||
|
- 05: Run the BigQuery query to see results.
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default |
|
||||||
|
|---|---|:---:|:---:|:---:|
|
||||||
|
| [billing_account_id](variables.tf#L17) | Billing account id. | <code>string</code> | ✓ | |
|
||||||
|
| [folder_id](variables.tf#L41) | Folder to be used for the networking resources in folders/nnnn format. | <code>string</code> | ✓ | |
|
||||||
|
| [organization_domain](variables.tf#L79) | Organization domain. | <code>string</code> | ✓ | |
|
||||||
|
| [prefix](variables.tf#L84) | Unique prefix used for resource names. | <code>string</code> | ✓ | |
|
||||||
|
| [composer_config](variables.tf#L22) | | <code title="object({ node_count = number airflow_version = string env_variables = map(string) })">object({…})</code> | | <code title="{ node_count = 3 airflow_version = "composer-1.17.5-airflow-2.1.4" env_variables = {} }">{…}</code> |
|
||||||
|
| [data_force_destroy](variables.tf#L35) | Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage. | <code>bool</code> | | <code>false</code> |
|
||||||
|
| [groups](variables.tf#L46) | Groups. | <code>map(string)</code> | | <code title="{ data-analysts = "gcp-data-analysts" data-engineers = "gcp-data-engineers" data-security = "gcp-data-security" }">{…}</code> |
|
||||||
|
| [network_config](variables.tf#L56) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object({ host_project = string network_self_link = string subnet_self_links = object({ load = string transformation = string orchestration = string }) composer_ip_ranges = object({ cloudsql = string gke_master = string web_server = string }) composer_secondary_ranges = object({ pods = string services = string }) })">object({…})</code> | | <code>null</code> |
|
||||||
|
| [project_services](variables.tf#L89) | List of core services enabled on all projects. | <code>list(string)</code> | | <code title="[ "cloudresourcemanager.googleapis.com", "iam.googleapis.com", "serviceusage.googleapis.com", "stackdriver.googleapis.com" ]">[…]</code> |
|
||||||
|
| [region](variables.tf#L100) | Region used for regional resources. | <code>string</code> | | <code>"europe-west1"</code> |
|
||||||
|
|
||||||
|
## Outputs
|
||||||
|
|
||||||
|
| name | description | sensitive |
|
||||||
|
|---|---|:---:|
|
||||||
|
| [bigquery-datasets](outputs.tf#L17) | BigQuery datasets. | |
|
||||||
|
| [demo_commands](outputs.tf#L93) | Demo commands. | |
|
||||||
|
| [gcs-buckets](outputs.tf#L28) | GCS buckets. | |
|
||||||
|
| [kms_keys](outputs.tf#L42) | Cloud MKS keys. | |
|
||||||
|
| [projects](outputs.tf#L47) | GCP Projects informations. | |
|
||||||
|
| [vpc_network](outputs.tf#L75) | VPC network. | |
|
||||||
|
| [vpc_subnet](outputs.tf#L84) | VPC subnetworks. | |
|
||||||
|
|
||||||
|
<!-- END TFDOC -->
|
||||||
|
|
||||||
|
## TODOs
|
||||||
|
|
||||||
|
Features to add in future releases
|
||||||
|
|
||||||
|
- add support for column level access on BigQuery
|
||||||
|
- add example templates for Data Catalog
|
||||||
|
- add example on how to use Cloud Data Loss Prevention
|
||||||
|
- add solution to handle tables, views, and authorized views lifecycle
|
||||||
|
- add solution to handle metadata lifecycle
|
||||||
|
|
||||||
|
Fixes
|
||||||
|
|
||||||
|
- composer requires "Require OS Login" not enforced
|
||||||
|
- external Shared VPC
|
||||||
|
|
|
@ -12,18 +12,19 @@
|
||||||
# See the License for the specific language governing permissions and
|
# See the License for the specific language governing permissions and
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
|
# The `impersonate_service_account` option require the identity launching terraform
|
||||||
|
# role `roles/iam.serviceAccountTokenCreator` on the Service Account specified.
|
||||||
|
|
||||||
terraform {
|
terraform {
|
||||||
required_version = ">= 1.0.0"
|
backend "gcs" {
|
||||||
required_providers {
|
bucket = "BUCKET_NAME"
|
||||||
google = {
|
prefix = "PREFIX"
|
||||||
source = "hashicorp/google"
|
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
|
||||||
version = ">= 4.0.0"
|
|
||||||
}
|
|
||||||
google-beta = {
|
|
||||||
source = "hashicorp/google-beta"
|
|
||||||
version = ">= 4.0.0"
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
provider "google" {
|
||||||
|
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
|
||||||
|
}
|
||||||
|
provider "google-beta" {
|
||||||
|
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
|
||||||
|
}
|
|
@ -0,0 +1,3 @@
|
||||||
|
# Data ingestion Demo
|
||||||
|
|
||||||
|
In this folder you can find an example to ingest data on the `data platfoem` instantiated in [here](../). See details in the [README.m](../#demo-pipeline) to run the demo.
|
|
@ -0,0 +1,50 @@
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "customer_id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "purchase_id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "customer_name",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "customer_surname",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Surname"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "purchase_item",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Item Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "price",
|
||||||
|
"type": "FLOAT",
|
||||||
|
"description": "Item Price"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "purchase_timestamp",
|
||||||
|
"type": "TIMESTAMP",
|
||||||
|
"description": "Timestamp"
|
||||||
|
}
|
||||||
|
]
|
|
@ -0,0 +1,12 @@
|
||||||
|
1,Name1,Surname1,1636972001
|
||||||
|
2,Name2,Surname2,1636972002
|
||||||
|
3,Name3,Surname3,1636972003
|
||||||
|
4,Name4,Surname4,1636972004
|
||||||
|
5,Name5,Surname5,1636972005
|
||||||
|
6,Name6,Surname6,1636972006
|
||||||
|
7,Name7,Surname7,1636972007
|
||||||
|
8,Name8,Surname8,1636972008
|
||||||
|
9,Name9,Surname9,1636972009
|
||||||
|
10,Name11,Surname11,1636972010
|
||||||
|
11,Name12,Surname12,1636972011
|
||||||
|
12,Name13,Surname13,1636972012
|
|
|
@ -0,0 +1,26 @@
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "name",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "surname",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Surname"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "timestamp",
|
||||||
|
"type": "TIMESTAMP",
|
||||||
|
"description": "Timestamp"
|
||||||
|
}
|
||||||
|
]
|
|
@ -0,0 +1,28 @@
|
||||||
|
{
|
||||||
|
"BigQuery Schema": [
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "name",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "surname",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Surname"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "timestamp",
|
||||||
|
"type": "TIMESTAMP",
|
||||||
|
"description": "Timestamp"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
|
@ -0,0 +1,12 @@
|
||||||
|
function transform(line) {
|
||||||
|
var values = line.split(',');
|
||||||
|
|
||||||
|
var obj = new Object();
|
||||||
|
obj.id = values[0]
|
||||||
|
obj.name = values[1];
|
||||||
|
obj.surname = values[2];
|
||||||
|
obj.timestamp = values[3];
|
||||||
|
var jsonString = JSON.stringify(obj);
|
||||||
|
|
||||||
|
return jsonString;
|
||||||
|
}
|
|
@ -0,0 +1,20 @@
|
||||||
|
1,1,Car1,5000,1636972012
|
||||||
|
1,1,Car1,7000,1636972045
|
||||||
|
1,2,Car1,6000,1636972088
|
||||||
|
1,2,Car1,8000,16369720099
|
||||||
|
1,3,Car1,10000,1636972102
|
||||||
|
1,3,Car1,50000,1636972180
|
||||||
|
1,4,Car1,13000,1636972260
|
||||||
|
1,4,Car1,5000,1636972302
|
||||||
|
1,5,Car1,2000,1636972408
|
||||||
|
1,1,Car1,77000,1636972501
|
||||||
|
1,1,Car1,64000,1636975001
|
||||||
|
1,8,Car1,2000,1636976001
|
||||||
|
1,9,Car1,4000,1636977001
|
||||||
|
1,10,Car1,18000,1636982001
|
||||||
|
1,11,Car1,21000,1636992001
|
||||||
|
1,11,Car1,33000,1636932001
|
||||||
|
1,11,Car1,37000,1636872001
|
||||||
|
1,11,Car1,26000,1636772001
|
||||||
|
1,12,Car1,22000,1636672001
|
||||||
|
1,4,Car1,11000,1636952001
|
|
|
@ -0,0 +1,32 @@
|
||||||
|
[
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "customer_id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "item",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Item Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "price",
|
||||||
|
"type": "FLOAT",
|
||||||
|
"description": "Item Price"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "timestamp",
|
||||||
|
"type": "TIMESTAMP",
|
||||||
|
"description": "Timestamp"
|
||||||
|
}
|
||||||
|
]
|
|
@ -0,0 +1,34 @@
|
||||||
|
{
|
||||||
|
"BigQuery Schema": [
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "customer_id",
|
||||||
|
"type": "INTEGER",
|
||||||
|
"description": "ID"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "item",
|
||||||
|
"type": "STRING",
|
||||||
|
"description": "Item Name"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "price",
|
||||||
|
"type": "FLOAT",
|
||||||
|
"description": "Item Price"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"mode": "REQUIRED",
|
||||||
|
"name": "timestamp",
|
||||||
|
"type": "TIMESTAMP",
|
||||||
|
"description": "Timestamp"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
|
@ -0,0 +1,13 @@
|
||||||
|
function transform(line) {
|
||||||
|
var values = line.split(',');
|
||||||
|
|
||||||
|
var obj = new Object();
|
||||||
|
obj.id = values[0];
|
||||||
|
obj.customer_id = values[1];
|
||||||
|
obj.item = values[2];
|
||||||
|
obj.price = values[3];
|
||||||
|
obj.timestamp = values[4];
|
||||||
|
var jsonString = JSON.stringify(obj);
|
||||||
|
|
||||||
|
return jsonString;
|
||||||
|
}
|
|
@ -0,0 +1,201 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
# Load The Dependencies
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
import csv
|
||||||
|
import datetime
|
||||||
|
import io
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
|
||||||
|
from airflow import models
|
||||||
|
from airflow.contrib.operators.dataflow_operator import DataflowTemplateOperator
|
||||||
|
from airflow.operators import dummy
|
||||||
|
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
# Set variables
|
||||||
|
# ------------------------------------------------------------
|
||||||
|
DTL_L0_PRJ = os.environ.get("DTL_L0_PRJ")
|
||||||
|
DTL_L0_BQ_DATASET = os.environ.get("DTL_L0_BQ_DATASET")
|
||||||
|
DTL_L0_GCS = os.environ.get("DTL_L0_GCS")
|
||||||
|
DTL_L1_PRJ = os.environ.get("DTL_L1_PRJ")
|
||||||
|
DTL_L1_BQ_DATASET = os.environ.get("DTL_L1_BQ_DATASET")
|
||||||
|
DTL_L1_GCS = os.environ.get("DTL_L1_GCS")
|
||||||
|
DTL_L2_PRJ = os.environ.get("DTL_L2_PRJ")
|
||||||
|
DTL_L2_BQ_DATASET = os.environ.get("DTL_L2_BQ_DATASET")
|
||||||
|
DTL_L2_GCS = os.environ.get("DTL_L2_GCS")
|
||||||
|
DTL_PLG_PRJ = os.environ.get("DTL_PLG_PRJ")
|
||||||
|
DTL_PLG_BQ_DATASET = os.environ.get("DTL_PLG_BQ_DATASET")
|
||||||
|
DTL_PLG_GCS = os.environ.get("DTL_PLG_GCS")
|
||||||
|
GCP_REGION = os.environ.get("GCP_REGION")
|
||||||
|
LND_PRJ = os.environ.get("LND_PRJ")
|
||||||
|
LND_BQ = os.environ.get("LND_BQ")
|
||||||
|
LND_GCS = os.environ.get("LND_GCS")
|
||||||
|
LND_PS = os.environ.get("LND_PS")
|
||||||
|
LOD_PRJ = os.environ.get("LOD_PRJ")
|
||||||
|
LOD_GCS_STAGING = os.environ.get("LOD_GCS_STAGING")
|
||||||
|
LOD_NET_VPC = os.environ.get("LOD_NET_VPC")
|
||||||
|
LOD_NET_SUBNET = os.environ.get("LOD_NET_SUBNET")
|
||||||
|
LOD_SA_DF = os.environ.get("LOD_SA_DF")
|
||||||
|
ORC_PRJ = os.environ.get("ORC_PRJ")
|
||||||
|
ORC_GCS = os.environ.get("ORC_GCS")
|
||||||
|
TRF_PRJ = os.environ.get("TRF_PRJ")
|
||||||
|
TRF_GCS_STAGING = os.environ.get("TRF_GCS_STAGING")
|
||||||
|
TRF_NET_VPC = os.environ.get("TRF_NET_VPC")
|
||||||
|
TRF_NET_SUBNET = os.environ.get("TRF_NET_SUBNET")
|
||||||
|
TRF_SA_DF = os.environ.get("TRF_SA_DF")
|
||||||
|
TRF_SA_BQ = os.environ.get("TRF_SA_BQ")
|
||||||
|
DF_ZONE = os.environ.get("GCP_REGION") + "-b"
|
||||||
|
DF_REGION = BQ_REGION = os.environ.get("GCP_REGION")
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
# Set default arguments
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
# If you are running Airflow in more than one time zone
|
||||||
|
# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html
|
||||||
|
# for best practices
|
||||||
|
yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
|
||||||
|
|
||||||
|
default_args = {
|
||||||
|
'owner': 'airflow',
|
||||||
|
'start_date': yesterday,
|
||||||
|
'depends_on_past': False,
|
||||||
|
'email': [''],
|
||||||
|
'email_on_failure': False,
|
||||||
|
'email_on_retry': False,
|
||||||
|
'retries': 1,
|
||||||
|
'retry_delay': datetime.timedelta(minutes=5),
|
||||||
|
'dataflow_default_options': {
|
||||||
|
'project': LOD_PRJ,
|
||||||
|
'location': DF_REGION,
|
||||||
|
'zone': DF_ZONE,
|
||||||
|
'stagingLocation': LOD_GCS_STAGING,
|
||||||
|
'tempLocation': LOD_GCS_STAGING + "/tmp",
|
||||||
|
'serviceAccountEmail': LOD_SA_DF,
|
||||||
|
'subnetwork': LOD_NET_SUBNET,
|
||||||
|
'ipConfiguration': "WORKER_IP_PRIVATE"
|
||||||
|
},
|
||||||
|
}
|
||||||
|
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
# Main DAG
|
||||||
|
# --------------------------------------------------------------------------------
|
||||||
|
|
||||||
|
with models.DAG(
|
||||||
|
'data_pipeline_dag',
|
||||||
|
default_args=default_args,
|
||||||
|
schedule_interval=None) as dag:
|
||||||
|
start = dummy.DummyOperator(
|
||||||
|
task_id='start',
|
||||||
|
trigger_rule='all_success'
|
||||||
|
)
|
||||||
|
|
||||||
|
end = dummy.DummyOperator(
|
||||||
|
task_id='end',
|
||||||
|
trigger_rule='all_success'
|
||||||
|
)
|
||||||
|
|
||||||
|
customers_import = DataflowTemplateOperator(
|
||||||
|
task_id="dataflow_customer_import",
|
||||||
|
template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery",
|
||||||
|
parameters={
|
||||||
|
"javascriptTextTransformFunctionName": "transform",
|
||||||
|
"JSONPath": ORC_GCS + "/customers_schema.json",
|
||||||
|
"javascriptTextTransformGcsPath": ORC_GCS + "/customers_udf.js",
|
||||||
|
"inputFilePattern": LND_GCS + "/customers.csv",
|
||||||
|
"outputTable": DTL_L0_PRJ + ":"+DTL_L0_BQ_DATASET+".customers",
|
||||||
|
"bigQueryLoadingTemporaryDirectory": LOD_GCS_STAGING + "/tmp/bq/",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
purchases_import = DataflowTemplateOperator(
|
||||||
|
task_id="dataflow_purchases_import",
|
||||||
|
template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery",
|
||||||
|
parameters={
|
||||||
|
"javascriptTextTransformFunctionName": "transform",
|
||||||
|
"JSONPath": ORC_GCS + "/purchases_schema.json",
|
||||||
|
"javascriptTextTransformGcsPath": ORC_GCS + "/purchases_udf.js",
|
||||||
|
"inputFilePattern": LND_GCS + "/purchases.csv",
|
||||||
|
"outputTable": DTL_L0_PRJ + ":"+DTL_L0_BQ_DATASET+".purchases",
|
||||||
|
"bigQueryLoadingTemporaryDirectory": LOD_GCS_STAGING + "/tmp/bq/",
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
join_customer_purchase = BigQueryInsertJobOperator(
|
||||||
|
task_id='bq_join_customer_purchase',
|
||||||
|
gcp_conn_id='bigquery_default',
|
||||||
|
project_id=TRF_PRJ,
|
||||||
|
location=BQ_REGION,
|
||||||
|
configuration={
|
||||||
|
'jobType':'QUERY',
|
||||||
|
'query':{
|
||||||
|
'query':"""SELECT
|
||||||
|
c.id as customer_id,
|
||||||
|
p.id as purchase_id,
|
||||||
|
c.name as name,
|
||||||
|
c.surname as surname,
|
||||||
|
p.item as item,
|
||||||
|
p.price as price,
|
||||||
|
p.timestamp as timestamp
|
||||||
|
FROM `{dtl_0_prj}.{dtl_0_dataset}.customers` c
|
||||||
|
JOIN `{dtl_0_prj}.{dtl_0_dataset}.purchases` p ON c.id = p.customer_id
|
||||||
|
""".format(dtl_0_prj=DTL_L0_PRJ, dtl_0_dataset=DTL_L0_BQ_DATASET, ),
|
||||||
|
'destinationTable':{
|
||||||
|
'projectId': DTL_L1_PRJ,
|
||||||
|
'datasetId': DTL_L1_BQ_DATASET,
|
||||||
|
'tableId': 'customer_purchase'
|
||||||
|
},
|
||||||
|
'writeDisposition':'WRITE_TRUNCATE',
|
||||||
|
"useLegacySql": False
|
||||||
|
}
|
||||||
|
},
|
||||||
|
impersonation_chain=[TRF_SA_BQ]
|
||||||
|
)
|
||||||
|
|
||||||
|
l2_customer_purchase = BigQueryInsertJobOperator(
|
||||||
|
task_id='bq_l2_customer_purchase',
|
||||||
|
gcp_conn_id='bigquery_default',
|
||||||
|
project_id=TRF_PRJ,
|
||||||
|
location=BQ_REGION,
|
||||||
|
configuration={
|
||||||
|
'jobType':'QUERY',
|
||||||
|
'query':{
|
||||||
|
'query':"""SELECT
|
||||||
|
customer_id,
|
||||||
|
purchase_id,
|
||||||
|
name,
|
||||||
|
surname,
|
||||||
|
item,
|
||||||
|
price,
|
||||||
|
timestamp
|
||||||
|
FROM `{dtl_1_prj}.{dtl_1_dataset}.customer_purchase`
|
||||||
|
""".format(dtl_1_prj=DTL_L1_PRJ, dtl_1_dataset=DTL_L1_BQ_DATASET, ),
|
||||||
|
'destinationTable':{
|
||||||
|
'projectId': DTL_L2_PRJ,
|
||||||
|
'datasetId': DTL_L2_BQ_DATASET,
|
||||||
|
'tableId': 'customer_purchase'
|
||||||
|
},
|
||||||
|
'writeDisposition':'WRITE_TRUNCATE',
|
||||||
|
"useLegacySql": False
|
||||||
|
}
|
||||||
|
},
|
||||||
|
impersonation_chain=[TRF_SA_BQ]
|
||||||
|
)
|
||||||
|
|
||||||
|
start >> [customers_import, purchases_import] >> join_customer_purchase >> l2_customer_purchase >> end
|
Binary file not shown.
After Width: | Height: | Size: 27 KiB |
Binary file not shown.
After Width: | Height: | Size: 20 KiB |
Binary file not shown.
After Width: | Height: | Size: 70 KiB |
|
@ -0,0 +1,53 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Core locals.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
groups = {
|
||||||
|
for k, v in var.groups : k => "${v}@${var.organization_domain}"
|
||||||
|
}
|
||||||
|
groups_iam = {
|
||||||
|
for k, v in local.groups : k => "group:${v}"
|
||||||
|
}
|
||||||
|
service_encryption_keys = var.service_encryption_keys
|
||||||
|
shared_vpc_project = try(var.network_config.host_project, null)
|
||||||
|
use_shared_vpc = var.network_config != null
|
||||||
|
}
|
||||||
|
|
||||||
|
module "shared-vpc-project" {
|
||||||
|
source = "../../../modules/project"
|
||||||
|
count = local.use_shared_vpc ? 1 : 0
|
||||||
|
name = var.network_config.host_project
|
||||||
|
project_create = false
|
||||||
|
iam_additive = {
|
||||||
|
"roles/compute.networkUser" = [
|
||||||
|
# load Dataflow service agent and worker service account
|
||||||
|
module.load-project.service_accounts.robots.dataflow,
|
||||||
|
module.load-sa-df-0.iam_email,
|
||||||
|
# orchestration Composer service agents
|
||||||
|
module.orch-project.service_accounts.robots.cloudservices,
|
||||||
|
module.orch-project.service_accounts.robots.container-engine,
|
||||||
|
module.orch-project.service_accounts.robots.dataflow,
|
||||||
|
],
|
||||||
|
"roles/composer.sharedVpcAgent" = [
|
||||||
|
# orchestration Composer service agent
|
||||||
|
module.orch-project.service_accounts.robots.composer
|
||||||
|
],
|
||||||
|
"roles/container.hostServiceAgentUser" = [
|
||||||
|
# orchestration Composer service agents
|
||||||
|
module.orch-project.service_accounts.robots.dataflow,
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,104 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Output variables.
|
||||||
|
|
||||||
|
output "bigquery-datasets" {
|
||||||
|
description = "BigQuery datasets."
|
||||||
|
value = {
|
||||||
|
land-bq-0 = module.land-bq-0.dataset_id,
|
||||||
|
lake-0-bq-0 = module.lake-0-bq-0.dataset_id,
|
||||||
|
lake-1-bq-0 = module.lake-1-bq-0.dataset_id,
|
||||||
|
lake-2-bq-0 = module.lake-2-bq-0.dataset_id,
|
||||||
|
lake-plg-bq-0 = module.lake-plg-bq-0.dataset_id,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "gcs-buckets" {
|
||||||
|
description = "GCS buckets."
|
||||||
|
value = {
|
||||||
|
lake-0-cs-0 = module.lake-0-cs-0.name,
|
||||||
|
lake-1-cs-0 = module.lake-1-cs-0.name,
|
||||||
|
lake-2-cs-0 = module.lake-2-cs-0.name,
|
||||||
|
lake-plg-cs-0 = module.lake-plg-cs-0.name,
|
||||||
|
land-cs-0 = module.land-cs-0.name,
|
||||||
|
lod-cs-df = module.load-cs-df-0.name,
|
||||||
|
orch-cs-0 = module.orch-cs-0.name,
|
||||||
|
transf-cs-df = module.transf-cs-df-0.name,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "kms_keys" {
|
||||||
|
description = "Cloud MKS keys."
|
||||||
|
value = local.service_encryption_keys
|
||||||
|
}
|
||||||
|
|
||||||
|
output "projects" {
|
||||||
|
description = "GCP Projects informations."
|
||||||
|
value = {
|
||||||
|
project_number = {
|
||||||
|
lake-0 = module.lake-0-project.number,
|
||||||
|
lake-1 = module.lake-1-project.number,
|
||||||
|
lake-2 = module.lake-2-project.number,
|
||||||
|
lake-plg = module.lake-plg-project.number,
|
||||||
|
exposure = module.exp-project.number,
|
||||||
|
landing = module.land-project.number,
|
||||||
|
load = module.load-project.number,
|
||||||
|
orchestration = module.orch-project.number,
|
||||||
|
transformation = module.transf-project.number,
|
||||||
|
}
|
||||||
|
project_id = {
|
||||||
|
lake-0 = module.lake-0-project.project_id,
|
||||||
|
lake-1 = module.lake-1-project.project_id,
|
||||||
|
lake-2 = module.lake-2-project.project_id,
|
||||||
|
lake-plg = module.lake-plg-project.project_id,
|
||||||
|
exposure = module.exp-project.project_id,
|
||||||
|
landing = module.land-project.project_id,
|
||||||
|
load = module.load-project.project_id,
|
||||||
|
orchestration = module.orch-project.project_id,
|
||||||
|
transformation = module.transf-project.project_id,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "vpc_network" {
|
||||||
|
description = "VPC network."
|
||||||
|
value = {
|
||||||
|
load = local.load_vpc
|
||||||
|
orchestration = local.orch_vpc
|
||||||
|
transformation = local.transf_vpc
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "vpc_subnet" {
|
||||||
|
description = "VPC subnetworks."
|
||||||
|
value = {
|
||||||
|
load = local.load_subnet
|
||||||
|
orchestration = local.orch_subnet
|
||||||
|
transformation = local.transf_subnet
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
output "demo_commands" {
|
||||||
|
description = "Demo commands."
|
||||||
|
value = {
|
||||||
|
01 = "gsutil -i ${module.land-sa-cs-0.email} cp demo/data/*.csv gs://${module.land-cs-0.name}"
|
||||||
|
02 = "gsutil -i ${module.orch-sa-cmp-0.email} cp demo/data/*.j* gs://${module.orch-cs-0.name}"
|
||||||
|
03 = "gsutil -i ${module.orch-sa-cmp-0.email} cp demo/*.py ${google_composer_environment.orch-cmp-0.config[0].dag_gcs_prefix}/"
|
||||||
|
04 = "Open ${google_composer_environment.orch-cmp-0.config.0.airflow_uri} and run uploaded DAG."
|
||||||
|
05 = <<EOT
|
||||||
|
bq query --project_id=${module.lake-2-project.project_id} --use_legacy_sql=false 'SELECT * FROM `${module.lake-2-project.project_id}.${module.lake-2-bq-0.dataset_id}.customer_purchase` LIMIT 1000'"
|
||||||
|
EOT
|
||||||
|
}
|
||||||
|
}
|
|
@ -0,0 +1,8 @@
|
||||||
|
prefix = "prefix"
|
||||||
|
project_create = {
|
||||||
|
parent = "folders/123456789012"
|
||||||
|
billing_account_id = "111111-222222-333333"
|
||||||
|
}
|
||||||
|
organization = {
|
||||||
|
domain = "example.com"
|
||||||
|
}
|
|
@ -0,0 +1,116 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Terraform Variables.
|
||||||
|
|
||||||
|
variable "billing_account_id" {
|
||||||
|
description = "Billing account id."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "composer_config" {
|
||||||
|
type = object({
|
||||||
|
node_count = number
|
||||||
|
airflow_version = string
|
||||||
|
env_variables = map(string)
|
||||||
|
})
|
||||||
|
default = {
|
||||||
|
node_count = 3
|
||||||
|
airflow_version = "composer-1.17.5-airflow-2.1.4"
|
||||||
|
env_variables = {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "data_force_destroy" {
|
||||||
|
description = "Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage."
|
||||||
|
type = bool
|
||||||
|
default = false
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "folder_id" {
|
||||||
|
description = "Folder to be used for the networking resources in folders/nnnn format."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "groups" {
|
||||||
|
description = "Groups."
|
||||||
|
type = map(string)
|
||||||
|
default = {
|
||||||
|
data-analysts = "gcp-data-analysts"
|
||||||
|
data-engineers = "gcp-data-engineers"
|
||||||
|
data-security = "gcp-data-security"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "network_config" {
|
||||||
|
description = "Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values."
|
||||||
|
type = object({
|
||||||
|
host_project = string
|
||||||
|
network_self_link = string
|
||||||
|
subnet_self_links = object({
|
||||||
|
load = string
|
||||||
|
transformation = string
|
||||||
|
orchestration = string
|
||||||
|
})
|
||||||
|
composer_ip_ranges = object({
|
||||||
|
cloudsql = string
|
||||||
|
gke_master = string
|
||||||
|
web_server = string
|
||||||
|
})
|
||||||
|
composer_secondary_ranges = object({
|
||||||
|
pods = string
|
||||||
|
services = string
|
||||||
|
})
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "organization_domain" {
|
||||||
|
description = "Organization domain."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "prefix" {
|
||||||
|
description = "Unique prefix used for resource names."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "project_services" {
|
||||||
|
description = "List of core services enabled on all projects."
|
||||||
|
type = list(string)
|
||||||
|
default = [
|
||||||
|
"cloudresourcemanager.googleapis.com",
|
||||||
|
"iam.googleapis.com",
|
||||||
|
"serviceusage.googleapis.com",
|
||||||
|
"stackdriver.googleapis.com"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "region" {
|
||||||
|
description = "Region used for regional resources."
|
||||||
|
type = string
|
||||||
|
default = "europe-west1"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "service_encryption_keys" { # service encription key
|
||||||
|
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
|
||||||
|
type = object({
|
||||||
|
bq = string
|
||||||
|
composer = string
|
||||||
|
dataflow = string
|
||||||
|
storage = string
|
||||||
|
pubsub = string
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
|
@ -195,6 +195,7 @@ resource "google_organization_iam_binding" "org_admin_delegated" {
|
||||||
"roles/compute.orgFirewallPolicyAdmin",
|
"roles/compute.orgFirewallPolicyAdmin",
|
||||||
"roles/compute.xpnAdmin",
|
"roles/compute.xpnAdmin",
|
||||||
"roles/orgpolicy.policyAdmin",
|
"roles/orgpolicy.policyAdmin",
|
||||||
|
module.organization.custom_role_id.serviceProjectNetworkAdmin
|
||||||
],
|
],
|
||||||
local.billing_org ? [
|
local.billing_org ? [
|
||||||
"roles/billing.admin",
|
"roles/billing.admin",
|
||||||
|
|
|
@ -67,6 +67,16 @@ locals {
|
||||||
billing_account_id = var.billing_account.id
|
billing_account_id = var.billing_account.id
|
||||||
prefix = var.prefix
|
prefix = var.prefix
|
||||||
})
|
})
|
||||||
|
"03-data-platform-dev" = jsonencode({
|
||||||
|
billing_account_id = var.billing_account.id
|
||||||
|
organization = var.organization
|
||||||
|
prefix = var.prefix
|
||||||
|
})
|
||||||
|
"03-data-platform-prod" = jsonencode({
|
||||||
|
billing_account_id = var.billing_account.id
|
||||||
|
organization = var.organization
|
||||||
|
prefix = var.prefix
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -20,6 +20,8 @@ locals {
|
||||||
# used here for convenience, in organization.tf members are explicit
|
# used here for convenience, in organization.tf members are explicit
|
||||||
billing_ext_users = concat(
|
billing_ext_users = concat(
|
||||||
[
|
[
|
||||||
|
module.branch-dp-dev-sa.iam_email,
|
||||||
|
module.branch-dp-prod-sa.iam_email,
|
||||||
module.branch-network-sa.iam_email,
|
module.branch-network-sa.iam_email,
|
||||||
module.branch-security-sa.iam_email,
|
module.branch-security-sa.iam_email,
|
||||||
],
|
],
|
||||||
|
|
|
@ -0,0 +1,137 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2022 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
# tfdoc:file:description Data Platform stages resources.
|
||||||
|
|
||||||
|
# top-level Data Platform folder and service account
|
||||||
|
|
||||||
|
module "branch-dp-folder" {
|
||||||
|
source = "../../../modules/folder"
|
||||||
|
parent = "organizations/${var.organization.id}"
|
||||||
|
name = "Dataplatform"
|
||||||
|
}
|
||||||
|
|
||||||
|
#TODO check if I can delete those modules, Would you create a data-platform TF to run dev/prod?
|
||||||
|
# module "branch-dp-sa" {
|
||||||
|
# source = "../../../modules/iam-service-account"
|
||||||
|
# project_id = var.automation_project_id
|
||||||
|
# name = "resman-dp-0"
|
||||||
|
# description = "Terraform Data Platform production service account."
|
||||||
|
# prefix = local.prefixes.prod
|
||||||
|
# }
|
||||||
|
|
||||||
|
# module "branch-dp-gcs" {
|
||||||
|
# source = "../../../modules/gcs"
|
||||||
|
# project_id = var.automation_project_id
|
||||||
|
# name = "dp-0"
|
||||||
|
# prefix = local.prefixes.prod
|
||||||
|
# versioning = true
|
||||||
|
# iam = {
|
||||||
|
# "roles/storage.objectAdmin" = [module.branch-dp-sa.iam_email]
|
||||||
|
# }
|
||||||
|
# }
|
||||||
|
|
||||||
|
# environment: development folder
|
||||||
|
|
||||||
|
module "branch-dp-dev-folder" {
|
||||||
|
source = "../../../modules/folder"
|
||||||
|
parent = module.branch-dp-folder.id
|
||||||
|
# naming: environment descriptive name
|
||||||
|
name = "Data Platform - Development"
|
||||||
|
# environment-wide human permissions on the whole Data Platform environment
|
||||||
|
group_iam = {}
|
||||||
|
iam = {
|
||||||
|
# remove owner here and at project level if SA does not manage project resources
|
||||||
|
"roles/owner" = [
|
||||||
|
module.branch-dp-dev-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/logging.admin" = [
|
||||||
|
module.branch-dp-dev-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/resourcemanager.folderAdmin" = [
|
||||||
|
module.branch-dp-dev-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/resourcemanager.projectCreator" = [
|
||||||
|
module.branch-dp-dev-sa.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "branch-dp-dev-sa" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = var.automation_project_id
|
||||||
|
name = "resman-dp-dev-0"
|
||||||
|
# naming: environment in description
|
||||||
|
description = "Terraform Data Platform development service account."
|
||||||
|
prefix = local.prefixes.dev
|
||||||
|
}
|
||||||
|
|
||||||
|
module "branch-dp-dev-gcs" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = var.automation_project_id
|
||||||
|
name = "resman-dp-0"
|
||||||
|
prefix = local.prefixes.dev
|
||||||
|
versioning = true
|
||||||
|
iam = {
|
||||||
|
"roles/storage.objectAdmin" = [module.branch-dp-dev-sa.iam_email]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
# environment: production folder
|
||||||
|
|
||||||
|
module "branch-dp-prod-folder" {
|
||||||
|
source = "../../../modules/folder"
|
||||||
|
parent = module.branch-dp-folder.id
|
||||||
|
# naming: environment descriptive name
|
||||||
|
name = "Data Platform - Production"
|
||||||
|
# environment-wide human permissions on the whole Data Platform environment
|
||||||
|
group_iam = {}
|
||||||
|
iam = {
|
||||||
|
# remove owner here and at project level if SA does not manage project resources
|
||||||
|
"roles/owner" = [
|
||||||
|
module.branch-dp-prod-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/logging.admin" = [
|
||||||
|
module.branch-dp-prod-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/resourcemanager.folderAdmin" = [
|
||||||
|
module.branch-dp-prod-sa.iam_email
|
||||||
|
]
|
||||||
|
"roles/resourcemanager.projectCreator" = [
|
||||||
|
module.branch-dp-prod-sa.iam_email
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
module "branch-dp-prod-sa" {
|
||||||
|
source = "../../../modules/iam-service-account"
|
||||||
|
project_id = var.automation_project_id
|
||||||
|
name = "resman-dp-0"
|
||||||
|
# naming: environment in description
|
||||||
|
description = "Terraform Data Platform production service account."
|
||||||
|
prefix = local.prefixes.prod
|
||||||
|
}
|
||||||
|
|
||||||
|
module "branch-dp-prod-gcs" {
|
||||||
|
source = "../../../modules/gcs"
|
||||||
|
project_id = var.automation_project_id
|
||||||
|
name = "resman-dp-0"
|
||||||
|
prefix = local.prefixes.prod
|
||||||
|
versioning = true
|
||||||
|
iam = {
|
||||||
|
"roles/storage.objectAdmin" = [module.branch-dp-prod-sa.iam_email]
|
||||||
|
}
|
||||||
|
}
|
|
@ -18,6 +18,11 @@
|
||||||
|
|
||||||
|
|
||||||
locals {
|
locals {
|
||||||
|
# set to the empty list if you remove the data platform branch
|
||||||
|
branch_dataplatform_pf_sa_iam_emails = [
|
||||||
|
module.branch-dp-dev-sa.iam_email,
|
||||||
|
module.branch-dp-prod-sa.iam_email
|
||||||
|
]
|
||||||
# set to the empty list if you remove the teams branch
|
# set to the empty list if you remove the teams branch
|
||||||
branch_teams_pf_sa_iam_emails = [
|
branch_teams_pf_sa_iam_emails = [
|
||||||
module.branch-teams-dev-projectfactory-sa.iam_email,
|
module.branch-teams-dev-projectfactory-sa.iam_email,
|
||||||
|
@ -58,7 +63,10 @@ module "organization" {
|
||||||
"roles/compute.xpnAdmin" = [
|
"roles/compute.xpnAdmin" = [
|
||||||
module.branch-network-sa.iam_email
|
module.branch-network-sa.iam_email
|
||||||
]
|
]
|
||||||
"roles/orgpolicy.policyAdmin" = local.branch_teams_pf_sa_iam_emails
|
"roles/orgpolicy.policyAdmin" = concat(
|
||||||
|
local.branch_dataplatform_pf_sa_iam_emails,
|
||||||
|
local.branch_teams_pf_sa_iam_emails
|
||||||
|
)
|
||||||
},
|
},
|
||||||
local.billing_org ? {
|
local.billing_org ? {
|
||||||
"roles/billing.costsManager" = local.branch_teams_pf_sa_iam_emails
|
"roles/billing.costsManager" = local.branch_teams_pf_sa_iam_emails
|
||||||
|
@ -71,6 +79,7 @@ module "organization" {
|
||||||
# [
|
# [
|
||||||
# for k, v in module.branch-teams-team-sa : v.iam_email
|
# for k, v in module.branch-teams-team-sa : v.iam_email
|
||||||
# ],
|
# ],
|
||||||
|
local.branch_dataplatform_pf_sa_iam_emails,
|
||||||
local.branch_teams_pf_sa_iam_emails
|
local.branch_teams_pf_sa_iam_emails
|
||||||
)
|
)
|
||||||
} : {}
|
} : {}
|
||||||
|
|
|
@ -15,6 +15,10 @@
|
||||||
*/
|
*/
|
||||||
|
|
||||||
locals {
|
locals {
|
||||||
|
_data_platform_sas = {
|
||||||
|
dev = module.branch-dp-dev-sa.iam_email
|
||||||
|
prod = module.branch-dp-prod-sa.iam_email
|
||||||
|
}
|
||||||
_project_factory_sas = {
|
_project_factory_sas = {
|
||||||
dev = module.branch-teams-dev-projectfactory-sa.iam_email
|
dev = module.branch-teams-dev-projectfactory-sa.iam_email
|
||||||
prod = module.branch-teams-prod-projectfactory-sa.iam_email
|
prod = module.branch-teams-prod-projectfactory-sa.iam_email
|
||||||
|
@ -30,6 +34,16 @@ locals {
|
||||||
name = "security"
|
name = "security"
|
||||||
sa = module.branch-security-sa.email
|
sa = module.branch-security-sa.email
|
||||||
})
|
})
|
||||||
|
"03-data-platform-dev" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
|
||||||
|
bucket = module.branch-dp-dev-gcs.name
|
||||||
|
name = "dp-dev"
|
||||||
|
sa = module.branch-dp-dev-sa.email
|
||||||
|
})
|
||||||
|
"03-data-platform-prod" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
|
||||||
|
bucket = module.branch-dp-prod-gcs.name
|
||||||
|
name = "dp-prod"
|
||||||
|
sa = module.branch-dp-prod-sa.email
|
||||||
|
})
|
||||||
"03-project-factory-dev" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
|
"03-project-factory-dev" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
|
||||||
bucket = module.branch-teams-dev-projectfactory-gcs.name
|
bucket = module.branch-teams-dev-projectfactory-gcs.name
|
||||||
name = "team-dev"
|
name = "team-dev"
|
||||||
|
@ -48,12 +62,14 @@ locals {
|
||||||
}
|
}
|
||||||
tfvars = {
|
tfvars = {
|
||||||
"02-networking" = jsonencode({
|
"02-networking" = jsonencode({
|
||||||
|
data_platform_sa = local._data_platform_sas
|
||||||
folder_ids = {
|
folder_ids = {
|
||||||
networking = module.branch-network-folder.id
|
networking = module.branch-network-folder.id
|
||||||
networking-dev = module.branch-network-dev-folder.id
|
networking-dev = module.branch-network-dev-folder.id
|
||||||
networking-prod = module.branch-network-prod-folder.id
|
networking-prod = module.branch-network-prod-folder.id
|
||||||
}
|
}
|
||||||
project_factory_sa = local._project_factory_sas
|
project_factory_sa = local._project_factory_sas
|
||||||
|
data_platform_sa = local._data_platform_sas
|
||||||
})
|
})
|
||||||
"02-security" = jsonencode({
|
"02-security" = jsonencode({
|
||||||
folder_id = module.branch-security-folder.id
|
folder_id = module.branch-security-folder.id
|
||||||
|
@ -61,6 +77,14 @@ locals {
|
||||||
for k, v in local._project_factory_sas : k => [v]
|
for k, v in local._project_factory_sas : k => [v]
|
||||||
}
|
}
|
||||||
})
|
})
|
||||||
|
"03-data-platform-dev" = jsonencode({
|
||||||
|
folder_id = module.branch-dp-dev-folder.id
|
||||||
|
date_platform_sa = module.branch-dp-dev-sa.iam_email
|
||||||
|
})
|
||||||
|
"03-data-platform-prod" = jsonencode({
|
||||||
|
folder_id = module.branch-dp-dev-folder.id
|
||||||
|
date_platform_sa = module.branch-dp-dev-sa.iam_email
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -0,0 +1,33 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
allow-dataflow-load-ingress-traffic:
|
||||||
|
description: "Allow traffic on Cloud Dataflow subnet"
|
||||||
|
direction: INGRESS
|
||||||
|
action: allow
|
||||||
|
sources: []
|
||||||
|
ranges:
|
||||||
|
- 10.10.0.0/24
|
||||||
|
- 10.10.1.0/24
|
||||||
|
targets: []
|
||||||
|
use_service_accounts: false
|
||||||
|
rules:
|
||||||
|
- protocol: tcp
|
||||||
|
ports:
|
||||||
|
- 12345
|
||||||
|
- 12346
|
||||||
|
|
||||||
|
allow-composer-health-checks:
|
||||||
|
description: "Allow Health Checks"
|
||||||
|
direction: INGRESS
|
||||||
|
action: allow
|
||||||
|
sources: []
|
||||||
|
ranges:
|
||||||
|
- 130.211.0.0/22
|
||||||
|
- 35.191.0.0/16
|
||||||
|
targets: []
|
||||||
|
use_service_accounts: false
|
||||||
|
rules:
|
||||||
|
- protocol: tcp
|
||||||
|
ports:
|
||||||
|
- 80
|
||||||
|
- 443
|
|
@ -0,0 +1,5 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Load layer Dataflow
|
||||||
|
ip_cidr_range: 10.10.0.0/24
|
|
@ -0,0 +1,8 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Orchestration layer Composer
|
||||||
|
ip_cidr_range: 172.18.16.0/24
|
||||||
|
secondary_ip_range :
|
||||||
|
pods: 172.18.24.0/22
|
||||||
|
services: 172.18.28.0/24
|
|
@ -0,0 +1,5 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Transformation layer Dataflow
|
||||||
|
ip_cidr_range: 10.10.1.0/24
|
|
@ -0,0 +1,5 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Load layer Dataflow
|
||||||
|
ip_cidr_range: 10.20.0.0/24
|
|
@ -0,0 +1,8 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Orchestration layer Composer
|
||||||
|
ip_cidr_range: 10.20.2.0/24
|
||||||
|
secondary_ip_range :
|
||||||
|
pods: 10.20.8.0/22
|
||||||
|
services: 10.20.12.0/24
|
|
@ -0,0 +1,5 @@
|
||||||
|
# skip boilerplate check
|
||||||
|
|
||||||
|
region: europe-west1
|
||||||
|
description: Default subnet for dev Data Platform - Transformation layer Dataflow
|
||||||
|
ip_cidr_range: 10.20.1.0/24
|
|
@ -89,5 +89,5 @@ module "landing-nat-ew1" {
|
||||||
router_create = true
|
router_create = true
|
||||||
router_name = "prod-nat-ew1"
|
router_name = "prod-nat-ew1"
|
||||||
router_network = module.landing-vpc.name
|
router_network = module.landing-vpc.name
|
||||||
router_asn = 4200001024
|
router_asn = 65530
|
||||||
}
|
}
|
||||||
|
|
|
@ -27,6 +27,30 @@ locals {
|
||||||
shared_vpc_self_link = module.prod-spoke-vpc.self_link
|
shared_vpc_self_link = module.prod-spoke-vpc.self_link
|
||||||
vpc_host_project = module.prod-spoke-project.project_id
|
vpc_host_project = module.prod-spoke-project.project_id
|
||||||
})
|
})
|
||||||
|
"03-data-platform-prod" = jsonencode({
|
||||||
|
network_self_link = module.prod-spoke-vpc.self_link
|
||||||
|
subnet_self_links = {
|
||||||
|
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].self_link
|
||||||
|
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].self_link
|
||||||
|
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].self_link
|
||||||
|
}
|
||||||
|
})
|
||||||
|
"03-data-platform-prod" = jsonencode({
|
||||||
|
network_config = {
|
||||||
|
host_project = module.prod-spoke-project.project_id
|
||||||
|
network = module.prod-spoke-vpc.self_link
|
||||||
|
vpc_subnet_range = {
|
||||||
|
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].ip_cidr_range
|
||||||
|
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].ip_cidr_range
|
||||||
|
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].ip_cidr_range
|
||||||
|
}
|
||||||
|
vpc_subnet_self_link = {
|
||||||
|
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].self_link
|
||||||
|
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].self_link
|
||||||
|
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].self_link
|
||||||
|
}
|
||||||
|
}
|
||||||
|
})
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -27,6 +27,7 @@ module "dev-spoke-project" {
|
||||||
disable_dependent_services = false
|
disable_dependent_services = false
|
||||||
}
|
}
|
||||||
services = [
|
services = [
|
||||||
|
"container.googleapis.com",
|
||||||
"compute.googleapis.com",
|
"compute.googleapis.com",
|
||||||
"dns.googleapis.com",
|
"dns.googleapis.com",
|
||||||
"iap.googleapis.com",
|
"iap.googleapis.com",
|
||||||
|
@ -92,7 +93,7 @@ module "dev-spoke-cloudnat" {
|
||||||
name = "dev-nat-${local.region_trigram[each.value]}"
|
name = "dev-nat-${local.region_trigram[each.value]}"
|
||||||
router_create = true
|
router_create = true
|
||||||
router_network = module.dev-spoke-vpc.name
|
router_network = module.dev-spoke-vpc.name
|
||||||
router_asn = 4200001024
|
router_asn = 65530
|
||||||
logging_filter = "ERRORS_ONLY"
|
logging_filter = "ERRORS_ONLY"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -112,6 +113,7 @@ resource "google_project_iam_binding" "dev_spoke_project_iam_delegated" {
|
||||||
project = module.dev-spoke-project.project_id
|
project = module.dev-spoke-project.project_id
|
||||||
role = "roles/resourcemanager.projectIamAdmin"
|
role = "roles/resourcemanager.projectIamAdmin"
|
||||||
members = [
|
members = [
|
||||||
|
var.data_platform_sa.dev,
|
||||||
var.project_factory_sa.dev
|
var.project_factory_sa.dev
|
||||||
]
|
]
|
||||||
condition {
|
condition {
|
||||||
|
|
|
@ -92,7 +92,7 @@ module "prod-spoke-cloudnat" {
|
||||||
name = "prod-nat-${local.region_trigram[each.value]}"
|
name = "prod-nat-${local.region_trigram[each.value]}"
|
||||||
router_create = true
|
router_create = true
|
||||||
router_network = module.prod-spoke-vpc.name
|
router_network = module.prod-spoke-vpc.name
|
||||||
router_asn = 4200001024
|
router_asn = 65530
|
||||||
logging_filter = "ERRORS_ONLY"
|
logging_filter = "ERRORS_ONLY"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
@ -112,6 +112,7 @@ resource "google_project_iam_binding" "prod_spoke_project_iam_delegated" {
|
||||||
project = module.prod-spoke-project.project_id
|
project = module.prod-spoke-project.project_id
|
||||||
role = "roles/resourcemanager.projectIamAdmin"
|
role = "roles/resourcemanager.projectIamAdmin"
|
||||||
members = [
|
members = [
|
||||||
|
var.data_platform_sa.prod,
|
||||||
var.project_factory_sa.prod
|
var.project_factory_sa.prod
|
||||||
]
|
]
|
||||||
condition {
|
condition {
|
||||||
|
|
|
@ -50,6 +50,13 @@ variable "data_dir" {
|
||||||
default = "data"
|
default = "data"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
variable "data_platform_sa" {
|
||||||
|
# tfdoc:variable:source 01-resman
|
||||||
|
description = "IAM emails for Data Platform service accounts."
|
||||||
|
type = map(string)
|
||||||
|
default = {}
|
||||||
|
}
|
||||||
|
|
||||||
variable "dns" {
|
variable "dns" {
|
||||||
description = "Onprem DNS resolvers."
|
description = "Onprem DNS resolvers."
|
||||||
type = map(list(string))
|
type = map(list(string))
|
||||||
|
|
|
@ -0,0 +1,6 @@
|
||||||
|
# Data Platform
|
||||||
|
|
||||||
|
The Data Platform (DP) builds on top of your foundations to create and set up projects (and related resources) to be used for your workloads.
|
||||||
|
It is organized in folders representing environments (e.g. "dev", "prod"), each implemented by a stand-alone terraform.
|
||||||
|
|
||||||
|
This directory contains a single DP ([`dev/`](./dev/)) as an example - to implement multiple environments (e.g. "prod" and "dev") you'll need to copy the `dev` folder into one folder per environment, then customize variables following the instructions found in [`dev/README.md`](./dev/README.md).
|
|
@ -0,0 +1,140 @@
|
||||||
|
# Data Platform
|
||||||
|
|
||||||
|
The Data Platform (DP) builds on top of your foundations to create and set up projects (and related resources) to be used for your data platform.
|
||||||
|
|
||||||
|
<p align="center">
|
||||||
|
<img src="diagram.png" alt="Data Platform diagram">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
## Design overview and choices
|
||||||
|
|
||||||
|
The DP creates projects in a well-defined context, according to your resource management structure. Within the DP folder, resources are organized by environment.
|
||||||
|
|
||||||
|
Projects for each environment across different data layer are created to separate Service Account and Group roles. Roles are assigned at project level.
|
||||||
|
|
||||||
|
The Data Platform takes care of the following activities:
|
||||||
|
|
||||||
|
- Project creation
|
||||||
|
- API/Services enablement
|
||||||
|
- Service accounts creation
|
||||||
|
- IAM roles assignment for groups and service accounts
|
||||||
|
- KMS keys roles assignment
|
||||||
|
- Shared VPC attachment and subnets IAM binding
|
||||||
|
- Project-level org policies definition
|
||||||
|
- Billing setup (billing account attachment and budget configuration)
|
||||||
|
- Resource on each project to handle your data platform.
|
||||||
|
|
||||||
|
You can find more details on the DP implemented on the DP [README](../../../examples/data-solutions/data-platform-foundations/).
|
||||||
|
|
||||||
|
### User Groups
|
||||||
|
|
||||||
|
The DP rely on user groups to assign roles. They provide a stable frame of reference that allows decoupling the final set of permissions for each group, from the stage where entities and resources are created and their IAM bindings defined. [Here]((../../../examples/data-solutions/data-platform-foundations/#groups)) you can find more detail on users groups used by the DP.
|
||||||
|
### Network
|
||||||
|
|
||||||
|
The DP rely on the shared VPC defined on the `[02-networking](../../../02-network-vpn)` stage.
|
||||||
|
|
||||||
|
### Encryption
|
||||||
|
|
||||||
|
The DP may rely on Cloud KMS crypto keys created by the `[02-security](../../../02-security)` stage.
|
||||||
|
|
||||||
|
## How to run this stage
|
||||||
|
|
||||||
|
This stage is meant to be executed after "foundational stages" (i.e., stages [`00-bootstrap`](../../00-bootstrap), [`01-resman`](../../01-resman), [`02-networking`](../../02-networking) and [`02-security`](../../02-security)) have been run.
|
||||||
|
|
||||||
|
It's of course possible to run this stage in isolation, by making sure the architectural prerequisites are satisfied (e.g., networking), and that the Service Account running the stage is granted the roles/permissions below:
|
||||||
|
|
||||||
|
- One service account per environment, each with appropriate permissions
|
||||||
|
- at the organization level a custom role for networking operations including the following permissions
|
||||||
|
- `"compute.organizations.enableXpnResource"`,
|
||||||
|
- `"compute.organizations.disableXpnResource"`,
|
||||||
|
- `"compute.subnetworks.setIamPolicy"`,
|
||||||
|
- and role `"roles/orgpolicy.policyAdmin"`
|
||||||
|
- on each folder where projects are created
|
||||||
|
- `"roles/logging.admin"`
|
||||||
|
- `"roles/owner"`
|
||||||
|
- `"roles/resourcemanager.folderAdmin"`
|
||||||
|
- `"roles/resourcemanager.projectCreator"`
|
||||||
|
- on the host project for the Shared VPC
|
||||||
|
- `"roles/browser"`
|
||||||
|
- `"roles/compute.viewer"`
|
||||||
|
- VPC Host projects and their subnets should exist when creating projects
|
||||||
|
|
||||||
|
### Providers configuration
|
||||||
|
|
||||||
|
If you're running this on top of Fast, you should run the following commands to create the providers file, and populate the required variables from the previous stage.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Variable `outputs_location` is set to `../../../config` in stage 01-resman
|
||||||
|
$ cd fabric-fast/stages/03-data-platform/dev
|
||||||
|
ln -s ../../../config/03-data-platform-dev/providers.tf
|
||||||
|
```
|
||||||
|
|
||||||
|
### Variable configuration
|
||||||
|
|
||||||
|
There are two broad sets of variables you will need to fill in:
|
||||||
|
|
||||||
|
- variables shared by other stages (org id, billing account id, etc.), or derived from a resource managed by a different stage (folder id, automation project id, etc.)
|
||||||
|
- variables specific to resources managed by this stage
|
||||||
|
|
||||||
|
To avoid the tedious job of filling in the first group of variables with values derived from other stages' outputs, the same mechanism used above for the provider configuration can be used to leverage pre-configured `.tfvars` files.
|
||||||
|
|
||||||
|
If you configured a valid path for `outputs_location` in the bootstrap and networking stage, simply link the relevant `terraform-*.auto.tfvars.json` files from this stage's outputs folder (under the path you specified), where the `*` above is set to the name of the stage that produced it. For this stage, a single `.tfvars` file is available:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Variable `outputs_location` is set to `../../../config` in stages 01-bootstrap and 02-networking
|
||||||
|
ln -s ../../../config/03-data-platform-prod/terraform-bootstrap.auto.tfvars.json
|
||||||
|
ln -s ../../../config/03-data-platform-prod/terraform-networking.auto.tfvars.json
|
||||||
|
```
|
||||||
|
|
||||||
|
If you're not using Fast, refer to the [Variables](#variables) table at the bottom of this document for a full list of variables, their origin (e.g., a stage or specific to this one), and descriptions explaining their meaning.
|
||||||
|
|
||||||
|
Once the configuration is complete, run the project factory by running
|
||||||
|
|
||||||
|
```bash
|
||||||
|
terraform init
|
||||||
|
terraform apply
|
||||||
|
```
|
||||||
|
|
||||||
|
<!-- TFDOC OPTS files:1 show_extra:1 -->
|
||||||
|
<!-- BEGIN TFDOC -->
|
||||||
|
|
||||||
|
## Files
|
||||||
|
|
||||||
|
| name | description | modules | resources |
|
||||||
|
|---|---|---|---|
|
||||||
|
| [main.tf](./main.tf) | Data Platformy. | <code>data-platform-foundations</code> | |
|
||||||
|
| [outputs.tf](./outputs.tf) | Output variables. | | <code>local_file</code> |
|
||||||
|
| [providers.tf](./providers.tf) | Provider configurations. | | |
|
||||||
|
| [variables.tf](./variables.tf) | Terraform Variables. | | |
|
||||||
|
|
||||||
|
## Variables
|
||||||
|
|
||||||
|
| name | description | type | required | default | producer |
|
||||||
|
|---|---|:---:|:---:|:---:|:---:|
|
||||||
|
| [billing_account_id](variables.tf#L17) | Billing account id. | <code>string</code> | ✓ | | <code>00-bootstrap</code> |
|
||||||
|
| [folder_id](variables.tf#L66) | Folder to be used for the networking resources in folders/nnnn format. | <code>string</code> | ✓ | | <code>resman</code> |
|
||||||
|
| [network_config](variables.tf#L94) | Network configurations to use. Specify a shared VPC to use, if null networks will be created in projects. | <code title="object({ host_project = string network = string vpc_subnet_self_link = object({ load = string transformation = string orchestration = string }) })">object({…})</code> | ✓ | | |
|
||||||
|
| [organization](variables.tf#L107) | Organization details. | <code title="object({ domain = string id = number customer_id = string })">object({…})</code> | ✓ | | <code>00-bootstrap</code> |
|
||||||
|
| [prefix](variables.tf#L123) | Unique prefix used for resource names. Not used for projects if 'project_create' is null. | <code>string</code> | ✓ | | <code>00-bootstrap</code> |
|
||||||
|
| [composer_config](variables.tf#L23) | | <code title="object({ node_count = number ip_range_cloudsql = string ip_range_gke_master = string ip_range_web_server = string project_policy_boolean = map(bool) region = string ip_allocation_policy = object({ use_ip_aliases = string cluster_secondary_range_name = string services_secondary_range_name = string }) })">object({…})</code> | | <code title="{ node_count = 3 ip_range_cloudsql = "172.18.29.0/24" ip_range_gke_master = "172.18.30.0/28" ip_range_web_server = "172.18.30.16/28" project_policy_boolean = { "constraints/compute.requireOsLogin" = true } region = "europe-west1" ip_allocation_policy = { use_ip_aliases = "true" cluster_secondary_range_name = "pods" services_secondary_range_name = "services" } }">{…}</code> | |
|
||||||
|
| [data_force_destroy](variables.tf#L54) | Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage. | <code>bool</code> | | <code>false</code> | |
|
||||||
|
| [enable_cloud_nat](variables.tf#L60) | Network Cloud NAT flag. | <code>bool</code> | | <code>false</code> | |
|
||||||
|
| [groups](variables.tf#L72) | Groups. | <code>map(string)</code> | | <code title="{ data-analysts = "gcp-data-analysts" data-engineers = "gcp-data-engineers" data-security = "gcp-data-security" }">{…}</code> | |
|
||||||
|
| [location_config](variables.tf#L82) | Locations where resources will be deployed. Map to configure region and multiregion specs. | <code title="object({ region = string multi_region = string })">object({…})</code> | | <code title="{ region = "europe-west1" multi_region = "eu" }">{…}</code> | |
|
||||||
|
| [outputs_location](variables.tf#L117) | Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable. | <code>string</code> | | <code>null</code> | |
|
||||||
|
| [project_id](variables.tf#L129) | Project id, references existing project if `project_create` is null. | <code title="object({ landing = string load = string orchestration = string trasformation = string datalake-l0 = string datalake-l1 = string datalake-l2 = string datalake-playground = string common = string exposure = string })">object({…})</code> | | <code title="{ landing = "lnd" load = "lod" orchestration = "orc" trasformation = "trf" datalake-l0 = "dtl-0" datalake-l1 = "dtl-1" datalake-l2 = "dtl-2" datalake-playground = "dtl-plg" common = "cmn" exposure = "exp" }">{…}</code> | |
|
||||||
|
| [project_services](variables.tf#L157) | List of core services enabled on all projects. | <code>list(string)</code> | | <code title="[ "cloudresourcemanager.googleapis.com", "iam.googleapis.com", "serviceusage.googleapis.com", "stackdriver.googleapis.com" ]">[…]</code> | |
|
||||||
|
|
||||||
|
## Outputs
|
||||||
|
|
||||||
|
| name | description | sensitive | consumers |
|
||||||
|
|---|---|:---:|---|
|
||||||
|
| [bigquery_datasets](outputs.tf#L35) | BigQuery datasets. | | |
|
||||||
|
| [demo_commands](outputs.tf#L65) | Demo commands. | | |
|
||||||
|
| [gcs_buckets](outputs.tf#L40) | GCS buckets. | | |
|
||||||
|
| [kms_keys](outputs.tf#L45) | Cloud MKS keys. | | |
|
||||||
|
| [projects](outputs.tf#L50) | GCP Projects informations. | | |
|
||||||
|
| [vpc_network](outputs.tf#L55) | VPC network. | | |
|
||||||
|
| [vpc_subnet](outputs.tf#L60) | VPC subnetworks. | | |
|
||||||
|
|
||||||
|
<!-- END TFDOC -->
|
Binary file not shown.
After Width: | Height: | Size: 115 KiB |
|
@ -0,0 +1,39 @@
|
||||||
|
/**
|
||||||
|
* Copyright 2022 Google LLC
|
||||||
|
*
|
||||||
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
* you may not use this file except in compliance with the License.
|
||||||
|
* You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
|
||||||
|
# tfdoc:file:description Data Platformy.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
_network_config = merge(
|
||||||
|
var.network_config_composer,
|
||||||
|
var.network_config
|
||||||
|
)
|
||||||
|
}
|
||||||
|
|
||||||
|
module "data-platform" {
|
||||||
|
source = "../../../../examples/data-solutions/data-platform-foundations"
|
||||||
|
billing_account_id = var.billing_account_id
|
||||||
|
composer_config = var.composer_config
|
||||||
|
data_force_destroy = var.data_force_destroy
|
||||||
|
folder_id = var.folder_id
|
||||||
|
groups = var.groups
|
||||||
|
network_config = local._network_config
|
||||||
|
organization_domain = var.organization_domain
|
||||||
|
prefix = var.prefix
|
||||||
|
project_services = var.project_services
|
||||||
|
region = var.region
|
||||||
|
service_encryption_keys = var.service_encryption_keys
|
||||||
|
}
|
|
@ -0,0 +1,61 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Output variables.
|
||||||
|
|
||||||
|
locals {
|
||||||
|
tfvars = {}
|
||||||
|
}
|
||||||
|
|
||||||
|
resource "local_file" "tfvars" {
|
||||||
|
for_each = var.outputs_location == null ? {} : local.tfvars
|
||||||
|
filename = "${var.outputs_location}/${each.key}/terraform-dataplatform-dev.auto.tfvars.json"
|
||||||
|
content = each.value
|
||||||
|
}
|
||||||
|
|
||||||
|
# outputs
|
||||||
|
output "bigquery_datasets" {
|
||||||
|
description = "BigQuery datasets."
|
||||||
|
value = module.data-platform.bigquery-datasets
|
||||||
|
}
|
||||||
|
|
||||||
|
output "gcs_buckets" {
|
||||||
|
description = "GCS buckets."
|
||||||
|
value = module.data-platform.gcs-buckets
|
||||||
|
}
|
||||||
|
|
||||||
|
output "kms_keys" {
|
||||||
|
description = "Cloud MKS keys."
|
||||||
|
value = module.data-platform.kms_keys
|
||||||
|
}
|
||||||
|
|
||||||
|
output "projects" {
|
||||||
|
description = "GCP Projects informations."
|
||||||
|
value = module.data-platform.projects
|
||||||
|
}
|
||||||
|
|
||||||
|
output "vpc_network" {
|
||||||
|
description = "VPC network."
|
||||||
|
value = module.data-platform.vpc_network
|
||||||
|
}
|
||||||
|
|
||||||
|
output "vpc_subnet" {
|
||||||
|
description = "VPC subnetworks."
|
||||||
|
value = module.data-platform.vpc_subnet
|
||||||
|
}
|
||||||
|
|
||||||
|
output "demo_commands" {
|
||||||
|
description = "Demo commands."
|
||||||
|
value = module.data-platform.demo_commands
|
||||||
|
}
|
|
@ -0,0 +1,141 @@
|
||||||
|
# Copyright 2022 Google LLC
|
||||||
|
#
|
||||||
|
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
# you may not use this file except in compliance with the License.
|
||||||
|
# You may obtain a copy of the License at
|
||||||
|
#
|
||||||
|
# https://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
#
|
||||||
|
# Unless required by applicable law or agreed to in writing, software
|
||||||
|
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
# See the License for the specific language governing permissions and
|
||||||
|
# limitations under the License.
|
||||||
|
|
||||||
|
# tfdoc:file:description Terraform Variables.
|
||||||
|
|
||||||
|
variable "billing_account_id" {
|
||||||
|
# tfdoc:variable:source 00-bootstrap
|
||||||
|
description = "Billing account id."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "composer_config" {
|
||||||
|
type = object({
|
||||||
|
node_count = number
|
||||||
|
airflow_version = string
|
||||||
|
env_variables = map(string)
|
||||||
|
})
|
||||||
|
default = {
|
||||||
|
node_count = 3
|
||||||
|
airflow_version = "composer-1.17.5-airflow-2.1.4"
|
||||||
|
env_variables = {}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "data_force_destroy" {
|
||||||
|
description = "Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage."
|
||||||
|
type = bool
|
||||||
|
default = false
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "folder_id" {
|
||||||
|
# tfdoc:variable:source resman
|
||||||
|
description = "Folder to be used for the networking resources in folders/nnnn format."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "groups" {
|
||||||
|
description = "Groups."
|
||||||
|
type = map(string)
|
||||||
|
default = {
|
||||||
|
data-analysts = "gcp-data-analysts"
|
||||||
|
data-engineers = "gcp-data-engineers"
|
||||||
|
data-security = "gcp-data-security"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "network_config" {
|
||||||
|
description = "Network configurations to use. Specify a shared VPC to use, if null networks will be created in projects."
|
||||||
|
type = object({
|
||||||
|
host_project = string
|
||||||
|
network_self_link = string
|
||||||
|
subnet_self_links = object({
|
||||||
|
load = string
|
||||||
|
transformation = string
|
||||||
|
orchestration = string
|
||||||
|
})
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "network_config_composer" {
|
||||||
|
description = "Network configurations to use for Composer."
|
||||||
|
type = object({
|
||||||
|
composer_ip_ranges = object({
|
||||||
|
cloudsql = string
|
||||||
|
gke_master = string
|
||||||
|
web_server = string
|
||||||
|
})
|
||||||
|
composer_secondary_ranges = object({
|
||||||
|
pods = string
|
||||||
|
services = string
|
||||||
|
})
|
||||||
|
})
|
||||||
|
default = {
|
||||||
|
composer_ip_ranges = {
|
||||||
|
cloudsql = "172.18.29.0/24"
|
||||||
|
gke_master = "172.18.30.0/28"
|
||||||
|
web_server = "172.18.30.16/28"
|
||||||
|
}
|
||||||
|
composer_secondary_ranges = {
|
||||||
|
pods = "pods"
|
||||||
|
services = "services"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "organization_domain" {
|
||||||
|
description = "Organization domain."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "outputs_location" {
|
||||||
|
description = "Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable."
|
||||||
|
type = string
|
||||||
|
default = null
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "prefix" {
|
||||||
|
# tfdoc:variable:source 00-bootstrap
|
||||||
|
description = "Unique prefix used for resource names. Not used for projects if 'project_create' is null."
|
||||||
|
type = string
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "project_services" {
|
||||||
|
description = "List of core services enabled on all projects."
|
||||||
|
type = list(string)
|
||||||
|
default = [
|
||||||
|
"cloudresourcemanager.googleapis.com",
|
||||||
|
"iam.googleapis.com",
|
||||||
|
"serviceusage.googleapis.com",
|
||||||
|
"stackdriver.googleapis.com"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "region" {
|
||||||
|
description = "Region used for regional resources."
|
||||||
|
type = string
|
||||||
|
default = "europe-west1"
|
||||||
|
}
|
||||||
|
|
||||||
|
variable "service_encryption_keys" { # service encription key
|
||||||
|
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
|
||||||
|
type = object({
|
||||||
|
bq = string
|
||||||
|
composer = string
|
||||||
|
dataflow = string
|
||||||
|
storage = string
|
||||||
|
pubsub = string
|
||||||
|
})
|
||||||
|
default = null
|
||||||
|
}
|
|
@ -17,8 +17,8 @@ Refer to each stage's documentation for a detailed description of its purpose, t
|
||||||
|
|
||||||
- [Security](02-security/README.md)
|
- [Security](02-security/README.md)
|
||||||
Manages centralized security configurations in a separate stage, and is typically owned by the security team. This stage implements VPC Security Controls via separate perimeters for environments and central services, and creates projects to host centralized KMS keys used by the whole organization. It's meant to be easily extended to include other security-related resources which are required, like Secret Manager.
|
Manages centralized security configurations in a separate stage, and is typically owned by the security team. This stage implements VPC Security Controls via separate perimeters for environments and central services, and creates projects to host centralized KMS keys used by the whole organization. It's meant to be easily extended to include other security-related resources which are required, like Secret Manager.
|
||||||
- Networking ([VPN](02-networking-vpn/README.md)/[NVA](02-networking-nva/README.md))
|
- [Networking](02-networking/README.md)
|
||||||
Manages centralized network resources in a separate stage, and is typically owned by the networking team. This stage implements a hub-and-spoke design, and includes connectivity via VPN to on-premises, and YAML-based factories for firewall rules (hierarchical and VPC-level) and subnets. It's currently available in two versions: [spokes connected via VPN](02-networking-vpn/README.md), [and spokes connected via appliances](02-networking-nva/README.md).
|
Manages centralized network resources in a separate stage, and is typically owned by the networking team. This stage implements a hub-and-spoke design, and includes connectivity via VPN to on-premises, and YAML-based factories for firewall rules (hierarchical and VPC-level) and subnets.
|
||||||
|
|
||||||
## Environment-level resources (03)
|
## Environment-level resources (03)
|
||||||
|
|
||||||
|
|
|
@ -14,13 +14,10 @@
|
||||||
* limitations under the License.
|
* limitations under the License.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
module "test-environment" {
|
module "test" {
|
||||||
source = "../../../../../examples/data-solutions/data-platform-foundations/01-environment"
|
source = "../../../../../examples/data-solutions/data-platform-foundations/"
|
||||||
billing_account_id = var.billing_account
|
organization_domain = "example.com"
|
||||||
root_node = var.root_node
|
billing_account_id = "123456-123456-123456"
|
||||||
}
|
folder_id = "folders/12345678"
|
||||||
|
prefix = "prefix"
|
||||||
module "test-resources" {
|
|
||||||
source = "../../../../../examples/data-solutions/data-platform-foundations/02-resources"
|
|
||||||
project_ids = module.test-environment.project_ids
|
|
||||||
}
|
}
|
||||||
|
|
|
@ -1,26 +0,0 @@
|
||||||
/**
|
|
||||||
* Copyright 2022 Google LLC
|
|
||||||
*
|
|
||||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
||||||
* you may not use this file except in compliance with the License.
|
|
||||||
* You may obtain a copy of the License at
|
|
||||||
*
|
|
||||||
* http://www.apache.org/licenses/LICENSE-2.0
|
|
||||||
*
|
|
||||||
* Unless required by applicable law or agreed to in writing, software
|
|
||||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
||||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
||||||
* See the License for the specific language governing permissions and
|
|
||||||
* limitations under the License.
|
|
||||||
*/
|
|
||||||
|
|
||||||
variable "billing_account" {
|
|
||||||
type = string
|
|
||||||
default = "123456-123456-123456"
|
|
||||||
}
|
|
||||||
|
|
||||||
variable "root_node" {
|
|
||||||
description = "The resource name of the parent Folder or Organization. Must be of the form folders/folder_id or organizations/org_id."
|
|
||||||
type = string
|
|
||||||
default = "folders/12345678"
|
|
||||||
}
|
|
|
@ -12,8 +12,16 @@
|
||||||
# See the License for the specific language governing permissions and
|
# See the License for the specific language governing permissions and
|
||||||
# limitations under the License.
|
# limitations under the License.
|
||||||
|
|
||||||
|
|
||||||
|
import os
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture')
|
||||||
|
|
||||||
|
|
||||||
def test_resources(e2e_plan_runner):
|
def test_resources(e2e_plan_runner):
|
||||||
"Test that plan works and the numbers of resources is as expected."
|
"Test that plan works and the numbers of resources is as expected."
|
||||||
modules, resources = e2e_plan_runner()
|
modules, resources = e2e_plan_runner(FIXTURES_DIR)
|
||||||
assert len(modules) == 6
|
assert len(modules) == 40
|
||||||
assert len(resources) == 53
|
assert len(resources) == 287
|
||||||
|
|
Loading…
Reference in New Issue