Add Data Platform to FAST (#510)

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* merge tools changes

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* add bolierplate to validate_schema

Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com>

* stage 02-security

* Import Fast from dev repository.

Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* Copy FAST top level README

* Copy FAST top level README

* TODO list

* TODO list

* fix linting action to account for fast

* remove providers file

* add missing boilerplate

* update factory README

* align examples tfdoc

* fast readmes tfdoc

* disable markdown link check

* really disable markdown link check

* update TODO

* switch to local module refs in stage0

* replace module refs in 02-sec

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* merge tools changes

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* add bolierplate to validate_schema

Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com>

* Import Fast from dev repository.
>
>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* stage 02-security

* Import Fast from dev repository.

Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>

* Copy FAST top level README

* Copy FAST top level README

* TODO list

* TODO list

* fix linting action to account for fast

* remove providers file

* add missing boilerplate

* update factory README

* align examples tfdoc

* fast readmes tfdoc

* disable markdown link check

* really disable markdown link check

* update TODO

* switch to local module refs in stage0

* replace module refs in 02-sec

* Move first draft to fast branch

* Fix roles and variables. Add e2e DAG example!

* Fix example

* Fix KMS

* First draft: README

* Update README

* Add DLP, update README

* Update Readme

* README

* Add todos

* Merge master

* Merge master

* Merge master

* Fix and test KMS, Fix and test existing prj (it works also with single prj), Update README

* Fix READM and Demo

* add  on TF files

* Remove block comments

* simplify service_encryption_keys logic

* fix README

* Fix TODOs

* fix tfdoc description

* fix demo README

* fix sample files

* rename tf files

* Fix outputs file name, fix README, remove dependeces on composer resource

* Add test.

* Fix README.

* Initial README update

* README review

* Fix issues & readme

* Fix README

* Fix README

* Fix test error

* Fix test error

* Add datacatalog

* Fix test, for real? :-)

* fix readme

* support policy_boolean

* split Cloud NAT flag

* Fix README.

* Fix Shared VPC, first try :-)

* Fix tests and resource name

* fix tests

* fix tests

* README refactor

* Fix secondary range logic

* First commit

* Replace existing data platform

* Fix secondary range logic

* Fix README

* Replace DP example tests with the new one.

* Fix test module location.

* Fix test module location, for real.

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Add TODO, VPC-SC

* Possible improvement to handle VPC-SC perimeter projects with folder as variable

* Add TODO

* Fix module path

* Initial fix for KMS

* Add PubSub encryption

* Fix secondary range logic

* First commit

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Add TODO, VPC-SC

* Possible improvement to handle VPC-SC perimeter projects with folder as variable

* Add TODO

* Fix module path

* Initial fix for KMS

* Update READMEs

* Update README

* Fix composer roles and README.

* Fix test.

* Fixes.

* Add DLP documentation link.

* Temp commit with errors

* Refactor variables

* Fix secondary range logic

* First commit

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Add TODO, VPC-SC

* Possible improvement to handle VPC-SC perimeter projects with folder as variable

* Add TODO

* Fix module path

* Initial fix for KMS

* rebase

* rebase

* rebase

* Rebase

* rebase

* Update READMEs

* Fixes.

* Fix new variables

* Fix misconfiguration and tests.

* Fix secondary range logic

* First commit

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Add TODO, VPC-SC

* Possible improvement to handle VPC-SC perimeter projects with folder as variable

* Add TODO

* Fix module path

* Initial fix for KMS

* rebase

* rebase

* rebase

* Rebase

* rebase

* Update READMEs

* Fixes.

* Rebase - Fix secondary range logic

* Rebase - First commit

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Possible improvement to handle VPC-SC perimeter projects with folder as variable

* Initial fix for KMS

* Fix secondary range logic

* First commit

* Support DataPlatform project in VPC-SC

* Fix VPC-SC

* Fix module path

* Initial fix for KMS

* Update READMEs

* Fixes.

* Fix new variables

* Revert VPC-SC logic

* Fix variable typos

* README fixes

* Fix Project Name logic

* Fix Linting

* READEME

* update READEME

* update READEME

* update README

* mandatory project creation, refactor

* formatting

* add TODO for service accounts descriptive name

* use project module to assign shared vpc roles

* Fix shared-vpc-project module

* Fix vpc name and tests

* README

* update to newer version

Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Simone Ruffilli <sruffilli@google.com>
Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com>
Co-authored-by: Julio Castillo <jccb@google.com>
This commit is contained in:
lcaggio 2022-02-11 17:32:16 +01:00 committed by GitHub
parent 9076c2f2b0
commit bf64a3dfda
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
73 changed files with 2732 additions and 1288 deletions

View File

@ -5,7 +5,7 @@ This section contains **[foundational examples](./foundations/)** that bootstrap
Currently available examples:
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management)
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/)
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/), [Data Platform Foundations](./data-solutions/data-platform-foundations/)
- **factories** - [The why and the how of resource factories](./factories/README.md)
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [PSC for on-premises Cloud Function invocation](./networking/private-cloud-function-from-onprem/), [decentralized firewall](./networking/decentralized-firewall)

View File

@ -18,6 +18,6 @@ They are meant to be used as minimal but complete starting points to create actu
### Data Platform Foundations
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/02-resources/diagram.png" align="left" width="280px"></a>
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/images/overview_diagram.png" align="left" width="280px"></a>
This [example](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
<br clear="left">

View File

@ -1,72 +0,0 @@
# Data Platform Foundations - Environment (Step 1)
This is the first step needed to deploy Data Platform Foundations, which creates projects and service accounts. Please refer to the [top-level Data Platform README](../README.md) for prerequisites.
The projects that will be created are:
- Common services
- Landing
- Orchestration & Transformation
- DWH
- Datamart
A main service account named `projects-editor-sa` will be created under the common services project, and it will be granted editor permissions on all the projects in scope.
This is a high level diagram of the created resources:
![Environment - Phase 1](./diagram.png "High-level Environment diagram")
## Running the example
To create the infrastructure:
- specify your variables in a `terraform.tvars`
```tfm
billing_account = "1234-1234-1234"
parent = "folders/12345678"
admins = ["user:xxxxx@yyyyy.com"]
```
- make sure you have the right authentication setup (application default credentials, or a service account key) with the right permissions
- **The output of this stage contains the values for the resources stage**
- the `admins` variable contain a list of principals allowed to impersonate the service accounts. These principals will be given the `iam.serviceAccountTokenCreator` role
- run `terraform init` and `terraform apply`
Once done testing, you can clean up resources by running `terraform destroy`.
### CMEK configuration
You can configure GCP resources to use existing CMEK keys configuring the 'service_encryption_key_ids' variable. You need to specify a 'global' and a 'multiregional' key.
### VPC-SC configuration
You can assign projects to an existing VPC-SC standard perimeter configuring the 'service_perimeter_standard' variable. You can retrieve the list of existing perimeters from the GCP console or using the following command:
'''
gcloud access-context-manager perimeters list --format="json" | grep name
'''
The script use 'google_access_context_manager_service_perimeter_resource' terraform resource. If this resource is used alongside the 'vpc-sc' module, remember to uncomment the lifecycle block in the 'vpc-sc' module so they don't fight over which resources should be in the perimeter.
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [billing_account_id](variables.tf#L21) | Billing account id. | <code>string</code> | ✓ | |
| [root_node](variables.tf#L50) | Parent folder or organization in 'folders/folder_id' or 'organizations/org_id' format. | <code>string</code> | ✓ | |
| [admins](variables.tf#L15) | List of users allowed to impersonate the service account. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [prefix](variables.tf#L26) | Prefix used to generate project id and name. | <code>string</code> | | <code>null</code> |
| [project_names](variables.tf#L32) | Override this variable if you need non-standard names. | <code title="object&#40;&#123;&#10; datamart &#61; string&#10; dwh &#61; string&#10; landing &#61; string&#10; services &#61; string&#10; transformation &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; datamart &#61; &#34;datamart&#34;&#10; dwh &#61; &#34;datawh&#34;&#10; landing &#61; &#34;landing&#34;&#10; services &#61; &#34;services&#34;&#10; transformation &#61; &#34;transformation&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [service_account_names](variables.tf#L55) | Override this variable if you need non-standard names. | <code title="object&#40;&#123;&#10; main &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; main &#61; &#34;data-platform-main&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [service_encryption_key_ids](variables.tf#L65) | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object&#40;&#123;&#10; multiregional &#61; string&#10; global &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; multiregional &#61; null&#10; global &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [service_perimeter_standard](variables.tf#L78) | VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode. | <code>string</code> | | <code>null</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [project_ids](outputs.tf#L17) | Project ids for created projects. | |
| [service_account](outputs.tf#L28) | Main service account. | |
| [service_encryption_key_ids](outputs.tf#L33) | Cloud KMS encryption keys in {LOCATION => [KEY_URL]} format. | |
<!-- END TFDOC -->

Binary file not shown.

Before

Width:  |  Height:  |  Size: 275 KiB

View File

@ -1,162 +0,0 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
###############################################################################
# projects #
###############################################################################
module "project-datamart" {
source = "../../../../modules/project"
parent = var.root_node
billing_account = var.billing_account_id
prefix = var.prefix
name = var.project_names.datamart
services = [
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"bigqueryreservation.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
}
service_encryption_key_ids = {
bq = [var.service_encryption_key_ids.multiregional]
storage = [var.service_encryption_key_ids.multiregional]
}
# If used, remember to uncomment 'lifecycle' block in the
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
service_perimeter_standard = var.service_perimeter_standard
}
module "project-dwh" {
source = "../../../../modules/project"
parent = var.root_node
billing_account = var.billing_account_id
prefix = var.prefix
name = var.project_names.dwh
services = [
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"bigqueryreservation.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
}
service_encryption_key_ids = {
bq = [var.service_encryption_key_ids.multiregional]
storage = [var.service_encryption_key_ids.multiregional]
}
# If used, remember to uncomment 'lifecycle' block in the
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
service_perimeter_standard = var.service_perimeter_standard
}
module "project-landing" {
source = "../../../../modules/project"
parent = var.root_node
billing_account = var.billing_account_id
prefix = var.prefix
name = var.project_names.landing
services = [
"pubsub.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
}
service_encryption_key_ids = {
pubsub = [var.service_encryption_key_ids.global]
storage = [var.service_encryption_key_ids.multiregional]
}
# If used, remember to uncomment 'lifecycle' block in the
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
service_perimeter_standard = var.service_perimeter_standard
}
module "project-services" {
source = "../../../../modules/project"
parent = var.root_node
billing_account = var.billing_account_id
prefix = var.prefix
name = var.project_names.services
services = [
"bigquery.googleapis.com",
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"pubsub.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
"sourcerepo.googleapis.com",
"stackdriver.googleapis.com",
"cloudasset.googleapis.com",
"cloudkms.googleapis.com"
]
iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
}
service_encryption_key_ids = {
storage = [var.service_encryption_key_ids.multiregional]
}
# If used, remember to uncomment 'lifecycle' block in the
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
service_perimeter_standard = var.service_perimeter_standard
}
module "project-transformation" {
source = "../../../../modules/project"
parent = var.root_node
billing_account = var.billing_account_id
prefix = var.prefix
name = var.project_names.transformation
services = [
"bigquery.googleapis.com",
"cloudbuild.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
]
iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
}
service_encryption_key_ids = {
compute = [var.service_encryption_key_ids.global]
storage = [var.service_encryption_key_ids.multiregional]
dataflow = [var.service_encryption_key_ids.global]
}
# If used, remember to uncomment 'lifecycle' block in the
# modules/vpc-sc/google_access_context_manager_service_perimeter resource.
service_perimeter_standard = var.service_perimeter_standard
}
###############################################################################
# service accounts #
###############################################################################
module "sa-services-main" {
source = "../../../../modules/iam-service-account"
project_id = module.project-services.project_id
name = var.service_account_names.main
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}

View File

@ -1,36 +0,0 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
output "project_ids" {
description = "Project ids for created projects."
value = {
datamart = module.project-datamart.project_id
dwh = module.project-dwh.project_id
landing = module.project-landing.project_id
services = module.project-services.project_id
transformation = module.project-transformation.project_id
}
}
output "service_account" {
description = "Main service account."
value = module.sa-services-main.email
}
output "service_encryption_key_ids" {
description = "Cloud KMS encryption keys in {LOCATION => [KEY_URL]} format."
value = var.service_encryption_key_ids
}

View File

@ -1,82 +0,0 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
variable "admins" {
description = "List of users allowed to impersonate the service account."
type = list(string)
default = null
}
variable "billing_account_id" {
description = "Billing account id."
type = string
}
variable "prefix" {
description = "Prefix used to generate project id and name."
type = string
default = null
}
variable "project_names" {
description = "Override this variable if you need non-standard names."
type = object({
datamart = string
dwh = string
landing = string
services = string
transformation = string
})
default = {
datamart = "datamart"
dwh = "datawh"
landing = "landing"
services = "services"
transformation = "transformation"
}
}
variable "root_node" {
description = "Parent folder or organization in 'folders/folder_id' or 'organizations/org_id' format."
type = string
}
variable "service_account_names" {
description = "Override this variable if you need non-standard names."
type = object({
main = string
})
default = {
main = "data-platform-main"
}
}
variable "service_encryption_key_ids" {
description = "Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project."
type = object({
multiregional = string
global = string
})
default = {
multiregional = null
global = null
}
}
variable "service_perimeter_standard" {
description = "VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode."
type = string
default = null
}

View File

@ -0,0 +1,139 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description land project and resources.
locals {
land_orch_service_accounts = [
module.load-sa-df-0.iam_email, module.orch-sa-cmp-0.iam_email
]
}
module "land-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "lnd"
group_iam = {
(local.groups.data-engineers) = [
"roles/bigquery.dataEditor",
"roles/pubsub.editor",
"roles/storage.admin",
"roles/storage.objectViewer",
]
}
iam = {
"roles/bigquery.dataEditor" = [module.land-sa-bq-0.iam_email]
"roles/bigquery.dataViewer" = local.land_orch_service_accounts
"roles/bigquery.jobUser" = [module.orch-sa-cmp-0.iam_email]
"roles/bigquery.user" = [module.load-sa-df-0.iam_email]
"roles/pubsub.publisher" = [module.land-sa-ps-0.iam_email]
"roles/pubsub.subscriber" = local.land_orch_service_accounts
"roles/storage.objectAdmin" = [module.load-sa-df-0.iam_email]
"roles/storage.objectCreator" = [module.land-sa-cs-0.iam_email]
"roles/storage.objectViewer" = [module.orch-sa-cmp-0.iam_email]
"roles/storage.admin" = [module.load-sa-df-0.iam_email]
}
services = concat(var.project_services, [
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudkms.googleapis.com",
"pubsub.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com",
])
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
pubsub = [try(local.service_encryption_keys.pubsub, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
}
# Cloud Storage
module "land-sa-cs-0" {
source = "../../../modules/iam-service-account"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers
]
}
}
module "land-cs-0" {
source = "../../../modules/gcs"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
force_destroy = var.data_force_destroy
# retention_policy = {
# retention_period = 7776000 # 90 * 24 * 60 * 60
# is_locked = false
# }
}
# PubSub
module "land-sa-ps-0" {
source = "../../../modules/iam-service-account"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-ps-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers
]
}
}
module "land-ps-0" {
source = "../../../modules/pubsub"
project_id = module.land-project.project_id
name = "${var.prefix}-lnd-ps-0"
kms_key = try(local.service_encryption_keys.pubsub, null)
}
# BigQuery
module "land-sa-bq-0" {
source = "../../../modules/iam-service-account"
project_id = module.land-project.project_id
prefix = var.prefix
name = "lnd-bq-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
}
}
module "land-bq-0" {
source = "../../../modules/bigquery-dataset"
project_id = module.land-project.project_id
id = "${replace(var.prefix, "-", "_")}lnd_bq_0"
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}

View File

@ -0,0 +1,148 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Load project and VPC.
locals {
load_service_accounts = [
"serviceAccount:${module.load-project.service_accounts.robots.dataflow}",
module.load-sa-df-0.iam_email
]
load_subnet = (
local.use_shared_vpc
? var.network_config.subnet_self_links.orchestration
: values(module.load-vpc.0.subnet_self_links)[0]
)
load_vpc = (
local.use_shared_vpc
? var.network_config.network_self_link
: module.load-vpc.0.self_link
)
}
# Project
module "load-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "lod"
group_iam = {
(local.groups.data-engineers) = [
"roles/compute.viewer",
"roles/dataflow.admin",
"roles/dataflow.developer",
"roles/viewer",
]
}
iam = {
"roles/bigquery.jobUser" = [module.load-sa-df-0.iam_email]
"roles/dataflow.admin" = [
module.orch-sa-cmp-0.iam_email, module.load-sa-df-0.iam_email
]
"roles/dataflow.worker" = [module.load-sa-df-0.iam_email]
"roles/storage.objectAdmin" = local.load_service_accounts
# TODO: these are needed on the shared VPC?
# "roles/compute.serviceAgent" = [
# "serviceAccount:${module.load-project.service_accounts.robots.compute}"
# ]
# "roles/dataflow.serviceAgent" = [
# "serviceAccount:${module.load-project.service_accounts.robots.dataflow}"
# ]
}
services = concat(var.project_services, [
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudkms.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"dlp.googleapis.com",
"pubsub.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
])
service_encryption_key_ids = {
pubsub = [try(local.service_encryption_keys.pubsub, null)]
dataflow = [try(local.service_encryption_keys.dataflow, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
service_identity_iam = {}
# service_identity_iam = {
# "compute.networkUser" = ["dataflow"]
# }
}
}
module "load-sa-df-0" {
source = "../../../modules/iam-service-account"
project_id = module.load-project.project_id
prefix = var.prefix
name = "load-df-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
"roles/iam.serviceAccountUser" = [module.orch-sa-cmp-0.iam_email]
}
}
module "load-cs-df-0" {
source = "../../../modules/gcs"
project_id = module.load-project.project_id
prefix = var.prefix
name = "load-cs-0"
storage_class = "REGIONAL"
location = var.region
encryption_key = try(local.service_encryption_keys.storage, null)
}
# internal VPC resources
module "load-vpc" {
source = "../../../modules/net-vpc"
count = local.use_shared_vpc ? 0 : 1
project_id = module.load-project.project_id
name = "${var.prefix}-default"
subnets = [
{
ip_cidr_range = "10.10.0.0/24"
name = "default"
region = var.region
secondary_ip_range = {}
}
]
}
module "load-vpc-firewall" {
source = "../../../modules/net-vpc-firewall"
count = local.use_shared_vpc ? 0 : 1
project_id = module.load-project.project_id
network = module.load-vpc.0.name
admin_ranges = ["10.10.0.0/24"]
}
module "load-nat" {
source = "../../../modules/net-cloudnat"
count = local.use_shared_vpc ? 0 : 1
project_id = module.load-project.project_id
name = "${var.prefix}-default"
region = var.region
router_network = module.load-vpc.0.name
}

View File

@ -1,83 +0,0 @@
# Data Platform Foundations - Resources (Step 2)
This is the second step needed to deploy Data Platform Foundations, which creates resources needed to store and process the data, in the projects created in the [previous step](../01-environment/README.md). Please refer to the [top-level README](../README.md) for prerequisites and how to run the first step.
![Data Foundation - Phase 2](./diagram.png "High-level diagram")
The resources that will be create in each project are:
- Common
- Landing
- [x] GCS
- [x] Pub/Sub
- Orchestration & Transformation
- [x] Dataflow
- DWH
- [x] Bigquery (L0/1/2)
- [x] GCS
- Datamart
- [x] Bigquery (views/table)
- [x] GCS
- [ ] BigTable
## Running the example
In the previous step, we created the environment (projects and service account) which we are going to use in this step.
To create the resources, copy the output of the environment step (**project_ids**) and paste it into the `terraform.tvars`:
- Specify your variables in a `terraform.tvars`, you can use the output from the environment stage
```tfm
project_ids = {
datamart = "datamart-project_id"
dwh = "dwh-project_id"
landing = "landing-project_id"
services = "services-project_id"
transformation = "transformation-project_id"
}
```
- The providers.tf file has been configured to impersonate the **main** service account
- To launch terraform:
```bash
terraform plan
terraform apply
```
Once done testing, you can clean up resources by running `terraform destroy`.
### CMEK configuration
You can configure GCP resources to use existing CMEK keys configuring the 'service_encryption_key_ids' variable. You need to specify a 'global' and a 'multiregional' key.
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [project_ids](variables.tf#L108) | Project IDs. | <code title="object&#40;&#123;&#10; datamart &#61; string&#10; dwh &#61; string&#10; landing &#61; string&#10; services &#61; string&#10; transformation &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [admins](variables.tf#L16) | List of users allowed to impersonate the service account. | <code>list&#40;string&#41;</code> | | <code>null</code> |
| [datamart_bq_datasets](variables.tf#L22) | Datamart Bigquery datasets. | <code title="map&#40;object&#40;&#123;&#10; iam &#61; map&#40;list&#40;string&#41;&#41;&#10; location &#61; string&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code title="&#123;&#10; bq_datamart_dataset &#61; &#123;&#10; location &#61; &#34;EU&#34;&#10; iam &#61; &#123;&#10; &#125;&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [dwh_bq_datasets](variables.tf#L40) | DWH Bigquery datasets. | <code title="map&#40;object&#40;&#123;&#10; location &#61; string&#10; iam &#61; map&#40;list&#40;string&#41;&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code title="&#123;&#10; bq_raw_dataset &#61; &#123;&#10; iam &#61; &#123;&#125;&#10; location &#61; &#34;EU&#34;&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [landing_buckets](variables.tf#L54) | List of landing buckets to create. | <code title="map&#40;object&#40;&#123;&#10; location &#61; string&#10; name &#61; string&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code title="&#123;&#10; raw-data &#61; &#123;&#10; location &#61; &#34;EU&#34;&#10; name &#61; &#34;raw-data&#34;&#10; &#125;&#10; data-schema &#61; &#123;&#10; location &#61; &#34;EU&#34;&#10; name &#61; &#34;data-schema&#34;&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [landing_pubsub](variables.tf#L72) | List of landing pubsub topics and subscriptions to create. | <code title="map&#40;map&#40;object&#40;&#123;&#10; iam &#61; map&#40;list&#40;string&#41;&#41;&#10; labels &#61; map&#40;string&#41;&#10; options &#61; object&#40;&#123;&#10; ack_deadline_seconds &#61; number&#10; message_retention_duration &#61; number&#10; retain_acked_messages &#61; bool&#10; expiration_policy_ttl &#61; number&#10; &#125;&#41;&#10;&#125;&#41;&#41;&#41;">map&#40;map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;&#41;</code> | | <code title="&#123;&#10; landing-1 &#61; &#123;&#10; sub1 &#61; &#123;&#10; iam &#61; &#123;&#10; &#125;&#10; labels &#61; &#123;&#125;&#10; options &#61; null&#10; &#125;&#10; sub2 &#61; &#123;&#10; iam &#61; &#123;&#125;&#10; labels &#61; &#123;&#125;,&#10; options &#61; null&#10; &#125;,&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [landing_service_account](variables.tf#L102) | landing service accounts list. | <code>string</code> | | <code>&#34;sa-landing&#34;</code> |
| [service_account_names](variables.tf#L119) | Project service accounts list. | <code title="object&#40;&#123;&#10; datamart &#61; string&#10; dwh &#61; string&#10; landing &#61; string&#10; services &#61; string&#10; transformation &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; datamart &#61; &#34;sa-datamart&#34;&#10; dwh &#61; &#34;sa-datawh&#34;&#10; landing &#61; &#34;sa-landing&#34;&#10; services &#61; &#34;sa-services&#34;&#10; transformation &#61; &#34;sa-transformation&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [service_encryption_key_ids](variables.tf#L137) | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object&#40;&#123;&#10; multiregional &#61; string&#10; global &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; multiregional &#61; null&#10; global &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [transformation_buckets](variables.tf#L149) | List of transformation buckets to create. | <code title="map&#40;object&#40;&#123;&#10; location &#61; string&#10; name &#61; string&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code title="&#123;&#10; temp &#61; &#123;&#10; location &#61; &#34;EU&#34;&#10; name &#61; &#34;temp&#34;&#10; &#125;,&#10; templates &#61; &#123;&#10; location &#61; &#34;EU&#34;&#10; name &#61; &#34;templates&#34;&#10; &#125;,&#10;&#125;">&#123;&#8230;&#125;</code> |
| [transformation_subnets](variables.tf#L167) | List of subnets to create in the transformation Project. | <code title="list&#40;object&#40;&#123;&#10; ip_cidr_range &#61; string&#10; name &#61; string&#10; region &#61; string&#10; secondary_ip_range &#61; map&#40;string&#41;&#10;&#125;&#41;&#41;">list&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code title="&#91;&#10; &#123;&#10; ip_cidr_range &#61; &#34;10.1.0.0&#47;20&#34;&#10; name &#61; &#34;transformation-subnet&#34;&#10; region &#61; &#34;europe-west3&#34;&#10; secondary_ip_range &#61; &#123;&#125;&#10; &#125;,&#10;&#93;">&#91;&#8230;&#93;</code> |
| [transformation_vpc_name](variables.tf#L185) | Name of the VPC created in the transformation Project. | <code>string</code> | | <code>&#34;transformation-vpc&#34;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [datamart-datasets](outputs.tf#L17) | List of bigquery datasets created for the datamart project. | |
| [dwh-datasets](outputs.tf#L24) | List of bigquery datasets created for the dwh project. | |
| [landing-buckets](outputs.tf#L29) | List of buckets created for the landing project. | |
| [landing-pubsub](outputs.tf#L34) | List of pubsub topics and subscriptions created for the landing project. | |
| [transformation-buckets](outputs.tf#L44) | List of buckets created for the transformation project. | |
| [transformation-vpc](outputs.tf#L49) | Transformation VPC details. | |
<!-- END TFDOC -->

Binary file not shown.

Before

Width:  |  Height:  |  Size: 470 KiB

View File

@ -1,211 +0,0 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
###############################################################################
# IAM #
###############################################################################
module "datamart-sa" {
source = "../../../../modules/iam-service-account"
project_id = var.project_ids.datamart
name = var.service_account_names.datamart
iam_project_roles = {
"${var.project_ids.datamart}" = ["roles/editor"]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}
module "dwh-sa" {
source = "../../../../modules/iam-service-account"
project_id = var.project_ids.dwh
name = var.service_account_names.dwh
iam_project_roles = {
"${var.project_ids.dwh}" = ["roles/bigquery.admin"]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}
module "landing-sa" {
source = "../../../../modules/iam-service-account"
project_id = var.project_ids.landing
name = var.service_account_names.landing
iam_project_roles = {
"${var.project_ids.landing}" = [
"roles/pubsub.publisher",
"roles/storage.objectCreator"]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}
module "services-sa" {
source = "../../../../modules/iam-service-account"
project_id = var.project_ids.services
name = var.service_account_names.services
iam_project_roles = {
"${var.project_ids.services}" = ["roles/editor"]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}
module "transformation-sa" {
source = "../../../../modules/iam-service-account"
project_id = var.project_ids.transformation
name = var.service_account_names.transformation
iam_project_roles = {
"${var.project_ids.transformation}" = [
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/dataflow.admin",
"roles/iam.serviceAccountUser",
"roles/bigquery.dataOwner",
"roles/bigquery.jobUser",
"roles/dataflow.worker",
"roles/bigquery.metadataViewer",
"roles/storage.objectViewer",
],
"${var.project_ids.landing}" = [
"roles/storage.objectViewer",
],
"${var.project_ids.dwh}" = [
"roles/bigquery.dataOwner",
"roles/bigquery.jobUser",
"roles/bigquery.metadataViewer",
]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
}
###############################################################################
# GCS #
###############################################################################
module "landing-buckets" {
source = "../../../../modules/gcs"
for_each = var.landing_buckets
project_id = var.project_ids.landing
prefix = var.project_ids.landing
name = each.value.name
location = each.value.location
iam = {
"roles/storage.objectCreator" = [module.landing-sa.iam_email]
"roles/storage.admin" = [module.transformation-sa.iam_email]
}
encryption_key = var.service_encryption_key_ids.multiregional
}
module "transformation-buckets" {
source = "../../../../modules/gcs"
for_each = var.transformation_buckets
project_id = var.project_ids.transformation
prefix = var.project_ids.transformation
name = each.value.name
location = each.value.location
iam = {
"roles/storage.admin" = [module.transformation-sa.iam_email]
}
encryption_key = var.service_encryption_key_ids.multiregional
}
###############################################################################
# Bigquery #
###############################################################################
module "datamart-bq" {
source = "../../../../modules/bigquery-dataset"
for_each = var.datamart_bq_datasets
project_id = var.project_ids.datamart
id = each.key
location = each.value.location
iam = {
for k, v in each.value.iam : k => (
k == "roles/bigquery.dataOwner"
? concat(v, [module.datamart-sa.iam_email])
: v
)
}
encryption_key = var.service_encryption_key_ids.multiregional
}
module "dwh-bq" {
source = "../../../../modules/bigquery-dataset"
for_each = var.dwh_bq_datasets
project_id = var.project_ids.dwh
id = each.key
location = each.value.location
iam = {
for k, v in each.value.iam : k => (
k == "roles/bigquery.dataOwner"
? concat(v, [module.dwh-sa.iam_email])
: v
)
}
encryption_key = var.service_encryption_key_ids.multiregional
}
###############################################################################
# Network #
###############################################################################
module "vpc-transformation" {
source = "../../../../modules/net-vpc"
project_id = var.project_ids.transformation
name = var.transformation_vpc_name
subnets = var.transformation_subnets
}
module "firewall" {
source = "../../../../modules/net-vpc-firewall"
project_id = var.project_ids.transformation
network = module.vpc-transformation.name
admin_ranges = []
http_source_ranges = []
https_source_ranges = []
ssh_source_ranges = []
custom_rules = {
iap-svc = {
description = "Dataflow service."
direction = "INGRESS"
action = "allow"
sources = ["dataflow"]
targets = ["dataflow"]
ranges = []
use_service_accounts = false
rules = [{ protocol = "tcp", ports = ["12345-12346"] }]
extra_attributes = {}
}
}
}
###############################################################################
# Pub/Sub #
###############################################################################
module "landing-pubsub" {
source = "../../../../modules/pubsub"
for_each = var.landing_pubsub
project_id = var.project_ids.landing
name = each.key
subscriptions = {
for k, v in each.value : k => { labels = v.labels, options = v.options }
}
subscription_iam = {
for k, v in each.value : k => merge(v.iam, {
"roles/pubsub.subscriber" = [module.transformation-sa.iam_email]
})
}
kms_key = var.service_encryption_key_ids.global
}

View File

@ -1,60 +0,0 @@
/**
* Copyright 2020 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
output "datamart-datasets" {
description = "List of bigquery datasets created for the datamart project."
value = [
for k, datasets in module.datamart-bq : datasets.dataset_id
]
}
output "dwh-datasets" {
description = "List of bigquery datasets created for the dwh project."
value = [for k, datasets in module.dwh-bq : datasets.dataset_id]
}
output "landing-buckets" {
description = "List of buckets created for the landing project."
value = [for k, bucket in module.landing-buckets : bucket.name]
}
output "landing-pubsub" {
description = "List of pubsub topics and subscriptions created for the landing project."
value = {
for t in module.landing-pubsub : t.topic.name => {
id = t.topic.id
subscriptions = { for s in t.subscriptions : s.name => s.id }
}
}
}
output "transformation-buckets" {
description = "List of buckets created for the transformation project."
value = [for k, bucket in module.transformation-buckets : bucket.name]
}
output "transformation-vpc" {
description = "Transformation VPC details."
value = {
name = module.vpc-transformation.name
subnets = {
for k, s in module.vpc-transformation.subnets : k => {
ip_cidr_range = s.ip_cidr_range
region = s.region
}
}
}
}

View File

@ -1,23 +0,0 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
provider "google" {
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
}
provider "google-beta" {
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
}

View File

@ -1,189 +0,0 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
variable "admins" {
description = "List of users allowed to impersonate the service account."
type = list(string)
default = null
}
variable "datamart_bq_datasets" {
description = "Datamart Bigquery datasets."
type = map(object({
iam = map(list(string))
location = string
}))
default = {
bq_datamart_dataset = {
location = "EU"
iam = {
# "roles/bigquery.dataOwner" = []
# "roles/bigquery.dataEditor" = []
# "roles/bigquery.dataViewer" = []
}
}
}
}
variable "dwh_bq_datasets" {
description = "DWH Bigquery datasets."
type = map(object({
location = string
iam = map(list(string))
}))
default = {
bq_raw_dataset = {
iam = {}
location = "EU"
}
}
}
variable "landing_buckets" {
description = "List of landing buckets to create."
type = map(object({
location = string
name = string
}))
default = {
raw-data = {
location = "EU"
name = "raw-data"
}
data-schema = {
location = "EU"
name = "data-schema"
}
}
}
variable "landing_pubsub" {
description = "List of landing pubsub topics and subscriptions to create."
type = map(map(object({
iam = map(list(string))
labels = map(string)
options = object({
ack_deadline_seconds = number
message_retention_duration = number
retain_acked_messages = bool
expiration_policy_ttl = number
})
})))
default = {
landing-1 = {
sub1 = {
iam = {
# "roles/pubsub.subscriber" = []
}
labels = {}
options = null
}
sub2 = {
iam = {}
labels = {},
options = null
},
}
}
}
variable "landing_service_account" {
description = "landing service accounts list."
type = string
default = "sa-landing"
}
variable "project_ids" {
description = "Project IDs."
type = object({
datamart = string
dwh = string
landing = string
services = string
transformation = string
})
}
variable "service_account_names" {
description = "Project service accounts list."
type = object({
datamart = string
dwh = string
landing = string
services = string
transformation = string
})
default = {
datamart = "sa-datamart"
dwh = "sa-datawh"
landing = "sa-landing"
services = "sa-services"
transformation = "sa-transformation"
}
}
variable "service_encryption_key_ids" {
description = "Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project."
type = object({
multiregional = string
global = string
})
default = {
multiregional = null
global = null
}
}
variable "transformation_buckets" {
description = "List of transformation buckets to create."
type = map(object({
location = string
name = string
}))
default = {
temp = {
location = "EU"
name = "temp"
},
templates = {
location = "EU"
name = "templates"
},
}
}
variable "transformation_subnets" {
description = "List of subnets to create in the transformation Project."
type = list(object({
ip_cidr_range = string
name = string
region = string
secondary_ip_range = map(string)
}))
default = [
{
ip_cidr_range = "10.1.0.0/20"
name = "transformation-subnet"
region = "europe-west3"
secondary_ip_range = {}
},
]
}
variable "transformation_vpc_name" {
description = "Name of the VPC created in the transformation Project."
type = string
default = "transformation-vpc"
}

View File

@ -0,0 +1,121 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Orchestration Cloud Composer definition.
module "orch-sa-cmp-0" {
source = "../../../modules/iam-service-account"
project_id = module.orch-project.project_id
prefix = var.prefix
name = "orc-cmp-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers]
"roles/iam.serviceAccountUser" = [module.orch-sa-cmp-0.iam_email]
}
}
resource "google_composer_environment" "orch-cmp-0" {
provider = google-beta
project = module.orch-project.project_id
name = "${var.prefix}-orc-cmp-0"
region = var.region
config {
node_count = var.composer_config.node_count
node_config {
zone = "${var.region}-b"
service_account = module.orch-sa-cmp-0.email
network = local.orch_vpc
subnetwork = local.orch_subnet
tags = ["composer-worker", "http-server", "https-server"]
ip_allocation_policy {
use_ip_aliases = "true"
cluster_secondary_range_name = try(
var.network_config.composer_secondary_ranges.pods, "pods"
)
services_secondary_range_name = try(
var.network_config.composer_secondary_ranges.services, "services"
)
}
}
software_config {
image_version = var.composer_config.airflow_version
env_variables = merge(
var.composer_config.env_variables, {
DTL_L0_PRJ = module.lake-0-project.project_id
DTL_L0_BQ_DATASET = module.lake-0-bq-0.dataset_id
DTL_L0_GCS = module.lake-0-cs-0.url
DTL_L1_PRJ = module.lake-1-project.project_id
DTL_L1_BQ_DATASET = module.lake-1-bq-0.dataset_id
DTL_L1_GCS = module.lake-1-cs-0.url
DTL_L2_PRJ = module.lake-2-project.project_id
DTL_L2_BQ_DATASET = module.lake-2-bq-0.dataset_id
DTL_L2_GCS = module.lake-2-cs-0.url
DTL_PLG_PRJ = module.lake-plg-project.project_id
DTL_PLG_BQ_DATASET = module.lake-plg-bq-0.dataset_id
DTL_PLG_GCS = module.lake-plg-cs-0.url
GCP_REGION = var.region
LND_PRJ = module.land-project.project_id
LND_BQ = module.land-bq-0.dataset_id
LND_GCS = module.land-cs-0.url
LND_PS = module.land-ps-0.id
LOD_PRJ = module.load-project.project_id
LOD_GCS_STAGING = module.load-cs-df-0.url
LOD_NET_VPC = local.load_vpc
LOD_NET_SUBNET = local.load_subnet
LOD_SA_DF = module.load-sa-df-0.email
ORC_PRJ = module.orch-project.project_id
ORC_GCS = module.orch-cs-0.url
TRF_PRJ = module.transf-project.project_id
TRF_GCS_STAGING = module.transf-cs-df-0.url
TRF_NET_VPC = local.transf_vpc
TRF_NET_SUBNET = local.transf_subnet
TRF_SA_DF = module.transf-sa-df-0.email
TRF_SA_BQ = module.transf-sa-bq-0.email
}
)
}
private_environment_config {
enable_private_endpoint = "true"
cloud_sql_ipv4_cidr_block = try(
var.network_config.composer_ip_ranges.cloudsql, "10.20.10.0/24"
)
master_ipv4_cidr_block = try(
var.network_config.composer_ip_ranges.gke_master, "10.20.11.0/28"
)
web_server_ipv4_cidr_block = try(
var.network_config.composer_ip_ranges.web_server, "10.20.11.16/28"
)
}
dynamic "encryption_config" {
for_each = (
try(local.service_encryption_keys.composer != null, false)
? { 1 = 1 }
: {}
)
content {
kms_key_name = try(local.service_encryption_keys.composer, null)
}
}
# web_server_network_access_control {
# allowed_ip_range {
# value = "172.16.0.0/12"
# description = "Allowed ip range"
# }
# }
}
}

View File

@ -0,0 +1,168 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Orchestration project and VPC.
locals {
orch_subnet = (
local.use_shared_vpc
? var.network_config.subnet_self_links.orchestration
: values(module.orch-vpc.0.subnet_self_links)[0]
)
orch_vpc = (
local.use_shared_vpc
? var.network_config.network_self_link
: module.orch-vpc.0.self_link
)
}
module "orch-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "orc"
group_iam = {
(local.groups.data-engineers) = [
"roles/bigquery.dataEditor",
"roles/bigquery.jobUser",
"roles/cloudbuild.builds.editor",
"roles/composer.admin",
"roles/composer.environmentAndStorageObjectAdmin",
"roles/iap.httpsResourceAccessor",
"roles/iam.serviceAccountUser",
"roles/compute.networkUser",
"roles/storage.objectAdmin",
"roles/storage.admin",
"roles/compute.networkUser"
]
}
iam = {
"roles/bigquery.dataEditor" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
module.orch-sa-cmp-0.iam_email,
]
"roles/bigquery.jobUser" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
module.orch-sa-cmp-0.iam_email,
]
"roles/composer.worker" = [
module.orch-sa-cmp-0.iam_email
]
"roles/iam.serviceAccountUser" = [
module.orch-sa-cmp-0.iam_email
]
"roles/storage.objectAdmin" = [
module.load-sa-df-0.iam_email,
module.orch-sa-cmp-0.iam_email,
"serviceAccount:${module.orch-project.service_accounts.robots.composer}",
]
"roles/storage.admin" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email
]
}
oslogin = false
policy_boolean = {
"constraints/compute.requireOsLogin" = false
}
services = concat(var.project_services, [
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudbuild.googleapis.com",
"cloudkms.googleapis.com",
"composer.googleapis.com",
"compute.googleapis.com",
"container.googleapis.com",
"containerregistry.googleapis.com",
"dataflow.googleapis.com",
"pubsub.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
])
service_encryption_key_ids = {
composer = [try(local.service_encryption_keys.composer, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
service_identity_iam = {}
# service_identity_iam = {
# "roles/composer.sharedVpcAgent" = [
# "composer"
# ]
# "roles/compute.networkUser" = [
# "cloudservices", "container-engine", "dataflow"
# ]
# "roles/container.hostServiceAgentUser" = [
# "container-engine"
# ]
# }
}
}
# Cloud Storage
module "orch-cs-0" {
source = "../../../modules/gcs"
project_id = module.orch-project.project_id
prefix = var.prefix
name = "orc-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
}
# internal VPC resources
module "orch-vpc" {
source = "../../../modules/net-vpc"
count = local.use_shared_vpc ? 0 : 1
project_id = module.orch-project.project_id
name = "${var.prefix}-default"
subnets = [
{
ip_cidr_range = "10.10.0.0/24"
name = "default"
region = var.region
secondary_ip_range = {
pods = "10.10.8.0/22"
services = "10.10.12.0/24"
}
}
]
}
module "orch-vpc-firewall" {
source = "../../../modules/net-vpc-firewall"
count = local.use_shared_vpc ? 0 : 1
project_id = module.orch-project.project_id
network = module.orch-vpc.0.name
admin_ranges = ["10.10.0.0/24"]
}
module "orch-nat" {
count = local.use_shared_vpc ? 0 : 1
source = "../../../modules/net-cloudnat"
project_id = module.orch-project.project_id
name = "${var.prefix}-default"
region = var.region
router_network = module.orch-vpc.0.name
}

View File

@ -1,8 +0,0 @@
# Manual pipeline Example
Once you deployed projects [step 1](../01-environment/README.md) and resources [step 2](../02-resources/README.md) you can use it to run your data pipeline.
Here we will demo 2 pipelines:
* [GCS to Bigquery](./gcs_to_bigquery.md)
* [PubSub to Bigquery](./pubsub_to_bigquery.md)

View File

@ -1,140 +0,0 @@
# Manual pipeline Example: GCS to Bigquery
In this example we will publish person message in the following format:
```bash
name,surname,1617898199
```
A Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project.
[TODO] An autorized view will be created in the datamart project to expose the table.
[TODO] Further automation is expected in future.
## Set up the env vars
```bash
export DWH_PROJECT_ID=**dwh_project_id**
export LANDING_PROJECT_ID=**landing_project_id**
export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
```
## Create BQ table
Those steps should be done as DWH Service Account.
You can run the command to create a table:
```bash
gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
alpha bq tables create person \
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
--description "This is a Test Person table" \
--schema name=STRING,surname=STRING,timestamp=TIMESTAMP
```
## Produce CSV data file, JSON schema file and UDF JS file
Those steps should be done as landing Service Account:
Let's now create a series of messages we can use to import:
```bash
for i in {0..10}
do
echo "Lorenzo,Caggioni,$(date +%s)" >> person.csv
done
```
and copy files to the GCS bucket:
```bash
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person.csv gs://$LANDING_PROJECT_ID-eu-raw-data
```
Let's create the data JSON schema:
```bash
cat <<'EOF' >> person_schema.json
{
"BigQuery Schema": [
{
"name": "name",
"type": "STRING"
},
{
"name": "surname",
"type": "STRING"
},
{
"name": "timestamp",
"type": "TIMESTAMP"
}
]
}
EOF
```
and copy files to the GCS bucket:
```bash
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_schema.json gs://$LANDING_PROJECT_ID-eu-data-schema
```
Let's create the data UDF function to transform message data:
```bash
cat <<'EOF' >> person_udf.js
function transform(line) {
var values = line.split(',');
var obj = new Object();
obj.name = values[0];
obj.surname = values[1];
obj.timestamp = values[2];
var jsonString = JSON.stringify(obj);
return jsonString;
}
EOF
```
and copy files to the GCS bucket:
```bash
gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_udf.js gs://$LANDING_PROJECT_ID-eu-data-schema
```
if you want to check files copied to GCS, you can use the Transformation service account:
```bash
gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-raw-data
gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-data-schema
```
## Dataflow
Those steps should be done as transformation Service Account.
Let's than start a Dataflow batch pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
```bash
gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com dataflow jobs run test_batch_01 \
--gcs-location gs://dataflow-templates/latest/GCS_Text_to_BigQuery \
--project $TRANSFORMATION_PROJECT_ID \
--region europe-west3 \
--disable-public-ips \
--network transformation-vpc \
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \
--staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
--service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
--parameters \
javascriptTextTransformFunctionName=transform,\
JSONPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_schema.json,\
javascriptTextTransformGcsPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_udf.js,\
inputFilePattern=gs://$LANDING_PROJECT_ID-eu-raw-data/person.csv,\
outputTable=$DWH_PROJECT_ID:bq_raw_dataset.person,\
bigQueryLoadingTemporaryDirectory=gs://$TRANSFORMATION_PROJECT_ID-eu-temp
```

View File

@ -1,75 +0,0 @@
# Manual pipeline Example: PubSub to Bigquery
In this example we will publish person message in the following format:
```txt
name: Name
surname: Surname
timestamp: 1617898199
```
a Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project.
An autorized view will be created in the datamart project to expose the table.
[TODO] Further automation is expected in future.
## Set up the env vars
```bash
export DWH_PROJECT_ID=**dwh_project_id**
export LANDING_PROJECT_ID=**landing_project_id**
export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
```
## Create BQ table
Those steps should be done as DWH Service Account.
You can run the command to create a table:
```bash
gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
alpha bq tables create person \
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
--description "This is a Test Person table" \
--schema name=STRING,surname=STRING,timestamp=TIMESTAMP
```
## Produce PubSub messages
Those steps should be done as landing Service Account:
Let's now create a series of messages we can use to import:
```bash
for i in {0..10}
do
gcloud --impersonate-service-account=sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com pubsub topics publish projects/$LANDING_PROJECT_ID/topics/landing-1 --message="{\"name\": \"Lorenzo\", \"surname\": \"Caggioni\", \"timestamp\": \"$(date +%s)\"}"
done
```
if you want to check messages published, you can use the Transformation service account and read a message (message won't be acked and will stay in the subscription):
```bash
gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com pubsub subscriptions pull projects/$LANDING_PROJECT_ID/subscriptions/sub1
```
## Dataflow
Those steps should be done as transformation Service Account:
Let's than start a Dataflow streaming pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
```bash
gcloud dataflow jobs run test_streaming01 \
--gcs-location gs://dataflow-templates/latest/PubSub_Subscription_to_BigQuery \
--project $TRANSFORMATION_PROJECT_ID \
--region europe-west3 \
--disable-public-ips \
--network transformation-vpc \
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \
--staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
--service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
--parameters \
inputSubscription=projects/$LANDING_PROJECT_ID/subscriptions/sub1,\
outputTableSpec=$DWH_PROJECT_ID:bq_raw_dataset.person
```

View File

@ -1,26 +0,0 @@
{
"schema": {
"fields": [
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "surname",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "age",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "boolean_val",
"type": "BOOLEAN"
}
]
}
}

View File

@ -0,0 +1,167 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Trasformation project and VPC.
locals {
transf_subnet = (
local.use_shared_vpc
? var.network_config.subnet_self_links.orchestration
: values(module.transf-vpc.0.subnet_self_links)[0]
)
transf_vpc = (
local.use_shared_vpc
? var.network_config.network_self_link
: module.transf-vpc.0.self_link
)
}
module "transf-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "trf"
group_iam = {
(local.groups.data-engineers) = [
"roles/bigquery.jobUser",
"roles/dataflow.admin",
]
}
iam = {
"roles/bigquery.dataViewer" = [
module.orch-sa-cmp-0.iam_email
]
"roles/bigquery.jobUser" = [
module.transf-sa-bq-0.iam_email,
]
"roles/dataflow.admin" = [
module.orch-sa-cmp-0.iam_email,
]
"roles/dataflow.worker" = [
module.transf-sa-df-0.iam_email
]
"roles/storage.objectAdmin" = [
module.transf-sa-df-0.iam_email,
module.orch-sa-cmp-0.iam_email,
"serviceAccount:${module.transf-project.service_accounts.robots.dataflow}"
]
}
services = concat(var.project_services, [
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudkms.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"dlp.googleapis.com",
"pubsub.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
])
service_encryption_key_ids = {
dataflow = [try(local.service_encryption_keys.dataflow, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
service_identity_iam = {}
}
}
# Cloud Storage
module "transf-sa-df-0" {
source = "../../../modules/iam-service-account"
project_id = module.transf-project.project_id
prefix = var.prefix
name = "trf-df-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers,
module.orch-sa-cmp-0.iam_email
],
"roles/iam.serviceAccountUser" = [
module.orch-sa-cmp-0.iam_email
]
}
}
module "transf-cs-df-0" {
source = "../../../modules/gcs"
project_id = module.transf-project.project_id
prefix = var.prefix
name = "trf-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
}
# BigQuery
module "transf-sa-bq-0" {
source = "../../../modules/iam-service-account"
project_id = module.transf-project.project_id
prefix = var.prefix
name = "trf-bq-0"
# TODO: descriptive name
display_name = "TODO"
iam = {
"roles/iam.serviceAccountTokenCreator" = [
local.groups_iam.data-engineers,
module.orch-sa-cmp-0.iam_email
],
"roles/iam.serviceAccountUser" = [
module.orch-sa-cmp-0.iam_email
]
}
}
# internal VPC resources
module "transf-vpc" {
source = "../../../modules/net-vpc"
count = local.use_shared_vpc ? 0 : 1
project_id = module.transf-project.project_id
name = "${var.prefix}-default"
subnets = [
{
ip_cidr_range = "10.10.0.0/24"
name = "default"
region = var.region
secondary_ip_range = {}
}
]
}
module "transf-vpc-firewall" {
source = "../../../modules/net-vpc-firewall"
count = local.use_shared_vpc ? 0 : 1
project_id = module.transf-project.project_id
network = module.transf-vpc.0.name
admin_ranges = ["10.10.0.0/24"]
}
module "transf-nat" {
source = "../../../modules/net-cloudnat"
count = local.use_shared_vpc ? 0 : 1
project_id = module.transf-project.project_id
name = "${var.prefix}-default"
region = var.region
router_network = module.transf-vpc.0.name
}

View File

@ -0,0 +1,213 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Datalake projects.
locals {
lake_group_iam = {
(local.groups.data-engineers) = [
"roles/bigquery.dataEditor",
"roles/storage.admin",
],
(local.groups.data-analysts) = [
"roles/bigquery.dataViewer",
"roles/bigquery.jobUser",
"roles/bigquery.user",
"roles/datacatalog.viewer",
"roles/datacatalog.tagTemplateViewer",
"roles/storage.objectViewer",
]
}
lake_iam = {
"roles/bigquery.dataEditor" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
module.transf-sa-bq-0.iam_email,
module.orch-sa-cmp-0.iam_email,
]
"roles/bigquery.jobUser" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
]
"roles/storage.admin" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
]
"roles/storage.objectCreator" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email,
module.transf-sa-bq-0.iam_email,
module.orch-sa-cmp-0.iam_email,
]
"roles/storage.objectViewer" = [
module.transf-sa-df-0.iam_email,
module.transf-sa-bq-0.iam_email,
module.orch-sa-cmp-0.iam_email,
]
}
lake_services = concat(var.project_services, [
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudkms.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"pubsub.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
])
}
# Project
module "lake-0-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "dtl-0"
group_iam = local.lake_group_iam
iam = local.lake_iam
services = local.lake_services
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
}
module "lake-1-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "dtl-1"
group_iam = local.lake_group_iam
iam = local.lake_iam
services = local.lake_services
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
}
module "lake-2-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "dtl-2"
group_iam = local.lake_group_iam
iam = local.lake_iam
services = local.lake_services
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
}
module "lake-plg-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "dtl-plg"
group_iam = local.lake_group_iam
iam = local.lake_iam
services = local.lake_services
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
}
# Bigquery
module "lake-0-bq-0" {
source = "../../../modules/bigquery-dataset"
project_id = module.lake-0-project.project_id
id = "${replace(var.prefix, "-", "_")}_dtl_0_bq_0"
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}
module "lake-1-bq-0" {
source = "../../../modules/bigquery-dataset"
project_id = module.lake-1-project.project_id
id = "${replace(var.prefix, "-", "_")}_dtl_1_bq_0"
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}
module "lake-2-bq-0" {
source = "../../../modules/bigquery-dataset"
project_id = module.lake-2-project.project_id
id = "${replace(var.prefix, "-", "_")}_dtl_2_bq_0"
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}
module "lake-plg-bq-0" {
source = "../../../modules/bigquery-dataset"
project_id = module.lake-plg-project.project_id
id = "${replace(var.prefix, "-", "_")}_dtl_plg_bq_0"
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}
# Cloud storage
module "lake-0-cs-0" {
source = "../../../modules/gcs"
project_id = module.lake-0-project.project_id
prefix = var.prefix
name = "dtl-0-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
force_destroy = var.data_force_destroy
}
module "lake-1-cs-0" {
source = "../../../modules/gcs"
project_id = module.lake-1-project.project_id
prefix = var.prefix
name = "dtl-1-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
force_destroy = var.data_force_destroy
}
module "lake-2-cs-0" {
source = "../../../modules/gcs"
project_id = module.lake-2-project.project_id
prefix = var.prefix
name = "dtl-2-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
force_destroy = var.data_force_destroy
}
module "lake-plg-cs-0" {
source = "../../../modules/gcs"
project_id = module.lake-plg-project.project_id
prefix = var.prefix
name = "dtl-plg-cs-0"
location = var.region
storage_class = "REGIONAL"
encryption_key = try(local.service_encryption_keys.storage, null)
force_destroy = var.data_force_destroy
}

View File

@ -0,0 +1,83 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description common project.
module "common-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "cmn"
group_iam = {
(local.groups.data-engineers) = [
"roles/dlp.reader",
"roles/dlp.user",
"roles/dlp.estimatesAdmin",
]
(local.groups.data-security) = [
"roles/dlp.admin",
]
}
iam = {
"roles/dlp.user" = [
module.load-sa-df-0.iam_email,
module.transf-sa-df-0.iam_email
]
}
services = concat(var.project_services, [
"datacatalog.googleapis.com",
"dlp.googleapis.com",
])
}
# To create KMS keys in the common projet: uncomment this section and assigne key links accondingly in local.service_encryption_keys variable
# module "cmn-kms-0" {
# source = "../../../modules/kms"
# project_id = module.cmn-prj.project_id
# keyring = {
# name = "${var.prefix}-kr-global",
# location = var.location_config.region
# }
# keys = {
# pubsub = null
# }
# }
# module "cmn-kms-1" {
# source = "../../../modules/kms"
# project_id = module.cmn-prj.project_id
# keyring = {
# name = "${var.prefix}-kr-mregional",
# location = var.location_config.region
# }
# keys = {
# bq = null
# storage = null
# }
# }
# module "cmn-kms-2" {
# source = "../../../modules/kms"
# project_id = module.cmn-prj.project_id
# keyring = {
# name = "${var.prefix}-kr-regional",
# location = var.location_config.region
# }
# keys = {
# composer = null
# dataflow = null
# }
# }

View File

@ -12,18 +12,12 @@
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.0.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.0.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 4.0.0"
}
}
# tfdoc:file:description common project.
module "exp-project" {
source = "../../../modules/project"
parent = var.folder_id
billing_account = var.billing_account_id
prefix = var.prefix
name = "exp"
}

View File

@ -1,61 +1,274 @@
# Data Foundation Platform
# Data Platform
The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.
This module implements an opinionated Data Platform Architecture that creates and sets up projects and related resources, to be used to create your end to end data environment.
The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.
The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design.
1. **[Environment Setup](./01-environment/)**
*(once per environment)*
* projects
* VPC configuration
* Composer environment and identity
* shared buckets and datasets
1. **[Data Source Setup](./02-resources)**
*(once per data source)*
* landing and archive bucket
* internal and external identities
* domain specific datasets
1. **[Pipeline Setup](./03-pipeline)**
*(once per pipeline)*
* pipeline-specific tables and views
* pipeline code
* Composer DAG
The following diagram is a high-level reference of the resources created and managed here:
The resulting GCP architecture is outlined in this diagram
![Target architecture](./02-resources/diagram.png)
![Data Platform architecture overview](./images/overview_diagram.png "Data Platform architecture overview")
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to verify or test the setup quickly.
## Prerequisites
## Design overview and choices
In order to bring up this example, you will need
Despite its simplicity, this stage implements the basics of a design that we've seen working well for various customers.
The approach adapts to different high-level requirements:
- boundaries for each step
- clear and defined actors
- least privilege principle
- rely on service account impersonation
The code in this example doesn't address Organization level configuration (Organization policy, VPC-SC, centralized logs). We expect to address those aspects on stages external to this script.
### Project structure
The Data Platform is designed to rely on several projects, one project per data stage. The stages identified are:
- landing
- load
- data lake
- orchestration
- transformation
- exposure
This separation into projects allows adhering the least-privilege principle relying on project-level roles.
The script will create the following projects:
- **Landing** This project is intended to store data temporarily. Data are pushed to Cloud Storage, BigQuery, or Cloud PubSub. Resource configured with 3-months lifecycle policy.
- **Load** This project is intended to load data from `landing` to the `data lake`. The load is made with minimal to zero transformation logic (mainly `cast`). This stage can anonymization/tokenization Personally Identifiable Information (PII). Alternatively, it can be done in the transformation stage depending on your requirements. The use of [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates) is recommended.
- **Data Lake** projects where data are stored. itìs composed of 3 layers that progressively process and define data:
- **L0 - Raw data** Structured Data, stored in the adequate format: structured data stored in BigQuery, unstructured data stored on Cloud Storage with additional metadata stored in BigQuery (for example pictures stored in Cloud Storage and analysis of the images for Cloud Vision API stored in BigQuery).
- **L1 - Cleansed, aggregated and standardized data**
- **L2 - Curated layer**
- **Playground** Store temporary tables that Data Analyst may use to perform R&D on data available on other Data Lake layers
- **Orchestration** This project is intended to host Cloud Composer. Cloud Composer will orchestrate all tasks to move your data on its journey.
- **Transformation** This project is used to move data between layers of the Data Lake. We strongly suggest relying on BigQuery engine to perform transformations. If BigQuery doesn't have the feature needed to perform your transformation you recommend using Cloud Dataflow together with [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates). This stage can optionally be used to anonymiza/tokenize PII.
- **Exposure** This project is intended to host resources to share your processed data with external systems your data. For the porpuse of this example we leace this project empty. Depending on the access pattern, data can be presented on Cloud SQL, BigQuery, or Bigtable. For BigQuery data, we strongly suggest relying on [Authorized views](https://cloud.google.com/bigquery/docs/authorized-views).
### Roles
We assign roles on resources at project level setting the appropriate role to groups. We recommend not adding human users directly to the resource-access groups with IAM permissions to access data.
### Service accounts
Service account creation follows the least privilege principle, performing a single task which requires access to a defined set of resources. In the table below you can find an high level overview on roles for each service account on each data layer. For semplicy `READ` or `WRITE` roles are used, for detailed roles please refer to the code.
|Service Account|Landing|DataLake L0|DataLake L1|DataLake L2|
|-|:-:|:-:|:-:|:-:|
|landing-sa|WRITE|-|-|-|
|load-sa|READ|READ/WRITE|-|-|
|transformation-sa|-|READ/WRITE|READ/WRITE|READ/WRITE|
|orchestration-sa|-|-|-|-|
- Each service account perform a single task having access to the minimum number of resources (example: the Cloud Dataflow Service Account has access to the Landing project and the Data Lake L0 project)
- Each Service Account has the least privilege on each project.
#### Service Account Keys
The use of SAK within a data pipeline incurs several security risks, as these credentials, that could be leaked without oversight or control. This example relies on Service Account Impersonation to avoid the creation of private keys.
### User groups
User groups are important. They provide a stable frame of reference that allows decoupling the final set of permissions for each group, from the stage where entities and resources are created and their IAM bindings defined.
We use three groups to control access to resources:
- *Data Engineers* They handle and run the Data Hub, with read access to all resources in order to troubleshoot possible issues with pipelines. This team can also impersonate any service account.
- *Data Analyst*. They perform analysis on datasets, with read access to the data lake L2 project, and BigQuery READ/WRITE access to the playground project.
- *Data Security*:. They handle security configurations related to the Data Hub. This team has admin access to the common project to configure Cloud DLP templates or Data Catalog policy tags.
In the table below you can find an high level overview on roles for each group on each project. For semplicy `READ`, `WRITE` and `ADMIN` roles are used, for detailed roles please refer to the code.
|Group|Landing|Load|Transformation|Data Lake L0|Data Lake L1|Data Lake L2|Data Lake Playground|Orchestration|Common|
|-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
|Data Engineers|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|ADMIN|
|Data Analyst|-|-|-|-|-|READ|READ/WRITE|-|-|
|Data Security|-|-|-|-|-|-|-|-|ADMIN|
### Groups
We use thress groups based on the required access:
- *Data Engineers*: the group that handles and runs the Data Hub. The group has Read access to all resources to troubleshoot possible issues with the pipeline. The team also can impersonate all service accounts. Default value: `gcp-data-engineers@DOMAIN.COM`.
- *Data Analyst*: the group that performs analysis on the dataset. The group has Read access to the Data Lake L2 project and BigQuery READ/WRITE access to the `playground` project. Default value: `gcp-data-analyst@DOMAIN.COM`
- *Data Security*: the group handling security configurations related to the Data Hub. Default name: `gcp-data-security@DOMAIN.com`
### Virtual Private Cloud (VPC) design
The Data Platform accepts as input an existing [Shared-VPC](https://cloud.google.com/vpc/docs/shared-vpc) to run resources. You can configure subnets for data resources by specifying the link to the subnet in the `network_config` variable. You may want to configure a shared-VPC to host your resources if your pipelines may need to reach on-premise resources.
If `network_config` variable is not provided, the script will create a VPC on each project that requires a VPC: *load*, *transformation*, and *orchestration* projects with the default configuration.
### IP ranges, subnetting
To deploy your Data Platform you need the following ranges:
- Load project VPC for Cloud Dataflow workers. Range: '/24'.
- Transformation VPC for Cloud Dataflow workers. Range: '/24'.
- Orchestration VPC for Cloud Composer:
- Cloud SQL. Range: '/24'
- GKE Master. Range: '/28'
- Web Server: Range: '/28'
- Secondary IP ranges. Pods range: '/22', Services range: '/24'
### Resource naming convention
Resources follow the naming convention described below.
- `prefix-layer` for projects
- `prefix-layer-prduct` for resources
- `prefix-layer[2]-gcp-product[2]-counter` for services and service accounts
### Encryption
We suggest a centralized approach to key management, where Organization Security is the only team that can access encryption material, and keyrings and keys are managed in a project external to the DP.
![Centralized Cloud Key Management high-level diagram](./images/kms_diagram.png "Centralized Cloud Key Management high-level diagram")
To configure the use of Cloud Key Management on resources you have to specify the key URL on the 'service_encryption_keys'. Keys location should match the resource location. Example:
```hcl
service_encryption_keys = {
bq = "KEY_URL_MULTIREGIONAL"
composer = "KEY_URL_REGIONAL"
dataflow = "KEY_URL_REGIONAL"
storage = "KEY_URL_MULTIREGIONAL"
pubsub = "KEY_URL_MULTIREGIONAL"
```
We consider this step optional, it depends on customer policy and security best practices.
## Data Anonymization
We suggest using Cloud Data Loss Prevention to identify/mask/tokenize your confidential data. Implementing the Data Loss Prevention strategy is out of scope for this example. We enable the service in 2 different projects to implement the data loss prevention strategy. We expect you will use [Cloud Data Loss Prevention templates](https://cloud.google.com/dlp/docs/concepts-templates) in one of the following ways:
- During the ingestion phase, from Dataflow
- During the transformation phase, from [BigQuery](https://cloud.google.com/bigquery/docs/scan-with-dlp) or [Cloud Dataflow](https://cloud.google.com/architecture/running-automated-dataflow-pipeline-de-identify-pii-dataset)
We implemented a centralized model for Cloud Data Loss Prevention resources. Templates will be stored in the security project:
![Centralized Cloud Data Loss Prevention high-level diagram](./images/dlp_diagram.png "Centralized Cloud Data Loss Prevention high-level diagram")
## How to run this script
To deploy this example on your GCP organization, you will need
- a folder or organization where new projects will be created
- a billing account that will be associated to new projects
- an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account
- a billing account that will be associated with the new projects
## Bringing up the platform
The Data Platform is meant to be executed by a Service Account (or a regular user) having this minimal set of permission:
[![Open in Cloud Shell](https://gstatic.com/cloudssh/images/open-btn.svg)](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fterraform-google-modules%2Fcloud-foundation-fabric.git&cloudshell_open_in_editor=README.md&cloudshell_workspace=examples%2Fdata-solutions%2Fdata-platform-foundations)
- Org level
- `"compute.organizations.enableXpnResource"`
- `"compute.organizations.disableXpnResource"`
- `"compute.subnetworks.setIamPolicy"`
- Folder level
- `"roles/logging.admin"`
- `"roles/owner"`
- `"roles/resourcemanager.folderAdmin"`
- `"roles/resourcemanager.projectCreator"`
- Cloud Key Management Keys** (if Cloud Key Management keys are configured):
- `"roles/cloudkms.admin"` or Permissions: `cloudkms.cryptoKeys.getIamPolicy`, `cloudkms.cryptoKeys.list`, `cloudkms.cryptoKeys.setIamPolicy`
- on the host project for the Shared VPC/s
- `"roles/browser"`
- `"roles/compute.viewer"`
- `"roles/dns.admin"`
The end-to-end example is composed of 2 foundational, and 1 optional steps:
## Variable configuration
1. [Environment setup](./01-environment/)
1. [Data source setup](./02-resources/)
1. (Optional) [Pipeline setup](./03-pipeline/)
There are three sets of variables you will need to fill in:
The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.
```hcl
prefix = "PRFX"
project_create = {
parent = "folders/123456789012"
billing_account_id = "111111-222222-333333"
}
organization = {
domain = "DOMAIN.com"
}
```
## TODO
For more fine details check variables on [`variables.tf`](./variables.tf) and update according to the desired configuration. Remember to create team groups described [below](#groups).
| Description | Priority (1:High - 5:Low ) | Status | Remarks |
|-------------|----------|:------:|---------|
| DLP best practices in the pipeline | 2 | Not Started | |
| Add Composer with a static DAG running the example | 3 | Not Started | |
| Integrate [CI/CD composer data processing workflow framework](https://github.com/jaketf/ci-cd-for-data-processing-workflow) | 3 | Not Started | |
| Schema changes, how to handle | 4 | Not Started | |
| Data lineage | 4 | Not Started | |
| Data quality checks | 4 | Not Started | |
| Shared-VPC | 5 | Not Started | |
| Logging & monitoring | TBD | Not Started | |
| Orcestration for ingestion pipeline (just in the readme) | TBD | Not Started | |
Once the configuration is complete, run the project factory by running
```bash
terraform init
terraform apply
```
## Customizations
### Create Cloud Key Management keys as part of the Data Platform
To create Cloud Key Management keys in the Data Platform you can uncomment the Cloud Key Management resources configured in the [`06-common.tf`](./06-common.tf) file and update Cloud Key Management keys pointers on `local.service_encryption_keys.*` to the local resource created.
### Assign roles at BQ Dataset level
To handle multiple groups of `data-analysts` accessing the same Data Lake layer projects but only to the dataset belonging to a specific group, you may want to assign roles at BigQuery dataset level instead of at project-level.
To do this, you need to remove IAM binging at project-level for the `data-analysts` group and give roles at BigQuery dataset level using the `iam` variable on `bigquery-dataset` modules.
## Demo pipeline
The application layer is out of scope of this script, but as a demo, it is provided with a Cloud Composer DAG to mode data from the `landing` area to the `DataLake L2` dataset.
Just follow the commands you find in the `demo_commands` Terraform output, go in the Cloud Composer UI and run the `data_pipeline_dag`.
Description of commands:
- 01: copy sample data to a `landing` Cloud Storage bucket impersonating the `load` service account.
- 02: copy sample data structure definition in the `orchestration` Cloud Storage bucket impersonating the `orchestration` service account.
- 03: copy the Cloud Composer DAG to the Cloud Composer Storage bucket impersonating the `orchestration` service account.
- 04: Open the Cloud Composer Airflow UI and run the imported DAG.
- 05: Run the BigQuery query to see results.
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [billing_account_id](variables.tf#L17) | Billing account id. | <code>string</code> | ✓ | |
| [folder_id](variables.tf#L41) | Folder to be used for the networking resources in folders/nnnn format. | <code>string</code> | ✓ | |
| [organization_domain](variables.tf#L79) | Organization domain. | <code>string</code> | ✓ | |
| [prefix](variables.tf#L84) | Unique prefix used for resource names. | <code>string</code> | ✓ | |
| [composer_config](variables.tf#L22) | | <code title="object&#40;&#123;&#10; node_count &#61; number&#10; airflow_version &#61; string&#10; env_variables &#61; map&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; node_count &#61; 3&#10; airflow_version &#61; &#34;composer-1.17.5-airflow-2.1.4&#34;&#10; env_variables &#61; &#123;&#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [data_force_destroy](variables.tf#L35) | Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage. | <code>bool</code> | | <code>false</code> |
| [groups](variables.tf#L46) | Groups. | <code>map&#40;string&#41;</code> | | <code title="&#123;&#10; data-analysts &#61; &#34;gcp-data-analysts&#34;&#10; data-engineers &#61; &#34;gcp-data-engineers&#34;&#10; data-security &#61; &#34;gcp-data-security&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [network_config](variables.tf#L56) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_links &#61; object&#40;&#123;&#10; load &#61; string&#10; transformation &#61; string&#10; orchestration &#61; string&#10; &#125;&#41;&#10; composer_ip_ranges &#61; object&#40;&#123;&#10; cloudsql &#61; string&#10; gke_master &#61; string&#10; web_server &#61; string&#10; &#125;&#41;&#10; composer_secondary_ranges &#61; object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [project_services](variables.tf#L89) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;cloudresourcemanager.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;,&#10; &#34;stackdriver.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [region](variables.tf#L100) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west1&#34;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [bigquery-datasets](outputs.tf#L17) | BigQuery datasets. | |
| [demo_commands](outputs.tf#L93) | Demo commands. | |
| [gcs-buckets](outputs.tf#L28) | GCS buckets. | |
| [kms_keys](outputs.tf#L42) | Cloud MKS keys. | |
| [projects](outputs.tf#L47) | GCP Projects informations. | |
| [vpc_network](outputs.tf#L75) | VPC network. | |
| [vpc_subnet](outputs.tf#L84) | VPC subnetworks. | |
<!-- END TFDOC -->
## TODOs
Features to add in future releases
- add support for column level access on BigQuery
- add example templates for Data Catalog
- add example on how to use Cloud Data Loss Prevention
- add solution to handle tables, views, and authorized views lifecycle
- add solution to handle metadata lifecycle
Fixes
- composer requires "Require OS Login" not enforced
- external Shared VPC

View File

@ -12,18 +12,19 @@
# See the License for the specific language governing permissions and
# limitations under the License.
# The `impersonate_service_account` option require the identity launching terraform
# role `roles/iam.serviceAccountTokenCreator` on the Service Account specified.
terraform {
required_version = ">= 1.0.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.0.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 4.0.0"
}
backend "gcs" {
bucket = "BUCKET_NAME"
prefix = "PREFIX"
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
}
}
provider "google" {
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
}
provider "google-beta" {
impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com"
}

View File

@ -0,0 +1,3 @@
# Data ingestion Demo
In this folder you can find an example to ingest data on the `data platfoem` instantiated in [here](../). See details in the [README.m](../#demo-pipeline) to run the demo.

View File

@ -0,0 +1,50 @@
[
{
"mode": "REQUIRED",
"name": "id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "customer_id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "purchase_id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "customer_name",
"type": "STRING",
"description": "Name"
},
{
"mode": "REQUIRED",
"name": "customer_surname",
"type": "STRING",
"description": "Surname"
},
{
"mode": "REQUIRED",
"name": "purchase_item",
"type": "STRING",
"description": "Item Name"
},
{
"mode": "REQUIRED",
"name": "price",
"type": "FLOAT",
"description": "Item Price"
},
{
"mode": "REQUIRED",
"name": "purchase_timestamp",
"type": "TIMESTAMP",
"description": "Timestamp"
}
]

View File

@ -0,0 +1,12 @@
1,Name1,Surname1,1636972001
2,Name2,Surname2,1636972002
3,Name3,Surname3,1636972003
4,Name4,Surname4,1636972004
5,Name5,Surname5,1636972005
6,Name6,Surname6,1636972006
7,Name7,Surname7,1636972007
8,Name8,Surname8,1636972008
9,Name9,Surname9,1636972009
10,Name11,Surname11,1636972010
11,Name12,Surname12,1636972011
12,Name13,Surname13,1636972012
1 1 Name1 Surname1 1636972001
2 2 Name2 Surname2 1636972002
3 3 Name3 Surname3 1636972003
4 4 Name4 Surname4 1636972004
5 5 Name5 Surname5 1636972005
6 6 Name6 Surname6 1636972006
7 7 Name7 Surname7 1636972007
8 8 Name8 Surname8 1636972008
9 9 Name9 Surname9 1636972009
10 10 Name11 Surname11 1636972010
11 11 Name12 Surname12 1636972011
12 12 Name13 Surname13 1636972012

View File

@ -0,0 +1,26 @@
[
{
"mode": "REQUIRED",
"name": "id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "name",
"type": "STRING",
"description": "Name"
},
{
"mode": "REQUIRED",
"name": "surname",
"type": "STRING",
"description": "Surname"
},
{
"mode": "REQUIRED",
"name": "timestamp",
"type": "TIMESTAMP",
"description": "Timestamp"
}
]

View File

@ -0,0 +1,28 @@
{
"BigQuery Schema": [
{
"mode": "REQUIRED",
"name": "id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "name",
"type": "STRING",
"description": "Name"
},
{
"mode": "REQUIRED",
"name": "surname",
"type": "STRING",
"description": "Surname"
},
{
"mode": "REQUIRED",
"name": "timestamp",
"type": "TIMESTAMP",
"description": "Timestamp"
}
]
}

View File

@ -0,0 +1,12 @@
function transform(line) {
var values = line.split(',');
var obj = new Object();
obj.id = values[0]
obj.name = values[1];
obj.surname = values[2];
obj.timestamp = values[3];
var jsonString = JSON.stringify(obj);
return jsonString;
}

View File

@ -0,0 +1,20 @@
1,1,Car1,5000,1636972012
1,1,Car1,7000,1636972045
1,2,Car1,6000,1636972088
1,2,Car1,8000,16369720099
1,3,Car1,10000,1636972102
1,3,Car1,50000,1636972180
1,4,Car1,13000,1636972260
1,4,Car1,5000,1636972302
1,5,Car1,2000,1636972408
1,1,Car1,77000,1636972501
1,1,Car1,64000,1636975001
1,8,Car1,2000,1636976001
1,9,Car1,4000,1636977001
1,10,Car1,18000,1636982001
1,11,Car1,21000,1636992001
1,11,Car1,33000,1636932001
1,11,Car1,37000,1636872001
1,11,Car1,26000,1636772001
1,12,Car1,22000,1636672001
1,4,Car1,11000,1636952001
1 1 1 Car1 5000 1636972012
2 1 1 Car1 7000 1636972045
3 1 2 Car1 6000 1636972088
4 1 2 Car1 8000 16369720099
5 1 3 Car1 10000 1636972102
6 1 3 Car1 50000 1636972180
7 1 4 Car1 13000 1636972260
8 1 4 Car1 5000 1636972302
9 1 5 Car1 2000 1636972408
10 1 1 Car1 77000 1636972501
11 1 1 Car1 64000 1636975001
12 1 8 Car1 2000 1636976001
13 1 9 Car1 4000 1636977001
14 1 10 Car1 18000 1636982001
15 1 11 Car1 21000 1636992001
16 1 11 Car1 33000 1636932001
17 1 11 Car1 37000 1636872001
18 1 11 Car1 26000 1636772001
19 1 12 Car1 22000 1636672001
20 1 4 Car1 11000 1636952001

View File

@ -0,0 +1,32 @@
[
{
"mode": "REQUIRED",
"name": "id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "customer_id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "item",
"type": "STRING",
"description": "Item Name"
},
{
"mode": "REQUIRED",
"name": "price",
"type": "FLOAT",
"description": "Item Price"
},
{
"mode": "REQUIRED",
"name": "timestamp",
"type": "TIMESTAMP",
"description": "Timestamp"
}
]

View File

@ -0,0 +1,34 @@
{
"BigQuery Schema": [
{
"mode": "REQUIRED",
"name": "id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "customer_id",
"type": "INTEGER",
"description": "ID"
},
{
"mode": "REQUIRED",
"name": "item",
"type": "STRING",
"description": "Item Name"
},
{
"mode": "REQUIRED",
"name": "price",
"type": "FLOAT",
"description": "Item Price"
},
{
"mode": "REQUIRED",
"name": "timestamp",
"type": "TIMESTAMP",
"description": "Timestamp"
}
]
}

View File

@ -0,0 +1,13 @@
function transform(line) {
var values = line.split(',');
var obj = new Object();
obj.id = values[0];
obj.customer_id = values[1];
obj.item = values[2];
obj.price = values[3];
obj.timestamp = values[4];
var jsonString = JSON.stringify(obj);
return jsonString;
}

View File

@ -0,0 +1,201 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# --------------------------------------------------------------------------------
# Load The Dependencies
# --------------------------------------------------------------------------------
import csv
import datetime
import io
import logging
import os
from airflow import models
from airflow.contrib.operators.dataflow_operator import DataflowTemplateOperator
from airflow.operators import dummy
from airflow.providers.google.cloud.operators.bigquery import BigQueryInsertJobOperator
# --------------------------------------------------------------------------------
# Set variables
# ------------------------------------------------------------
DTL_L0_PRJ = os.environ.get("DTL_L0_PRJ")
DTL_L0_BQ_DATASET = os.environ.get("DTL_L0_BQ_DATASET")
DTL_L0_GCS = os.environ.get("DTL_L0_GCS")
DTL_L1_PRJ = os.environ.get("DTL_L1_PRJ")
DTL_L1_BQ_DATASET = os.environ.get("DTL_L1_BQ_DATASET")
DTL_L1_GCS = os.environ.get("DTL_L1_GCS")
DTL_L2_PRJ = os.environ.get("DTL_L2_PRJ")
DTL_L2_BQ_DATASET = os.environ.get("DTL_L2_BQ_DATASET")
DTL_L2_GCS = os.environ.get("DTL_L2_GCS")
DTL_PLG_PRJ = os.environ.get("DTL_PLG_PRJ")
DTL_PLG_BQ_DATASET = os.environ.get("DTL_PLG_BQ_DATASET")
DTL_PLG_GCS = os.environ.get("DTL_PLG_GCS")
GCP_REGION = os.environ.get("GCP_REGION")
LND_PRJ = os.environ.get("LND_PRJ")
LND_BQ = os.environ.get("LND_BQ")
LND_GCS = os.environ.get("LND_GCS")
LND_PS = os.environ.get("LND_PS")
LOD_PRJ = os.environ.get("LOD_PRJ")
LOD_GCS_STAGING = os.environ.get("LOD_GCS_STAGING")
LOD_NET_VPC = os.environ.get("LOD_NET_VPC")
LOD_NET_SUBNET = os.environ.get("LOD_NET_SUBNET")
LOD_SA_DF = os.environ.get("LOD_SA_DF")
ORC_PRJ = os.environ.get("ORC_PRJ")
ORC_GCS = os.environ.get("ORC_GCS")
TRF_PRJ = os.environ.get("TRF_PRJ")
TRF_GCS_STAGING = os.environ.get("TRF_GCS_STAGING")
TRF_NET_VPC = os.environ.get("TRF_NET_VPC")
TRF_NET_SUBNET = os.environ.get("TRF_NET_SUBNET")
TRF_SA_DF = os.environ.get("TRF_SA_DF")
TRF_SA_BQ = os.environ.get("TRF_SA_BQ")
DF_ZONE = os.environ.get("GCP_REGION") + "-b"
DF_REGION = BQ_REGION = os.environ.get("GCP_REGION")
# --------------------------------------------------------------------------------
# Set default arguments
# --------------------------------------------------------------------------------
# If you are running Airflow in more than one time zone
# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html
# for best practices
yesterday = datetime.datetime.now() - datetime.timedelta(days=1)
default_args = {
'owner': 'airflow',
'start_date': yesterday,
'depends_on_past': False,
'email': [''],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': datetime.timedelta(minutes=5),
'dataflow_default_options': {
'project': LOD_PRJ,
'location': DF_REGION,
'zone': DF_ZONE,
'stagingLocation': LOD_GCS_STAGING,
'tempLocation': LOD_GCS_STAGING + "/tmp",
'serviceAccountEmail': LOD_SA_DF,
'subnetwork': LOD_NET_SUBNET,
'ipConfiguration': "WORKER_IP_PRIVATE"
},
}
# --------------------------------------------------------------------------------
# Main DAG
# --------------------------------------------------------------------------------
with models.DAG(
'data_pipeline_dag',
default_args=default_args,
schedule_interval=None) as dag:
start = dummy.DummyOperator(
task_id='start',
trigger_rule='all_success'
)
end = dummy.DummyOperator(
task_id='end',
trigger_rule='all_success'
)
customers_import = DataflowTemplateOperator(
task_id="dataflow_customer_import",
template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery",
parameters={
"javascriptTextTransformFunctionName": "transform",
"JSONPath": ORC_GCS + "/customers_schema.json",
"javascriptTextTransformGcsPath": ORC_GCS + "/customers_udf.js",
"inputFilePattern": LND_GCS + "/customers.csv",
"outputTable": DTL_L0_PRJ + ":"+DTL_L0_BQ_DATASET+".customers",
"bigQueryLoadingTemporaryDirectory": LOD_GCS_STAGING + "/tmp/bq/",
},
)
purchases_import = DataflowTemplateOperator(
task_id="dataflow_purchases_import",
template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery",
parameters={
"javascriptTextTransformFunctionName": "transform",
"JSONPath": ORC_GCS + "/purchases_schema.json",
"javascriptTextTransformGcsPath": ORC_GCS + "/purchases_udf.js",
"inputFilePattern": LND_GCS + "/purchases.csv",
"outputTable": DTL_L0_PRJ + ":"+DTL_L0_BQ_DATASET+".purchases",
"bigQueryLoadingTemporaryDirectory": LOD_GCS_STAGING + "/tmp/bq/",
},
)
join_customer_purchase = BigQueryInsertJobOperator(
task_id='bq_join_customer_purchase',
gcp_conn_id='bigquery_default',
project_id=TRF_PRJ,
location=BQ_REGION,
configuration={
'jobType':'QUERY',
'query':{
'query':"""SELECT
c.id as customer_id,
p.id as purchase_id,
c.name as name,
c.surname as surname,
p.item as item,
p.price as price,
p.timestamp as timestamp
FROM `{dtl_0_prj}.{dtl_0_dataset}.customers` c
JOIN `{dtl_0_prj}.{dtl_0_dataset}.purchases` p ON c.id = p.customer_id
""".format(dtl_0_prj=DTL_L0_PRJ, dtl_0_dataset=DTL_L0_BQ_DATASET, ),
'destinationTable':{
'projectId': DTL_L1_PRJ,
'datasetId': DTL_L1_BQ_DATASET,
'tableId': 'customer_purchase'
},
'writeDisposition':'WRITE_TRUNCATE',
"useLegacySql": False
}
},
impersonation_chain=[TRF_SA_BQ]
)
l2_customer_purchase = BigQueryInsertJobOperator(
task_id='bq_l2_customer_purchase',
gcp_conn_id='bigquery_default',
project_id=TRF_PRJ,
location=BQ_REGION,
configuration={
'jobType':'QUERY',
'query':{
'query':"""SELECT
customer_id,
purchase_id,
name,
surname,
item,
price,
timestamp
FROM `{dtl_1_prj}.{dtl_1_dataset}.customer_purchase`
""".format(dtl_1_prj=DTL_L1_PRJ, dtl_1_dataset=DTL_L1_BQ_DATASET, ),
'destinationTable':{
'projectId': DTL_L2_PRJ,
'datasetId': DTL_L2_BQ_DATASET,
'tableId': 'customer_purchase'
},
'writeDisposition':'WRITE_TRUNCATE',
"useLegacySql": False
}
},
impersonation_chain=[TRF_SA_BQ]
)
start >> [customers_import, purchases_import] >> join_customer_purchase >> l2_customer_purchase >> end

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 70 KiB

View File

@ -0,0 +1,53 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Core locals.
locals {
groups = {
for k, v in var.groups : k => "${v}@${var.organization_domain}"
}
groups_iam = {
for k, v in local.groups : k => "group:${v}"
}
service_encryption_keys = var.service_encryption_keys
shared_vpc_project = try(var.network_config.host_project, null)
use_shared_vpc = var.network_config != null
}
module "shared-vpc-project" {
source = "../../../modules/project"
count = local.use_shared_vpc ? 1 : 0
name = var.network_config.host_project
project_create = false
iam_additive = {
"roles/compute.networkUser" = [
# load Dataflow service agent and worker service account
module.load-project.service_accounts.robots.dataflow,
module.load-sa-df-0.iam_email,
# orchestration Composer service agents
module.orch-project.service_accounts.robots.cloudservices,
module.orch-project.service_accounts.robots.container-engine,
module.orch-project.service_accounts.robots.dataflow,
],
"roles/composer.sharedVpcAgent" = [
# orchestration Composer service agent
module.orch-project.service_accounts.robots.composer
],
"roles/container.hostServiceAgentUser" = [
# orchestration Composer service agents
module.orch-project.service_accounts.robots.dataflow,
]
}
}

View File

@ -0,0 +1,104 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Output variables.
output "bigquery-datasets" {
description = "BigQuery datasets."
value = {
land-bq-0 = module.land-bq-0.dataset_id,
lake-0-bq-0 = module.lake-0-bq-0.dataset_id,
lake-1-bq-0 = module.lake-1-bq-0.dataset_id,
lake-2-bq-0 = module.lake-2-bq-0.dataset_id,
lake-plg-bq-0 = module.lake-plg-bq-0.dataset_id,
}
}
output "gcs-buckets" {
description = "GCS buckets."
value = {
lake-0-cs-0 = module.lake-0-cs-0.name,
lake-1-cs-0 = module.lake-1-cs-0.name,
lake-2-cs-0 = module.lake-2-cs-0.name,
lake-plg-cs-0 = module.lake-plg-cs-0.name,
land-cs-0 = module.land-cs-0.name,
lod-cs-df = module.load-cs-df-0.name,
orch-cs-0 = module.orch-cs-0.name,
transf-cs-df = module.transf-cs-df-0.name,
}
}
output "kms_keys" {
description = "Cloud MKS keys."
value = local.service_encryption_keys
}
output "projects" {
description = "GCP Projects informations."
value = {
project_number = {
lake-0 = module.lake-0-project.number,
lake-1 = module.lake-1-project.number,
lake-2 = module.lake-2-project.number,
lake-plg = module.lake-plg-project.number,
exposure = module.exp-project.number,
landing = module.land-project.number,
load = module.load-project.number,
orchestration = module.orch-project.number,
transformation = module.transf-project.number,
}
project_id = {
lake-0 = module.lake-0-project.project_id,
lake-1 = module.lake-1-project.project_id,
lake-2 = module.lake-2-project.project_id,
lake-plg = module.lake-plg-project.project_id,
exposure = module.exp-project.project_id,
landing = module.land-project.project_id,
load = module.load-project.project_id,
orchestration = module.orch-project.project_id,
transformation = module.transf-project.project_id,
}
}
}
output "vpc_network" {
description = "VPC network."
value = {
load = local.load_vpc
orchestration = local.orch_vpc
transformation = local.transf_vpc
}
}
output "vpc_subnet" {
description = "VPC subnetworks."
value = {
load = local.load_subnet
orchestration = local.orch_subnet
transformation = local.transf_subnet
}
}
output "demo_commands" {
description = "Demo commands."
value = {
01 = "gsutil -i ${module.land-sa-cs-0.email} cp demo/data/*.csv gs://${module.land-cs-0.name}"
02 = "gsutil -i ${module.orch-sa-cmp-0.email} cp demo/data/*.j* gs://${module.orch-cs-0.name}"
03 = "gsutil -i ${module.orch-sa-cmp-0.email} cp demo/*.py ${google_composer_environment.orch-cmp-0.config[0].dag_gcs_prefix}/"
04 = "Open ${google_composer_environment.orch-cmp-0.config.0.airflow_uri} and run uploaded DAG."
05 = <<EOT
bq query --project_id=${module.lake-2-project.project_id} --use_legacy_sql=false 'SELECT * FROM `${module.lake-2-project.project_id}.${module.lake-2-bq-0.dataset_id}.customer_purchase` LIMIT 1000'"
EOT
}
}

View File

@ -0,0 +1,8 @@
prefix = "prefix"
project_create = {
parent = "folders/123456789012"
billing_account_id = "111111-222222-333333"
}
organization = {
domain = "example.com"
}

View File

@ -0,0 +1,116 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Terraform Variables.
variable "billing_account_id" {
description = "Billing account id."
type = string
}
variable "composer_config" {
type = object({
node_count = number
airflow_version = string
env_variables = map(string)
})
default = {
node_count = 3
airflow_version = "composer-1.17.5-airflow-2.1.4"
env_variables = {}
}
}
variable "data_force_destroy" {
description = "Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage."
type = bool
default = false
}
variable "folder_id" {
description = "Folder to be used for the networking resources in folders/nnnn format."
type = string
}
variable "groups" {
description = "Groups."
type = map(string)
default = {
data-analysts = "gcp-data-analysts"
data-engineers = "gcp-data-engineers"
data-security = "gcp-data-security"
}
}
variable "network_config" {
description = "Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values."
type = object({
host_project = string
network_self_link = string
subnet_self_links = object({
load = string
transformation = string
orchestration = string
})
composer_ip_ranges = object({
cloudsql = string
gke_master = string
web_server = string
})
composer_secondary_ranges = object({
pods = string
services = string
})
})
default = null
}
variable "organization_domain" {
description = "Organization domain."
type = string
}
variable "prefix" {
description = "Unique prefix used for resource names."
type = string
}
variable "project_services" {
description = "List of core services enabled on all projects."
type = list(string)
default = [
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com"
]
}
variable "region" {
description = "Region used for regional resources."
type = string
default = "europe-west1"
}
variable "service_encryption_keys" { # service encription key
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
type = object({
bq = string
composer = string
dataflow = string
storage = string
pubsub = string
})
default = null
}

View File

@ -195,6 +195,7 @@ resource "google_organization_iam_binding" "org_admin_delegated" {
"roles/compute.orgFirewallPolicyAdmin",
"roles/compute.xpnAdmin",
"roles/orgpolicy.policyAdmin",
module.organization.custom_role_id.serviceProjectNetworkAdmin
],
local.billing_org ? [
"roles/billing.admin",

View File

@ -67,6 +67,16 @@ locals {
billing_account_id = var.billing_account.id
prefix = var.prefix
})
"03-data-platform-dev" = jsonencode({
billing_account_id = var.billing_account.id
organization = var.organization
prefix = var.prefix
})
"03-data-platform-prod" = jsonencode({
billing_account_id = var.billing_account.id
organization = var.organization
prefix = var.prefix
})
}
}

View File

@ -20,6 +20,8 @@ locals {
# used here for convenience, in organization.tf members are explicit
billing_ext_users = concat(
[
module.branch-dp-dev-sa.iam_email,
module.branch-dp-prod-sa.iam_email,
module.branch-network-sa.iam_email,
module.branch-security-sa.iam_email,
],

View File

@ -0,0 +1,137 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
# tfdoc:file:description Data Platform stages resources.
# top-level Data Platform folder and service account
module "branch-dp-folder" {
source = "../../../modules/folder"
parent = "organizations/${var.organization.id}"
name = "Dataplatform"
}
#TODO check if I can delete those modules, Would you create a data-platform TF to run dev/prod?
# module "branch-dp-sa" {
# source = "../../../modules/iam-service-account"
# project_id = var.automation_project_id
# name = "resman-dp-0"
# description = "Terraform Data Platform production service account."
# prefix = local.prefixes.prod
# }
# module "branch-dp-gcs" {
# source = "../../../modules/gcs"
# project_id = var.automation_project_id
# name = "dp-0"
# prefix = local.prefixes.prod
# versioning = true
# iam = {
# "roles/storage.objectAdmin" = [module.branch-dp-sa.iam_email]
# }
# }
# environment: development folder
module "branch-dp-dev-folder" {
source = "../../../modules/folder"
parent = module.branch-dp-folder.id
# naming: environment descriptive name
name = "Data Platform - Development"
# environment-wide human permissions on the whole Data Platform environment
group_iam = {}
iam = {
# remove owner here and at project level if SA does not manage project resources
"roles/owner" = [
module.branch-dp-dev-sa.iam_email
]
"roles/logging.admin" = [
module.branch-dp-dev-sa.iam_email
]
"roles/resourcemanager.folderAdmin" = [
module.branch-dp-dev-sa.iam_email
]
"roles/resourcemanager.projectCreator" = [
module.branch-dp-dev-sa.iam_email
]
}
}
module "branch-dp-dev-sa" {
source = "../../../modules/iam-service-account"
project_id = var.automation_project_id
name = "resman-dp-dev-0"
# naming: environment in description
description = "Terraform Data Platform development service account."
prefix = local.prefixes.dev
}
module "branch-dp-dev-gcs" {
source = "../../../modules/gcs"
project_id = var.automation_project_id
name = "resman-dp-0"
prefix = local.prefixes.dev
versioning = true
iam = {
"roles/storage.objectAdmin" = [module.branch-dp-dev-sa.iam_email]
}
}
# environment: production folder
module "branch-dp-prod-folder" {
source = "../../../modules/folder"
parent = module.branch-dp-folder.id
# naming: environment descriptive name
name = "Data Platform - Production"
# environment-wide human permissions on the whole Data Platform environment
group_iam = {}
iam = {
# remove owner here and at project level if SA does not manage project resources
"roles/owner" = [
module.branch-dp-prod-sa.iam_email
]
"roles/logging.admin" = [
module.branch-dp-prod-sa.iam_email
]
"roles/resourcemanager.folderAdmin" = [
module.branch-dp-prod-sa.iam_email
]
"roles/resourcemanager.projectCreator" = [
module.branch-dp-prod-sa.iam_email
]
}
}
module "branch-dp-prod-sa" {
source = "../../../modules/iam-service-account"
project_id = var.automation_project_id
name = "resman-dp-0"
# naming: environment in description
description = "Terraform Data Platform production service account."
prefix = local.prefixes.prod
}
module "branch-dp-prod-gcs" {
source = "../../../modules/gcs"
project_id = var.automation_project_id
name = "resman-dp-0"
prefix = local.prefixes.prod
versioning = true
iam = {
"roles/storage.objectAdmin" = [module.branch-dp-prod-sa.iam_email]
}
}

View File

@ -18,6 +18,11 @@
locals {
# set to the empty list if you remove the data platform branch
branch_dataplatform_pf_sa_iam_emails = [
module.branch-dp-dev-sa.iam_email,
module.branch-dp-prod-sa.iam_email
]
# set to the empty list if you remove the teams branch
branch_teams_pf_sa_iam_emails = [
module.branch-teams-dev-projectfactory-sa.iam_email,
@ -58,7 +63,10 @@ module "organization" {
"roles/compute.xpnAdmin" = [
module.branch-network-sa.iam_email
]
"roles/orgpolicy.policyAdmin" = local.branch_teams_pf_sa_iam_emails
"roles/orgpolicy.policyAdmin" = concat(
local.branch_dataplatform_pf_sa_iam_emails,
local.branch_teams_pf_sa_iam_emails
)
},
local.billing_org ? {
"roles/billing.costsManager" = local.branch_teams_pf_sa_iam_emails
@ -71,6 +79,7 @@ module "organization" {
# [
# for k, v in module.branch-teams-team-sa : v.iam_email
# ],
local.branch_dataplatform_pf_sa_iam_emails,
local.branch_teams_pf_sa_iam_emails
)
} : {}

View File

@ -15,6 +15,10 @@
*/
locals {
_data_platform_sas = {
dev = module.branch-dp-dev-sa.iam_email
prod = module.branch-dp-prod-sa.iam_email
}
_project_factory_sas = {
dev = module.branch-teams-dev-projectfactory-sa.iam_email
prod = module.branch-teams-prod-projectfactory-sa.iam_email
@ -30,6 +34,16 @@ locals {
name = "security"
sa = module.branch-security-sa.email
})
"03-data-platform-dev" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
bucket = module.branch-dp-dev-gcs.name
name = "dp-dev"
sa = module.branch-dp-dev-sa.email
})
"03-data-platform-prod" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
bucket = module.branch-dp-prod-gcs.name
name = "dp-prod"
sa = module.branch-dp-prod-sa.email
})
"03-project-factory-dev" = templatefile("${path.module}/../../assets/templates/providers.tpl", {
bucket = module.branch-teams-dev-projectfactory-gcs.name
name = "team-dev"
@ -48,12 +62,14 @@ locals {
}
tfvars = {
"02-networking" = jsonencode({
data_platform_sa = local._data_platform_sas
folder_ids = {
networking = module.branch-network-folder.id
networking-dev = module.branch-network-dev-folder.id
networking-prod = module.branch-network-prod-folder.id
}
project_factory_sa = local._project_factory_sas
data_platform_sa = local._data_platform_sas
})
"02-security" = jsonencode({
folder_id = module.branch-security-folder.id
@ -61,6 +77,14 @@ locals {
for k, v in local._project_factory_sas : k => [v]
}
})
"03-data-platform-dev" = jsonencode({
folder_id = module.branch-dp-dev-folder.id
date_platform_sa = module.branch-dp-dev-sa.iam_email
})
"03-data-platform-prod" = jsonencode({
folder_id = module.branch-dp-dev-folder.id
date_platform_sa = module.branch-dp-dev-sa.iam_email
})
}
}

View File

@ -0,0 +1,33 @@
# skip boilerplate check
allow-dataflow-load-ingress-traffic:
description: "Allow traffic on Cloud Dataflow subnet"
direction: INGRESS
action: allow
sources: []
ranges:
- 10.10.0.0/24
- 10.10.1.0/24
targets: []
use_service_accounts: false
rules:
- protocol: tcp
ports:
- 12345
- 12346
allow-composer-health-checks:
description: "Allow Health Checks"
direction: INGRESS
action: allow
sources: []
ranges:
- 130.211.0.0/22
- 35.191.0.0/16
targets: []
use_service_accounts: false
rules:
- protocol: tcp
ports:
- 80
- 443

View File

@ -0,0 +1,5 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Load layer Dataflow
ip_cidr_range: 10.10.0.0/24

View File

@ -0,0 +1,8 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Orchestration layer Composer
ip_cidr_range: 172.18.16.0/24
secondary_ip_range :
pods: 172.18.24.0/22
services: 172.18.28.0/24

View File

@ -0,0 +1,5 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Transformation layer Dataflow
ip_cidr_range: 10.10.1.0/24

View File

@ -0,0 +1,5 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Load layer Dataflow
ip_cidr_range: 10.20.0.0/24

View File

@ -0,0 +1,8 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Orchestration layer Composer
ip_cidr_range: 10.20.2.0/24
secondary_ip_range :
pods: 10.20.8.0/22
services: 10.20.12.0/24

View File

@ -0,0 +1,5 @@
# skip boilerplate check
region: europe-west1
description: Default subnet for dev Data Platform - Transformation layer Dataflow
ip_cidr_range: 10.20.1.0/24

View File

@ -89,5 +89,5 @@ module "landing-nat-ew1" {
router_create = true
router_name = "prod-nat-ew1"
router_network = module.landing-vpc.name
router_asn = 4200001024
router_asn = 65530
}

View File

@ -27,6 +27,30 @@ locals {
shared_vpc_self_link = module.prod-spoke-vpc.self_link
vpc_host_project = module.prod-spoke-project.project_id
})
"03-data-platform-prod" = jsonencode({
network_self_link = module.prod-spoke-vpc.self_link
subnet_self_links = {
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].self_link
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].self_link
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].self_link
}
})
"03-data-platform-prod" = jsonencode({
network_config = {
host_project = module.prod-spoke-project.project_id
network = module.prod-spoke-vpc.self_link
vpc_subnet_range = {
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].ip_cidr_range
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].ip_cidr_range
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].ip_cidr_range
}
vpc_subnet_self_link = {
load = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-lod-ew1"].self_link
orchestration = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-orc-ew1"].self_link
transformation = module.prod-spoke-vpc.subnets["europe-west1/prod-dp-trf-ew1"].self_link
}
}
})
}
}

View File

@ -27,6 +27,7 @@ module "dev-spoke-project" {
disable_dependent_services = false
}
services = [
"container.googleapis.com",
"compute.googleapis.com",
"dns.googleapis.com",
"iap.googleapis.com",
@ -92,7 +93,7 @@ module "dev-spoke-cloudnat" {
name = "dev-nat-${local.region_trigram[each.value]}"
router_create = true
router_network = module.dev-spoke-vpc.name
router_asn = 4200001024
router_asn = 65530
logging_filter = "ERRORS_ONLY"
}
@ -112,6 +113,7 @@ resource "google_project_iam_binding" "dev_spoke_project_iam_delegated" {
project = module.dev-spoke-project.project_id
role = "roles/resourcemanager.projectIamAdmin"
members = [
var.data_platform_sa.dev,
var.project_factory_sa.dev
]
condition {

View File

@ -92,7 +92,7 @@ module "prod-spoke-cloudnat" {
name = "prod-nat-${local.region_trigram[each.value]}"
router_create = true
router_network = module.prod-spoke-vpc.name
router_asn = 4200001024
router_asn = 65530
logging_filter = "ERRORS_ONLY"
}
@ -112,6 +112,7 @@ resource "google_project_iam_binding" "prod_spoke_project_iam_delegated" {
project = module.prod-spoke-project.project_id
role = "roles/resourcemanager.projectIamAdmin"
members = [
var.data_platform_sa.prod,
var.project_factory_sa.prod
]
condition {

View File

@ -50,6 +50,13 @@ variable "data_dir" {
default = "data"
}
variable "data_platform_sa" {
# tfdoc:variable:source 01-resman
description = "IAM emails for Data Platform service accounts."
type = map(string)
default = {}
}
variable "dns" {
description = "Onprem DNS resolvers."
type = map(list(string))

View File

@ -0,0 +1,6 @@
# Data Platform
The Data Platform (DP) builds on top of your foundations to create and set up projects (and related resources) to be used for your workloads.
It is organized in folders representing environments (e.g. "dev", "prod"), each implemented by a stand-alone terraform.
This directory contains a single DP ([`dev/`](./dev/)) as an example - to implement multiple environments (e.g. "prod" and "dev") you'll need to copy the `dev` folder into one folder per environment, then customize variables following the instructions found in [`dev/README.md`](./dev/README.md).

View File

@ -0,0 +1,140 @@
# Data Platform
The Data Platform (DP) builds on top of your foundations to create and set up projects (and related resources) to be used for your data platform.
<p align="center">
<img src="diagram.png" alt="Data Platform diagram">
</p>
## Design overview and choices
The DP creates projects in a well-defined context, according to your resource management structure. Within the DP folder, resources are organized by environment.
Projects for each environment across different data layer are created to separate Service Account and Group roles. Roles are assigned at project level.
The Data Platform takes care of the following activities:
- Project creation
- API/Services enablement
- Service accounts creation
- IAM roles assignment for groups and service accounts
- KMS keys roles assignment
- Shared VPC attachment and subnets IAM binding
- Project-level org policies definition
- Billing setup (billing account attachment and budget configuration)
- Resource on each project to handle your data platform.
You can find more details on the DP implemented on the DP [README](../../../examples/data-solutions/data-platform-foundations/).
### User Groups
The DP rely on user groups to assign roles. They provide a stable frame of reference that allows decoupling the final set of permissions for each group, from the stage where entities and resources are created and their IAM bindings defined. [Here]((../../../examples/data-solutions/data-platform-foundations/#groups)) you can find more detail on users groups used by the DP.
### Network
The DP rely on the shared VPC defined on the `[02-networking](../../../02-network-vpn)` stage.
### Encryption
The DP may rely on Cloud KMS crypto keys created by the `[02-security](../../../02-security)` stage.
## How to run this stage
This stage is meant to be executed after "foundational stages" (i.e., stages [`00-bootstrap`](../../00-bootstrap), [`01-resman`](../../01-resman), [`02-networking`](../../02-networking) and [`02-security`](../../02-security)) have been run.
It's of course possible to run this stage in isolation, by making sure the architectural prerequisites are satisfied (e.g., networking), and that the Service Account running the stage is granted the roles/permissions below:
- One service account per environment, each with appropriate permissions
- at the organization level a custom role for networking operations including the following permissions
- `"compute.organizations.enableXpnResource"`,
- `"compute.organizations.disableXpnResource"`,
- `"compute.subnetworks.setIamPolicy"`,
- and role `"roles/orgpolicy.policyAdmin"`
- on each folder where projects are created
- `"roles/logging.admin"`
- `"roles/owner"`
- `"roles/resourcemanager.folderAdmin"`
- `"roles/resourcemanager.projectCreator"`
- on the host project for the Shared VPC
- `"roles/browser"`
- `"roles/compute.viewer"`
- VPC Host projects and their subnets should exist when creating projects
### Providers configuration
If you're running this on top of Fast, you should run the following commands to create the providers file, and populate the required variables from the previous stage.
```bash
# Variable `outputs_location` is set to `../../../config` in stage 01-resman
$ cd fabric-fast/stages/03-data-platform/dev
ln -s ../../../config/03-data-platform-dev/providers.tf
```
### Variable configuration
There are two broad sets of variables you will need to fill in:
- variables shared by other stages (org id, billing account id, etc.), or derived from a resource managed by a different stage (folder id, automation project id, etc.)
- variables specific to resources managed by this stage
To avoid the tedious job of filling in the first group of variables with values derived from other stages' outputs, the same mechanism used above for the provider configuration can be used to leverage pre-configured `.tfvars` files.
If you configured a valid path for `outputs_location` in the bootstrap and networking stage, simply link the relevant `terraform-*.auto.tfvars.json` files from this stage's outputs folder (under the path you specified), where the `*` above is set to the name of the stage that produced it. For this stage, a single `.tfvars` file is available:
```bash
# Variable `outputs_location` is set to `../../../config` in stages 01-bootstrap and 02-networking
ln -s ../../../config/03-data-platform-prod/terraform-bootstrap.auto.tfvars.json
ln -s ../../../config/03-data-platform-prod/terraform-networking.auto.tfvars.json
```
If you're not using Fast, refer to the [Variables](#variables) table at the bottom of this document for a full list of variables, their origin (e.g., a stage or specific to this one), and descriptions explaining their meaning.
Once the configuration is complete, run the project factory by running
```bash
terraform init
terraform apply
```
<!-- TFDOC OPTS files:1 show_extra:1 -->
<!-- BEGIN TFDOC -->
## Files
| name | description | modules | resources |
|---|---|---|---|
| [main.tf](./main.tf) | Data Platformy. | <code>data-platform-foundations</code> | |
| [outputs.tf](./outputs.tf) | Output variables. | | <code>local_file</code> |
| [providers.tf](./providers.tf) | Provider configurations. | | |
| [variables.tf](./variables.tf) | Terraform Variables. | | |
## Variables
| name | description | type | required | default | producer |
|---|---|:---:|:---:|:---:|:---:|
| [billing_account_id](variables.tf#L17) | Billing account id. | <code>string</code> | ✓ | | <code>00-bootstrap</code> |
| [folder_id](variables.tf#L66) | Folder to be used for the networking resources in folders/nnnn format. | <code>string</code> | ✓ | | <code>resman</code> |
| [network_config](variables.tf#L94) | Network configurations to use. Specify a shared VPC to use, if null networks will be created in projects. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network &#61; string&#10; vpc_subnet_self_link &#61; object&#40;&#123;&#10; load &#61; string&#10; transformation &#61; string&#10; orchestration &#61; string&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | | |
| [organization](variables.tf#L107) | Organization details. | <code title="object&#40;&#123;&#10; domain &#61; string&#10; id &#61; number&#10; customer_id &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | | <code>00-bootstrap</code> |
| [prefix](variables.tf#L123) | Unique prefix used for resource names. Not used for projects if 'project_create' is null. | <code>string</code> | ✓ | | <code>00-bootstrap</code> |
| [composer_config](variables.tf#L23) | | <code title="object&#40;&#123;&#10; node_count &#61; number&#10; ip_range_cloudsql &#61; string&#10; ip_range_gke_master &#61; string&#10; ip_range_web_server &#61; string&#10; project_policy_boolean &#61; map&#40;bool&#41;&#10; region &#61; string&#10; ip_allocation_policy &#61; object&#40;&#123;&#10; use_ip_aliases &#61; string&#10; cluster_secondary_range_name &#61; string&#10; services_secondary_range_name &#61; string&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; node_count &#61; 3&#10; ip_range_cloudsql &#61; &#34;172.18.29.0&#47;24&#34;&#10; ip_range_gke_master &#61; &#34;172.18.30.0&#47;28&#34;&#10; ip_range_web_server &#61; &#34;172.18.30.16&#47;28&#34;&#10; project_policy_boolean &#61; &#123;&#10; &#34;constraints&#47;compute.requireOsLogin&#34; &#61; true&#10; &#125;&#10; region &#61; &#34;europe-west1&#34;&#10; ip_allocation_policy &#61; &#123;&#10; use_ip_aliases &#61; &#34;true&#34;&#10; cluster_secondary_range_name &#61; &#34;pods&#34;&#10; services_secondary_range_name &#61; &#34;services&#34;&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> | |
| [data_force_destroy](variables.tf#L54) | Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage. | <code>bool</code> | | <code>false</code> | |
| [enable_cloud_nat](variables.tf#L60) | Network Cloud NAT flag. | <code>bool</code> | | <code>false</code> | |
| [groups](variables.tf#L72) | Groups. | <code>map&#40;string&#41;</code> | | <code title="&#123;&#10; data-analysts &#61; &#34;gcp-data-analysts&#34;&#10; data-engineers &#61; &#34;gcp-data-engineers&#34;&#10; data-security &#61; &#34;gcp-data-security&#34;&#10;&#125;">&#123;&#8230;&#125;</code> | |
| [location_config](variables.tf#L82) | Locations where resources will be deployed. Map to configure region and multiregion specs. | <code title="object&#40;&#123;&#10; region &#61; string&#10; multi_region &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; region &#61; &#34;europe-west1&#34;&#10; multi_region &#61; &#34;eu&#34;&#10;&#125;">&#123;&#8230;&#125;</code> | |
| [outputs_location](variables.tf#L117) | Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable. | <code>string</code> | | <code>null</code> | |
| [project_id](variables.tf#L129) | Project id, references existing project if `project_create` is null. | <code title="object&#40;&#123;&#10; landing &#61; string&#10; load &#61; string&#10; orchestration &#61; string&#10; trasformation &#61; string&#10; datalake-l0 &#61; string&#10; datalake-l1 &#61; string&#10; datalake-l2 &#61; string&#10; datalake-playground &#61; string&#10; common &#61; string&#10; exposure &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; landing &#61; &#34;lnd&#34;&#10; load &#61; &#34;lod&#34;&#10; orchestration &#61; &#34;orc&#34;&#10; trasformation &#61; &#34;trf&#34;&#10; datalake-l0 &#61; &#34;dtl-0&#34;&#10; datalake-l1 &#61; &#34;dtl-1&#34;&#10; datalake-l2 &#61; &#34;dtl-2&#34;&#10; datalake-playground &#61; &#34;dtl-plg&#34;&#10; common &#61; &#34;cmn&#34;&#10; exposure &#61; &#34;exp&#34;&#10;&#125;">&#123;&#8230;&#125;</code> | |
| [project_services](variables.tf#L157) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;cloudresourcemanager.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;,&#10; &#34;stackdriver.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> | |
## Outputs
| name | description | sensitive | consumers |
|---|---|:---:|---|
| [bigquery_datasets](outputs.tf#L35) | BigQuery datasets. | | |
| [demo_commands](outputs.tf#L65) | Demo commands. | | |
| [gcs_buckets](outputs.tf#L40) | GCS buckets. | | |
| [kms_keys](outputs.tf#L45) | Cloud MKS keys. | | |
| [projects](outputs.tf#L50) | GCP Projects informations. | | |
| [vpc_network](outputs.tf#L55) | VPC network. | | |
| [vpc_subnet](outputs.tf#L60) | VPC subnetworks. | | |
<!-- END TFDOC -->

Binary file not shown.

After

Width:  |  Height:  |  Size: 115 KiB

View File

@ -0,0 +1,39 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
# tfdoc:file:description Data Platformy.
locals {
_network_config = merge(
var.network_config_composer,
var.network_config
)
}
module "data-platform" {
source = "../../../../examples/data-solutions/data-platform-foundations"
billing_account_id = var.billing_account_id
composer_config = var.composer_config
data_force_destroy = var.data_force_destroy
folder_id = var.folder_id
groups = var.groups
network_config = local._network_config
organization_domain = var.organization_domain
prefix = var.prefix
project_services = var.project_services
region = var.region
service_encryption_keys = var.service_encryption_keys
}

View File

@ -0,0 +1,61 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Output variables.
locals {
tfvars = {}
}
resource "local_file" "tfvars" {
for_each = var.outputs_location == null ? {} : local.tfvars
filename = "${var.outputs_location}/${each.key}/terraform-dataplatform-dev.auto.tfvars.json"
content = each.value
}
# outputs
output "bigquery_datasets" {
description = "BigQuery datasets."
value = module.data-platform.bigquery-datasets
}
output "gcs_buckets" {
description = "GCS buckets."
value = module.data-platform.gcs-buckets
}
output "kms_keys" {
description = "Cloud MKS keys."
value = module.data-platform.kms_keys
}
output "projects" {
description = "GCP Projects informations."
value = module.data-platform.projects
}
output "vpc_network" {
description = "VPC network."
value = module.data-platform.vpc_network
}
output "vpc_subnet" {
description = "VPC subnetworks."
value = module.data-platform.vpc_subnet
}
output "demo_commands" {
description = "Demo commands."
value = module.data-platform.demo_commands
}

View File

@ -0,0 +1,141 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Terraform Variables.
variable "billing_account_id" {
# tfdoc:variable:source 00-bootstrap
description = "Billing account id."
type = string
}
variable "composer_config" {
type = object({
node_count = number
airflow_version = string
env_variables = map(string)
})
default = {
node_count = 3
airflow_version = "composer-1.17.5-airflow-2.1.4"
env_variables = {}
}
}
variable "data_force_destroy" {
description = "Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage."
type = bool
default = false
}
variable "folder_id" {
# tfdoc:variable:source resman
description = "Folder to be used for the networking resources in folders/nnnn format."
type = string
}
variable "groups" {
description = "Groups."
type = map(string)
default = {
data-analysts = "gcp-data-analysts"
data-engineers = "gcp-data-engineers"
data-security = "gcp-data-security"
}
}
variable "network_config" {
description = "Network configurations to use. Specify a shared VPC to use, if null networks will be created in projects."
type = object({
host_project = string
network_self_link = string
subnet_self_links = object({
load = string
transformation = string
orchestration = string
})
})
}
variable "network_config_composer" {
description = "Network configurations to use for Composer."
type = object({
composer_ip_ranges = object({
cloudsql = string
gke_master = string
web_server = string
})
composer_secondary_ranges = object({
pods = string
services = string
})
})
default = {
composer_ip_ranges = {
cloudsql = "172.18.29.0/24"
gke_master = "172.18.30.0/28"
web_server = "172.18.30.16/28"
}
composer_secondary_ranges = {
pods = "pods"
services = "services"
}
}
}
variable "organization_domain" {
description = "Organization domain."
type = string
}
variable "outputs_location" {
description = "Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable."
type = string
default = null
}
variable "prefix" {
# tfdoc:variable:source 00-bootstrap
description = "Unique prefix used for resource names. Not used for projects if 'project_create' is null."
type = string
}
variable "project_services" {
description = "List of core services enabled on all projects."
type = list(string)
default = [
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com"
]
}
variable "region" {
description = "Region used for regional resources."
type = string
default = "europe-west1"
}
variable "service_encryption_keys" { # service encription key
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
type = object({
bq = string
composer = string
dataflow = string
storage = string
pubsub = string
})
default = null
}

View File

@ -8,21 +8,21 @@ Refer to each stage's documentation for a detailed description of its purpose, t
## Organizational level (00-01)
- [Bootstrap](00-bootstrap/README.md)
- [Bootstrap](00-bootstrap/README.md)
Enables critical organization-level functionality that depends on broad permissions. It has two primary purposes. The first is to bootstrap the resources needed for automation of this and the following stages (service accounts, GCS buckets). And secondly, it applies the minimum amount of configuration needed at the organization level, to avoid the need of broad permissions later on, and to implement a minimum of security features like sinks and exports from the start.
- [Resource Management](01-resman/README.md)
- [Resource Management](01-resman/README.md)
Creates the base resource hierarchy (folders) and the automation resources required later to delegate deployment of each part of the hierarchy to separate stages. This stage also configures organization-level policies and any exceptions needed by different branches of the resource hierarchy.
## Shared resources (02)
- [Security](02-security/README.md)
- [Security](02-security/README.md)
Manages centralized security configurations in a separate stage, and is typically owned by the security team. This stage implements VPC Security Controls via separate perimeters for environments and central services, and creates projects to host centralized KMS keys used by the whole organization. It's meant to be easily extended to include other security-related resources which are required, like Secret Manager.
- Networking ([VPN](02-networking-vpn/README.md)/[NVA](02-networking-nva/README.md))
Manages centralized network resources in a separate stage, and is typically owned by the networking team. This stage implements a hub-and-spoke design, and includes connectivity via VPN to on-premises, and YAML-based factories for firewall rules (hierarchical and VPC-level) and subnets. It's currently available in two versions: [spokes connected via VPN](02-networking-vpn/README.md), [and spokes connected via appliances](02-networking-nva/README.md).
- [Networking](02-networking/README.md)
Manages centralized network resources in a separate stage, and is typically owned by the networking team. This stage implements a hub-and-spoke design, and includes connectivity via VPN to on-premises, and YAML-based factories for firewall rules (hierarchical and VPC-level) and subnets.
## Environment-level resources (03)
- [Project Factory](03-project-factory/README.md)
- [Project Factory](03-project-factory/README.md)
YAML-based fatory to create and configure application or team-level projects. Configuration includes VPC-level settings for Shared VPC, service-level configuration for CMEK encryption via centralized keys, and service account creation for workloads and applications. This stage is meant to be used once per environment.
- Data Platform (in development)
- GKE Multitenant (in development)

View File

@ -14,13 +14,10 @@
* limitations under the License.
*/
module "test-environment" {
source = "../../../../../examples/data-solutions/data-platform-foundations/01-environment"
billing_account_id = var.billing_account
root_node = var.root_node
}
module "test-resources" {
source = "../../../../../examples/data-solutions/data-platform-foundations/02-resources"
project_ids = module.test-environment.project_ids
module "test" {
source = "../../../../../examples/data-solutions/data-platform-foundations/"
organization_domain = "example.com"
billing_account_id = "123456-123456-123456"
folder_id = "folders/12345678"
prefix = "prefix"
}

View File

@ -1,26 +0,0 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "billing_account" {
type = string
default = "123456-123456-123456"
}
variable "root_node" {
description = "The resource name of the parent Folder or Organization. Must be of the form folders/folder_id or organizations/org_id."
type = string
default = "folders/12345678"
}

View File

@ -12,8 +12,16 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pytest
FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture')
def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner()
assert len(modules) == 6
assert len(resources) == 53
modules, resources = e2e_plan_runner(FIXTURES_DIR)
assert len(modules) == 40
assert len(resources) == 287