cloud-foundation-fabric/blueprints/data-solutions/data-platform-foundations/README.md

282 lines
21 KiB
Markdown
Raw Normal View History

2022-02-04 23:51:11 -08:00
# Data Platform
2021-05-18 09:30:21 -07:00
2022-02-13 02:26:55 -08:00
This module implements an opinionated Data Platform Architecture that creates and setup projects and related resources that compose an end-to-end data environment.
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design.
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
The following diagram is a high-level reference of the resources created and managed here:
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
![Data Platform architecture overview](./images/overview_diagram.png "Data Platform architecture overview")
2021-05-18 09:30:21 -07:00
2022-09-09 07:40:37 -07:00
A demo Airflow pipeline is also part of this blueprint: it can be built and run on top of the foundational infrastructure to verify or test the setup quickly.
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
## Design overview and choices
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
Despite its simplicity, this stage implements the basics of a design that we've seen working well for various customers.
The approach adapts to different high-level requirements:
- boundaries for each step
- clearly defined actors
- least privilege principle
- rely on service account impersonation
FAST multitenant bootstrap and resource management, rename org-level FAST stages (#1052) * rename stages * remove support for external org billing, rename output files * resman: make groups optional, align on new billing account variable * bootstrap: multitenant outputs * tenant bootstrap stage, untested * fix folder name * fix stage 0 output names * optional creation for tag keys in organization module * single tenant bootstrap minus tag * rename output files, add tenant tag key * fix organization module tag values output * test skipping creation for tags in organization module * single tenant bootstrap plan working * multitenant bootstrap * tfdoc * fix check links error messages * fix links * tfdoc * fix links * rename fast tests, fix bootstrap tests * multitenant stages have their own folder, simplify stage numbering * stage renumbering * wip * rename tests * exclude fast providers in fixture * stage 0 tests * stage 1 tests * network stages tests * stage tests * tfdoc * fix links * tfdoc * multitenant tests * remove local files * stage links command * fix links script, TODO * wip * wip single tenant bootstrap * working tenant bootstrap * update gitignore * remove local files * tfdoc * remove local files * allow tests for tenant bootstrap stage * tenant bootstrap proxies stage 1 tfvars * stage 2 and 3 service accounts and IAM in tenant bootstrap * wip * wip * wip * drop multitenant bootstrap * tfdoc * add missing stage 2 SAs, fix org-level IAM condition * wip * wip * optional tag value creation in organization module * stage 1 working * linting * linting * READMEs * wip * Make stage-links script work in old macos bash * stage links command help * fix output file names * diagrams * fix svg * stage 0 skeleton and diagram * test svg * test svg * test diagram * diagram * readme * fix stage links script * stage 0 readme * README changes * stage readmes * fix outputs order * fix link * fix tests * stage 1 test * skip stage example * boilerplate * fix tftest skip * default bootstrap stage log sinks to log buckets * add logging to tenant bootstrap * move iam variables out of tenant config * fix cicd, reintroduce missing variable * use optional in stage 1 cicd variable * rename extras stage * rename and move identity providers local, use optional for cicd variable * tfdoc * add support for wif pool and providers, ci/cd * tfdoc * fix links * better handling of modules repository * add missing role on logging project * fix cicd pools in locals, test cicd * fix workflow extension * fix module source replacement * allow tenant bootstrap cicd sa to impersonate resman sa * tenant workflow templates fix for no providers file * fix output files, push github workflow template to new repository * remove try from outpout files * align stage 1 cicd internals to stage 0 * tfdoc * tests * fix tests * tests * improve variable descriptions * use optional in fast features * actually create tenant log sinks, and allow the resman sa to do it * test * tests * aaaand tests again * fast features tenant override * fast features tenant override * fix wording * add missing comment * configure pf service accounts * add missing comment * tfdoc * tests * IAM docs * update copyright --------- Co-authored-by: Julio Castillo <jccb@google.com>
2023-02-04 06:00:45 -08:00
The code in this blueprint doesn't address Organization-level configurations (Organization policy, VPC-SC, centralized logs). We expect those elements to be managed by automation stages external to this script like those in [FAST](../../../fast) and this blueprint deployed on top of them as one of the [stages](../../../fast/stages/3-data-platform/dev/README.md).
2022-02-04 23:51:11 -08:00
### Project structure
2022-02-13 02:26:55 -08:00
The Data Platform is designed to rely on several projects, one project per data stage. The stages identified are:
2022-02-04 23:51:11 -08:00
2022-04-21 14:53:16 -07:00
- drop off
2022-02-04 23:51:11 -08:00
- load
2022-04-21 14:53:16 -07:00
- data warehouse
2022-02-04 23:51:11 -08:00
- orchestration
- transformation
- exposure
This separation into projects allows adhering to the least-privilege principle by using project-level roles.
The script will create the following projects:
2022-04-21 14:53:16 -07:00
- **Drop off** Used to store temporary data. Data is pushed to Cloud Storage, BigQuery, or Cloud PubSub. Resources are configured with a customizable lifecycle policy.
2023-01-12 03:41:00 -08:00
- **Load** Used to load data from the drop off zone to the data warehouse. The load is made with minimal to zero transformation logic (mainly `cast`). Anonymization or tokenization of Personally Identifiable Information (PII) can be implemented here or in the transformation stage, depending on your requirements. The use of [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates) is recommended. When you need to handle workloads from different teams, if strong role separation is needed between them, we suggest to customize the scirpt and have separate `Load` projects.
2022-04-21 14:53:16 -07:00
- **Data Warehouse** Several projects distributed across 3 separate layers, to host progressively processed and refined data:
- **Landing - Raw data** Structured Data, stored in relevant formats: structured data stored in BigQuery, unstructured data stored on Cloud Storage with additional metadata stored in BigQuery (for example pictures stored in Cloud Storage and analysis of the images for Cloud Vision API stored in BigQuery).
- **Curated - Cleansed, aggregated and curated data**
- **Confidential - Curated and unencrypted layer**
2022-02-04 23:51:11 -08:00
- **Orchestration** Used to host Cloud Composer, which orchestrates all tasks that move data across layers.
2023-01-12 03:41:00 -08:00
- **Transformation** Used to move data between Data Warehouse layers. We strongly suggest relying on BigQuery Engine to perform the transformations. If BigQuery doesn't have the features needed to perform your transformations, you can use Cloud Dataflow with [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates). This stage can also optionally anonymize or tokenize PII. When you need to handle workloads from different teams, if strong role separation is needed between them, we suggest to customize the scirpt and have separate `Tranformation` projects.
2022-02-04 23:51:11 -08:00
- **Exposure** Used to host resources that share processed data with external systems. Depending on the access pattern, data can be presented via Cloud SQL, BigQuery, or Bigtable. For BigQuery data, we strongly suggest relying on [Authorized views](https://cloud.google.com/bigquery/docs/authorized-views).
### Roles
2022-02-13 02:26:55 -08:00
We assign roles on resources at the project level, granting the appropriate roles via groups (humans) and service accounts (services and applications) according to best practices.
2022-02-04 23:51:11 -08:00
### Service accounts
2022-02-13 02:26:55 -08:00
Service account creation follows the least privilege principle, performing a single task which requires access to a defined set of resources. The table below shows a high level overview of roles for each service account on each data layer, using `READ` or `WRITE` access patterns for simplicity. For detailed roles please refer to the code.
2022-02-09 04:01:58 -08:00
2022-04-21 14:53:16 -07:00
|Service Account|Drop off|DWH Landing|DWH Curated|DWH Confidential|
2022-02-09 04:01:58 -08:00
|-|:-:|:-:|:-:|:-:|
2022-04-21 14:53:16 -07:00
|`drop-sa`|`WRITE`|-|-|-|
2022-02-13 02:26:55 -08:00
|`load-sa`|`READ`|`READ`/`WRITE`|-|-|
|`transformation-sa`|-|`READ`/`WRITE`|`READ`/`WRITE`|`READ`/`WRITE`|
|`orchestration-sa`|-|-|-|-|
2022-02-04 23:51:11 -08:00
2022-02-13 02:26:55 -08:00
A full reference of IAM roles managed by the Data Platform [is available here](./IAM.md).
2022-09-09 07:40:37 -07:00
Using of service account keys within a data pipeline exposes to several security risks deriving from a credentials leak. This blueprint shows how to leverage impersonation to avoid the need of creating keys.
2022-02-04 23:51:11 -08:00
2022-02-09 02:24:21 -08:00
### User groups
2022-02-13 02:26:55 -08:00
User groups provide a stable frame of reference that allows decoupling the final set of permissions from the stage where entities and resources are created, and their IAM bindings defined.
2022-02-04 23:51:11 -08:00
We use three groups to control access to resources:
- *Data Engineers* They handle and run the Data Hub, with read access to all resources in order to troubleshoot possible issues with pipelines. This team can also impersonate any service account.
2022-04-21 14:53:16 -07:00
- *Data Analysts*. They perform analysis on datasets, with read access to the Data Warehouse Confidential project, and BigQuery READ/WRITE access to the playground project.
2022-02-09 05:15:11 -08:00
- *Data Security*:. They handle security configurations related to the Data Hub. This team has admin access to the common project to configure Cloud DLP templates or Data Catalog policy tags.
2022-02-13 02:26:55 -08:00
The table below shows a high level overview of roles for each group on each project, using `READ`, `WRITE` and `ADMIN` access patterns for simplicity. For detailed roles please refer to the code.
2022-02-09 05:15:11 -08:00
2023-01-10 15:31:52 -08:00
|Group|Drop off|Load|Transformation|DHW Landing|DWH Curated|DWH Confidential|Orchestration|Common|
2023-01-16 06:58:11 -08:00
|-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:|
2023-01-10 15:31:52 -08:00
|Data Engineers|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|
|Data Analysts|-|-|-|-|-|`READ`|-|-|
2022-02-13 02:26:55 -08:00
|Data Security|-|-|-|-|-|-|-|-|`ADMIN`|
2022-02-04 23:51:11 -08:00
2022-02-06 23:20:20 -08:00
You can configure groups via the `groups` variable.
2022-02-13 02:26:55 -08:00
2022-02-04 23:51:11 -08:00
### Virtual Private Cloud (VPC) design
2022-09-09 07:40:37 -07:00
As is often the case in real-world configurations, this blueprint accepts as input an existing [Shared-VPC](https://cloud.google.com/vpc/docs/shared-vpc) via the `network_config` variable. Make sure that the GKE API (`container.googleapis.com`) is enabled in the VPC host project.
2022-02-04 23:51:11 -08:00
If the `network_config` variable is not provided, one VPC will be created in each project that supports network resources (load, transformation and orchestration).
### IP ranges and subnetting
2022-09-09 07:40:37 -07:00
To deploy this blueprint with self-managed VPCs you need the following ranges:
2022-02-04 23:51:11 -08:00
- one /24 for the load project VPC subnet used for Cloud Dataflow workers
- one /24 for the transformation VPC subnet used for Cloud Dataflow workers
- one /24 range for the orchestration VPC subnet used for Composer workers
- one /22 and one /24 ranges for the secondary ranges associated with the orchestration VPC subnet
If you are using Shared VPC, you need one subnet with one /22 and one /24 secondary range defined for Composer pods and services.
In both VPC scenarios, you also need these ranges for Composer:
- one /24 for Cloud SQL
- one /28 for the GKE control plane
### Resource naming conventions
Resources follow the naming convention described below.
- `prefix-layer` for projects
2023-01-10 15:31:52 -08:00
- `prefix-layer-product` for resources
2022-02-04 23:51:11 -08:00
- `prefix-layer[2]-gcp-product[2]-counter` for services and service accounts
### Encryption
2022-02-13 02:26:55 -08:00
We suggest a centralized approach to key management, where Organization Security is the only team that can access encryption material, and keyrings and keys are managed in a project external to the Data Platform.
2022-02-04 23:51:11 -08:00
![Centralized Cloud Key Management high-level diagram](./images/kms_diagram.png "Centralized Cloud Key Management high-level diagram")
To configure the use of Cloud KMS on resources, you have to specify the key id on the `service_encryption_keys` variable. Key locations should match resource locations. Example:
```tfvars
2022-02-04 23:51:11 -08:00
service_encryption_keys = {
bq = "KEY_URL_MULTIREGIONAL"
composer = "KEY_URL_REGIONAL"
dataflow = "KEY_URL_REGIONAL"
storage = "KEY_URL_MULTIREGIONAL"
pubsub = "KEY_URL_MULTIREGIONAL"
}
2022-02-04 23:51:11 -08:00
```
This step is optional and depends on customer policies and security best practices.
## Data Anonymization
We suggest using Cloud Data Loss Prevention to identify/mask/tokenize your confidential data.
2022-09-09 07:40:37 -07:00
While implementing a Data Loss Prevention strategy is out of scope for this blueprint, we enable the service in two different projects so that [Cloud Data Loss Prevention templates](https://cloud.google.com/dlp/docs/concepts-templates) can be configured in one of two ways:
2022-02-04 23:51:11 -08:00
- during the ingestion phase, from Dataflow
- during the transformation phase, from [BigQuery](https://cloud.google.com/bigquery/docs/scan-with-dlp) or [Cloud Dataflow](https://cloud.google.com/architecture/running-automated-dataflow-pipeline-de-identify-pii-dataset)
Cloud Data Loss Prevention resources and templates should be stored in the security project:
![Centralized Cloud Data Loss Prevention high-level diagram](./images/dlp_diagram.png "Centralized Cloud Data Loss Prevention high-level diagram")
2022-02-07 01:56:42 -08:00
You can find more details and best practices on using DLP to De-identification and re-identification of PII in large-scale datasets in the [GCP documentation](https://cloud.google.com/architecture/de-identification-re-identification-pii-using-cloud-dlp).
2022-03-30 08:53:48 -07:00
## Data Catalog
[Data Catalog](https://cloud.google.com/data-catalog) helps you to document your data entry at scale. Data Catalog relies on [tags](https://cloud.google.com/data-catalog/docs/tags-and-tag-templates#tags) and [tag template](https://cloud.google.com/data-catalog/docs/tags-and-tag-templates#tag-templates) to manage metadata for all data entries in a unified and centralized service. To implement [column-level security](https://cloud.google.com/bigquery/docs/column-level-security-intro) on BigQuery, we suggest to use `Tags` and `Tag templates`.
The default configuration will implement 3 tags:
- `3_Confidential`: policy tag for columns that include very sensitive information, such as credit card numbers.
- `2_Private`: policy tag for columns that include sensitive personal identifiable information (PII) information, such as a person's first name.
- `1_Sensitive`: policy tag for columns that include data that cannot be made public, such as the credit limit.
2022-03-30 08:53:48 -07:00
Anything that is not tagged is available to all users who have access to the data warehouse.
2022-09-09 07:40:37 -07:00
For the purpose of the blueprint no groups has access to tagged data. You can configure your tags and roles associated by configuring the `data_catalog_tags` variable. We suggest using the "[Best practices for using policy tags in BigQuery](https://cloud.google.com/bigquery/docs/best-practices-policy-tags)" article as a guide to designing your tags structure and access pattern.
2022-03-30 08:53:48 -07:00
2022-02-04 23:51:11 -08:00
## How to run this script
2022-09-09 07:40:37 -07:00
To deploy this blueprint on your GCP organization, you will need
2021-05-18 09:30:21 -07:00
- a folder or organization where new projects will be created
2022-02-04 23:51:11 -08:00
- a billing account that will be associated with the new projects
2022-02-13 02:26:55 -08:00
The Data Platform is meant to be executed by a Service Account (or a regular user) having this minimal set of permission:
2022-02-04 23:51:11 -08:00
2022-02-12 06:52:32 -08:00
- **Billing account**
2022-02-13 02:26:55 -08:00
- `roles/billing.user`
2022-02-04 23:51:11 -08:00
- **Folder level**:
2022-02-13 02:26:55 -08:00
- `roles/resourcemanager.folderAdmin`
- `roles/resourcemanager.projectCreator`
- **KMS Keys** (If CMEK encryption in use):
- `roles/cloudkms.admin` or a custom role with `cloudkms.cryptoKeys.getIamPolicy`, `cloudkms.cryptoKeys.list`, `cloudkms.cryptoKeys.setIamPolicy` permissions
- **Shared VPC host project** (if configured):\
- `roles/compute.xpnAdmin` on the host project folder or org
- `roles/resourcemanager.projectIamAdmin` on the host project, either with no conditions or with a condition allowing [delegated role grants](https://medium.com/google-cloud/managing-gcp-service-usage-through-delegated-role-grants-a843610f2226#:~:text=Delegated%20role%20grants%20is%20a,setIamPolicy%20permission%20on%20a%20resource.) for `roles/compute.networkUser`, `roles/composer.sharedVpcAgent`, `roles/container.hostServiceAgentUser`
2022-02-04 23:51:11 -08:00
## Variable configuration
There are three sets of variables you will need to fill in:
```tfvars
billing_account_id = "111111-222222-333333"
older_id = "folders/123456789012"
organization_domain = "domain.com"
prefix = "myco"
2022-11-16 14:03:29 -08:00
```
2022-02-04 23:51:11 -08:00
2022-02-06 23:20:20 -08:00
For more fine details check variables on [`variables.tf`](./variables.tf) and update according to the desired configuration. Remember to create team groups described [below](#groups).
2022-02-04 23:51:11 -08:00
2022-02-06 11:59:16 -08:00
Once the configuration is complete, run the project factory by running
```bash
terraform init
terraform apply
2022-02-09 04:01:58 -08:00
```
2022-02-06 11:59:16 -08:00
2022-09-09 07:40:37 -07:00
## How to use this blueprint from Terraform
2022-09-09 07:40:37 -07:00
While this blueprint can be used as a standalone deployment, it can also be called directly as a Terraform module by providing the variables values as show below:
```hcl
module "data-platform" {
source = "./fabric/blueprints/data-solutions/data-platform-foundations"
billing_account_id = var.billing_account_id
folder_id = var.folder_id
organization_domain = "example.com"
prefix = "myprefix"
}
# tftest modules=39 resources=287
```
2022-02-04 23:51:11 -08:00
## Customizations
2022-02-13 02:26:55 -08:00
### Create Cloud Key Management keys as part of the Data Platform
2022-02-04 23:51:11 -08:00
2022-02-13 02:26:55 -08:00
To create Cloud Key Management keys in the Data Platform you can uncomment the Cloud Key Management resources configured in the [`06-common.tf`](./06-common.tf) file and update Cloud Key Management keys pointers on `local.service_encryption_keys.*` to the local resource created.
2022-02-04 23:51:11 -08:00
### Assign roles at BQ Dataset level
2022-04-21 14:53:16 -07:00
To handle multiple groups of `data-analysts` accessing the same Data Warehouse layer projects but only to the dataset belonging to a specific group, you may want to assign roles at BigQuery dataset level instead of at project-level.
2022-02-04 23:51:11 -08:00
To do this, you need to remove IAM binging at project-level for the `data-analysts` group and give roles at BigQuery dataset level using the `iam` variable on `bigquery-dataset` modules.
## Demo pipeline
The application layer is out of scope of this script. As a demo purpuse only, several Cloud Composer DAGs are provided. Demos will import data from the `drop off` area to the `Data Warehouse Confidential` dataset suing different features.
2022-02-04 23:51:11 -08:00
You can find examples in the `[demo](./demo)` folder.
2022-02-04 23:51:11 -08:00
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
2022-02-09 08:01:25 -08:00
| [billing_account_id](variables.tf#L17) | Billing account id. | <code>string</code> | ✓ | |
2023-01-12 03:45:39 -08:00
| [folder_id](variables.tf#L122) | Folder to be used for the networking resources in folders/nnnn format. | <code>string</code> | ✓ | |
| [organization_domain](variables.tf#L166) | Organization domain. | <code>string</code> | ✓ | |
| [prefix](variables.tf#L171) | Prefix used for resource names. | <code>string</code> | ✓ | |
| [composer_config](variables.tf#L22) | Cloud Composer config. | <code title="object&#40;&#123;&#10; disable_deployment &#61; optional&#40;bool&#41;&#10; environment_size &#61; optional&#40;string, &#34;ENVIRONMENT_SIZE_SMALL&#34;&#41;&#10; software_config &#61; optional&#40;object&#40;&#123;&#10; airflow_config_overrides &#61; optional&#40;any&#41;&#10; pypi_packages &#61; optional&#40;any&#41;&#10; env_variables &#61; optional&#40;map&#40;string&#41;&#41;&#10; image_version &#61; string&#10; &#125;&#41;, &#123;&#10; image_version &#61; &#34;composer-2-airflow-2&#34;&#10; &#125;&#41;&#10; workloads_config &#61; optional&#40;object&#40;&#123;&#10; scheduler &#61; optional&#40;object&#40;&#10; &#123;&#10; cpu &#61; number&#10; memory_gb &#61; number&#10; storage_gb &#61; number&#10; count &#61; number&#10; &#125;&#10; &#41;, &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; count &#61; 1&#10; &#125;&#41;&#10; web_server &#61; optional&#40;object&#40;&#10; &#123;&#10; cpu &#61; number&#10; memory_gb &#61; number&#10; storage_gb &#61; number&#10; &#125;&#10; &#41;, &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; &#125;&#41;&#10; worker &#61; optional&#40;object&#40;&#10; &#123;&#10; cpu &#61; number&#10; memory_gb &#61; number&#10; storage_gb &#61; number&#10; min_count &#61; number&#10; max_count &#61; number&#10; &#125;&#10; &#41;, &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; min_count &#61; 1&#10; max_count &#61; 3&#10; &#125;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; environment_size &#61; &#34;ENVIRONMENT_SIZE_SMALL&#34;&#10; software_config &#61; &#123;&#10; image_version &#61; &#34;composer-2-airflow-2&#34;&#10; &#125;&#10; workloads_config &#61; &#123;&#10; scheduler &#61; &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; count &#61; 1&#10; &#125;&#10; web_server &#61; &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; &#125;&#10; worker &#61; &#123;&#10; cpu &#61; 0.5&#10; memory_gb &#61; 1.875&#10; storage_gb &#61; 1&#10; min_count &#61; 1&#10; max_count &#61; 3&#10; &#125;&#10; &#125;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [data_catalog_tags](variables.tf#L105) | List of Data Catalog Policy tags to be created with optional IAM binging configuration in {tag => {ROLE => [MEMBERS]}} format. | <code>map&#40;map&#40;list&#40;string&#41;&#41;&#41;</code> | | <code title="&#123;&#10; &#34;3_Confidential&#34; &#61; null&#10; &#34;2_Private&#34; &#61; null&#10; &#34;1_Sensitive&#34; &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [data_force_destroy](variables.tf#L116) | Flag to set 'force_destroy' on data services like BiguQery or Cloud Storage. | <code>bool</code> | | <code>false</code> |
| [groups](variables.tf#L127) | User groups. | <code>map&#40;string&#41;</code> | | <code title="&#123;&#10; data-analysts &#61; &#34;gcp-data-analysts&#34;&#10; data-engineers &#61; &#34;gcp-data-engineers&#34;&#10; data-security &#61; &#34;gcp-data-security&#34;&#10;&#125;">&#123;&#8230;&#125;</code> |
| [location](variables.tf#L137) | Location used for multi-regional resources. | <code>string</code> | | <code>&#34;eu&#34;</code> |
| [network_config](variables.tf#L143) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_links &#61; object&#40;&#123;&#10; load &#61; string&#10; transformation &#61; string&#10; orchestration &#61; string&#10; &#125;&#41;&#10; composer_ip_ranges &#61; object&#40;&#123;&#10; cloudsql &#61; string&#10; gke_master &#61; string&#10; &#125;&#41;&#10; composer_secondary_ranges &#61; object&#40;&#123;&#10; pods &#61; string&#10; services &#61; string&#10; &#125;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [project_services](variables.tf#L180) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;cloudresourcemanager.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;,&#10; &#34;stackdriver.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [project_suffix](variables.tf#L191) | Suffix used only for project ids. | <code>string</code> | | <code>null</code> |
| [region](variables.tf#L197) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west1&#34;</code> |
| [service_encryption_keys](variables.tf#L203) | Cloud KMS to use to encrypt different services. Key location should match service region. | <code title="object&#40;&#123;&#10; bq &#61; string&#10; composer &#61; string&#10; dataflow &#61; string&#10; storage &#61; string&#10; pubsub &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
## Outputs
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
| name | description | sensitive |
|---|---|:---:|
| [bigquery-datasets](outputs.tf#L17) | BigQuery datasets. | |
2023-01-10 23:47:42 -08:00
| [demo_commands](outputs.tf#L27) | Demo commands. Relevant only if Composer is deployed. | |
| [gcs-buckets](outputs.tf#L40) | GCS buckets. | |
| [kms_keys](outputs.tf#L53) | Cloud MKS keys. | |
| [projects](outputs.tf#L58) | GCP Projects informations. | |
| [vpc_network](outputs.tf#L84) | VPC network. | |
| [vpc_subnet](outputs.tf#L93) | VPC subnetworks. | |
2021-06-15 11:55:25 -07:00
2022-02-04 23:51:11 -08:00
<!-- END TFDOC -->
## TODOs
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
Features to add in future releases:
2021-05-18 09:30:21 -07:00
2022-02-04 23:51:11 -08:00
- Add example on how to use Cloud Data Loss Prevention
- Add solution to handle Tables, Views, and Authorized Views lifecycle
- Add solution to handle Metadata lifecycle