This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery).
The example is designed to match real-world use cases with a minimum amount of resources. It can be used as a starting point for more complex scenarios.
This is the high level diagram:
![GCS to Biquery High-level diagram](diagram.png "GCS to Biquery High-level diagram")
## Managed resources and services
This sample creates several distinct groups of resources:
- projects
- Cloud KMS project
- Service Project configured for GCE instances, GCS buckets, Dataflow instances and BigQuery tables
- networking
- VPC network
- One subnet
- Firewall rules for [SSH access via IAP](https://cloud.google.com/iap/docs/using-tcp-forwarding) and open communication within the VPC
- IAM
- One service account for GGE instances
- One service account for Dataflow instances
- One service account for Bigquery tables
- KMS
- One contintent key ring (example: 'Europe')
- One crypto key (Procection level: softwere) for Cloud Engine
- One crypto key (Protection level: softwere) for Cloud Storage
- One regional key ring ('example: 'europe-west1')
- One crypto key (Protection level: softwere) for Cloud Dataflow
- GCE
- One instance encrypted with a CMEK Cryptokey hosted in Cloud KMS
- GCS
- One bucket encrypted with a CMEK Cryptokey hosted in Cloud KMS
- BQ
- One dataset encrypted with a CMEK Cryptokey hosted in Cloud KMS
- Two tables encrypted with a CMEK Cryptokey hosted in Cloud KMS
You can run now the simple pipeline you can find [here](./scripts/data_ingestion/). Once you have installed required packages and copied a file into the GCS bucket, you can trigger the pipeline using internal ips with a command simila to:
| [project_id](variables.tf#L31) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [prefix](variables.tf#L16) | Unique prefix used for resource names. Not used for project if 'project_create' is null. | <code>string</code> | | <code>null</code> |
| [project_create](variables.tf#L22) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <codetitle="object({ billing_account_id = string parent = string })">object({…})</code> | | <code>null</code> |
| [region](variables.tf#L36) | The region where resources will be deployed. | <code>string</code> | | <code>"europe-west1"</code> |
| [vpc_subnet_range](variables.tf#L42) | Ip range used for the VPC subnet created for the example. | <code>string</code> | | <code>"10.0.0.0/20"</code> |