This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery). The example will create different service accounts with least privileges on resources. To run the pipeline, users listed in `data_eng_principals` can impersonate all those service accounts.
The example is designed to match real-world use cases with a minimum amount of resources and some compromise listed below. It can be used as a starting point for more complex scenarios.
- One table. Tables are defined in Terraform for the porpuse of the example. Probably, in real scenario, would handle Tables creation in a separate Terraform State or using a different tool/pipeline (for example: Dataform).
In this example you can also configure users or group of user to assign them viewer role on the resources created and the ability to imprsonate service accounts to test dataflow pipelines before autometing them with Composer or any other orchestration systems.
## Deploy your enviroment
Run Terraform init:
```
$ terraform init
```
Configure the Terraform variable in your `terraform.tfvars` file. You need to spefify at least the following variables:
```
billing_account = "0011322-334455-667788"
root_node = "folders/123456789012"
project_name = "test-demo-tf-001"
data_eng_users = ["your_email@domani.example"]
```
You can run now:
```
$ terraform apply
```
You should see the output of the Terraform script with resources created and some command pre-created for you to run the example following steps below.
For the purpose of the example we will import from GCS to Bigquery a CSV file with the following structure:
```
name,surname,timestam
```
We need to create 3 file:
- A `person.csv` file containing your data in the form `name,surname,timestam`. Here an example line `Lorenzo,Caggioni,1637771951'.
- A `person_udf.js` containing the UDF javascript file used by the Dataflow template.
- A `person_schema.json` file containing the table schema used to import the CSV.
You can find an example of those file in the folder `./data-demo`. You can copy the example files in the GCS bucket using the command returned in the terraform output as `command-01-gcs`.
| cmek_encryption | Flag to enable CMEK on GCP resources created. | <code>bool</code> | | <code>false</code> |
| data_eng_principals | Groups with Service Account Token creator role on service accounts in IAM format, eg 'group:group@domain.com'. | <code>list(string)</code> | | <code>[]</code> |
| project_create | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format | <codetitle="object({ billing_account_id = string parent = string })">object({…})</code> | | <code>null</code> |
| region | The region where resources will be deployed. | <code>string</code> | | <code>"europe-west1"</code> |
| vpc_subnet_range | Ip range used for the VPC subnet created for the example. | <code>string</code> | | <code>"10.0.0.0/20"</code> |