Bugfixing Data Foundations (#310)

* Bugfixing Data Foundations and impersonation support
- Fixed SA permissions
- Usage of impersonation to avoid SA private key export
- Fixed required API enablement
- Added FW rules required by dataflow
- Added provider for sa impersonation
This commit is contained in:
javiergp 2021-09-28 17:13:18 +02:00 committed by GitHub
parent 8b69638f89
commit 15b2736a7c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
10 changed files with 171 additions and 112 deletions

View File

@ -25,10 +25,12 @@ To create the infrastructure:
```tfm ```tfm
billing_account = "1234-1234-1234" billing_account = "1234-1234-1234"
parent = "folders/12345678" parent = "folders/12345678"
admins = ["user:xxxxx@yyyyy.com"]
``` ```
- make sure you have the right authentication setup (application default credentials, or a service account key) - make sure you have the right authentication setup (application default credentials, or a service account key) with the right permissions
- **The output of this stage contains the values for the resources stage** - **The output of this stage contains the values for the resources stage**
- the `admins` variable contain a list of principals allowed to impersonate the service accounts. These principals will be given the `iam.serviceAccountTokenCreator` role
- run `terraform init` and `terraform apply` - run `terraform init` and `terraform apply`
Once done testing, you can clean up resources by running `terraform destroy`. Once done testing, you can clean up resources by running `terraform destroy`.
@ -57,6 +59,9 @@ The script use 'google_access_context_manager_service_perimeter_resource' terraf
| *service_account_names* | Override this variable if you need non-standard names. | <code title="object&#40;&#123;&#10;main &#61; string&#10;&#125;&#41;">object({...})</code> | | <code title="&#123;&#10;main &#61; &#34;data-platform-main&#34;&#10;&#125;">...</code> | | *service_account_names* | Override this variable if you need non-standard names. | <code title="object&#40;&#123;&#10;main &#61; string&#10;&#125;&#41;">object({...})</code> | | <code title="&#123;&#10;main &#61; &#34;data-platform-main&#34;&#10;&#125;">...</code> |
| *service_encryption_key_ids* | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object&#40;&#123;&#10;multiregional &#61; string&#10;global &#61; string&#10;&#125;&#41;">object({...})</code> | | <code title="&#123;&#10;multiregional &#61; null&#10;global &#61; null&#10;&#125;">...</code> | | *service_encryption_key_ids* | Cloud KMS encryption key in {LOCATION => [KEY_URL]} format. Keys belong to existing project. | <code title="object&#40;&#123;&#10;multiregional &#61; string&#10;global &#61; string&#10;&#125;&#41;">object({...})</code> | | <code title="&#123;&#10;multiregional &#61; null&#10;global &#61; null&#10;&#125;">...</code> |
| *service_perimeter_standard* | VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode. | <code title="">string</code> | | <code title="">null</code> | | *service_perimeter_standard* | VPC Service control standard perimeter name in the form of 'accessPolicies/ACCESS_POLICY_NAME/servicePerimeters/PERIMETER_NAME'. All projects will be added to the perimeter in enforced mode. | <code title="">string</code> | | <code title="">null</code> |
| *admins* | List of users allowed to impersonate the service account | <code title="">list</code> | | <code title="">null</code> |
## Outputs ## Outputs

View File

@ -31,8 +31,9 @@ module "project-datamart" {
"storage.googleapis.com", "storage.googleapis.com",
"storage-component.googleapis.com", "storage-component.googleapis.com",
] ]
iam = {
"roles/editor" = [module.sa-services-main.iam_email] iam_additive = {
"roles/owner" = [module.sa-services-main.iam_email]
} }
service_encryption_key_ids = { service_encryption_key_ids = {
bq = [var.service_encryption_key_ids.multiregional] bq = [var.service_encryption_key_ids.multiregional]
@ -56,8 +57,8 @@ module "project-dwh" {
"storage.googleapis.com", "storage.googleapis.com",
"storage-component.googleapis.com", "storage-component.googleapis.com",
] ]
iam = { iam_additive = {
"roles/editor" = [module.sa-services-main.iam_email] "roles/owner" = [module.sa-services-main.iam_email]
} }
service_encryption_key_ids = { service_encryption_key_ids = {
bq = [var.service_encryption_key_ids.multiregional] bq = [var.service_encryption_key_ids.multiregional]
@ -79,8 +80,8 @@ module "project-landing" {
"storage.googleapis.com", "storage.googleapis.com",
"storage-component.googleapis.com", "storage-component.googleapis.com",
] ]
iam = { iam_additive = {
"roles/editor" = [module.sa-services-main.iam_email] "roles/owner" = [module.sa-services-main.iam_email]
} }
service_encryption_key_ids = { service_encryption_key_ids = {
pubsub = [var.service_encryption_key_ids.global] pubsub = [var.service_encryption_key_ids.global]
@ -98,6 +99,10 @@ module "project-services" {
prefix = var.prefix prefix = var.prefix
name = var.project_names.services name = var.project_names.services
services = [ services = [
"bigquery.googleapis.com",
"cloudresourcemanager.googleapis.com",
"iam.googleapis.com",
"pubsub.googleapis.com",
"storage.googleapis.com", "storage.googleapis.com",
"storage-component.googleapis.com", "storage-component.googleapis.com",
"sourcerepo.googleapis.com", "sourcerepo.googleapis.com",
@ -105,8 +110,8 @@ module "project-services" {
"cloudasset.googleapis.com", "cloudasset.googleapis.com",
"cloudkms.googleapis.com" "cloudkms.googleapis.com"
] ]
iam = { iam_additive = {
"roles/editor" = [module.sa-services-main.iam_email] "roles/owner" = [module.sa-services-main.iam_email]
} }
service_encryption_key_ids = { service_encryption_key_ids = {
storage = [var.service_encryption_key_ids.multiregional] storage = [var.service_encryption_key_ids.multiregional]
@ -123,6 +128,7 @@ module "project-transformation" {
prefix = var.prefix prefix = var.prefix
name = var.project_names.transformation name = var.project_names.transformation
services = [ services = [
"bigquery.googleapis.com",
"cloudbuild.googleapis.com", "cloudbuild.googleapis.com",
"compute.googleapis.com", "compute.googleapis.com",
"dataflow.googleapis.com", "dataflow.googleapis.com",
@ -130,8 +136,8 @@ module "project-transformation" {
"storage.googleapis.com", "storage.googleapis.com",
"storage-component.googleapis.com", "storage-component.googleapis.com",
] ]
iam = { iam_additive = {
"roles/editor" = [module.sa-services-main.iam_email] "roles/owner" = [module.sa-services-main.iam_email]
} }
service_encryption_key_ids = { service_encryption_key_ids = {
compute = [var.service_encryption_key_ids.global] compute = [var.service_encryption_key_ids.global]
@ -151,4 +157,6 @@ module "sa-services-main" {
source = "../../../modules/iam-service-account" source = "../../../modules/iam-service-account"
project_id = module.project-services.project_id project_id = module.project-services.project_id
name = var.service_account_names.main name = var.service_account_names.main
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }

View File

@ -12,6 +12,12 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
variable "admins" {
description = "List of users allowed to impersonate the service account"
type = list(string)
default = null
}
variable "billing_account_id" { variable "billing_account_id" {
description = "Billing account id." description = "Billing account id."
type = string type = string

View File

@ -26,7 +26,7 @@ In the previous step, we created the environment (projects and service account)
To create the resources, copy the output of the environment step (**project_ids**) and paste it into the `terraform.tvars`: To create the resources, copy the output of the environment step (**project_ids**) and paste it into the `terraform.tvars`:
- Specify your variables in a `terraform.tvars`, you can use the ouptu from the environment stage - Specify your variables in a `terraform.tvars`, you can use the output from the environment stage
```tfm ```tfm
project_ids = { project_ids = {
@ -38,15 +38,14 @@ project_ids = {
} }
``` ```
- Get a key for the service account created in the environment stage:
- Go into services project
- Go into IAM page
- Go into the service account section
- Creaet a new key for the service account created in previeous step (**service_account**)
- Download the json key into the current folder
- make sure you have the right authentication setup: `export GOOGLE_APPLICATION_CREDENTIALS=PATH_TO_SERVICE_ACCOUT_KEY.json`
- run `terraform init` and `terraform apply`
- The providers.tf file has been configured to impersonate the **main** service account
- To launch terraform:
```bash
terraform plan
terraform apply
```
Once done testing, you can clean up resources by running `terraform destroy`. Once done testing, you can clean up resources by running `terraform destroy`.
### CMEK configuration ### CMEK configuration
@ -68,6 +67,8 @@ You can configure GCP resources to use existing CMEK keys configuring the 'servi
| *transformation_buckets* | List of transformation buckets to create | <code title="map&#40;object&#40;&#123;&#10;location &#61; string&#10;name &#61; string&#10;&#125;&#41;&#41;">map(object({...}))</code> | | <code title="&#123;&#10;temp &#61; &#123;&#10;location &#61; &#34;EU&#34;&#10;name &#61; &#34;temp&#34;&#10;&#125;,&#10;templates &#61; &#123;&#10;location &#61; &#34;EU&#34;&#10;name &#61; &#34;templates&#34;&#10;&#125;,&#10;&#125;">...</code> | | *transformation_buckets* | List of transformation buckets to create | <code title="map&#40;object&#40;&#123;&#10;location &#61; string&#10;name &#61; string&#10;&#125;&#41;&#41;">map(object({...}))</code> | | <code title="&#123;&#10;temp &#61; &#123;&#10;location &#61; &#34;EU&#34;&#10;name &#61; &#34;temp&#34;&#10;&#125;,&#10;templates &#61; &#123;&#10;location &#61; &#34;EU&#34;&#10;name &#61; &#34;templates&#34;&#10;&#125;,&#10;&#125;">...</code> |
| *transformation_subnets* | List of subnets to create in the transformation Project. | <code title="list&#40;object&#40;&#123;&#10;ip_cidr_range &#61; string&#10;name &#61; string&#10;region &#61; string&#10;secondary_ip_range &#61; map&#40;string&#41;&#10;&#125;&#41;&#41;">list(object({...}))</code> | | <code title="&#91;&#10;&#123;&#10;ip_cidr_range &#61; &#34;10.1.0.0&#47;20&#34;&#10;name &#61; &#34;transformation-subnet&#34;&#10;region &#61; &#34;europe-west3&#34;&#10;secondary_ip_range &#61; &#123;&#125;&#10;&#125;,&#10;&#93;">...</code> | | *transformation_subnets* | List of subnets to create in the transformation Project. | <code title="list&#40;object&#40;&#123;&#10;ip_cidr_range &#61; string&#10;name &#61; string&#10;region &#61; string&#10;secondary_ip_range &#61; map&#40;string&#41;&#10;&#125;&#41;&#41;">list(object({...}))</code> | | <code title="&#91;&#10;&#123;&#10;ip_cidr_range &#61; &#34;10.1.0.0&#47;20&#34;&#10;name &#61; &#34;transformation-subnet&#34;&#10;region &#61; &#34;europe-west3&#34;&#10;secondary_ip_range &#61; &#123;&#125;&#10;&#125;,&#10;&#93;">...</code> |
| *transformation_vpc_name* | Name of the VPC created in the transformation Project. | <code title="">string</code> | | <code title="">transformation-vpc</code> | | *transformation_vpc_name* | Name of the VPC created in the transformation Project. | <code title="">string</code> | | <code title="">transformation-vpc</code> |
| *admins* | List of users allowed to impersonate the service account | <code title="">list</code> | | <code title="">null</code> |
## Outputs ## Outputs

View File

@ -25,12 +25,18 @@ module "datamart-sa" {
iam_project_roles = { iam_project_roles = {
"${var.project_ids.datamart}" = ["roles/editor"] "${var.project_ids.datamart}" = ["roles/editor"]
} }
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }
module "dwh-sa" { module "dwh-sa" {
source = "../../../modules/iam-service-account" source = "../../../modules/iam-service-account"
project_id = var.project_ids.dwh project_id = var.project_ids.dwh
name = var.service_account_names.dwh name = var.service_account_names.dwh
iam_project_roles = {
"${var.project_ids.dwh}" = ["roles/bigquery.admin"]
}
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }
module "landing-sa" { module "landing-sa" {
@ -38,8 +44,11 @@ module "landing-sa" {
project_id = var.project_ids.landing project_id = var.project_ids.landing
name = var.service_account_names.landing name = var.service_account_names.landing
iam_project_roles = { iam_project_roles = {
"${var.project_ids.landing}" = ["roles/pubsub.publisher"] "${var.project_ids.landing}" = [
"roles/pubsub.publisher",
"roles/storage.objectCreator"]
} }
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }
module "services-sa" { module "services-sa" {
@ -49,6 +58,7 @@ module "services-sa" {
iam_project_roles = { iam_project_roles = {
"${var.project_ids.services}" = ["roles/editor"] "${var.project_ids.services}" = ["roles/editor"]
} }
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }
module "transformation-sa" { module "transformation-sa" {
@ -66,8 +76,17 @@ module "transformation-sa" {
"roles/dataflow.worker", "roles/dataflow.worker",
"roles/bigquery.metadataViewer", "roles/bigquery.metadataViewer",
"roles/storage.objectViewer", "roles/storage.objectViewer",
],
"${var.project_ids.landing}" = [
"roles/storage.objectViewer",
],
"${var.project_ids.dwh}" = [
"roles/bigquery.dataOwner",
"roles/bigquery.jobUser",
"roles/bigquery.metadataViewer",
] ]
} }
iam = var.admins != null ? { "roles/iam.serviceAccountTokenCreator" = var.admins } : {}
} }
############################################################################### ###############################################################################
@ -147,6 +166,31 @@ module "vpc-transformation" {
subnets = var.transformation_subnets subnets = var.transformation_subnets
} }
module "firewall" {
source = "../../../modules/net-vpc-firewall"
project_id = var.project_ids.transformation
network = module.vpc-transformation.name
admin_ranges_enabled = false
admin_ranges = [""]
http_source_ranges = []
https_source_ranges = []
ssh_source_ranges = []
custom_rules = {
iap-svc = {
description = "Dataflow service."
direction = "INGRESS"
action = "allow"
sources = ["dataflow"]
targets = ["dataflow"]
ranges = []
use_service_accounts = false
rules = [{ protocol = "tcp", ports = ["12345-12346"] }]
extra_attributes = {}
}
}
}
############################################################################### ###############################################################################
# Pub/Sub # # Pub/Sub #
############################################################################### ###############################################################################

View File

@ -0,0 +1,20 @@
# Copyright 2020 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
provider "google" {
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
}
provider "google-beta" {
impersonate_service_account = "data-platform-main@${var.project_ids.services}.iam.gserviceaccount.com"
}

View File

@ -12,6 +12,13 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
variable "admins" {
description = "List of users allowed to impersonate the service account"
type = list(string)
default = null
}
variable "datamart_bq_datasets" { variable "datamart_bq_datasets" {
description = "Datamart Bigquery datasets" description = "Datamart Bigquery datasets"
type = map(object({ type = map(object({

View File

@ -3,43 +3,38 @@
In this example we will publish person message in the following format: In this example we will publish person message in the following format:
```bash ```bash
Lorenzo,Caggioni,1617898199 name,surname,1617898199
``` ```
a Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project. A Dataflow pipeline will read those messages and import them into a Bigquery table in the DWH project.
[TODO] An autorized view will be created in the datamart project to expose the table. [TODO] An autorized view will be created in the datamart project to expose the table.
[TODO] Remove hardcoded 'lcaggio' variables and made ENV variable for it.
[TODO] Further automation is expected in future. [TODO] Further automation is expected in future.
Create and download keys for Service accounts you created. ## Set up the env vars
## Create BQ table
Those steps should be done as Transformation Service Account:
```bash ```bash
gcloud auth activate-service-account sa-dwh@dwh-lc01.iam.gserviceaccount.com --key-file=sa-dwh.json --project=dwh-lc01 export DWH_PROJECT_ID=**dwh_project_id**
export LANDING_PROJECT_ID=**landing_project_id**
export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
``` ```
and you can run the command to create a table: ## Create BQ table
Those steps should be done as DWH Service Account.
You can run the command to create a table:
```bash ```bash
bq mk \ gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
-t \ alpha bq tables create person \
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
--description "This is a Test Person table" \ --description "This is a Test Person table" \
dwh-lc01:bq_raw_dataset.person \ --schema name=STRING,surname=STRING,timestamp=TIMESTAMP
name:STRING,surname:STRING,timestamp:TIMESTAMP
``` ```
## Produce CSV data file, JSON schema file and UDF JS file ## Produce CSV data file, JSON schema file and UDF JS file
Those steps should be done as landing Service Account: Those steps should be done as landing Service Account:
```bash
gcloud auth activate-service-account sa-landing@landing-lc01.iam.gserviceaccount.com --key-file=sa-landing.json --project=landing-lc01
```
Let's now create a series of messages we can use to import: Let's now create a series of messages we can use to import:
```bash ```bash
@ -52,7 +47,7 @@ done
and copy files to the GCS bucket: and copy files to the GCS bucket:
```bash ```bash
gsutil cp person.csv gs://landing-lc01-eu-raw-data gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person.csv gs://$LANDING_PROJECT_ID-eu-raw-data
``` ```
Let's create the data JSON schema: Let's create the data JSON schema:
@ -81,7 +76,8 @@ EOF
and copy files to the GCS bucket: and copy files to the GCS bucket:
```bash ```bash
gsutil cp person_schema.json gs://landing-lc01-eu-data-schema gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_schema.json gs://$LANDING_PROJECT_ID-eu-data-schema
``` ```
Let's create the data UDF function to transform message data: Let's create the data UDF function to transform message data:
@ -105,47 +101,40 @@ EOF
and copy files to the GCS bucket: and copy files to the GCS bucket:
```bash ```bash
gsutil cp person_udf.js gs://landing-lc01-eu-data-schema gsutil -i sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com cp person_udf.js gs://$LANDING_PROJECT_ID-eu-data-schema
``` ```
if you want to check files copied to GCS, you can use the Transformation service account: if you want to check files copied to GCS, you can use the Transformation service account:
```bash ```bash
gcloud auth activate-service-account sa-transformation@transformation-lc01.iam.gserviceaccount.com --key-file=sa-transformation.json --project=transformation-lc01 gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-raw-data
``` gsutil -i sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com ls gs://$LANDING_PROJECT_ID-eu-data-schema
and read a message (message won't be acked and will stay in the subscription):
```bash
gsutil ls gs://landing-lc01-eu-raw-data
gsutil ls gs://landing-lc01-eu-data-schema
``` ```
## Dataflow ## Dataflow
Those steps should be done as transformation Service Account: Those steps should be done as transformation Service Account.
Let's than start a Dataflow batch pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
```bash ```bash
gcloud auth activate-service-account sa-transformation@transformation-lc01.iam.gserviceaccount.com --key-file=sa-transformation.json --project=transformation-lc01 gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com dataflow jobs run test_batch_01 \
```
Let's than start a Dataflwo batch pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
```bash
gcloud dataflow jobs run test_batch_lcaggio01 \
--gcs-location gs://dataflow-templates/latest/GCS_Text_to_BigQuery \ --gcs-location gs://dataflow-templates/latest/GCS_Text_to_BigQuery \
--project transformation-lc01 \ --project $TRANSFORMATION_PROJECT_ID \
--region europe-west3 \ --region europe-west3 \
--disable-public-ips \ --disable-public-ips \
--network transformation-vpc \ --network transformation-vpc \
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \ --subnetwork regions/europe-west3/subnetworks/transformation-subnet \
--staging-location gs://transformation-lc01-eu-temp \ --staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
--service-account-email sa-transformation@transformation-lc01.iam.gserviceaccount.com \ --service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
--parameters \ --parameters \
javascriptTextTransformFunctionName=transform,\ javascriptTextTransformFunctionName=transform,\
JSONPath=gs://landing-lc01-eu-data-schema/person_schema.json,\ JSONPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_schema.json,\
javascriptTextTransformGcsPath=gs://landing-lc01-eu-data-schema/person_udf.js,\ javascriptTextTransformGcsPath=gs://$LANDING_PROJECT_ID-eu-data-schema/person_udf.js,\
inputFilePattern=gs://landing-lc01-eu-raw-data/person.csv,\ inputFilePattern=gs://$LANDING_PROJECT_ID-eu-raw-data/person.csv,\
outputTable=dwh-lc01:bq_raw_dataset.person,\ outputTable=$DWH_PROJECT_ID:bq_raw_dataset.person,\
bigQueryLoadingTemporaryDirectory=gs://transformation-lc01-eu-temp bigQueryLoadingTemporaryDirectory=gs://$TRANSFORMATION_PROJECT_ID-eu-temp
```
```

View File

@ -3,8 +3,8 @@
In this example we will publish person message in the following format: In this example we will publish person message in the following format:
```txt ```txt
name: Lorenzo name: Name
surname: Caggioni surname: Surname
timestamp: 1617898199 timestamp: 1617898199
``` ```
@ -12,85 +12,64 @@ a Dataflow pipeline will read those messages and import them into a Bigquery tab
An autorized view will be created in the datamart project to expose the table. An autorized view will be created in the datamart project to expose the table.
[TODO] Remove hardcoded 'lcaggio' variables and made ENV variable for it.
[TODO] Further automation is expected in future. [TODO] Further automation is expected in future.
Create and download keys for Service accounts you created, be sure to have `iam.serviceAccountKeys.create` permission on projects or at folder level. ## Set up the env vars
```bash ```bash
gcloud iam service-accounts keys create sa-landing.json --iam-account=sa-landing@landing-lc01.iam.gserviceaccount.com export DWH_PROJECT_ID=**dwh_project_id**
gcloud iam service-accounts keys create sa-transformation.json --iam-account=sa-transformation@transformation-lc01.iam.gserviceaccount.com export LANDING_PROJECT_ID=**landing_project_id**
gcloud iam service-accounts keys create sa-dwh.json --iam-account=sa-dwh@dwh-lc01.iam.gserviceaccount.com export TRANSFORMATION_PROJECT_ID=*transformation_project_id*
``` ```
## Create BQ table ## Create BQ table
Those steps should be done as DWH Service Account.
Those steps should be done as Transformation Service Account: You can run the command to create a table:
```bash ```bash
gcloud auth activate-service-account sa-dwh@dwh-lc01.iam.gserviceaccount.com --key-file=sa-dwh.json --project=dwh-lc01 gcloud --impersonate-service-account=sa-datawh@$DWH_PROJECT_ID.iam.gserviceaccount.com \
``` alpha bq tables create person \
--project=$DWH_PROJECT_ID --dataset=bq_raw_dataset \
and you can run the command to create a table:
```bash
bq mk \
-t \
--description "This is a Test Person table" \ --description "This is a Test Person table" \
dwh-lc01:bq_raw_dataset.person \ --schema name=STRING,surname=STRING,timestamp=TIMESTAMP
name:STRING,surname:STRING,timestamp:TIMESTAMP
``` ```
## Produce PubSub messages ## Produce PubSub messages
Those steps should be done as landing Service Account: Those steps should be done as landing Service Account:
```bash Let's now create a series of messages we can use to import:
gcloud auth activate-service-account sa-landing@landing-lc01.iam.gserviceaccount.com --key-file=sa-landing.json --project=landing-lc01
```
and let's now create a series of messages we can use to import:
```bash ```bash
for i in {0..10} for i in {0..10}
do do
gcloud pubsub topics publish projects/landing-lc01/topics/landing-1 --message="{\"name\": \"Lorenzo\", \"surname\": \"Caggioni\", \"timestamp\": \"$(date +%s)\"}" gcloud --impersonate-service-account=sa-landing@$LANDING_PROJECT_ID.iam.gserviceaccount.com pubsub topics publish projects/$LANDING_PROJECT_ID/topics/landing-1 --message="{\"name\": \"Lorenzo\", \"surname\": \"Caggioni\", \"timestamp\": \"$(date +%s)\"}"
done done
``` ```
if you want to check messages published, you can use the Transformation service account: if you want to check messages published, you can use the Transformation service account and read a message (message won't be acked and will stay in the subscription):
```bash ```bash
gcloud auth activate-service-account sa-transformation@transformation-lc01.iam.gserviceaccount.com --key-file=sa-transformation.json --project=transformation-lc01 gcloud --impersonate-service-account=sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com pubsub subscriptions pull projects/$LANDING_PROJECT_ID/subscriptions/sub1
```
and read a message (message won't be acked and will stay in the subscription):
```bash
gcloud pubsub subscriptions pull projects/landing-lc01/subscriptions/sub1
``` ```
## Dataflow ## Dataflow
Those steps should be done as transformation Service Account: Those steps should be done as transformation Service Account:
```bash Let's than start a Dataflow streaming pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
gcloud auth activate-service-account sa-transformation@transformation-lc01.iam.gserviceaccount.com --key-file=sa-transformation.json --project=transformation-lc01
```
Let's than start a Dataflwo streaming pipeline using a Google provided template using internal only IPs, the created network and subnetwork, the appropriate service account and requested parameters:
```bash ```bash
gcloud dataflow jobs run test_lcaggio01 \ gcloud dataflow jobs run test_streaming01 \
--gcs-location gs://dataflow-templates/latest/PubSub_Subscription_to_BigQuery \ --gcs-location gs://dataflow-templates/latest/PubSub_Subscription_to_BigQuery \
--project transformation-lc01 \ --project $TRANSFORMATION_PROJECT_ID \
--region europe-west3 \ --region europe-west3 \
--disable-public-ips \ --disable-public-ips \
--network transformation-vpc \ --network transformation-vpc \
--subnetwork regions/europe-west3/subnetworks/transformation-subnet \ --subnetwork regions/europe-west3/subnetworks/transformation-subnet \
--staging-location gs://transformation-lc01-eu-temp \ --staging-location gs://$TRANSFORMATION_PROJECT_ID-eu-temp \
--service-account-email sa-transformation@transformation-lc01.iam.gserviceaccount.com \ --service-account-email sa-transformation@$TRANSFORMATION_PROJECT_ID.iam.gserviceaccount.com \
--parameters \ --parameters \
inputSubscription=projects/landing-lc01/subscriptions/sub1,\ inputSubscription=projects/$LANDING_PROJECT_ID/subscriptions/sub1,\
outputTableSpec=dwh-lc01:bq_raw_dataset.person outputTableSpec=$DWH_PROJECT_ID:bq_raw_dataset.person
``` ```

View File

@ -24,4 +24,4 @@ def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected." "Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner(FIXTURES_DIR) modules, resources = e2e_plan_runner(FIXTURES_DIR)
assert len(modules) == 6 assert len(modules) == 6
assert len(resources) == 45 assert len(resources) == 53