Remove older GCS to BQ example (#523)
* remove older GCS to BQ example * remove tests
This commit is contained in:
parent
be33a7f880
commit
c2a2b799b9
|
@ -5,7 +5,7 @@ This section contains **[foundational examples](./foundations/)** that bootstrap
|
|||
Currently available examples:
|
||||
|
||||
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management)
|
||||
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms/), [Cloud Storage to Bigquery with Cloud Dataflow](./data-solutions/gcs-to-bq-with-dataflow/)
|
||||
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/)
|
||||
- **factories** - [The why and the how of resource factories](./factories/README.md)
|
||||
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
|
||||
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [PSC for on-premises Cloud Function invocation](./networking/private-cloud-function-from-onprem/), [decentralized firewall](./networking/decentralized-firewall)
|
||||
|
|
|
@ -11,9 +11,9 @@ They are meant to be used as minimal but complete starting points to create actu
|
|||
<a href="./cmek-via-centralized-kms/" title="CMEK on Cloud Storage and Compute Engine via centralized Cloud KMS"><img src="./cmek-via-centralized-kms/diagram.png" align="left" width="280px"></a> This [example](./cmek-via-centralized-kms/) implements [CMEK](https://cloud.google.com/kms/docs/cmek) for GCS and GCE, via keys hosted in KMS running in a centralized project. The example shows the basic resources and permissions for the typical use case of application projects implementing encryption at rest via a centrally managed KMS service.
|
||||
<br clear="left">
|
||||
|
||||
### Cloud Storage to Bigquery with Cloud Dataflow
|
||||
<a href="./gcs-to-bq-with-dataflow/" title="Cloud Storage to Bigquery with Cloud Dataflow"><img src="./gcs-to-bq-with-dataflow/diagram.png" align="left" width="280px"></a> This [example](./gcs-to-bq-with-dataflow/) implements [Cloud Storage](https://cloud.google.com/kms/docs/cmek) to Bigquery data import using Cloud Dataflow.
|
||||
All resources use CMEK hosted in Cloud KMS running in a centralized project. The example shows the basic resources and permissions for the typical use case to read, transform and import data from Cloud Storage to Bigquery.
|
||||
### Cloud Storage to Bigquery with Cloud Dataflow with least privileges
|
||||
|
||||
<a href="./gcs-to-bq-with-least-privileges/" title="Cloud Storage to Bigquery with Cloud Dataflow with least privileges"><img src="./gcs-to-bq-with-least-privileges/diagram.png" align="left" width="280px"></a> This [example](./gcs-to-bq-with-least-privileges/) implements resources required to run GCS to BigQuery Dataflow pipelines. The solution rely on a set of Services account created with the least privileges principle.
|
||||
<br clear="left">
|
||||
|
||||
### Data Platform Foundations
|
||||
|
@ -21,4 +21,3 @@ All resources use CMEK hosted in Cloud KMS running in a centralized project. The
|
|||
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/02-resources/diagram.png" align="left" width="280px"></a>
|
||||
This [example](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
|
||||
<br clear="left">
|
||||
|
||||
|
|
|
@ -1,134 +0,0 @@
|
|||
# Cloud Storage to Bigquery with Cloud Dataflow
|
||||
|
||||
This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery).
|
||||
|
||||
The solution will use:
|
||||
- internal IPs for GCE and Dataflow instances
|
||||
- CMEK encription for GCS bucket, GCE instances, DataFlow instances and BigQuery tables
|
||||
- Cloud NAT to let resources comunicate to the Internet, run system updates, and install packages
|
||||
|
||||
The example is designed to match real-world use cases with a minimum amount of resources. It can be used as a starting point for more complex scenarios.
|
||||
|
||||
This is the high level diagram:
|
||||
|
||||
![GCS to Biquery High-level diagram](diagram.png "GCS to Biquery High-level diagram")
|
||||
|
||||
## Managed resources and services
|
||||
|
||||
This sample creates several distinct groups of resources:
|
||||
|
||||
- projects
|
||||
- Cloud KMS project
|
||||
- Service Project configured for GCE instances, GCS buckets, Dataflow instances and BigQuery tables
|
||||
- networking
|
||||
- VPC network
|
||||
- One subnet
|
||||
- Firewall rules for [SSH access via IAP](https://cloud.google.com/iap/docs/using-tcp-forwarding) and open communication within the VPC
|
||||
- IAM
|
||||
- One service account for GGE instances
|
||||
- One service account for Dataflow instances
|
||||
- One service account for Bigquery tables
|
||||
- KMS
|
||||
- One contintent key ring (example: 'Europe')
|
||||
- One crypto key (Procection level: softwere) for Cloud Engine
|
||||
- One crypto key (Protection level: softwere) for Cloud Storage
|
||||
- One regional key ring ('example: 'europe-west1')
|
||||
- One crypto key (Protection level: softwere) for Cloud Dataflow
|
||||
- GCE
|
||||
- One instance encrypted with a CMEK Cryptokey hosted in Cloud KMS
|
||||
- GCS
|
||||
- One bucket encrypted with a CMEK Cryptokey hosted in Cloud KMS
|
||||
- BQ
|
||||
- One dataset encrypted with a CMEK Cryptokey hosted in Cloud KMS
|
||||
- Two tables encrypted with a CMEK Cryptokey hosted in Cloud KMS
|
||||
|
||||
## Test your environment with Cloud Dataflow
|
||||
You can now connect to the GCE instance with the following command:
|
||||
|
||||
```hcl
|
||||
gcloud compute ssh vm-example
|
||||
```
|
||||
|
||||
You can run now the simple pipeline you can find [here](./scripts/data_ingestion/). Once you have installed required packages and copied a file into the GCS bucket, you can trigger the pipeline using internal ips with a command simila to:
|
||||
|
||||
```hcl
|
||||
python data_ingestion.py \
|
||||
--runner=DataflowRunner \
|
||||
--max_num_workers=10 \
|
||||
--autoscaling_algorithm=THROUGHPUT_BASED \
|
||||
--region=### REGION ### \
|
||||
--staging_location=gs://### TEMP BUCKET NAME ###/ \
|
||||
--temp_location=gs://### TEMP BUCKET NAME ###/ \
|
||||
--project=### PROJECT ID ### \
|
||||
--input=gs://### DATA BUCKET NAME###/### FILE NAME ###.csv \
|
||||
--output=### DATASET NAME ###.### TABLE NAME ### \
|
||||
--service_account_email=### SERVICE ACCOUNT EMAIL ### \
|
||||
--network=### NETWORK NAME ### \
|
||||
--subnetwork=### SUBNET NAME ### \
|
||||
--dataflow_kms_key=### CRYPTOKEY ID ### \
|
||||
--no_use_public_ips
|
||||
```
|
||||
|
||||
for example:
|
||||
|
||||
```hcl
|
||||
python data_ingestion.py \
|
||||
--runner=DataflowRunner \
|
||||
--max_num_workers=10 \
|
||||
--autoscaling_algorithm=THROUGHPUT_BASED \
|
||||
--region=europe-west1 \
|
||||
--staging_location=gs://lc-001-eu-df-tmplocation/ \
|
||||
--temp_location=gs://lc-001-eu-df-tmplocation/ \
|
||||
--project=lcaggio-demo \
|
||||
--input=gs://lc-eu-data/person.csv \
|
||||
--output=bq_dataset.df_import \
|
||||
--service_account_email=df-test@lcaggio-demo.iam.gserviceaccount.com \
|
||||
--network=local \
|
||||
--subnetwork=regions/europe-west1/subnetworks/subnet \
|
||||
--dataflow_kms_key=projects/lcaggio-demo-kms/locations/europe-west1/keyRings/my-keyring-regional/cryptoKeys/key-df \
|
||||
--no_use_public_ips
|
||||
```
|
||||
|
||||
You can check data imported into Google BigQuery from the Google Cloud Console UI.
|
||||
|
||||
## Test your environment with 'bq' CLI
|
||||
You can now connect to the GCE instance with the following command:
|
||||
|
||||
```hcl
|
||||
gcloud compute ssh vm-example
|
||||
```
|
||||
|
||||
You can run now a simple 'bq load' command to import data into Bigquery. Below an example command:
|
||||
|
||||
```hcl
|
||||
bq load \
|
||||
--source_format=CSV \
|
||||
bq_dataset.bq_import \
|
||||
gs://my-bucket/person.csv \
|
||||
schema_bq_import.json
|
||||
```
|
||||
|
||||
You can check data imported into Google BigQuery from the Google Cloud Console UI.
|
||||
<!-- BEGIN TFDOC -->
|
||||
|
||||
## Variables
|
||||
|
||||
| name | description | type | required | default |
|
||||
|---|---|:---:|:---:|:---:|
|
||||
| [project_id](variables.tf#L31) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
|
||||
| [prefix](variables.tf#L16) | Unique prefix used for resource names. Not used for project if 'project_create' is null. | <code>string</code> | | <code>null</code> |
|
||||
| [project_create](variables.tf#L22) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object({ billing_account_id = string parent = string })">object({…})</code> | | <code>null</code> |
|
||||
| [region](variables.tf#L36) | The region where resources will be deployed. | <code>string</code> | | <code>"europe-west1"</code> |
|
||||
| [vpc_subnet_range](variables.tf#L42) | Ip range used for the VPC subnet created for the example. | <code>string</code> | | <code>"10.0.0.0/20"</code> |
|
||||
|
||||
## Outputs
|
||||
|
||||
| name | description | sensitive |
|
||||
|---|---|:---:|
|
||||
| [bq_tables](outputs.tf#L15) | Bigquery Tables. | |
|
||||
| [buckets](outputs.tf#L20) | GCS Bucket Cloud KMS crypto keys. | |
|
||||
| [data_ingestion_command](outputs.tf#L28) | | |
|
||||
| [project_id](outputs.tf#L48) | Project id. | |
|
||||
| [vm](outputs.tf#L53) | GCE VM. | |
|
||||
|
||||
<!-- END TFDOC -->
|
|
@ -1,20 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
terraform {
|
||||
backend "gcs" {
|
||||
bucket = ""
|
||||
}
|
||||
}
|
|
@ -1,65 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
module "bigquery-dataset" {
|
||||
source = "../../../modules/bigquery-dataset"
|
||||
project_id = module.project.project_id
|
||||
id = "example_dataset"
|
||||
location = var.region
|
||||
access = {
|
||||
reader-group = { role = "READER", type = "user" }
|
||||
owner = { role = "OWNER", type = "user" }
|
||||
}
|
||||
access_identities = {
|
||||
reader-group = module.service-account-bq.email
|
||||
owner = module.service-account-bq.email
|
||||
}
|
||||
encryption_key = module.kms.keys.key-bq.id
|
||||
tables = {
|
||||
bq_import = {
|
||||
friendly_name = "BQ import"
|
||||
labels = {}
|
||||
options = null
|
||||
partitioning = {
|
||||
field = null
|
||||
range = null # use start/end/interval for range
|
||||
time = null
|
||||
}
|
||||
schema = file("${path.module}/schema_bq_import.json")
|
||||
options = {
|
||||
clustering = null
|
||||
expiration_time = null
|
||||
encryption_key = module.kms.keys.key-bq.id
|
||||
}
|
||||
deletion_protection = false
|
||||
},
|
||||
df_import = {
|
||||
friendly_name = "Dataflow import"
|
||||
labels = {}
|
||||
options = null
|
||||
partitioning = {
|
||||
field = null
|
||||
range = null # use start/end/interval for range
|
||||
time = null
|
||||
}
|
||||
schema = file("${path.module}/schema_df_import.json")
|
||||
options = {
|
||||
clustering = null
|
||||
expiration_time = null
|
||||
encryption_key = module.kms.keys.key-bq.id
|
||||
}
|
||||
deletion_protection = false
|
||||
}
|
||||
}
|
||||
}
|
Binary file not shown.
Before Width: | Height: | Size: 197 KiB |
|
@ -1,54 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
locals {
|
||||
vm-startup-script = join("\n", [
|
||||
"#! /bin/bash",
|
||||
"apt-get update && apt-get install -y bash-completion git python3-venv gcc build-essential python-dev python3-dev",
|
||||
"pip3 install --upgrade setuptools pip"
|
||||
])
|
||||
}
|
||||
|
||||
module "vm" {
|
||||
source = "../../../modules/compute-vm"
|
||||
project_id = module.project.project_id
|
||||
zone = "${var.region}-b"
|
||||
name = "${var.prefix}-vm-0"
|
||||
network_interfaces = [{
|
||||
network = module.vpc.self_link,
|
||||
subnetwork = local.subnet_self_link,
|
||||
nat = false,
|
||||
addresses = null
|
||||
}]
|
||||
attached_disks = [{
|
||||
name = "data", size = 10, source = null, source_type = null, options = null
|
||||
}]
|
||||
boot_disk = {
|
||||
image = "projects/debian-cloud/global/images/family/debian-10"
|
||||
type = "pd-ssd"
|
||||
size = 10
|
||||
encrypt_disk = true
|
||||
}
|
||||
encryption = {
|
||||
encrypt_boot = true
|
||||
disk_encryption_key_raw = null
|
||||
kms_key_self_link = module.kms.key_ids.key-gce
|
||||
}
|
||||
metadata = {
|
||||
startup-script = local.vm-startup-script
|
||||
}
|
||||
service_account = module.service-account-gce.email
|
||||
service_account_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
|
||||
tags = ["ssh"]
|
||||
}
|
|
@ -1,49 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
module "gcs-data" {
|
||||
source = "../../../modules/gcs"
|
||||
project_id = module.project.project_id
|
||||
prefix = var.prefix
|
||||
name = "data"
|
||||
location = var.region
|
||||
storage_class = "REGIONAL"
|
||||
iam = {
|
||||
"roles/storage.admin" = [
|
||||
"serviceAccount:${module.service-account-gce.email}",
|
||||
],
|
||||
"roles/storage.objectViewer" = [
|
||||
"serviceAccount:${module.service-account-df.email}",
|
||||
]
|
||||
}
|
||||
encryption_key = module.kms.keys.key-gcs.id
|
||||
force_destroy = true
|
||||
}
|
||||
|
||||
module "gcs-df-tmp" {
|
||||
source = "../../../modules/gcs"
|
||||
project_id = module.project.project_id
|
||||
prefix = var.prefix
|
||||
name = "df-tmp"
|
||||
location = var.region
|
||||
storage_class = "REGIONAL"
|
||||
iam = {
|
||||
"roles/storage.admin" = [
|
||||
"serviceAccount:${module.service-account-gce.email}",
|
||||
"serviceAccount:${module.service-account-df.email}",
|
||||
]
|
||||
}
|
||||
encryption_key = module.kms.keys.key-gcs.id
|
||||
force_destroy = true
|
||||
}
|
|
@ -1,60 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
module "service-account-bq" {
|
||||
source = "../../../modules/iam-service-account"
|
||||
project_id = module.project.project_id
|
||||
name = "bq-test"
|
||||
prefix = var.prefix
|
||||
iam_project_roles = {
|
||||
(module.project.project_id) = [
|
||||
"roles/bigquery.admin",
|
||||
"roles/logging.logWriter",
|
||||
"roles/monitoring.metricWriter",
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
module "service-account-df" {
|
||||
source = "../../../modules/iam-service-account"
|
||||
project_id = module.project.project_id
|
||||
name = "df-test"
|
||||
prefix = var.prefix
|
||||
iam_project_roles = {
|
||||
(module.project.project_id) = [
|
||||
"roles/bigquery.dataOwner",
|
||||
"roles/bigquery.jobUser",
|
||||
"roles/bigquery.metadataViewer",
|
||||
"roles/dataflow.worker",
|
||||
"roles/storage.objectViewer",
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
module "service-account-gce" {
|
||||
source = "../../../modules/iam-service-account"
|
||||
project_id = module.project.project_id
|
||||
name = "gce-test"
|
||||
prefix = var.prefix
|
||||
iam_project_roles = {
|
||||
(module.project.project_id) = [
|
||||
"roles/bigquery.dataOwner",
|
||||
"roles/bigquery.jobUser",
|
||||
"roles/dataflow.admin",
|
||||
"roles/iam.serviceAccountUser",
|
||||
"roles/logging.logWriter",
|
||||
"roles/monitoring.metricWriter",
|
||||
]
|
||||
}
|
||||
}
|
|
@ -1,63 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
module "kms" {
|
||||
source = "../../../modules/kms"
|
||||
project_id = module.project.project_id
|
||||
keyring = {
|
||||
name = "${var.prefix}-keyring",
|
||||
location = var.region
|
||||
}
|
||||
keys = {
|
||||
key-df = null
|
||||
key-gce = null
|
||||
key-gcs = null
|
||||
key-bq = null
|
||||
}
|
||||
key_iam = {
|
||||
key-gce = {
|
||||
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
|
||||
"serviceAccount:${module.project.service_accounts.robots.compute}"
|
||||
]
|
||||
},
|
||||
key-gcs = {
|
||||
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
|
||||
"serviceAccount:${module.project.service_accounts.robots.storage}"
|
||||
]
|
||||
},
|
||||
key-bq = {
|
||||
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
|
||||
"serviceAccount:${module.project.service_accounts.robots.bq}"
|
||||
]
|
||||
},
|
||||
key-df = {
|
||||
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
|
||||
"serviceAccount:${module.project.service_accounts.robots.dataflow}",
|
||||
"serviceAccount:${module.project.service_accounts.robots.compute}",
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# module "kms-regional" {
|
||||
# source = "../../../modules/kms"
|
||||
# project_id = module.project-kms.project_id
|
||||
# keyring = {
|
||||
# name = "my-keyring-regional",
|
||||
# location = var.region
|
||||
# }
|
||||
# keys = { key-df = null }
|
||||
# key_iam = {
|
||||
# }
|
||||
# }
|
|
@ -1,69 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
locals {
|
||||
subnet_name = module.vpc.subnets["${var.region}/${var.prefix}-subnet-0"].name
|
||||
subnet_self_link = module.vpc.subnets["${var.region}/${var.prefix}-subnet-0"].self_link
|
||||
}
|
||||
|
||||
module "project" {
|
||||
source = "../../../modules/project"
|
||||
name = var.project_id
|
||||
parent = try(var.project_create.parent, null)
|
||||
billing_account = try(var.project_create.billing_account_id, null)
|
||||
project_create = var.project_create != null
|
||||
prefix = var.project_create == null ? null : var.prefix
|
||||
services = [
|
||||
"bigquery.googleapis.com",
|
||||
"bigqueryreservation.googleapis.com",
|
||||
"bigquerystorage.googleapis.com",
|
||||
"cloudkms.googleapis.com",
|
||||
"compute.googleapis.com",
|
||||
"dataflow.googleapis.com",
|
||||
"servicenetworking.googleapis.com",
|
||||
"storage.googleapis.com",
|
||||
]
|
||||
service_config = {
|
||||
disable_on_destroy = false, disable_dependent_services = false
|
||||
}
|
||||
}
|
||||
|
||||
module "vpc" {
|
||||
source = "../../../modules/net-vpc"
|
||||
project_id = module.project.project_id
|
||||
name = "${var.prefix}-vpc"
|
||||
subnets = [
|
||||
{
|
||||
ip_cidr_range = var.vpc_subnet_range
|
||||
name = "${var.prefix}-subnet-0"
|
||||
region = var.region
|
||||
secondary_ip_range = {}
|
||||
}
|
||||
]
|
||||
}
|
||||
|
||||
module "vpc-firewall" {
|
||||
source = "../../../modules/net-vpc-firewall"
|
||||
project_id = module.project.project_id
|
||||
network = module.vpc.name
|
||||
admin_ranges = [var.vpc_subnet_range]
|
||||
}
|
||||
|
||||
module "nat" {
|
||||
source = "../../../modules/net-cloudnat"
|
||||
project_id = module.project.project_id
|
||||
region = var.region
|
||||
name = "${var.prefix}-default"
|
||||
router_network = module.vpc.name
|
||||
}
|
|
@ -1,59 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
output "bq_tables" {
|
||||
description = "Bigquery Tables."
|
||||
value = module.bigquery-dataset.table_ids
|
||||
}
|
||||
|
||||
output "buckets" {
|
||||
description = "GCS Bucket Cloud KMS crypto keys."
|
||||
value = {
|
||||
data = module.gcs-data.name
|
||||
df-tmp = module.gcs-df-tmp.name
|
||||
}
|
||||
}
|
||||
|
||||
output "data_ingestion_command" {
|
||||
value = <<-EOF
|
||||
python data_ingestion.py \
|
||||
--runner=DataflowRunner \
|
||||
--max_num_workers=10 \
|
||||
--autoscaling_algorithm=THROUGHPUT_BASED \
|
||||
--region=${var.region} \
|
||||
--staging_location=${module.gcs-df-tmp.url} \
|
||||
--temp_location=${module.gcs-df-tmp.url}/ \
|
||||
--project=${var.project_id} \
|
||||
--input=${module.gcs-data.url}/### FILE NAME ###.csv \
|
||||
--output=${module.bigquery-dataset.dataset_id}.${module.bigquery-dataset.table_ids.df_import} \
|
||||
--service_account_email=${module.service-account-df.email} \
|
||||
--network=${module.vpc.name} \
|
||||
--subnetwork=${local.subnet_name} \
|
||||
--dataflow_kms_key=${module.kms.key_ids.key-df} \
|
||||
--no_use_public_ips
|
||||
EOF
|
||||
}
|
||||
|
||||
output "project_id" {
|
||||
description = "Project id."
|
||||
value = module.project.project_id
|
||||
}
|
||||
|
||||
output "vm" {
|
||||
description = "GCE VM."
|
||||
value = {
|
||||
name = module.vm.instance.name
|
||||
address = module.vm.internal_ip
|
||||
}
|
||||
}
|
|
@ -1,14 +0,0 @@
|
|||
[
|
||||
{
|
||||
"name": "name",
|
||||
"type": "STRING"
|
||||
},
|
||||
{
|
||||
"name": "surname",
|
||||
"type": "STRING"
|
||||
},
|
||||
{
|
||||
"name": "age",
|
||||
"type": "NUMERIC"
|
||||
}
|
||||
]
|
|
@ -1,22 +0,0 @@
|
|||
[
|
||||
{
|
||||
"mode": "NULLABLE",
|
||||
"name": "name",
|
||||
"type": "STRING"
|
||||
},
|
||||
{
|
||||
"mode": "NULLABLE",
|
||||
"name": "surname",
|
||||
"type": "STRING"
|
||||
},
|
||||
{
|
||||
"mode": "NULLABLE",
|
||||
"name": "age",
|
||||
"type": "NUMERIC"
|
||||
},
|
||||
{
|
||||
"mode": "NULLABLE",
|
||||
"name": "_TIMESTAMP",
|
||||
"type": "TIMESTAMP"
|
||||
}
|
||||
]
|
|
@ -1,4 +0,0 @@
|
|||
# Sripts
|
||||
In this section you can find two simple scripts to test your environment:
|
||||
- [Data ingestion](./data_ingestion/): a simple Apache Beam Python pipeline to import data from Google Cloud Storage into Bigquery.
|
||||
- [Person details generator](./person_details_generator/): a simple script to generate some random data to test your environment.
|
|
@ -1,201 +0,0 @@
|
|||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
|
@ -1,99 +0,0 @@
|
|||
# Ingest CSV files from GCS into Bigquery
|
||||
|
||||
In this example we create a Python [Apache Beam](https://beam.apache.org/) pipeline running on [Google Cloud Dataflow](https://cloud.google.com/dataflow/) to import CSV files into BigQuery adding a timestamp to each row. Below the architecture used:
|
||||
|
||||
![Apache Beam pipeline to import CSV from GCS into BQ](diagram.png)
|
||||
|
||||
The architecture uses:
|
||||
* [Google Cloud Storage]() to store CSV source files
|
||||
* [Google Cloud Dataflow](https://cloud.google.com/dataflow/) to read files from Google Cloud Storage, Transform data base on the structure of the file and import the data into Google BigQuery
|
||||
* [Google BigQuery](https://cloud.google.com/bigquery/) to store data in a Data Lake.
|
||||
|
||||
You can use this script as a starting point to import your files into Google BigQuery. You'll probably need to adapt the script logic to your requirements.
|
||||
|
||||
## 1. Prerequisites
|
||||
- Up and running GCP project with enabled billing account
|
||||
- gcloud installed and initiated to your project
|
||||
- Google Cloud Dataflow API enabled
|
||||
- Google Cloud Storage Bucket containing the file to import (CSV format) containings name, surnames and age. Example: `Mario,Rossi,30`.
|
||||
- Google Cloud Storage Bucket for temp and staging Google Dataflow files
|
||||
- Google BigQuery dataset
|
||||
- [Python](https://www.python.org/) >= 3.7 and python-dev module
|
||||
- gcc
|
||||
- Google Cloud [Application Default Credentials](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login)
|
||||
|
||||
## 2. Create virtual environment
|
||||
Create a new virtual environment (recommended) and install requirements:
|
||||
|
||||
```
|
||||
virtualenv env
|
||||
source ./env/bin/activate
|
||||
pip3 install --upgrade setuptools pip
|
||||
pip3 install -r requirements.txt
|
||||
```
|
||||
|
||||
## 4. Upload files into Google Cloud Storage
|
||||
Upload files to be imported into Google Bigquery in a Google Cloud Storage Bucket. You can use `gsutil` using a command like:
|
||||
```
|
||||
gsutil cp [LOCAL_OBJECT_LOCATION] gs://[DESTINATION_BUCKET_NAME]/
|
||||
```
|
||||
|
||||
Files need to be in CSV format,For example:
|
||||
```
|
||||
Enrico,Bianchi,20
|
||||
Mario,Rossi,30
|
||||
```
|
||||
|
||||
You can use the [person_details_generator](../person_details_generator/) script if you want to create random person details.
|
||||
|
||||
## 5. Run pipeline
|
||||
You can check parameters accepted by the `data_ingestion.py` script with the following command:
|
||||
```
|
||||
python pipelines/data_ingestion --help
|
||||
```
|
||||
|
||||
You can run the pipeline locally with the following command:
|
||||
```
|
||||
python data_ingestion.py \
|
||||
--runner=DirectRunner \
|
||||
--project=###PUT HERE PROJECT ID### \
|
||||
--input=###PUT HERE THE FILE TO IMPORT. EXAMPLE: gs://bucket_name/person.csv ### \
|
||||
--output=###PUT HERE BQ DATASET.TABLE###
|
||||
```
|
||||
|
||||
or you can run the pipeline on Google Dataflow using the following command:
|
||||
|
||||
```
|
||||
python data_ingestion.py \
|
||||
--runner=DataflowRunner \
|
||||
--max_num_workers=100 \
|
||||
--autoscaling_algorithm=THROUGHPUT_BASED \
|
||||
--region=###PUT HERE REGION### \
|
||||
--staging_location=###PUT HERE GCS STAGING LOCATION### \
|
||||
--temp_location=###PUT HERE GCS TMP LOCATION###\
|
||||
--project=###PUT HERE PROJECT ID### \
|
||||
--input=###PUT HERE GCS BUCKET NAME. EXAMPLE: gs://bucket_name/person.csv### \
|
||||
--output=###PUT HERE BQ DATASET NAME. EXAMPLE: bq_dataset.df_import### \
|
||||
```
|
||||
|
||||
Below an example to run the pipeline specifying Network and Subnetwork, using private IPs and using a KMS key to encrypt data at rest:
|
||||
|
||||
```
|
||||
python data_ingestion.py \
|
||||
--runner=DataflowRunner \
|
||||
--max_num_workers=100 \
|
||||
--autoscaling_algorithm=THROUGHPUT_BASED \
|
||||
--region=###PUT HERE REGION### \
|
||||
--staging_location=###PUT HERE GCS STAGING LOCATION### \
|
||||
--temp_location=###PUT HERE GCS TMP LOCATION###\
|
||||
--project=###PUT HERE PROJECT ID### \
|
||||
--network=###PUT HERE YOUR NETWORK### \
|
||||
--subnetwork=###PUT HERE YOUR SUBNETWORK. EXAMPLE: regions/europe-west1/subnetworks/subnet### \
|
||||
--dataflowKmsKey=###PUT HERE KMES KEY. Example: projects/lcaggio-d-4-kms/locations/europe-west1/keyRings/my-keyring-regional/cryptoKeys/key-df### \
|
||||
--input=###PUT HERE GCS BUCKET NAME. EXAMPLE: gs://bucket_name/person.csv### \
|
||||
--output=###PUT HERE BQ DATASET NAME. EXAMPLE: bq_dataset.df_import### \
|
||||
--no_use_public_ips
|
||||
```
|
||||
|
||||
## 6. Check results
|
||||
You can check data imported into Google BigQuery from the Google Cloud Console UI.
|
|
@ -1,3 +0,0 @@
|
|||
apache-beam[gcp]
|
||||
setuptools
|
||||
wheel
|
|
@ -1,134 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""Dataflow pipeline. Reads a CSV file and writes to a BQ table adding a timestamp.
|
||||
"""
|
||||
|
||||
|
||||
import argparse
|
||||
import logging
|
||||
import re
|
||||
|
||||
import apache_beam as beam
|
||||
from apache_beam.options.pipeline_options import PipelineOptions
|
||||
|
||||
|
||||
class DataIngestion:
|
||||
"""A helper class which contains the logic to translate the file into
|
||||
a format BigQuery will accept."""
|
||||
|
||||
def parse_method(self, string_input):
|
||||
"""Translate CSV row to dictionary.
|
||||
Args:
|
||||
string_input: A comma separated list of values in the form of
|
||||
name,surname
|
||||
Example string_input: lorenzo,caggioni
|
||||
Returns:
|
||||
A dict mapping BigQuery column names as keys
|
||||
example output:
|
||||
{
|
||||
'name': 'mario',
|
||||
'surname': 'rossi',
|
||||
'age': 30
|
||||
}
|
||||
"""
|
||||
# Strip out carriage return, newline and quote characters.
|
||||
values = re.split(",", re.sub('\r\n', '', re.sub('"', '',
|
||||
string_input)))
|
||||
row = dict(
|
||||
zip(('name', 'surname', 'age'),
|
||||
values))
|
||||
return row
|
||||
|
||||
|
||||
class InjectTimestamp(beam.DoFn):
|
||||
"""A class which add a timestamp for each row.
|
||||
Args:
|
||||
element: A dictionary mapping BigQuery column names
|
||||
Example:
|
||||
{
|
||||
'name': 'mario',
|
||||
'surname': 'rossi',
|
||||
'age': 30
|
||||
}
|
||||
Returns:
|
||||
The input dictionary with a timestamp value added
|
||||
Example:
|
||||
{
|
||||
'name': 'mario',
|
||||
'surname': 'rossi',
|
||||
'age': 30
|
||||
'_TIMESTAMP': 1545730073
|
||||
}
|
||||
"""
|
||||
|
||||
def process(self, element):
|
||||
import time
|
||||
element['_TIMESTAMP'] = int(time.mktime(time.gmtime()))
|
||||
return [element]
|
||||
|
||||
|
||||
def run(argv=None):
|
||||
"""The main function which creates the pipeline and runs it."""
|
||||
|
||||
parser = argparse.ArgumentParser()
|
||||
|
||||
parser.add_argument(
|
||||
'--input',
|
||||
dest='input',
|
||||
required=False,
|
||||
help='Input file to read. This can be a local file or '
|
||||
'a file in a Google Storage Bucket.')
|
||||
|
||||
parser.add_argument(
|
||||
'--output',
|
||||
dest='output',
|
||||
required=False,
|
||||
help='Output BQ table to write results to.')
|
||||
|
||||
# Parse arguments from the command line.
|
||||
known_args, pipeline_args = parser.parse_known_args(argv)
|
||||
|
||||
# DataIngestion is a class we built in this script to hold the logic for
|
||||
# transforming the file into a BigQuery table.
|
||||
data_ingestion = DataIngestion()
|
||||
|
||||
# Initiate the pipeline using the pipeline arguments
|
||||
p = beam.Pipeline(options=PipelineOptions(pipeline_args))
|
||||
|
||||
(p
|
||||
# Read the file. This is the source of the pipeline.
|
||||
| 'Read from a File' >> beam.io.ReadFromText(known_args.input)
|
||||
# Translates CSV row to a dictionary object consumable by BigQuery.
|
||||
| 'String To BigQuery Row' >>
|
||||
beam.Map(lambda s: data_ingestion.parse_method(s))
|
||||
# Add the timestamp on each row
|
||||
| 'Inject Timestamp - ' >> beam.ParDo(InjectTimestamp())
|
||||
# Write data to Bigquery
|
||||
| 'Write to BigQuery' >> beam.io.Write(
|
||||
beam.io.BigQuerySink(
|
||||
# BigQuery table name.
|
||||
known_args.output,
|
||||
# Bigquery table schema
|
||||
schema='name:STRING,surname:STRING,age:NUMERIC,_TIMESTAMP:TIMESTAMP',
|
||||
# Creates the table in BigQuery if it does not yet exist.
|
||||
create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,
|
||||
# Deletes all data in the BigQuery table before writing.
|
||||
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))
|
||||
p.run().wait_until_finish()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
logging.getLogger().setLevel(logging.INFO)
|
||||
run()
|
Binary file not shown.
Before Width: | Height: | Size: 88 KiB |
|
@ -1,201 +0,0 @@
|
|||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf of
|
||||
the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if provided along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the NOTICE text from the Work, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright [yyyy] [name of copyright owner]
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
|
@ -1,17 +0,0 @@
|
|||
# Create random Person PII data
|
||||
|
||||
In this example you can find a Python script to generate Person PII data in a CSV file format.
|
||||
|
||||
To know how to use the script run:
|
||||
|
||||
```hcl
|
||||
python3 person_details_generator.py --help
|
||||
```
|
||||
|
||||
## Example
|
||||
To create a file 'person.csv' with 10000 of random person details data you can run:
|
||||
```hcl
|
||||
python3 person_details_generator.py \
|
||||
--count 10000 \
|
||||
--output person.csv
|
||||
```
|
|
@ -1 +0,0 @@
|
|||
click
|
|
@ -1,47 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
"""Generate random person PIIs based on arrays of names and surnames."""
|
||||
|
||||
|
||||
import click
|
||||
import logging
|
||||
import random
|
||||
|
||||
|
||||
@click.command()
|
||||
@click.option("--count", default=100, help="Number of generated names.")
|
||||
@click.option("--output", default=False, help=(
|
||||
"Name of the output file. Content will be overwritten. "
|
||||
"If not defined, standard output will be used."))
|
||||
@click.option("--first_names", default="Lorenzo,Giacomo,Chiara,Miriam", help=(
|
||||
"String of Names, comma separated. Default 'Lorenzo,Giacomo,Chiara,Miriam'"))
|
||||
@click.option("--last_names", default="Rossi, Bianchi,Brambilla,Caggioni", help=(
|
||||
"String of Names, comma separated. Default 'Rossi,Bianchi,Brambilla,Caggioni'"))
|
||||
def main(count=100, output=False, first_names=None, last_names=None):
|
||||
generated_names = "".join(
|
||||
random.choice(first_names.split(',')) + "," +
|
||||
random.choice(last_names.split(',')) + "," +
|
||||
str(random.randint(1, 100)) + "\n" for _ in range(count))[:-1]
|
||||
if output:
|
||||
f = open(output, "w")
|
||||
f.write(generated_names)
|
||||
f.close()
|
||||
else:
|
||||
print(generated_names)
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
logging.getLogger().setLevel(logging.INFO)
|
||||
main()
|
|
@ -1,46 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
|
||||
variable "prefix" {
|
||||
description = "Unique prefix used for resource names. Not used for project if 'project_create' is null."
|
||||
type = string
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "project_create" {
|
||||
description = "Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format."
|
||||
type = object({
|
||||
billing_account_id = string
|
||||
parent = string
|
||||
})
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "project_id" {
|
||||
description = "Project id, references existing project if `project_create` is null."
|
||||
type = string
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
description = "The region where resources will be deployed."
|
||||
type = string
|
||||
default = "europe-west1"
|
||||
}
|
||||
|
||||
variable "vpc_subnet_range" {
|
||||
description = "Ip range used for the VPC subnet created for the example."
|
||||
type = string
|
||||
default = "10.0.0.0/20"
|
||||
}
|
|
@ -1,29 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# https://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
terraform {
|
||||
required_version = ">= 1.0.0"
|
||||
required_providers {
|
||||
google = {
|
||||
source = "hashicorp/google"
|
||||
version = ">= 4.0.0"
|
||||
}
|
||||
google-beta = {
|
||||
source = "hashicorp/google-beta"
|
||||
version = ">= 4.0.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -1,13 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
|
@ -1,25 +0,0 @@
|
|||
/**
|
||||
* Copyright 2022 Google LLC
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
module "test" {
|
||||
source = "../../../../../examples/data-solutions/gcs-to-bq-with-dataflow/"
|
||||
prefix = var.prefix
|
||||
project_id = var.project_id
|
||||
project_create = {
|
||||
billing_account_id = var.billing_account_id
|
||||
parent = var.parent
|
||||
}
|
||||
}
|
|
@ -1,35 +0,0 @@
|
|||
/**
|
||||
* Copyright 2022 Google LLC
|
||||
*
|
||||
* Licensed under the Apache License, Version 2.0 (the "License");
|
||||
* you may not use this file except in compliance with the License.
|
||||
* You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
|
||||
variable "billing_account_id" {
|
||||
default = "012345-678901-234567"
|
||||
}
|
||||
|
||||
variable "parent" {
|
||||
default = "folders/01234567890"
|
||||
}
|
||||
|
||||
variable "prefix" {
|
||||
default = "fabric"
|
||||
}
|
||||
|
||||
variable "project_id" {
|
||||
default = "gcs-to-bq"
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
default = "europe-west1"
|
||||
}
|
|
@ -1,19 +0,0 @@
|
|||
# Copyright 2022 Google LLC
|
||||
#
|
||||
# Licensed under the Apache License, Version 2.0 (the "License");
|
||||
# you may not use this file except in compliance with the License.
|
||||
# You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
def test_resources(e2e_plan_runner):
|
||||
"Test that plan works and the numbers of resources is as expected."
|
||||
modules, resources = e2e_plan_runner()
|
||||
assert len(modules) == 12
|
||||
assert len(resources) == 57
|
Loading…
Reference in New Issue