Remove older GCS to BQ example (#523)

* remove older GCS to BQ example

* remove tests
This commit is contained in:
Ludovico Magnocavallo 2022-02-08 07:30:03 +01:00 committed by GitHub
parent be33a7f880
commit c2a2b799b9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
30 changed files with 4 additions and 1488 deletions

View File

@ -5,7 +5,7 @@ This section contains **[foundational examples](./foundations/)** that bootstrap
Currently available examples:
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management)
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms/), [Cloud Storage to Bigquery with Cloud Dataflow](./data-solutions/gcs-to-bq-with-dataflow/)
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/)
- **factories** - [The why and the how of resource factories](./factories/README.md)
- **foundations** - [single level hierarchy](./foundations/environments/) (environments), [multiple level hierarchy](./foundations/business-units/) (business units + environments)
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [PSC for on-premises Cloud Function invocation](./networking/private-cloud-function-from-onprem/), [decentralized firewall](./networking/decentralized-firewall)

View File

@ -11,9 +11,9 @@ They are meant to be used as minimal but complete starting points to create actu
<a href="./cmek-via-centralized-kms/" title="CMEK on Cloud Storage and Compute Engine via centralized Cloud KMS"><img src="./cmek-via-centralized-kms/diagram.png" align="left" width="280px"></a> This [example](./cmek-via-centralized-kms/) implements [CMEK](https://cloud.google.com/kms/docs/cmek) for GCS and GCE, via keys hosted in KMS running in a centralized project. The example shows the basic resources and permissions for the typical use case of application projects implementing encryption at rest via a centrally managed KMS service.
<br clear="left">
### Cloud Storage to Bigquery with Cloud Dataflow
<a href="./gcs-to-bq-with-dataflow/" title="Cloud Storage to Bigquery with Cloud Dataflow"><img src="./gcs-to-bq-with-dataflow/diagram.png" align="left" width="280px"></a> This [example](./gcs-to-bq-with-dataflow/) implements [Cloud Storage](https://cloud.google.com/kms/docs/cmek) to Bigquery data import using Cloud Dataflow.
All resources use CMEK hosted in Cloud KMS running in a centralized project. The example shows the basic resources and permissions for the typical use case to read, transform and import data from Cloud Storage to Bigquery.
### Cloud Storage to Bigquery with Cloud Dataflow with least privileges
<a href="./gcs-to-bq-with-least-privileges/" title="Cloud Storage to Bigquery with Cloud Dataflow with least privileges"><img src="./gcs-to-bq-with-least-privileges/diagram.png" align="left" width="280px"></a> This [example](./gcs-to-bq-with-least-privileges/) implements resources required to run GCS to BigQuery Dataflow pipelines. The solution rely on a set of Services account created with the least privileges principle.
<br clear="left">
### Data Platform Foundations
@ -21,4 +21,3 @@ All resources use CMEK hosted in Cloud KMS running in a centralized project. The
<a href="./data-platform-foundations/" title="Data Platform Foundations"><img src="./data-platform-foundations/02-resources/diagram.png" align="left" width="280px"></a>
This [example](./data-platform-foundations/) implements a robust and flexible Data Foundation on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably.
<br clear="left">

View File

@ -1,134 +0,0 @@
# Cloud Storage to Bigquery with Cloud Dataflow
This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery).
The solution will use:
- internal IPs for GCE and Dataflow instances
- CMEK encription for GCS bucket, GCE instances, DataFlow instances and BigQuery tables
- Cloud NAT to let resources comunicate to the Internet, run system updates, and install packages
The example is designed to match real-world use cases with a minimum amount of resources. It can be used as a starting point for more complex scenarios.
This is the high level diagram:
![GCS to Biquery High-level diagram](diagram.png "GCS to Biquery High-level diagram")
## Managed resources and services
This sample creates several distinct groups of resources:
- projects
- Cloud KMS project
- Service Project configured for GCE instances, GCS buckets, Dataflow instances and BigQuery tables
- networking
- VPC network
- One subnet
- Firewall rules for [SSH access via IAP](https://cloud.google.com/iap/docs/using-tcp-forwarding) and open communication within the VPC
- IAM
- One service account for GGE instances
- One service account for Dataflow instances
- One service account for Bigquery tables
- KMS
- One contintent key ring (example: 'Europe')
- One crypto key (Procection level: softwere) for Cloud Engine
- One crypto key (Protection level: softwere) for Cloud Storage
- One regional key ring ('example: 'europe-west1')
- One crypto key (Protection level: softwere) for Cloud Dataflow
- GCE
- One instance encrypted with a CMEK Cryptokey hosted in Cloud KMS
- GCS
- One bucket encrypted with a CMEK Cryptokey hosted in Cloud KMS
- BQ
- One dataset encrypted with a CMEK Cryptokey hosted in Cloud KMS
- Two tables encrypted with a CMEK Cryptokey hosted in Cloud KMS
## Test your environment with Cloud Dataflow
You can now connect to the GCE instance with the following command:
```hcl
gcloud compute ssh vm-example
```
You can run now the simple pipeline you can find [here](./scripts/data_ingestion/). Once you have installed required packages and copied a file into the GCS bucket, you can trigger the pipeline using internal ips with a command simila to:
```hcl
python data_ingestion.py \
--runner=DataflowRunner \
--max_num_workers=10 \
--autoscaling_algorithm=THROUGHPUT_BASED \
--region=### REGION ### \
--staging_location=gs://### TEMP BUCKET NAME ###/ \
--temp_location=gs://### TEMP BUCKET NAME ###/ \
--project=### PROJECT ID ### \
--input=gs://### DATA BUCKET NAME###/### FILE NAME ###.csv \
--output=### DATASET NAME ###.### TABLE NAME ### \
--service_account_email=### SERVICE ACCOUNT EMAIL ### \
--network=### NETWORK NAME ### \
--subnetwork=### SUBNET NAME ### \
--dataflow_kms_key=### CRYPTOKEY ID ### \
--no_use_public_ips
```
for example:
```hcl
python data_ingestion.py \
--runner=DataflowRunner \
--max_num_workers=10 \
--autoscaling_algorithm=THROUGHPUT_BASED \
--region=europe-west1 \
--staging_location=gs://lc-001-eu-df-tmplocation/ \
--temp_location=gs://lc-001-eu-df-tmplocation/ \
--project=lcaggio-demo \
--input=gs://lc-eu-data/person.csv \
--output=bq_dataset.df_import \
--service_account_email=df-test@lcaggio-demo.iam.gserviceaccount.com \
--network=local \
--subnetwork=regions/europe-west1/subnetworks/subnet \
--dataflow_kms_key=projects/lcaggio-demo-kms/locations/europe-west1/keyRings/my-keyring-regional/cryptoKeys/key-df \
--no_use_public_ips
```
You can check data imported into Google BigQuery from the Google Cloud Console UI.
## Test your environment with 'bq' CLI
You can now connect to the GCE instance with the following command:
```hcl
gcloud compute ssh vm-example
```
You can run now a simple 'bq load' command to import data into Bigquery. Below an example command:
```hcl
bq load \
--source_format=CSV \
bq_dataset.bq_import \
gs://my-bucket/person.csv \
schema_bq_import.json
```
You can check data imported into Google BigQuery from the Google Cloud Console UI.
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [project_id](variables.tf#L31) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [prefix](variables.tf#L16) | Unique prefix used for resource names. Not used for project if 'project_create' is null. | <code>string</code> | | <code>null</code> |
| [project_create](variables.tf#L22) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [region](variables.tf#L36) | The region where resources will be deployed. | <code>string</code> | | <code>&#34;europe-west1&#34;</code> |
| [vpc_subnet_range](variables.tf#L42) | Ip range used for the VPC subnet created for the example. | <code>string</code> | | <code>&#34;10.0.0.0&#47;20&#34;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [bq_tables](outputs.tf#L15) | Bigquery Tables. | |
| [buckets](outputs.tf#L20) | GCS Bucket Cloud KMS crypto keys. | |
| [data_ingestion_command](outputs.tf#L28) | | |
| [project_id](outputs.tf#L48) | Project id. | |
| [vm](outputs.tf#L53) | GCE VM. | |
<!-- END TFDOC -->

View File

@ -1,20 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
backend "gcs" {
bucket = ""
}
}

View File

@ -1,65 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
module "bigquery-dataset" {
source = "../../../modules/bigquery-dataset"
project_id = module.project.project_id
id = "example_dataset"
location = var.region
access = {
reader-group = { role = "READER", type = "user" }
owner = { role = "OWNER", type = "user" }
}
access_identities = {
reader-group = module.service-account-bq.email
owner = module.service-account-bq.email
}
encryption_key = module.kms.keys.key-bq.id
tables = {
bq_import = {
friendly_name = "BQ import"
labels = {}
options = null
partitioning = {
field = null
range = null # use start/end/interval for range
time = null
}
schema = file("${path.module}/schema_bq_import.json")
options = {
clustering = null
expiration_time = null
encryption_key = module.kms.keys.key-bq.id
}
deletion_protection = false
},
df_import = {
friendly_name = "Dataflow import"
labels = {}
options = null
partitioning = {
field = null
range = null # use start/end/interval for range
time = null
}
schema = file("${path.module}/schema_df_import.json")
options = {
clustering = null
expiration_time = null
encryption_key = module.kms.keys.key-bq.id
}
deletion_protection = false
}
}
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 197 KiB

View File

@ -1,54 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
locals {
vm-startup-script = join("\n", [
"#! /bin/bash",
"apt-get update && apt-get install -y bash-completion git python3-venv gcc build-essential python-dev python3-dev",
"pip3 install --upgrade setuptools pip"
])
}
module "vm" {
source = "../../../modules/compute-vm"
project_id = module.project.project_id
zone = "${var.region}-b"
name = "${var.prefix}-vm-0"
network_interfaces = [{
network = module.vpc.self_link,
subnetwork = local.subnet_self_link,
nat = false,
addresses = null
}]
attached_disks = [{
name = "data", size = 10, source = null, source_type = null, options = null
}]
boot_disk = {
image = "projects/debian-cloud/global/images/family/debian-10"
type = "pd-ssd"
size = 10
encrypt_disk = true
}
encryption = {
encrypt_boot = true
disk_encryption_key_raw = null
kms_key_self_link = module.kms.key_ids.key-gce
}
metadata = {
startup-script = local.vm-startup-script
}
service_account = module.service-account-gce.email
service_account_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
tags = ["ssh"]
}

View File

@ -1,49 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
module "gcs-data" {
source = "../../../modules/gcs"
project_id = module.project.project_id
prefix = var.prefix
name = "data"
location = var.region
storage_class = "REGIONAL"
iam = {
"roles/storage.admin" = [
"serviceAccount:${module.service-account-gce.email}",
],
"roles/storage.objectViewer" = [
"serviceAccount:${module.service-account-df.email}",
]
}
encryption_key = module.kms.keys.key-gcs.id
force_destroy = true
}
module "gcs-df-tmp" {
source = "../../../modules/gcs"
project_id = module.project.project_id
prefix = var.prefix
name = "df-tmp"
location = var.region
storage_class = "REGIONAL"
iam = {
"roles/storage.admin" = [
"serviceAccount:${module.service-account-gce.email}",
"serviceAccount:${module.service-account-df.email}",
]
}
encryption_key = module.kms.keys.key-gcs.id
force_destroy = true
}

View File

@ -1,60 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
module "service-account-bq" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "bq-test"
prefix = var.prefix
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.admin",
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
]
}
}
module "service-account-df" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "df-test"
prefix = var.prefix
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.dataOwner",
"roles/bigquery.jobUser",
"roles/bigquery.metadataViewer",
"roles/dataflow.worker",
"roles/storage.objectViewer",
]
}
}
module "service-account-gce" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "gce-test"
prefix = var.prefix
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.dataOwner",
"roles/bigquery.jobUser",
"roles/dataflow.admin",
"roles/iam.serviceAccountUser",
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
]
}
}

View File

@ -1,63 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
module "kms" {
source = "../../../modules/kms"
project_id = module.project.project_id
keyring = {
name = "${var.prefix}-keyring",
location = var.region
}
keys = {
key-df = null
key-gce = null
key-gcs = null
key-bq = null
}
key_iam = {
key-gce = {
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
"serviceAccount:${module.project.service_accounts.robots.compute}"
]
},
key-gcs = {
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
"serviceAccount:${module.project.service_accounts.robots.storage}"
]
},
key-bq = {
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
"serviceAccount:${module.project.service_accounts.robots.bq}"
]
},
key-df = {
"roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
"serviceAccount:${module.project.service_accounts.robots.dataflow}",
"serviceAccount:${module.project.service_accounts.robots.compute}",
]
}
}
}
# module "kms-regional" {
# source = "../../../modules/kms"
# project_id = module.project-kms.project_id
# keyring = {
# name = "my-keyring-regional",
# location = var.region
# }
# keys = { key-df = null }
# key_iam = {
# }
# }

View File

@ -1,69 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
locals {
subnet_name = module.vpc.subnets["${var.region}/${var.prefix}-subnet-0"].name
subnet_self_link = module.vpc.subnets["${var.region}/${var.prefix}-subnet-0"].self_link
}
module "project" {
source = "../../../modules/project"
name = var.project_id
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account_id, null)
project_create = var.project_create != null
prefix = var.project_create == null ? null : var.prefix
services = [
"bigquery.googleapis.com",
"bigqueryreservation.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudkms.googleapis.com",
"compute.googleapis.com",
"dataflow.googleapis.com",
"servicenetworking.googleapis.com",
"storage.googleapis.com",
]
service_config = {
disable_on_destroy = false, disable_dependent_services = false
}
}
module "vpc" {
source = "../../../modules/net-vpc"
project_id = module.project.project_id
name = "${var.prefix}-vpc"
subnets = [
{
ip_cidr_range = var.vpc_subnet_range
name = "${var.prefix}-subnet-0"
region = var.region
secondary_ip_range = {}
}
]
}
module "vpc-firewall" {
source = "../../../modules/net-vpc-firewall"
project_id = module.project.project_id
network = module.vpc.name
admin_ranges = [var.vpc_subnet_range]
}
module "nat" {
source = "../../../modules/net-cloudnat"
project_id = module.project.project_id
region = var.region
name = "${var.prefix}-default"
router_network = module.vpc.name
}

View File

@ -1,59 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
output "bq_tables" {
description = "Bigquery Tables."
value = module.bigquery-dataset.table_ids
}
output "buckets" {
description = "GCS Bucket Cloud KMS crypto keys."
value = {
data = module.gcs-data.name
df-tmp = module.gcs-df-tmp.name
}
}
output "data_ingestion_command" {
value = <<-EOF
python data_ingestion.py \
--runner=DataflowRunner \
--max_num_workers=10 \
--autoscaling_algorithm=THROUGHPUT_BASED \
--region=${var.region} \
--staging_location=${module.gcs-df-tmp.url} \
--temp_location=${module.gcs-df-tmp.url}/ \
--project=${var.project_id} \
--input=${module.gcs-data.url}/### FILE NAME ###.csv \
--output=${module.bigquery-dataset.dataset_id}.${module.bigquery-dataset.table_ids.df_import} \
--service_account_email=${module.service-account-df.email} \
--network=${module.vpc.name} \
--subnetwork=${local.subnet_name} \
--dataflow_kms_key=${module.kms.key_ids.key-df} \
--no_use_public_ips
EOF
}
output "project_id" {
description = "Project id."
value = module.project.project_id
}
output "vm" {
description = "GCE VM."
value = {
name = module.vm.instance.name
address = module.vm.internal_ip
}
}

View File

@ -1,14 +0,0 @@
[
{
"name": "name",
"type": "STRING"
},
{
"name": "surname",
"type": "STRING"
},
{
"name": "age",
"type": "NUMERIC"
}
]

View File

@ -1,22 +0,0 @@
[
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "surname",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "age",
"type": "NUMERIC"
},
{
"mode": "NULLABLE",
"name": "_TIMESTAMP",
"type": "TIMESTAMP"
}
]

View File

@ -1,4 +0,0 @@
# Sripts
In this section you can find two simple scripts to test your environment:
- [Data ingestion](./data_ingestion/): a simple Apache Beam Python pipeline to import data from Google Cloud Storage into Bigquery.
- [Person details generator](./person_details_generator/): a simple script to generate some random data to test your environment.

View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,99 +0,0 @@
# Ingest CSV files from GCS into Bigquery
In this example we create a Python [Apache Beam](https://beam.apache.org/) pipeline running on [Google Cloud Dataflow](https://cloud.google.com/dataflow/) to import CSV files into BigQuery adding a timestamp to each row. Below the architecture used:
![Apache Beam pipeline to import CSV from GCS into BQ](diagram.png)
The architecture uses:
* [Google Cloud Storage]() to store CSV source files
* [Google Cloud Dataflow](https://cloud.google.com/dataflow/) to read files from Google Cloud Storage, Transform data base on the structure of the file and import the data into Google BigQuery
* [Google BigQuery](https://cloud.google.com/bigquery/) to store data in a Data Lake.
You can use this script as a starting point to import your files into Google BigQuery. You'll probably need to adapt the script logic to your requirements.
## 1. Prerequisites
- Up and running GCP project with enabled billing account
- gcloud installed and initiated to your project
- Google Cloud Dataflow API enabled
- Google Cloud Storage Bucket containing the file to import (CSV format) containings name, surnames and age. Example: `Mario,Rossi,30`.
- Google Cloud Storage Bucket for temp and staging Google Dataflow files
- Google BigQuery dataset
- [Python](https://www.python.org/) >= 3.7 and python-dev module
- gcc
- Google Cloud [Application Default Credentials](https://cloud.google.com/sdk/gcloud/reference/auth/application-default/login)
## 2. Create virtual environment
Create a new virtual environment (recommended) and install requirements:
```
virtualenv env
source ./env/bin/activate
pip3 install --upgrade setuptools pip
pip3 install -r requirements.txt
```
## 4. Upload files into Google Cloud Storage
Upload files to be imported into Google Bigquery in a Google Cloud Storage Bucket. You can use `gsutil` using a command like:
```
gsutil cp [LOCAL_OBJECT_LOCATION] gs://[DESTINATION_BUCKET_NAME]/
```
Files need to be in CSV format,For example:
```
Enrico,Bianchi,20
Mario,Rossi,30
```
You can use the [person_details_generator](../person_details_generator/) script if you want to create random person details.
## 5. Run pipeline
You can check parameters accepted by the `data_ingestion.py` script with the following command:
```
python pipelines/data_ingestion --help
```
You can run the pipeline locally with the following command:
```
python data_ingestion.py \
--runner=DirectRunner \
--project=###PUT HERE PROJECT ID### \
--input=###PUT HERE THE FILE TO IMPORT. EXAMPLE: gs://bucket_name/person.csv ### \
--output=###PUT HERE BQ DATASET.TABLE###
```
or you can run the pipeline on Google Dataflow using the following command:
```
python data_ingestion.py \
--runner=DataflowRunner \
--max_num_workers=100 \
--autoscaling_algorithm=THROUGHPUT_BASED \
--region=###PUT HERE REGION### \
--staging_location=###PUT HERE GCS STAGING LOCATION### \
--temp_location=###PUT HERE GCS TMP LOCATION###\
--project=###PUT HERE PROJECT ID### \
--input=###PUT HERE GCS BUCKET NAME. EXAMPLE: gs://bucket_name/person.csv### \
--output=###PUT HERE BQ DATASET NAME. EXAMPLE: bq_dataset.df_import### \
```
Below an example to run the pipeline specifying Network and Subnetwork, using private IPs and using a KMS key to encrypt data at rest:
```
python data_ingestion.py \
--runner=DataflowRunner \
--max_num_workers=100 \
--autoscaling_algorithm=THROUGHPUT_BASED \
--region=###PUT HERE REGION### \
--staging_location=###PUT HERE GCS STAGING LOCATION### \
--temp_location=###PUT HERE GCS TMP LOCATION###\
--project=###PUT HERE PROJECT ID### \
--network=###PUT HERE YOUR NETWORK### \
--subnetwork=###PUT HERE YOUR SUBNETWORK. EXAMPLE: regions/europe-west1/subnetworks/subnet### \
--dataflowKmsKey=###PUT HERE KMES KEY. Example: projects/lcaggio-d-4-kms/locations/europe-west1/keyRings/my-keyring-regional/cryptoKeys/key-df### \
--input=###PUT HERE GCS BUCKET NAME. EXAMPLE: gs://bucket_name/person.csv### \
--output=###PUT HERE BQ DATASET NAME. EXAMPLE: bq_dataset.df_import### \
--no_use_public_ips
```
## 6. Check results
You can check data imported into Google BigQuery from the Google Cloud Console UI.

View File

@ -1,3 +0,0 @@
apache-beam[gcp]
setuptools
wheel

View File

@ -1,134 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Dataflow pipeline. Reads a CSV file and writes to a BQ table adding a timestamp.
"""
import argparse
import logging
import re
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
class DataIngestion:
"""A helper class which contains the logic to translate the file into
a format BigQuery will accept."""
def parse_method(self, string_input):
"""Translate CSV row to dictionary.
Args:
string_input: A comma separated list of values in the form of
name,surname
Example string_input: lorenzo,caggioni
Returns:
A dict mapping BigQuery column names as keys
example output:
{
'name': 'mario',
'surname': 'rossi',
'age': 30
}
"""
# Strip out carriage return, newline and quote characters.
values = re.split(",", re.sub('\r\n', '', re.sub('"', '',
string_input)))
row = dict(
zip(('name', 'surname', 'age'),
values))
return row
class InjectTimestamp(beam.DoFn):
"""A class which add a timestamp for each row.
Args:
element: A dictionary mapping BigQuery column names
Example:
{
'name': 'mario',
'surname': 'rossi',
'age': 30
}
Returns:
The input dictionary with a timestamp value added
Example:
{
'name': 'mario',
'surname': 'rossi',
'age': 30
'_TIMESTAMP': 1545730073
}
"""
def process(self, element):
import time
element['_TIMESTAMP'] = int(time.mktime(time.gmtime()))
return [element]
def run(argv=None):
"""The main function which creates the pipeline and runs it."""
parser = argparse.ArgumentParser()
parser.add_argument(
'--input',
dest='input',
required=False,
help='Input file to read. This can be a local file or '
'a file in a Google Storage Bucket.')
parser.add_argument(
'--output',
dest='output',
required=False,
help='Output BQ table to write results to.')
# Parse arguments from the command line.
known_args, pipeline_args = parser.parse_known_args(argv)
# DataIngestion is a class we built in this script to hold the logic for
# transforming the file into a BigQuery table.
data_ingestion = DataIngestion()
# Initiate the pipeline using the pipeline arguments
p = beam.Pipeline(options=PipelineOptions(pipeline_args))
(p
# Read the file. This is the source of the pipeline.
| 'Read from a File' >> beam.io.ReadFromText(known_args.input)
# Translates CSV row to a dictionary object consumable by BigQuery.
| 'String To BigQuery Row' >>
beam.Map(lambda s: data_ingestion.parse_method(s))
# Add the timestamp on each row
| 'Inject Timestamp - ' >> beam.ParDo(InjectTimestamp())
# Write data to Bigquery
| 'Write to BigQuery' >> beam.io.Write(
beam.io.BigQuerySink(
# BigQuery table name.
known_args.output,
# Bigquery table schema
schema='name:STRING,surname:STRING,age:NUMERIC,_TIMESTAMP:TIMESTAMP',
# Creates the table in BigQuery if it does not yet exist.
create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER,
# Deletes all data in the BigQuery table before writing.
write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)))
p.run().wait_until_finish()
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
run()

Binary file not shown.

Before

Width:  |  Height:  |  Size: 88 KiB

View File

@ -1,201 +0,0 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf of
the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if provided along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the NOTICE text from the Work, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright [yyyy] [name of copyright owner]
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

View File

@ -1,17 +0,0 @@
# Create random Person PII data
In this example you can find a Python script to generate Person PII data in a CSV file format.
To know how to use the script run:
```hcl
python3 person_details_generator.py --help
```
## Example
To create a file 'person.csv' with 10000 of random person details data you can run:
```hcl
python3 person_details_generator.py \
--count 10000 \
--output person.csv
```

View File

@ -1,47 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Generate random person PIIs based on arrays of names and surnames."""
import click
import logging
import random
@click.command()
@click.option("--count", default=100, help="Number of generated names.")
@click.option("--output", default=False, help=(
"Name of the output file. Content will be overwritten. "
"If not defined, standard output will be used."))
@click.option("--first_names", default="Lorenzo,Giacomo,Chiara,Miriam", help=(
"String of Names, comma separated. Default 'Lorenzo,Giacomo,Chiara,Miriam'"))
@click.option("--last_names", default="Rossi, Bianchi,Brambilla,Caggioni", help=(
"String of Names, comma separated. Default 'Rossi,Bianchi,Brambilla,Caggioni'"))
def main(count=100, output=False, first_names=None, last_names=None):
generated_names = "".join(
random.choice(first_names.split(',')) + "," +
random.choice(last_names.split(',')) + "," +
str(random.randint(1, 100)) + "\n" for _ in range(count))[:-1]
if output:
f = open(output, "w")
f.write(generated_names)
f.close()
else:
print(generated_names)
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
main()

View File

@ -1,46 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
variable "prefix" {
description = "Unique prefix used for resource names. Not used for project if 'project_create' is null."
type = string
default = null
}
variable "project_create" {
description = "Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format."
type = object({
billing_account_id = string
parent = string
})
default = null
}
variable "project_id" {
description = "Project id, references existing project if `project_create` is null."
type = string
}
variable "region" {
description = "The region where resources will be deployed."
type = string
default = "europe-west1"
}
variable "vpc_subnet_range" {
description = "Ip range used for the VPC subnet created for the example."
type = string
default = "10.0.0.0/20"
}

View File

@ -1,29 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.0.0"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.0.0"
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 4.0.0"
}
}
}

View File

@ -1,13 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@ -1,25 +0,0 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
module "test" {
source = "../../../../../examples/data-solutions/gcs-to-bq-with-dataflow/"
prefix = var.prefix
project_id = var.project_id
project_create = {
billing_account_id = var.billing_account_id
parent = var.parent
}
}

View File

@ -1,35 +0,0 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "billing_account_id" {
default = "012345-678901-234567"
}
variable "parent" {
default = "folders/01234567890"
}
variable "prefix" {
default = "fabric"
}
variable "project_id" {
default = "gcs-to-bq"
}
variable "region" {
default = "europe-west1"
}

View File

@ -1,19 +0,0 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner()
assert len(modules) == 12
assert len(resources) == 57