Merge pull request #909 from GoogleCloudPlatform/lcaggio/fix-pipeline
GCS2BQ: Move images and templates in sub-folders
|
@ -5,7 +5,7 @@ This section **[networking blueprints](./networking/)** that implement core patt
|
|||
Currently available blueprints:
|
||||
|
||||
- **cloud operations** - [Resource tracking and remediation via Cloud Asset feeds](./cloud-operations/asset-inventory-feed-remediation), [Granular Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Granular Cloud DNS IAM for Shared VPC](./cloud-operations/dns-shared-vpc), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Packer image builder](./cloud-operations/packer-image-builder), [On-prem SA key management](./cloud-operations/onprem-sa-key-management), [TCP healthcheck for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [HTTP Load Balancer with Cloud Armor](./cloud-operations/glb_and_armor)
|
||||
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/gcs-to-bq-with-least-privileges/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/), [Data Platform Foundations](./data-solutions/data-platform-foundations/), [SQL Server AlwaysOn availability groups blueprint](./data-solutions/sqlserver-alwayson), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion/), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2/)
|
||||
- **data solutions** - [GCE/GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms/), [Cloud Storage to Bigquery with Cloud Dataflow with least privileges](./data-solutions/gcs-to-bq-with-least-privileges/), [Data Platform Foundations](./data-solutions/data-platform-foundations/), [SQL Server AlwaysOn availability groups blueprint](./data-solutions/sqlserver-alwayson), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion/), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2/)
|
||||
- **factories** - [The why and the how of resource factories](./factories/README.md)
|
||||
- **GKE** - [GKE multitenant fleet](./gke/multitenant-fleet/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [Binary Authorization Pipeline](./gke/binauthz/), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api/)
|
||||
- **networking** - [hub and spoke via peering](./networking/hub-and-spoke-peering/), [hub and spoke via VPN](./networking/hub-and-spoke-vpn/), [DNS and Google Private Access for on-premises](./networking/onprem-google-access-dns/), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [ILB as next hop](./networking/ilb-next-hop), [Connecting to on-premise services leveraging PSC and hybrid NEGs](./networking/psc-hybrid/), [decentralized firewall](./networking/decentralized-firewall)
|
||||
|
|
|
@ -13,7 +13,7 @@ They are meant to be used as minimal but complete starting points to create actu
|
|||
|
||||
### Cloud Storage to Bigquery with Cloud Dataflow with least privileges
|
||||
|
||||
<a href="./gcs-to-bq-with-least-privileges/" title="Cloud Storage to Bigquery with Cloud Dataflow with least privileges"><img src="./gcs-to-bq-with-least-privileges/diagram.png" align="left" width="280px"></a> This [blueprint](./gcs-to-bq-with-least-privileges/) implements resources required to run GCS to BigQuery Dataflow pipelines. The solution rely on a set of Services account created with the least privileges principle.
|
||||
<a href="./gcs-to-bq-with-least-privileges/" title="Cloud Storage to Bigquery with Cloud Dataflow with least privileges"><img src="./gcs-to-bq-with-least-privileges/images/diagram.png" align="left" width="280px"></a> This [blueprint](./gcs-to-bq-with-least-privileges/) implements resources required to run GCS to BigQuery Dataflow pipelines. The solution rely on a set of Services account created with the least privileges principle.
|
||||
<br clear="left">
|
||||
|
||||
### Data Platform Foundations
|
||||
|
|
|
@ -23,7 +23,7 @@ Whether you’re transferring from another Cloud Service Provider or you’re ta
|
|||
|
||||
## Architecture
|
||||
|
||||
![GCS to BigQuery High-level diagram](diagram.png "GCS to BigQuery High-level diagram")
|
||||
![GCS to BigQuery High-level diagram](images/diagram.png "GCS to BigQuery High-level diagram")
|
||||
|
||||
The main components that we would be setting up are (to learn more about these products, click on the hyperlinks):
|
||||
|
||||
|
@ -61,11 +61,11 @@ __Note__: To grant a user a role, take a look at the [Granting and Revoking Acce
|
|||
Click on the button below, sign in if required and when the prompt appears, click on “confirm”.
|
||||
|
||||
|
||||
[![Open Cloudshell](shell_button.png)](https://goo.gle/GoDataPipe)
|
||||
[![Open Cloudshell](images/shell_button.png)](https://goo.gle/GoDataPipe)
|
||||
|
||||
This will clone the repository to your cloud shell and a screen like this one will appear:
|
||||
|
||||
![cloud_shell](cloud_shell.png)
|
||||
![cloud_shell](images/cloud_shell.png)
|
||||
|
||||
Before you deploy the architecture, make sure you run the following command to move your cloudshell session into your service project:
|
||||
|
||||
|
@ -87,7 +87,7 @@ Before we deploy the architecture, you will need the following information:
|
|||
|
||||
2. In the editor, edit the terraform.tfvars.sample file with the variables you gathered in the step above.
|
||||
|
||||
![editor](editor.png)
|
||||
![editor](images/editor.png)
|
||||
|
||||
* a. Fill in __data_eng_principals__ with the list of Users or Groups to impersonate service accounts.
|
||||
|
||||
|
@ -105,7 +105,7 @@ Before we deploy the architecture, you will need the following information:
|
|||
|
||||
The resource creation will take a few minutes, at the end this is the output you should expect for successful completion along with a list of the created resources:
|
||||
|
||||
![output](output.png)
|
||||
![output](images/output.png)
|
||||
|
||||
__Congratulations!__ You have successfully deployed the foundation for running your first ETL pipeline on Google Cloud.
|
||||
|
||||
|
@ -168,16 +168,16 @@ This command will start a dataflow job called test_batch_01 that uses a Dataflow
|
|||
|
||||
The expected output is the following:
|
||||
|
||||
![second_output](second_output.png)
|
||||
![second_output](images/second_output.png)
|
||||
|
||||
Then, if you navigate to Dataflow on the console, you will see the following:
|
||||
|
||||
![dataflow_console](dataflow_console.png)
|
||||
![dataflow_console](images/dataflow_console.png)
|
||||
|
||||
This shows the job you started from the cloudshell is currently running in Dataflow.
|
||||
If you click on the job name, you can see the job graph created and how every step of the Dataflow pipeline is moving along:
|
||||
|
||||
![dataflow_execution](dataflow_execution.png)
|
||||
![dataflow_execution](images/dataflow_execution.png)
|
||||
|
||||
Once the job completes, you can navigate to BigQuery in the console and under __SERVICE_PROJECT_ID__ → datalake → person, you can see the data that was successfully imported into BigQuery through the Dataflow job.
|
||||
|
||||
|
|
Before Width: | Height: | Size: 144 KiB After Width: | Height: | Size: 144 KiB |
Before Width: | Height: | Size: 32 KiB After Width: | Height: | Size: 32 KiB |
Before Width: | Height: | Size: 72 KiB After Width: | Height: | Size: 72 KiB |
Before Width: | Height: | Size: 39 KiB After Width: | Height: | Size: 39 KiB |
Before Width: | Height: | Size: 68 KiB After Width: | Height: | Size: 68 KiB |
Before Width: | Height: | Size: 19 KiB After Width: | Height: | Size: 19 KiB |
Before Width: | Height: | Size: 67 KiB After Width: | Height: | Size: 67 KiB |
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
|
@ -47,7 +47,7 @@ output "command_01_gcs" {
|
|||
|
||||
output "command_02_dataflow" {
|
||||
description = "Command to run Dataflow template impersonating the service account."
|
||||
value = templatefile("${path.module}/dataflow.tftpl", {
|
||||
value = templatefile("${path.module}/templates/dataflow.tftpl", {
|
||||
sa_orch_email = module.service-account-orch.email
|
||||
project_id = module.project.project_id
|
||||
region = var.region
|
||||
|
@ -68,7 +68,7 @@ output "command_02_dataflow" {
|
|||
|
||||
output "command_03_bq" {
|
||||
description = "BigQuery command to query imported data."
|
||||
value = templatefile("${path.module}/bigquery.tftpl", {
|
||||
value = templatefile("${path.module}/templates/bigquery.tftpl", {
|
||||
project_id = module.project.project_id
|
||||
bigquery_dataset = module.bigquery-dataset.dataset_id
|
||||
bigquery_table = module.bigquery-dataset.tables["person"].table_id
|
||||
|
|