Update README.md

This commit is contained in:
Ludovico Magnocavallo 2022-01-03 09:05:46 +01:00 committed by GitHub
parent 47acc03188
commit 910f8be666
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 4 additions and 4 deletions

View File

@ -1,12 +1,12 @@
# Cloud Storage to Bigquery with Cloud Dataflow with least privileges
This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery). The example will create different Service Account with least privileges on resources. To run the pipeline, users listed in `data_eng_users` or `data_eng_groups` can impersonate all those Service Accounts.
This example creates the infrastructure needed to run a [Cloud Dataflow](https://cloud.google.com/dataflow) pipeline to import data from [GCS](https://cloud.google.com/storage) to [Bigquery](https://cloud.google.com/bigquery). The example will create different service accounts with least privileges on resources. To run the pipeline, users listed in `data_eng_users` or `data_eng_groups` can impersonate all those service accounts.
The solution will use:
- internal IPs for GCE and Dataflow instances
- Cloud NAT to let resources comunicate to the Internet, run system updates, and install packages
- relay on Google Service Account impersonification to better split roles
- Service Account with least privilege on each resources
- Cloud NAT to let resources egress to the Internet, to run system updates and install packages
- rely on impersonation to avoid the use of service account keys
- service accounts with least privilege on each resources
The example is designed to match real-world use cases with a minimum amount of resources. It can be used as a starting point for more complex scenarios.