Data Foundations: reorder first level README and folder structure (#251)

* reorder first level README and folder structure * Link fix Co-authored-by: Yoram Ben-Yaacov <benyaacov@google.com>
2021-05-30 20:04:56 +02:00 · 2021-05-30 20:04:56 +02:00 · 34dd1f565c
parent 99bfd4da98
commit 34dd1f565c
50 changed files with 36 additions and 51 deletions
--- a/data-solutions/data-platform-foundations/README.md
+++ b/data-solutions/data-platform-foundations/README.md
@ -1,38 +1,50 @@
-# Data Platform Foundations
+# Data Foundation Platform
-## General
+The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.
-The goal of this project is to Build a **robust and flexible** Data Foundation on GCP that provides **opinionated defaults**, while allowing customers to **build and scale** out additional data pipelines **quickly and reliably**.
+The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.
-There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.
+- **[Environment Setup](./environment/)**
  *(once per environment)*
  - projects
  - VPC configuration
  - Composer environment and identity
  - shared buckets and datasets
 - **[Data Source Setup](./datasource)**
  *(once per data source)*
  - landing and archive bucket
  - internal and external identities
  - domain specific datasets
 - **[Pipeline Setup](./pipeline)**
  *(once per pipeline)*
  - pipeline-specific tables and views
  - pipeline code
  - Composer DAG
-![Three Main Workflows](./img/three_main_workflows.png)
+The resulting GCP architecture is outlined in this diagram
 ![Target architecture](./datasource/diagram.png)
-## Target architecture
+A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.
-![Target architecture](./img/Data_Foundation-phase2.png)
+## Prerequisites
-In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.
+In order to bring up this example, you will need
-## Before you begine
+- a folder or organization where new projects will be created
 - a billing account that will be associated to new projects
 - an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account
-Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.
+## Bringing up the platform
-This example assumes that the following resources were already created and provided:
+The end-to-end example is composed of 2 foundational, and 1-n optional steps:
- Root node (organization or folder)
+- [environment setup](./environment/)
- Service account with owner permissions on the root node, used to apply Terraform code
+- [data source setup](./datasource/)
 - (Optional) [pipeline setup](./pipeline/)
-## Building the Platform
+The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.
-Building the platform is composed of 2 foundational, and 2 optional steps:
+## TODO
 - [Foundations 1 - project creation](./infra/tf-phase1/README.md)
 - [Foundations 2 - assets deployment](./infra/tf-phase2/README.md)
 - [Optional - manual pipeline example](./data-pipeline/README.md)
 - [Optional - managing multiple environments](./manageing_multiple_environments.md)
 ## TODO list
 | Description | Priority (1:High - 5:Low ) | Status | Remarks |
 |-------------|----------|:------:|---------|
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/README.md
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/README.md
@ -4,7 +4,7 @@
 Now that we have all the needed project, we will create all the needed assets to store and process the data.
-![Data Foundation -  Phase 2](../../img/Data_Foundation-phase2.png)
+![Data Foundation -  Phase 2](./diagram.png)
 This example will create the next resources per project:
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_variables.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation-phase2.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation-phase2.png
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/main.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/main.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/outputs.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/outputs.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/versions.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/versions.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/README.md
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/README.md
@ -21,7 +21,7 @@ This example will create the next projects:
 A master service account named projects-editor-sa will be created under common services project and will be granted editor permissions on all the projects in scope.
-![Data Foundation -  Phase 1](../../img/Data_Foundation-phase1.png)
+![Data Foundation -  Phase 1](./diagram.png)
 ## Running the example
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_variables.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation-phase1.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation-phase1.png
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/main.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/main.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/outputs.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/outputs.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/versions.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/versions.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation_Multiple_project_Single_VPC.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation_Multiple_project_Single_VPC.png
--- a/data-solutions/data-platform-foundations/img/three_main_workflows.png
+++ b/data-solutions/data-platform-foundations/img/three_main_workflows.png
--- a/data-solutions/data-platform-foundations/manageing_multiple_environments.md
+++ b/data-solutions/data-platform-foundations/manageing_multiple_environments.md
@ -1,27 +0,0 @@
 # Manageing Multiple Environments
 Terraform is a great tool for provisioning immutable infrastructure.
 There are several ways to get Terraform to provision different environments using one repo. Here I’m going to use the most basic and naive method - the State separation.
 State separation signals more mature usage of Terraform but with additional maturity comes additional complexity.
 There are two primary methods to separate state between environments: directories and workspaces. I’m going to use the directory method.
 For this example I’ll assume we have 3 environments:
 - Dev
 - QA
 - Prod
 ```bash
 export data_platform_folder="dpm"
 mkdir ${data_platform_folder}
 cd ${data_platform_folder}
 git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git dev
 git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git prod
 git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git qa
 ```
 Now you have a directory per environment in which you can do all the needed configurations (tfvars files) and provision it.
--- a/data-solutions/data-platform-foundations/data-pipeline/README.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/README.md
--- a/data-solutions/data-platform-foundations/data-pipeline/gcs_to_bigquery.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/gcs_to_bigquery.md
--- a/data-solutions/data-platform-foundations/data-pipeline/pubsub_to_bigquery.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/pubsub_to_bigquery.md
--- a/data-solutions/data-platform-foundations/data-pipeline/resource/raw_data.json
+++ b/data-solutions/data-platform-foundations/data-pipeline/resource/raw_data.json