Data Foundations: reorder first level README and folder structure (#251)

* reorder first level README and folder structure * Link fix Co-authored-by: Yoram Ben-Yaacov <benyaacov@google.com>
2021-05-30 20:04:56 +02:00 · 2021-05-30 20:04:56 +02:00 · 34dd1f565c
parent 99bfd4da98
commit 34dd1f565c
50 changed files with 36 additions and 51 deletions
--- a/data-solutions/data-platform-foundations/README.md
+++ b/data-solutions/data-platform-foundations/README.md
@ -1,38 +1,50 @@
-# Data Platform Foundations
+# Data Foundation Platform

-## General
+The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.

-The goal of this project is to Build a **robust and flexible** Data Foundation on GCP that provides **opinionated defaults**, while allowing customers to **build and scale** out additional data pipelines **quickly and reliably**.
+The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.

-There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.
+- **[Environment Setup](./environment/)**
+  *(once per environment)*
+  - projects
+  - VPC configuration
+  - Composer environment and identity
+  - shared buckets and datasets
+- **[Data Source Setup](./datasource)**
+  *(once per data source)*
+  - landing and archive bucket
+  - internal and external identities
+  - domain specific datasets
+- **[Pipeline Setup](./pipeline)**
+  *(once per pipeline)*
+  - pipeline-specific tables and views
+  - pipeline code
+  - Composer DAG

-![Three Main Workflows](./img/three_main_workflows.png)
+The resulting GCP architecture is outlined in this diagram
+![Target architecture](./datasource/diagram.png)

-## Target architecture
+A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.

-![Target architecture](./img/Data_Foundation-phase2.png)
+## Prerequisites

-In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.
+In order to bring up this example, you will need

-## Before you begine
+- a folder or organization where new projects will be created
+- a billing account that will be associated to new projects
+- an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account

-Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.
+## Bringing up the platform

-This example assumes that the following resources were already created and provided:
+The end-to-end example is composed of 2 foundational, and 1-n optional steps:

- Root node (organization or folder)
- Service account with owner permissions on the root node, used to apply Terraform code
+- [environment setup](./environment/)
+- [data source setup](./datasource/)
+- (Optional) [pipeline setup](./pipeline/)

-## Building the Platform
+The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.

-Building the platform is composed of 2 foundational, and 2 optional steps:
-
- [Foundations 1 - project creation](./infra/tf-phase1/README.md)
- [Foundations 2 - assets deployment](./infra/tf-phase2/README.md)
- [Optional - manual pipeline example](./data-pipeline/README.md)
- [Optional - managing multiple environments](./manageing_multiple_environments.md)
-
-## TODO list
+## TODO

 | Description | Priority (1:High - 5:Low ) | Status | Remarks |
 |-------------|----------|:------:|---------|
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/README.md
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/README.md
@ -4,7 +4,7 @@

 Now that we have all the needed project, we will create all the needed assets to store and process the data.

-![Data Foundation -  Phase 2](../../img/Data_Foundation-phase2.png)
+![Data Foundation -  Phase 2](./diagram.png)

 This example will create the next resources per project:

--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/datamart_project_variables.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation-phase2.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation-phase2.png
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/dwh_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/landing_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/main.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/main.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/outputs.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/outputs.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/services_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/transformation_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/versions.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/versions.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/README.md
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/README.md
@ -21,7 +21,7 @@ This example will create the next projects:

 A master service account named projects-editor-sa will be created under common services project and will be granted editor permissions on all the projects in scope.

-![Data Foundation -  Phase 1](../../img/Data_Foundation-phase1.png)
+![Data Foundation -  Phase 1](./diagram.png)

 ## Running the example

--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/datamart_project_variables.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation-phase1.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation-phase1.png
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/dwh_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/landing_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/main.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/main.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase2/outputs.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase2/outputs.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/services_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_output.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_output.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/transformation_project_variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/variables.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/variables.tf
--- a/data-solutions/data-platform-foundations/infra/tf-phase1/versions.tf
+++ b/data-solutions/data-platform-foundations/infra/tf-phase1/versions.tf
--- a/data-solutions/data-platform-foundations/img/Data_Foundation_Multiple_project_Single_VPC.png
+++ b/data-solutions/data-platform-foundations/img/Data_Foundation_Multiple_project_Single_VPC.png
--- a/data-solutions/data-platform-foundations/img/three_main_workflows.png
+++ b/data-solutions/data-platform-foundations/img/three_main_workflows.png
--- a/data-solutions/data-platform-foundations/manageing_multiple_environments.md
+++ b/data-solutions/data-platform-foundations/manageing_multiple_environments.md
@ -1,27 +0,0 @@
-# Manageing Multiple Environments
-
-Terraform is a great tool for provisioning immutable infrastructure.
-There are several ways to get Terraform to provision different environments using one repo. Here I’m going to use the most basic and naive method - the State separation.
-State separation signals more mature usage of Terraform but with additional maturity comes additional complexity.
-There are two primary methods to separate state between environments: directories and workspaces. I’m going to use the directory method.
-
-For this example I’ll assume we have 3 environments:
-
- Dev
- QA
- Prod
-
-```bash
-export data_platform_folder="dpm"
-
-mkdir ${data_platform_folder}
-cd ${data_platform_folder}
-
-git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git dev
-
-git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git prod
-
-git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git qa
-```
-
-Now you have a directory per environment in which you can do all the needed configurations (tfvars files) and provision it.
--- a/data-solutions/data-platform-foundations/data-pipeline/README.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/README.md
--- a/data-solutions/data-platform-foundations/data-pipeline/gcs_to_bigquery.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/gcs_to_bigquery.md
--- a/data-solutions/data-platform-foundations/data-pipeline/pubsub_to_bigquery.md
+++ b/data-solutions/data-platform-foundations/data-pipeline/pubsub_to_bigquery.md
--- a/data-solutions/data-platform-foundations/data-pipeline/resource/raw_data.json
+++ b/data-solutions/data-platform-foundations/data-pipeline/resource/raw_data.json