Data Foundations: reorder first level README and folder structure (#251)

* reorder first level README and folder structure

* Link fix

Co-authored-by: Yoram Ben-Yaacov <benyaacov@google.com>
This commit is contained in:
Ludovico Magnocavallo 2021-05-30 20:04:56 +02:00 committed by GitHub
parent 99bfd4da98
commit 34dd1f565c
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
50 changed files with 36 additions and 51 deletions

View File

@ -1,38 +1,50 @@
# Data Platform Foundations
# Data Foundation Platform
## General
The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.
The goal of this project is to Build a **robust and flexible** Data Foundation on GCP that provides **opinionated defaults**, while allowing customers to **build and scale** out additional data pipelines **quickly and reliably**.
The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.
There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.
- **[Environment Setup](./environment/)**
*(once per environment)*
- projects
- VPC configuration
- Composer environment and identity
- shared buckets and datasets
- **[Data Source Setup](./datasource)**
*(once per data source)*
- landing and archive bucket
- internal and external identities
- domain specific datasets
- **[Pipeline Setup](./pipeline)**
*(once per pipeline)*
- pipeline-specific tables and views
- pipeline code
- Composer DAG
![Three Main Workflows](./img/three_main_workflows.png)
The resulting GCP architecture is outlined in this diagram
![Target architecture](./datasource/diagram.png)
## Target architecture
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.
![Target architecture](./img/Data_Foundation-phase2.png)
## Prerequisites
In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.
In order to bring up this example, you will need
## Before you begine
- a folder or organization where new projects will be created
- a billing account that will be associated to new projects
- an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account
Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.
## Bringing up the platform
This example assumes that the following resources were already created and provided:
The end-to-end example is composed of 2 foundational, and 1-n optional steps:
- Root node (organization or folder)
- Service account with owner permissions on the root node, used to apply Terraform code
- [environment setup](./environment/)
- [data source setup](./datasource/)
- (Optional) [pipeline setup](./pipeline/)
## Building the Platform
The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.
Building the platform is composed of 2 foundational, and 2 optional steps:
- [Foundations 1 - project creation](./infra/tf-phase1/README.md)
- [Foundations 2 - assets deployment](./infra/tf-phase2/README.md)
- [Optional - manual pipeline example](./data-pipeline/README.md)
- [Optional - managing multiple environments](./manageing_multiple_environments.md)
## TODO list
## TODO
| Description | Priority (1:High - 5:Low ) | Status | Remarks |
|-------------|----------|:------:|---------|

View File

@ -4,7 +4,7 @@
Now that we have all the needed project, we will create all the needed assets to store and process the data.
![Data Foundation - Phase 2](../../img/Data_Foundation-phase2.png)
![Data Foundation - Phase 2](./diagram.png)
This example will create the next resources per project:

View File

Before

Width:  |  Height:  |  Size: 37 KiB

After

Width:  |  Height:  |  Size: 37 KiB

View File

@ -21,7 +21,7 @@ This example will create the next projects:
A master service account named projects-editor-sa will be created under common services project and will be granted editor permissions on all the projects in scope.
![Data Foundation - Phase 1](../../img/Data_Foundation-phase1.png)
![Data Foundation - Phase 1](./diagram.png)
## Running the example

View File

Before

Width:  |  Height:  |  Size: 23 KiB

After

Width:  |  Height:  |  Size: 23 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 36 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 48 KiB

View File

@ -1,27 +0,0 @@
# Manageing Multiple Environments
Terraform is a great tool for provisioning immutable infrastructure.
There are several ways to get Terraform to provision different environments using one repo. Here Im going to use the most basic and naive method - the State separation.
State separation signals more mature usage of Terraform but with additional maturity comes additional complexity.
There are two primary methods to separate state between environments: directories and workspaces. Im going to use the directory method.
For this example Ill assume we have 3 environments:
- Dev
- QA
- Prod
```bash
export data_platform_folder="dpm"
mkdir ${data_platform_folder}
cd ${data_platform_folder}
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git dev
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git prod
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git qa
```
Now you have a directory per environment in which you can do all the needed configurations (tfvars files) and provision it.