Data Foundations: reorder first level README and folder structure (#251)
* reorder first level README and folder structure * Link fix Co-authored-by: Yoram Ben-Yaacov <benyaacov@google.com>
This commit is contained in:
parent
99bfd4da98
commit
34dd1f565c
|
@ -1,38 +1,50 @@
|
||||||
# Data Platform Foundations
|
# Data Foundation Platform
|
||||||
|
|
||||||
## General
|
The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.
|
||||||
|
|
||||||
The goal of this project is to Build a **robust and flexible** Data Foundation on GCP that provides **opinionated defaults**, while allowing customers to **build and scale** out additional data pipelines **quickly and reliably**.
|
The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.
|
||||||
|
|
||||||
There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.
|
- **[Environment Setup](./environment/)**
|
||||||
|
*(once per environment)*
|
||||||
|
- projects
|
||||||
|
- VPC configuration
|
||||||
|
- Composer environment and identity
|
||||||
|
- shared buckets and datasets
|
||||||
|
- **[Data Source Setup](./datasource)**
|
||||||
|
*(once per data source)*
|
||||||
|
- landing and archive bucket
|
||||||
|
- internal and external identities
|
||||||
|
- domain specific datasets
|
||||||
|
- **[Pipeline Setup](./pipeline)**
|
||||||
|
*(once per pipeline)*
|
||||||
|
- pipeline-specific tables and views
|
||||||
|
- pipeline code
|
||||||
|
- Composer DAG
|
||||||
|
|
||||||
![Three Main Workflows](./img/three_main_workflows.png)
|
The resulting GCP architecture is outlined in this diagram
|
||||||
|
![Target architecture](./datasource/diagram.png)
|
||||||
|
|
||||||
## Target architecture
|
A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.
|
||||||
|
|
||||||
![Target architecture](./img/Data_Foundation-phase2.png)
|
## Prerequisites
|
||||||
|
|
||||||
In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.
|
In order to bring up this example, you will need
|
||||||
|
|
||||||
## Before you begine
|
- a folder or organization where new projects will be created
|
||||||
|
- a billing account that will be associated to new projects
|
||||||
|
- an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account
|
||||||
|
|
||||||
Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.
|
## Bringing up the platform
|
||||||
|
|
||||||
This example assumes that the following resources were already created and provided:
|
The end-to-end example is composed of 2 foundational, and 1-n optional steps:
|
||||||
|
|
||||||
- Root node (organization or folder)
|
- [environment setup](./environment/)
|
||||||
- Service account with owner permissions on the root node, used to apply Terraform code
|
- [data source setup](./datasource/)
|
||||||
|
- (Optional) [pipeline setup](./pipeline/)
|
||||||
|
|
||||||
## Building the Platform
|
The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.
|
||||||
|
|
||||||
Building the platform is composed of 2 foundational, and 2 optional steps:
|
## TODO
|
||||||
|
|
||||||
- [Foundations 1 - project creation](./infra/tf-phase1/README.md)
|
|
||||||
- [Foundations 2 - assets deployment](./infra/tf-phase2/README.md)
|
|
||||||
- [Optional - manual pipeline example](./data-pipeline/README.md)
|
|
||||||
- [Optional - managing multiple environments](./manageing_multiple_environments.md)
|
|
||||||
|
|
||||||
## TODO list
|
|
||||||
|
|
||||||
| Description | Priority (1:High - 5:Low ) | Status | Remarks |
|
| Description | Priority (1:High - 5:Low ) | Status | Remarks |
|
||||||
|-------------|----------|:------:|---------|
|
|-------------|----------|:------:|---------|
|
||||||
|
|
|
@ -4,7 +4,7 @@
|
||||||
|
|
||||||
Now that we have all the needed project, we will create all the needed assets to store and process the data.
|
Now that we have all the needed project, we will create all the needed assets to store and process the data.
|
||||||
|
|
||||||
![Data Foundation - Phase 2](../../img/Data_Foundation-phase2.png)
|
![Data Foundation - Phase 2](./diagram.png)
|
||||||
|
|
||||||
This example will create the next resources per project:
|
This example will create the next resources per project:
|
||||||
|
|
Before Width: | Height: | Size: 37 KiB After Width: | Height: | Size: 37 KiB |
|
@ -21,7 +21,7 @@ This example will create the next projects:
|
||||||
|
|
||||||
A master service account named projects-editor-sa will be created under common services project and will be granted editor permissions on all the projects in scope.
|
A master service account named projects-editor-sa will be created under common services project and will be granted editor permissions on all the projects in scope.
|
||||||
|
|
||||||
![Data Foundation - Phase 1](../../img/Data_Foundation-phase1.png)
|
![Data Foundation - Phase 1](./diagram.png)
|
||||||
|
|
||||||
## Running the example
|
## Running the example
|
||||||
|
|
Before Width: | Height: | Size: 23 KiB After Width: | Height: | Size: 23 KiB |
Binary file not shown.
Before Width: | Height: | Size: 36 KiB |
Binary file not shown.
Before Width: | Height: | Size: 48 KiB |
|
@ -1,27 +0,0 @@
|
||||||
# Manageing Multiple Environments
|
|
||||||
|
|
||||||
Terraform is a great tool for provisioning immutable infrastructure.
|
|
||||||
There are several ways to get Terraform to provision different environments using one repo. Here I’m going to use the most basic and naive method - the State separation.
|
|
||||||
State separation signals more mature usage of Terraform but with additional maturity comes additional complexity.
|
|
||||||
There are two primary methods to separate state between environments: directories and workspaces. I’m going to use the directory method.
|
|
||||||
|
|
||||||
For this example I’ll assume we have 3 environments:
|
|
||||||
|
|
||||||
- Dev
|
|
||||||
- QA
|
|
||||||
- Prod
|
|
||||||
|
|
||||||
```bash
|
|
||||||
export data_platform_folder="dpm"
|
|
||||||
|
|
||||||
mkdir ${data_platform_folder}
|
|
||||||
cd ${data_platform_folder}
|
|
||||||
|
|
||||||
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git dev
|
|
||||||
|
|
||||||
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git prod
|
|
||||||
|
|
||||||
git clone https://github.com/yorambenyaacov/cloud-foundation-fabric.git qa
|
|
||||||
```
|
|
||||||
|
|
||||||
Now you have a directory per environment in which you can do all the needed configurations (tfvars files) and provision it.
|
|
Loading…
Reference in New Issue