cloud-foundation-fabric/data-solutions/data-platform-foundations
Yoram Ben-Yaacov c53b755684 Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00
..
data-pipeline Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00
img Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00
infra Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00
README.md Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00
manageing_multiple_environments.md Adding data-platform-foundations code 2021-05-18 19:30:21 +03:00

README.md

Data Platform Foundations

General

The goal of this project is to Build a robust and flexible Data Foundation on GCP that provides opinionated defaults while allowing customers to build and scale out additional data pipelines quickly and reliably.

There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.

Three Main Workflows

Target architecture

Target architecture

In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.

Before you begine

Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.

This example assume the next items were already created and provided:

  • Organization / folder
  • Terraform runner Service account with owner permissions on the above organization / folder

Building the Platform

Building the platform is composed of 4 steps:

  1. (Optional) Managed resources and services
  2. Data Platform Foundations - Phase 1: Building the projects
  3. Data Platform Foundations - Phase 2: Deploy assets
  4. (Optional) Manual pipeline Example

TODO list

Description Priority (1:High - 5:Low ) Status Remarks
DLP best practices in the pipeline 2 Not Started
KMS support (CMEK) 2 Not Started
VPC-SC 3 Not Started
Add Composer with a static DAG running the example 3 Not Started
Integrate CI/CD composer data processing workflow framework 3 Not Started
Schema changes, how to handle 4 Not Started
Data lineage 4 Not Started
Data quality checks 4 Not Started
Shared-VPC 5 Not Started
Logging & monitoring TBD Not Started
Orcestration for ingestion pipeline (just in the readme) TBD Not Started