cloud-foundation-fabric/data-solutions/data-platform-foundations/README.md

2.3 KiB

Data Platform Foundations

General

The goal of this project is to Build a robust and flexible Data Foundation on GCP that provides opinionated defaults, while allowing customers to build and scale out additional data pipelines quickly and reliably.

There are three provisioning workflows to enable an end to end Foundational Data Platform along with Data Pipelines on top of it. This is represented in the diagram below.

Three Main Workflows

Target architecture

Target architecture

In this example we will create the infrastructure needed for the foundational build and run a demo pipeline.

Before you begine

Since this example is intended for the data infra engineers we do expect that an initial organization / folder and service account with owner privileges will be pre-created and provided as variables.

This example assumes that the following resources were already created and provided:

  • Root node (organization or folder)
  • Service account with owner permissions on the root node, used to apply Terraform code

Building the Platform

Building the platform is composed of 2 foundational, and 2 optional steps:

TODO list

Description Priority (1:High - 5:Low ) Status Remarks
DLP best practices in the pipeline 2 Not Started
KMS support (CMEK) 2 Not Started
VPC-SC 3 Not Started
Add Composer with a static DAG running the example 3 Not Started
Integrate CI/CD composer data processing workflow framework 3 Not Started
Schema changes, how to handle 4 Not Started
Data lineage 4 Not Started
Data quality checks 4 Not Started
Shared-VPC 5 Not Started
Logging & monitoring TBD Not Started
Orcestration for ingestion pipeline (just in the readme) TBD Not Started