cloud-foundation-fabric/data-solutions/data-platform-foundations
Julio Castillo 1d13e3e624 Add more validations to linter
- Ensure all variables and outputs are sorted
- Ensure all variables and outputs have a description
- Add data-solutions/data-platform-foundations to linter

Fix all modules to follow these new conventions.
2021-10-08 18:26:04 +02:00
..
01-environment Add more validations to linter 2021-10-08 18:26:04 +02:00
02-resources Add more validations to linter 2021-10-08 18:26:04 +02:00
03-pipeline Bugfixing Data Foundations (#310) 2021-09-28 17:13:18 +02:00
README.md add vpc-sc support 2021-07-08 16:51:57 +02:00

README.md

Data Foundation Platform

The goal of this example is to Build a robust and flexible Data Foundation on GCP, providing opinionated defaults while still allowing customers to quickly and reliably build and scale out additional data pipelines.

The example is composed of three separate provisioning workflows, which are deisgned to be plugged together and create end to end Data Foundations, that support multiple data pipelines on top.

  1. Environment Setup (once per environment)
    • projects
    • VPC configuration
    • Composer environment and identity
    • shared buckets and datasets
  2. Data Source Setup (once per data source)
    • landing and archive bucket
    • internal and external identities
    • domain specific datasets
  3. Pipeline Setup (once per pipeline)
    • pipeline-specific tables and views
    • pipeline code
    • Composer DAG

The resulting GCP architecture is outlined in this diagram Target architecture

A demo pipeline is also part of this example: it can be built and run on top of the foundational infrastructure to quickly verify or test the setup.

Prerequisites

In order to bring up this example, you will need

  • a folder or organization where new projects will be created
  • a billing account that will be associated to new projects
  • an identity (user or service account) with owner permissions on the folder or org, and billing user permissions on the billing account

Bringing up the platform

Open in Cloud Shell

The end-to-end example is composed of 2 foundational, and 1 optional steps:

  1. Environment setup
  2. Data source setup
  3. (Optional) Pipeline setup

The environment setup is designed to manage a single environment. Various strategies like workspaces, branching, or even separate clones can be used to support multiple environments.

TODO

Description Priority (1:High - 5:Low ) Status Remarks
DLP best practices in the pipeline 2 Not Started
Add Composer with a static DAG running the example 3 Not Started
Integrate CI/CD composer data processing workflow framework 3 Not Started
Schema changes, how to handle 4 Not Started
Data lineage 4 Not Started
Data quality checks 4 Not Started
Shared-VPC 5 Not Started
Logging & monitoring TBD Not Started
Orcestration for ingestion pipeline (just in the readme) TBD Not Started