cloud-foundation-fabric/blueprints/data-solutions/data-playground
Dan Farmer 52eb83758f Fix compute-vm:CloudKMS test for provider>=4.54.0
* TF provider >= 4.54.0 now returns `rsa_encrypted_key` for
  `google_compute_disk.disks["attached-disk"]` (see
  hashicorp/terraform-provider-google#4448)
* Add this field to expected model to fix test assertion failure
* Update required TF provider to 4.55.0 (latest) since the assertion
  will now fail with <4.54.0, which do not return `rsa_encrypted_key`
  * Updated the whole repo on advice from @ludoo
2023-02-28 15:10:22 +00:00
..
README.md Update data solutions tests 2023-02-25 19:26:11 +01:00
diagram.png Rename examples folder to blueprints 2022-09-09 16:38:43 +02:00
main.tf Update remaining org policies 2023-02-21 15:49:16 +01:00
outputs.tf Add shared-vpc support on Data Playgroud blueprint 2023-01-19 00:33:31 +01:00
variables.tf Add shared-vpc support on Data Playgroud blueprint 2023-01-19 00:33:31 +01:00
versions.tf Fix compute-vm:CloudKMS test for provider>=4.54.0 2023-02-28 15:10:22 +00:00

README.md

Data Playground

This blueprint creates a minimum viable architecture for a data experimentation project with the needed APIs enabled, VPC and Firewall set in place, BigQuesy dataset, GCS bucket and an AI notebook to get started.

This is the high level diagram:

High-level diagram

Managed resources and services

This sample creates several distinct groups of resources:

  • project
  • networking
  • Vertex AI Workbench notebook configured with a private IP and using a dedicated Service Account
  • One GCS bucket
  • One BigQuery dataset

Virtual Private Cloud (VPC) design

As is often the case in real-world configurations, this blueprint accepts as input an existing Shared-VPC via the network_config variable. Make sure that 'container.googleapis.com', 'notebooks.googleapis.com' and 'servicenetworking.googleapis.com' are enabled in the VPC host project.

If the network_config variable is not provided, one VPC will be created in each project that supports network resources (load, transformation and orchestration).

Deploy your enviroment

We assume the identiy running the following steps has the following role:

  • resourcemanager.projectCreator in case a new project will be created.
  • owner on the project in case you use an existing project.

Run Terraform init:

$ terraform init

Configure the Terraform variable in your terraform.tfvars file. You need to spefify at least the following variables:

prefix = "prefix"
project_id      = "data-001"

You can run now:

$ terraform apply

You can now connect to the Vertex AI notbook to perform your data analysy.

Variables

name description type required default
prefix Prefix used for resource names. string
project_id Project id, references existing project if project_create is null. string
location The location where resources will be deployed. string "EU"
network_config Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. object({…}) null
project_create Provide values if project creation is needed, uses existing project if null. Parent format: folders/folder_id or organizations/org_id. object({…}) null
region The region where resources will be deployed. string "europe-west1"

Outputs

name description sensitive
bucket GCS Bucket URL.
dataset GCS Bucket URL.
notebook Vertex AI notebook details.
project Project id.
vpc VPC Network.

Test

module "test" {
  source     = "./fabric/blueprints/data-solutions/data-playground"
  project_id = "sampleproject"
  prefix     = "tst"
  project_create = {
    billing_account_id = "123456-123456-123456",
    parent             = "folders/467898377"
  }
}
# tftest modules=8 resources=39