Improve Data Playground example (#738)
* First commit * Fix README * Improve READMEs * Implement PR comments. * Fix Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
This commit is contained in:
parent
c3e7d98d09
commit
c0e17f4732
|
@ -36,6 +36,6 @@ This [example](./cloudsql-multiregion/) creates a [Cloud SQL instance](https://c
|
|||
|
||||
### Data Playground starter with Cloud Vertex AI Notebook and GCS
|
||||
|
||||
<a href="./data-playground/" title="Data Playground starter with Cloud Vertex AI Notebook and GCS"><img src="./data-playground/diagram.png" align="left" width="280px"></a>
|
||||
This [example](./data-playground/) creates a [Vertex AI Notebook](https://cloud.google.com/vertex-ai/docs/workbench/introduction) running under a VPC network and a starter GCS bucket to store inputs and outputs of data experiments.
|
||||
<a href="./data-playground/" title="Data Playground project with Cloud Vertex AI Notebook, BigQuery and GCS"><img src="./data-playground/diagram.png" align="left" width="280px"></a>
|
||||
This [example](./data-playground/) creates a [Vertex AI Notebook](https://cloud.google.com/vertex-ai/docs/workbench/introduction) running on a VPC with a private IP and a dedicated Service Account. A GCS bucket and a BigQuery dataset are created to store inputs and outputs of data experiments.
|
||||
<br clear="left">
|
|
@ -1,6 +1,6 @@
|
|||
# Data Playground
|
||||
|
||||
This example creates a minimum viable template for a data experimentation project with the needed APIs enabled, basic VPC and Firewall set in place, GCS bucket and an AI notebook to get started.
|
||||
This example creates a minimum viable architecture for a data experimentation project with the needed APIs enabled, VPC and Firewall set in place, BigQuesy dataset, GCS bucket and an AI notebook to get started.
|
||||
|
||||
This is the high level diagram:
|
||||
|
||||
|
@ -10,34 +10,58 @@ This is the high level diagram:
|
|||
|
||||
This sample creates several distinct groups of resources:
|
||||
|
||||
- projects
|
||||
- Service Project configured for GCE instances and GCS buckets
|
||||
- project
|
||||
- networking
|
||||
- VPC network
|
||||
- One default subnet
|
||||
- VPC network with a default subnet and CloudNat
|
||||
- Firewall rules for [SSH access via IAP](https://cloud.google.com/iap/docs/using-tcp-forwarding) and open communication within the VPC
|
||||
- Vertex AI notebook
|
||||
- One Jupyter lab notebook instance with public access
|
||||
- GCS
|
||||
- One bucket initial bucket
|
||||
- Vertex AI Workbench notebook configured with a private IP and using a dedicated Service Account
|
||||
- One GCS bucket
|
||||
- One BigQuery dataset
|
||||
|
||||
## Deploy your enviroment
|
||||
We assume the identiy running the following steps has the following role:
|
||||
|
||||
- resourcemanager.projectCreator in case a new project will be created.
|
||||
- owner on the project in case you use an existing project.
|
||||
|
||||
Run Terraform init:
|
||||
```
|
||||
$ terraform init
|
||||
```
|
||||
|
||||
Configure the Terraform variable in your terraform.tfvars file. You need to spefify at least the following variables:
|
||||
```
|
||||
prefix = "prefix"
|
||||
project_id = "data-001"
|
||||
```
|
||||
|
||||
You can run now:
|
||||
```
|
||||
$ terraform apply
|
||||
```
|
||||
|
||||
You can now connect to the Vertex AI notbook to perform your data analysy.
|
||||
<!-- BEGIN TFDOC -->
|
||||
|
||||
## Variables
|
||||
| name | description | type | required | default |
|
||||
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------- | ----------- | -------- | ------------ |
|
||||
| project\_id | Project id, references existing project if \`project\_create\` is null. | string | ✓ | |
|
||||
| location | The location where resources will be deployed | string | | europe |
|
||||
| region | The region where resources will be deployed. | string | | europe-west1 |
|
||||
| project\_create | Provide values if project creation is needed, uses existing project if null. Parent format: folders/folder\_id or organizations/org\_id | object({…}) | | null |
|
||||
| prefix | Unique prefix used for resource names. Not used for project if 'project\_create' is null. | string | | dp |
|
||||
| service\_encryption\_keys | Cloud KMS to use to encrypt different services. Key location should match service region. | object({…}) | | null |
|
||||
| vpc\_config | Parameters to create a simple VPC for the Data Playground | object({…}) | | {...} |
|
||||
|
||||
| name | description | type | required | default |
|
||||
|---|---|:---:|:---:|:---:|
|
||||
| [prefix](variables.tf#L36) | Unique prefix used for resource names. Not used for project if 'project_create' is null. | <code>string</code> | ✓ | |
|
||||
| [project_id](variables.tf#L22) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
|
||||
| [location](variables.tf#L16) | The location where resources will be deployed. | <code>string</code> | | <code>"EU"</code> |
|
||||
| [project_create](variables.tf#L27) | Provide values if project creation is needed, uses existing project if null. Parent format: folders/folder_id or organizations/org_id | <code title="object({ billing_account_id = string parent = string })">object({…})</code> | | <code>null</code> |
|
||||
| [region](variables.tf#L41) | The region where resources will be deployed. | <code>string</code> | | <code>"europe-west1"</code> |
|
||||
| [vpc_config](variables.tf#L57) | Parameters to create a VPC. | <code title="object({ ip_cidr_range = string })">object({…})</code> | | <code title="{ ip_cidr_range = "10.0.0.0/20" }">{…}</code> |
|
||||
|
||||
## Outputs
|
||||
| Name | Description |
|
||||
| ----------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- |
|
||||
| bucket | GCS Bucket URL. |
|
||||
| project | Project id |
|
||||
| vpc | VPC Network name |
|
||||
| notebook | Vertex AI notebook name |
|
||||
|
||||
| name | description | sensitive |
|
||||
|---|---|:---:|
|
||||
| [bucket](outputs.tf#L15) | GCS Bucket URL. | |
|
||||
| [dataset](outputs.tf#L20) | GCS Bucket URL. | |
|
||||
| [notebook](outputs.tf#L25) | Vertex AI notebook details. | |
|
||||
| [project](outputs.tf#L33) | Project id | |
|
||||
| [vpc](outputs.tf#L38) | VPC Network | |
|
||||
|
||||
<!-- END TFDOC -->
|
||||
|
|
Binary file not shown.
Before Width: | Height: | Size: 27 KiB After Width: | Height: | Size: 18 KiB |
|
@ -27,25 +27,33 @@ module "project" {
|
|||
project_create = var.project_create != null
|
||||
prefix = var.project_create == null ? null : var.prefix
|
||||
services = [
|
||||
"stackdriver.googleapis.com",
|
||||
"compute.googleapis.com",
|
||||
"storage-component.googleapis.com",
|
||||
"storage.googleapis.com",
|
||||
"servicenetworking.googleapis.com",
|
||||
"bigquery.googleapis.com",
|
||||
"bigquerystorage.googleapis.com",
|
||||
"bigqueryreservation.googleapis.com",
|
||||
"composer.googleapis.com",
|
||||
"compute.googleapis.com",
|
||||
"dataflow.googleapis.com",
|
||||
"ml.googleapis.com",
|
||||
"notebooks.googleapis.com",
|
||||
"composer.googleapis.com"
|
||||
"servicenetworking.googleapis.com",
|
||||
"stackdriver.googleapis.com",
|
||||
"storage.googleapis.com",
|
||||
"storage-component.googleapis.com"
|
||||
]
|
||||
policy_boolean = {
|
||||
# "constraints/compute.requireOsLogin" = false
|
||||
# Example of applying a project wide policy, mainly useful for Composer
|
||||
}
|
||||
service_encryption_key_ids = {
|
||||
compute = [try(local.service_encryption_keys.compute, null)]
|
||||
bq = [try(local.service_encryption_keys.bq, null)]
|
||||
storage = [try(local.service_encryption_keys.storage, null)]
|
||||
}
|
||||
|
||||
service_config = {
|
||||
disable_on_destroy = false,
|
||||
disable_dependent_services = false
|
||||
}
|
||||
}
|
||||
|
||||
###############################################################################
|
||||
|
@ -55,11 +63,11 @@ module "project" {
|
|||
module "vpc" {
|
||||
source = "../../../modules/net-vpc"
|
||||
project_id = module.project.project_id
|
||||
name = var.vpc_config.vpc_name
|
||||
name = "${var.prefix}-vpc"
|
||||
subnets = [
|
||||
{
|
||||
ip_cidr_range = var.vpc_config.ip_cidr_range
|
||||
name = var.vpc_config.subnet_name
|
||||
name = "${var.prefix}-subnet"
|
||||
region = var.region
|
||||
secondary_ip_range = {}
|
||||
}
|
||||
|
@ -71,27 +79,73 @@ module "vpc-firewall" {
|
|||
project_id = module.project.project_id
|
||||
network = module.vpc.name
|
||||
admin_ranges = [var.vpc_config.ip_cidr_range]
|
||||
custom_rules = {
|
||||
#TODO Remove and rely on 'ssh' tag once terraform-provider-google/issues/9273 is fixed
|
||||
("${var.prefix}-iap") = {
|
||||
description = "Enable SSH from IAP on Notebooks."
|
||||
direction = "INGRESS"
|
||||
action = "allow"
|
||||
sources = []
|
||||
ranges = ["35.235.240.0/20"]
|
||||
targets = ["notebook-instance"]
|
||||
use_service_accounts = false
|
||||
rules = [{ protocol = "tcp", ports = [22] }]
|
||||
extra_attributes = {}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
module "cloudnat" {
|
||||
source = "../../../modules/net-cloudnat"
|
||||
project_id = module.project.project_id
|
||||
name = "${var.prefix}-default"
|
||||
region = var.region
|
||||
router_network = module.vpc.name
|
||||
}
|
||||
|
||||
###############################################################################
|
||||
# GCS #
|
||||
# Storage #
|
||||
###############################################################################
|
||||
|
||||
module "base-gcs-bucket" {
|
||||
module "bucket" {
|
||||
source = "../../../modules/gcs"
|
||||
project_id = module.project.project_id
|
||||
prefix = module.project.project_id
|
||||
name = "base"
|
||||
prefix = var.prefix
|
||||
location = var.location
|
||||
name = "data"
|
||||
encryption_key = try(local.service_encryption_keys.storage, null) # Example assignment of an encryption key
|
||||
}
|
||||
|
||||
module "dataset" {
|
||||
source = "../../../modules/bigquery-dataset"
|
||||
project_id = module.project.project_id
|
||||
id = "${var.prefix}_data"
|
||||
encryption_key = try(local.service_encryption_keys.bq, null) # Example assignment of an encryption key
|
||||
}
|
||||
|
||||
###############################################################################
|
||||
# Vertex AI Notebook #
|
||||
###############################################################################
|
||||
# TODO: Add encryption_key to Vertex AI notebooks as well
|
||||
# TODO: Add shared VPC support
|
||||
|
||||
module "service-account-notebook" {
|
||||
source = "../../../modules/iam-service-account"
|
||||
project_id = module.project.project_id
|
||||
name = "notebook-sa"
|
||||
iam_project_roles = {
|
||||
(module.project.project_id) = [
|
||||
"roles/bigquery.admin",
|
||||
"roles/bigquery.jobUser",
|
||||
"roles/bigquery.dataEditor",
|
||||
"roles/bigquery.user",
|
||||
"roles/storage.admin",
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
resource "google_notebooks_instance" "playground" {
|
||||
name = "data-play-notebook"
|
||||
name = "${var.prefix}-notebook"
|
||||
location = format("%s-%s", var.region, "b")
|
||||
machine_type = "e2-medium"
|
||||
project = module.project.project_id
|
||||
|
@ -104,10 +158,17 @@ resource "google_notebooks_instance" "playground" {
|
|||
install_gpu_driver = true
|
||||
boot_disk_type = "PD_SSD"
|
||||
boot_disk_size_gb = 110
|
||||
disk_encryption = try(local.service_encryption_keys.compute != null, false) ? "CMEK" : "GMEK"
|
||||
kms_key = try(local.service_encryption_keys.compute, null)
|
||||
|
||||
no_public_ip = false
|
||||
no_public_ip = true
|
||||
no_proxy_access = false
|
||||
|
||||
network = module.vpc.network.id
|
||||
subnet = module.vpc.subnets[format("%s/%s", var.region, var.vpc_config.subnet_name)].id
|
||||
subnet = module.vpc.subnets[format("%s/%s", var.region, "${var.prefix}-subnet")].id
|
||||
|
||||
service_account = module.service-account-notebook.email
|
||||
|
||||
#TODO Uncomment once terraform-provider-google/issues/9273 is fixed
|
||||
# tags = ["ssh"]
|
||||
}
|
||||
|
|
|
@ -14,12 +14,20 @@
|
|||
|
||||
output "bucket" {
|
||||
description = "GCS Bucket URL."
|
||||
value = module.base-gcs-bucket.url
|
||||
value = module.bucket.url
|
||||
}
|
||||
|
||||
output "dataset" {
|
||||
description = "GCS Bucket URL."
|
||||
value = module.dataset.id
|
||||
}
|
||||
|
||||
output "notebook" {
|
||||
description = "Vertex AI notebook"
|
||||
value = resource.google_notebooks_instance.playground.name
|
||||
description = "Vertex AI notebook details."
|
||||
value = {
|
||||
name = resource.google_notebooks_instance.playground.name
|
||||
id = resource.google_notebooks_instance.playground.id
|
||||
}
|
||||
}
|
||||
|
||||
output "project" {
|
||||
|
@ -30,4 +38,4 @@ output "project" {
|
|||
output "vpc" {
|
||||
description = "VPC Network"
|
||||
value = module.vpc.name
|
||||
}
|
||||
}
|
||||
|
|
|
@ -16,7 +16,7 @@
|
|||
variable "location" {
|
||||
description = "The location where resources will be deployed."
|
||||
type = string
|
||||
default = "europe"
|
||||
default = "EU"
|
||||
}
|
||||
|
||||
variable "project_id" {
|
||||
|
@ -36,7 +36,6 @@ variable "project_create" {
|
|||
variable "prefix" {
|
||||
description = "Unique prefix used for resource names. Not used for project if 'project_create' is null."
|
||||
type = string
|
||||
default = "dp"
|
||||
}
|
||||
|
||||
variable "region" {
|
||||
|
@ -48,21 +47,19 @@ variable "region" {
|
|||
variable "service_encryption_keys" { # service encription key
|
||||
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
|
||||
type = object({
|
||||
bq = string
|
||||
compute = string
|
||||
storage = string
|
||||
})
|
||||
default = null
|
||||
}
|
||||
|
||||
variable "vpc_config" {
|
||||
description = "Parameters to create a simple VPC for the Data Playground"
|
||||
description = "Parameters to create a VPC."
|
||||
type = object({
|
||||
ip_cidr_range = string
|
||||
subnet_name = string
|
||||
vpc_name = string
|
||||
})
|
||||
default = {
|
||||
ip_cidr_range = "10.0.0.0/20"
|
||||
subnet_name = "default-subnet"
|
||||
vpc_name = "data-playground-vpc"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -17,6 +17,7 @@
|
|||
module "test" {
|
||||
source = "../../../../../examples/data-solutions/data-playground/"
|
||||
project_id = "sampleproject"
|
||||
prefix = "tst"
|
||||
project_create = {
|
||||
billing_account_id = "123456-123456-123456",
|
||||
parent = "folders/467898377"
|
||||
|
|
|
@ -22,5 +22,5 @@ FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture')
|
|||
def test_resources(e2e_plan_runner):
|
||||
"Test that plan works and the numbers of resources is as expected."
|
||||
modules, resources = e2e_plan_runner(FIXTURES_DIR)
|
||||
assert len(modules) == 4
|
||||
assert len(resources) == 23
|
||||
assert len(modules) == 7
|
||||
assert len(resources) == 34
|
||||
|
|
Loading…
Reference in New Issue