Merge pull request #1337 from GoogleCloudPlatform/lcaggio/vertex-01

Improve Vertex mlops blueprint
This commit is contained in:
Julio Castillo 2023-04-24 21:01:39 +02:00 committed by GitHub
commit 84a7b988a3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
13 changed files with 469 additions and 242 deletions

View File

@ -98,5 +98,5 @@ module "test" {
prefix = "prefix"
}
# tftest modules=9 resources=47
# tftest modules=9 resources=48
```

View File

@ -86,5 +86,5 @@ module "test" {
parent = "folders/467898377"
}
}
# tftest modules=8 resources=40
# tftest modules=8 resources=41
```

View File

@ -1,24 +1,30 @@
# MLOps with Vertex AI
## Introduction
## Tagline
Create a Vertex AI environment needed for MLOps.
## Detailed
This example implements the infrastructure required to deploy an end-to-end [MLOps process](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf) using [Vertex AI](https://cloud.google.com/vertex-ai) platform.
## GCP resources
## Architecture
The blueprint will deploy all the required resources to have a fully functional MLOPs environment containing:
- Vertex Workbench (for the experimentation environment)
- GCP Project (optional) to host all the resources
- Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable.
- Firewall rule to allow the internal subnet communication required by Dataflow
- Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow)
- GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining)
- BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset.
- Artifact Registry Docker repository to host the custom images.
- Service account (`mlops-[env]@`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline).
- Service account (`github@`) to be used by Workload Identity Federation, to federate Github identity (Optional).
- Secret to store the Github SSH key to get access the CICD code repo.
1. Vertex Workbench (for the experimentation environment).
1. GCP Project (optional) to host all the resources.
1. Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable.
1. Firewall rule to allow the internal subnet communication required by Dataflow.
1. Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow).
1. GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining).
1. BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset.
1. Artifact Registry Docker repository to host the custom images.
1. Service account (`PREFIX-sa-mlops`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline).
1. Service account (`PREFIX-sa-github@`) to be used by Workload Identity Federation, to federate Github identity (Optional).
1. Secret Manager to store the Github SSH key to get access the CICD code repo.
## Documentation
![MLOps project description](./images/mlops_projects.png "MLOps project description")
@ -46,69 +52,81 @@ Please note that these groups are not suitable for production grade environments
## What's next?
This blueprint can be used as a building block for setting up an end2end ML Ops solution. As next step, you can follow this [guide](https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build) to setup a Vertex AI pipeline and run it on the deployed infraestructure.
## Usage
Basic usage of this module is as follows:
```hcl
module "test" {
source = "./fabric/blueprints/data-solutions/vertex-mlops/"
notebooks = {
"myworkbench" = {
type = "USER_MANAGED"
}
}
prefix = "pref-dev"
project_config = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
project_id = "test-dev"
}
}
# tftest modules=11 resources=60
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [project_id](variables.tf#L101) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [notebooks](variables.tf#L69) | Vertex AI workbenchs to be deployed. Service Account runtime/instances deployed. | <code title="map&#40;object&#40;&#123;&#10; type &#61; string&#10; machine_type &#61; optional&#40;string, &#34;n1-standard-4&#34;&#41;&#10; internal_ip_only &#61; optional&#40;bool, true&#41;&#10; idle_shutdown &#61; optional&#40;bool, false&#41;&#10; owner &#61; optional&#40;string&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | ✓ | |
| [project_config](variables.tf#L96) | Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; optional&#40;string&#41;&#10; parent &#61; optional&#40;string&#41;&#10; project_id &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [bucket_name](variables.tf#L18) | GCS bucket name to store the Vertex AI artifacts. | <code>string</code> | | <code>null</code> |
| [dataset_name](variables.tf#L24) | BigQuery Dataset to store the training data. | <code>string</code> | | <code>null</code> |
| [groups](variables.tf#L30) | Name of the groups (name@domain.org) to apply opinionated IAM permissions. | <code title="object&#40;&#123;&#10; gcp-ml-ds &#61; string&#10; gcp-ml-eng &#61; string&#10; gcp-ml-viewer &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; gcp-ml-ds &#61; null&#10; gcp-ml-eng &#61; null&#10; gcp-ml-viewer &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [identity_pool_claims](variables.tf#L45) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L51) | Labels to be assigned at project level. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [location](variables.tf#L57) | Location used for multi-regional resources. | <code>string</code> | | <code>&#34;eu&#34;</code> |
| [network_config](variables.tf#L63) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_link &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [notebooks](variables.tf#L73) | Vertex AI workbenchs to be deployed. | <code title="map&#40;object&#40;&#123;&#10; owner &#61; string&#10; region &#61; string&#10; subnet &#61; string&#10; internal_ip_only &#61; optional&#40;bool, false&#41;&#10; idle_shutdown &#61; optional&#40;bool&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [prefix](variables.tf#L86) | Prefix used for the project id. | <code>string</code> | | <code>null</code> |
| [project_create](variables.tf#L92) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [project_services](variables.tf#L106) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;aiplatform.googleapis.com&#34;,&#10; &#34;artifactregistry.googleapis.com&#34;,&#10; &#34;bigquery.googleapis.com&#34;,&#10; &#34;cloudbuild.googleapis.com&#34;,&#10; &#34;compute.googleapis.com&#34;,&#10; &#34;datacatalog.googleapis.com&#34;,&#10; &#34;dataflow.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;monitoring.googleapis.com&#34;,&#10; &#34;notebooks.googleapis.com&#34;,&#10; &#34;secretmanager.googleapis.com&#34;,&#10; &#34;servicenetworking.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [region](variables.tf#L126) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west4&#34;</code> |
| [repo_name](variables.tf#L132) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> |
| [sa_mlops_name](variables.tf#L138) | Name for the MLOPs Service Account. | <code>string</code> | | <code>&#34;sa-mlops&#34;</code> |
| [groups](variables.tf#L30) | Name of the groups (name@domain.org) to apply opinionated IAM permissions. | <code title="object&#40;&#123;&#10; gcp-ml-ds &#61; optional&#40;string&#41;&#10; gcp-ml-eng &#61; optional&#40;string&#41;&#10; gcp-ml-viewer &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [identity_pool_claims](variables.tf#L41) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L47) | Labels to be assigned at project level. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [location](variables.tf#L53) | Location used for multi-regional resources. | <code>string</code> | | <code>&#34;eu&#34;</code> |
| [network_config](variables.tf#L59) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_link &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [prefix](variables.tf#L90) | Prefix used for the project id. | <code>string</code> | | <code>null</code> |
| [region](variables.tf#L110) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west4&#34;</code> |
| [repo_name](variables.tf#L116) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> |
| [service_encryption_keys](variables.tf#L122) | Cloud KMS to use to encrypt different services. Key location should match service region. | <code title="object&#40;&#123;&#10; aiplatform &#61; optional&#40;string&#41;&#10; bq &#61; optional&#40;string&#41;&#10; notebooks &#61; optional&#40;string&#41;&#10; secretmanager &#61; optional&#40;string&#41;&#10; storage &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [github](outputs.tf#L33) | Github Configuration. | |
| [notebook](outputs.tf#L39) | Vertex AI managed notebook details. | |
| [project](outputs.tf#L44) | The project resource as return by the `project` module. | |
| [project_id](outputs.tf#L49) | Project ID. | |
| [github](outputs.tf#L30) | Github Configuration. | |
| [notebook](outputs.tf#L35) | Vertex AI notebooks ids. | |
| [project](outputs.tf#L43) | The project resource as return by the `project` module. | |
<!-- END TFDOC -->
## TODO
- Add support for User Managed Notebooks, SA permission option and non default SA for Single User mode.
- Improve default naming for local VPC and Cloud NAT
## Test
```hcl
module "test" {
source = "./fabric/blueprints/data-solutions/vertex-mlops/"
labels = {
"env" : "dev",
"team" : "ml"
"env" = "dev",
"team" = "ml"
}
bucket_name = "test-dev"
dataset_name = "test"
bucket_name = "gcs-test"
dataset_name = "bq-test"
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
notebooks = {
"myworkbench" : {
"owner" : "user@example.com",
"region" : "europe-west4",
"subnet" : "default",
"myworkbench" = {
type = "USER_MANAGED"
}
}
prefix = "pref"
project_id = "test-dev"
project_create = {
prefix = "pref-dev"
project_config = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
project_id = "test-dev"
}
}
# tftest modules=12 resources=57
# tftest modules=13 resources=65
```

View File

@ -16,9 +16,9 @@
terraform {
provider_meta "google" {
module_name = "blueprints/terraform/fabric-blueprints:vertex-mlops/v1.0.0"
module_name = "blueprints/terraform/fabric-blueprints:vertex-mlops/v21.0.0"
}
provider_meta "google-beta" {
module_name = "blueprints/terraform/fabric-blueprints:vertex-mlops/v1.0.0"
module_name = "blueprints/terraform/fabric-blueprints:vertex-mlops/v21.0.0"
}
}

View File

@ -44,14 +44,11 @@ module "artifact_registry" {
project_id = module.project.project_id
location = var.region
format = "DOCKER"
# iam = {
# "roles/artifactregistry.admin" = ["group:cicd@example.com"]
# }
}
module "service-account-github" {
source = "../../../modules/iam-service-account"
name = "sa-github"
name = "${var.prefix}-sa-github"
project_id = module.project.project_id
iam = var.identity_pool_claims == null ? {} : { "roles/iam.workloadIdentityUser" = ["principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool[0].name}/${var.identity_pool_claims}"] }
}
@ -63,6 +60,9 @@ module "secret-manager" {
secrets = {
github-key = [var.region]
}
encryption_key = {
"${var.region}" = var.service_encryption_keys.secretmanager
}
iam = {
github-key = {
"roles/secretmanager.secretAccessor" = [
@ -71,4 +71,4 @@ module "secret-manager" {
]
}
}
}
}

View File

@ -64,8 +64,7 @@ locals {
}
)
service_encryption_keys = var.service_encryption_keys
shared_vpc_project = try(var.network_config.host_project, null)
shared_vpc_project = try(var.network_config.host_project, null)
subnet = (
local.use_shared_vpc
@ -109,7 +108,7 @@ module "gcs-bucket" {
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
encryption_key = var.service_encryption_keys.storage
}
# Default bucket for Cloud Build to prevent error: "'us' violates constraint gcp.resourceLocations"
@ -117,12 +116,12 @@ module "gcs-bucket" {
module "gcs-bucket-cloudbuild" {
source = "../../../modules/gcs"
project_id = module.project.project_id
name = "${var.project_id}_cloudbuild"
name = "${var.prefix}_cloudbuild"
prefix = var.prefix
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
encryption_key = var.service_encryption_keys.storage
}
module "bq-dataset" {
@ -131,7 +130,7 @@ module "bq-dataset" {
project_id = module.project.project_id
id = var.dataset_name
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
encryption_key = var.service_encryption_keys.bq
}
module "vpc-local" {
@ -190,19 +189,28 @@ module "cloudnat" {
module "project" {
source = "../../../modules/project"
name = var.project_id
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account_id, null)
project_create = var.project_create != null
name = var.project_config.project_id
parent = var.project_config.parent
billing_account = var.project_config.billing_account_id
project_create = var.project_config.billing_account_id != null
prefix = var.prefix
group_iam = local.group_iam
iam = {
"roles/aiplatform.user" = [module.service-account-mlops.iam_email]
"roles/aiplatform.user" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/artifactregistry.reader" = [module.service-account-mlops.iam_email]
"roles/artifactregistry.writer" = [module.service-account-github.iam_email]
"roles/bigquery.dataEditor" = [module.service-account-mlops.iam_email]
"roles/bigquery.jobUser" = [module.service-account-mlops.iam_email]
"roles/bigquery.user" = [module.service-account-mlops.iam_email]
"roles/bigquery.dataEditor" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/bigquery.jobUser" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email
]
"roles/bigquery.user" = [module.service-account-mlops.iam_email, module.service-account-notebook.iam_email]
"roles/cloudbuild.builds.editor" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
@ -213,6 +221,8 @@ module "project" {
"roles/dataflow.worker" = [module.service-account-mlops.iam_email]
"roles/iam.serviceAccountUser" = [
module.service-account-mlops.iam_email,
module.service-account-notebook.iam_email,
module.service-account-github.iam_email,
"serviceAccount:${module.project.service_accounts.robots.cloudbuild}"
]
"roles/monitoring.metricWriter" = [module.service-account-mlops.iam_email]
@ -223,28 +233,41 @@ module "project" {
]
"roles/storage.admin" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
module.service-account-github.iam_email,
module.service-account-notebook.iam_email
]
}
labels = var.labels
org_policies = {
# Example of applying a project wide policy
# "compute.requireOsLogin" = {
# rules = [{ enforce = false }]
# }
}
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
compute = [try(local.service_encryption_keys.compute, null)]
cloudbuild = [try(local.service_encryption_keys.storage, null)]
notebooks = [try(local.service_encryption_keys.compute, null)]
storage = [try(local.service_encryption_keys.storage, null)]
aiplatform = [var.service_encryption_keys.aiplatform]
bq = [var.service_encryption_keys.bq]
cloudbuild = [var.service_encryption_keys.storage]
notebooks = [var.service_encryption_keys.notebooks]
secretmanager = [var.service_encryption_keys.secretmanager]
storage = [var.service_encryption_keys.storage]
}
services = var.project_services
services = [
"aiplatform.googleapis.com",
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"cloudbuild.googleapis.com",
"compute.googleapis.com",
"datacatalog.googleapis.com",
"dataflow.googleapis.com",
"iam.googleapis.com",
"ml.googleapis.com",
"monitoring.googleapis.com",
"notebooks.googleapis.com",
"secretmanager.googleapis.com",
"servicenetworking.googleapis.com",
"serviceusage.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
]
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
@ -254,11 +277,8 @@ module "project" {
module "service-account-mlops" {
source = "../../../modules/iam-service-account"
name = var.sa_mlops_name
name = "${var.prefix}-sa-mlops"
project_id = module.project.project_id
iam = {
"roles/iam.serviceAccountUser" = [module.service-account-github.iam_email]
}
}
resource "google_project_iam_member" "shared_vpc" {
@ -268,11 +288,8 @@ resource "google_project_iam_member" "shared_vpc" {
member = "serviceAccount:${module.project.service_accounts.robots.notebooks}"
}
resource "google_sourcerepo_repository" "code-repo" {
count = var.repo_name == null ? 0 : 1
name = var.repo_name
project = module.project.project_id
}

View File

@ -0,0 +1,169 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
apiVersion: blueprints.cloud.google.com/v1alpha1
kind: BlueprintMetadata
metadata:
name: fabric-blueprint-vertex-mlops
spec:
title: MLOps with Vertex AI
source:
repo: https://github.com/GoogleCloudPlatform/cloud-foundation-fabric.git
sourceType: git
version: 21.0.0
actuationTool:
type: Terraform
version: '>= 1.3.0'
description:
tagline: MLOps with Vertex AI
detailed: |-
This example implements the infrastructure required to deploy an end-to-end MLOps process using Vertex AI platform.
architecture:
- Vertex Workbench (for the experimentation environment).
- GCP Project (optional) to host all the resources.
- Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable.
- Firewall rule to allow the internal subnet communication required by Dataflow.
- Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow).
- GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining).
- BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset.
- Artifact Registry Docker repository to host the custom images.
- Service account (`PREFIX-sa-mlops`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline).
- Service account (`PREFIX-sa-github@`) to be used by Workload Identity Federation, to federate Github identity (Optional).
- Secret Manager to store the Github SSH key to get access the CICD code repo.
documentation:
- title: Architecture Diagram
url: https://github.com/GoogleCloudPlatform/cloud-foundation-fabric/blob/master/blueprints/data-solutions/vertex-mlops/images/mlops_projects.png
variables:
- name: notebooks
description: Vertex AI workbenchs to be deployed. Service Account runtime/instances deployed.
type: |-
map(object({
type = string
machine_type = optional(string, "n1-standard-4")
internal_ip_only = optional(bool, true)
idle_shutdown = optional(bool, false)
owner = optional(string)
}))
required: true
- name: project_config
description: Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format.
type: |-
object({
billing_account_id = optional(string)
parent = optional(string)
project_id = string
})
required: true
- name: bucket_name
description: GCS bucket name to store the Vertex AI artifacts.
type: string
default: null
required: false
- name: dataset_name
description: BigQuery Dataset to store the training data.
type: string
default: null
required: false
- name: groups
description: Name of the groups (group_name@domain.org) to apply opinionated IAM permissions.
type: |-
object({
gcp-ml-ds = optional(string),
gcp-ml-eng = optional(string),
gcp-ml-viewer = optional(string)
})
default: {}
required: false
- name: identity_pool_claims
description: Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created.
type: string
default: null
required: false
- name: labels
description: Labels to be assigned at project level.
type: map(string)
required: false
default: {}
- name: location
description: Location used for multi-regional resources.
type: string
default: eu
required: false
- name: network_config
description: Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values.
type: |-
object({
host_project = string
network_self_link = string
subnet_self_link = string
})
default: null
required: false
- name: prefix
description: Prefix used for the project id.
type: string
default: null
required: false
- name: region
description: Region used for regional resources.
type: string
default: europe-west4
required: false
- name: repo_name
description: Cloud Source Repository name. null to avoid to create it.
type: string
default: null
required: false
- name: service_encryption_keys
description: Cloud KMS to use to encrypt different services. Key location should match service region.
type: |-
object({
aiplatform = optional(string)
bq = optional(string)
notebooks = optional(string)
secretmanager = optional(string)
storage = optional(string)
})
default: {}
required: false
outputs:
- name: github
description: Github Configuration.
- name: notebook
description: Vertex AI notebooks ids.
- name: project
description: The project resource as return by the project module.
roles:
- level: Project
roles:
- roles/owner
services:
- aiplatform.googleapis.com
- artifactregistry.googleapis.com
- bigquery.googleapis.com
- bigquerystorage.googleapis.com
- cloudbuild.googleapis.com
- compute.googleapis.com
- datacatalog.googleapis.com
- dataflow.googleapis.com
- iam.googleapis.com
- ml.googleapis.com
- monitoring.googleapis.com
- notebooks.googleapis.com
- secretmanager.googleapis.com
- servicenetworking.googleapis.com
- serviceusage.googleapis.com
- stackdriver.googleapis.com
- storage.googleapis.com
- storage-component.googleapis.com

View File

@ -1,60 +0,0 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
resource "google_notebooks_runtime" "runtime" {
for_each = var.notebooks
name = each.key
project = module.project.project_id
location = var.notebooks[each.key].region
access_config {
access_type = "SINGLE_USER"
runtime_owner = var.notebooks[each.key].owner
}
software_config {
enable_health_monitoring = true
idle_shutdown = var.notebooks[each.key].idle_shutdown
idle_shutdown_timeout = 1800
}
virtual_machine {
virtual_machine_config {
machine_type = "n1-standard-4"
network = local.vpc
subnet = local.subnet
internal_ip_only = var.notebooks[each.key].internal_ip_only
dynamic "encryption_config" {
for_each = try(local.service_encryption_keys.compute, null) == null ? [] : [1]
content {
kms_key = local.service_encryption_keys.compute
}
}
metadata = {
notebook-disable-nbconvert = "false"
notebook-disable-downloads = "false"
notebook-disable-terminal = "false"
#notebook-disable-root = "true"
#notebook-upgrade-schedule = "48 4 * * MON"
}
data_disk {
initialize_params {
disk_size_gb = "100"
disk_type = "PD_STANDARD"
}
}
}
}
}

View File

@ -14,9 +14,6 @@
* limitations under the License.
*/
# TODO(): proper outputs
locals {
docker_split = try(split("/", module.artifact_registry.id), null)
docker_repo = try("${local.docker_split[3]}-docker.pkg.dev/${local.docker_split[1]}/${local.docker_split[5]}", null)
@ -31,22 +28,19 @@ locals {
}
output "github" {
description = "Github Configuration."
value = local.gh_config
}
output "notebook" {
description = "Vertex AI managed notebook details."
value = { for k, v in resource.google_notebooks_runtime.runtime : k => v.id }
description = "Vertex AI notebooks ids."
value = merge(
{ for k, v in resource.google_notebooks_runtime.runtime : k => v.id },
{ for k, v in resource.google_notebooks_instance.playground : k => v.id }
)
}
output "project" {
description = "The project resource as return by the `project` module."
value = module.project
}
output "project_id" {
description = "Project ID."
value = module.project.project_id
}

View File

@ -1,20 +0,0 @@
bucket_name = "creditcards-dev"
dataset_name = "creditcards"
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
labels = {
"env" : "dev",
"team" : "ml"
}
notebooks = {
"myworkbench" : {
"owner" : "user@example.com",
"region" : "europe-west4",
"subnet" : "default",
}
}
prefix = "pref"
project_id = "creditcards-dev"
project_create = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
}

View File

@ -30,15 +30,11 @@ variable "dataset_name" {
variable "groups" {
description = "Name of the groups (name@domain.org) to apply opinionated IAM permissions."
type = object({
gcp-ml-ds = string
gcp-ml-eng = string
gcp-ml-viewer = string
gcp-ml-ds = optional(string)
gcp-ml-eng = optional(string)
gcp-ml-viewer = optional(string)
})
default = {
gcp-ml-ds = null
gcp-ml-eng = null
gcp-ml-viewer = null
}
default = {}
nullable = false
}
@ -71,16 +67,24 @@ variable "network_config" {
}
variable "notebooks" {
description = "Vertex AI workbenchs to be deployed."
description = "Vertex AI workbenchs to be deployed. Service Account runtime/instances deployed."
type = map(object({
owner = string
region = string
subnet = string
internal_ip_only = optional(bool, false)
idle_shutdown = optional(bool)
type = string
machine_type = optional(string, "n1-standard-4")
internal_ip_only = optional(bool, true)
idle_shutdown = optional(bool, false)
owner = optional(string)
}))
default = {}
nullable = false
validation {
condition = alltrue([
for k, v in var.notebooks : contains(["USER_MANAGED", "MANAGED"], v.type)])
error_message = "All `type` must be one of `USER_MANAGED` or `MANAGED`."
}
validation {
condition = alltrue([
for k, v in var.notebooks : (v.type == "MANAGED" && try(v.owner != null, false) || v.type == "USER_MANAGED")])
error_message = "`owner` must be set for `MANAGED` instances."
}
}
variable "prefix" {
@ -89,38 +93,18 @@ variable "prefix" {
default = null
}
variable "project_create" {
description = "Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format."
variable "project_config" {
description = "Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format."
type = object({
billing_account_id = string
parent = string
billing_account_id = optional(string)
parent = optional(string)
project_id = string
})
default = null
}
variable "project_id" {
description = "Project id, references existing project if `project_create` is null."
type = string
}
variable "project_services" {
description = "List of core services enabled on all projects."
type = list(string)
default = [
"aiplatform.googleapis.com",
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"cloudbuild.googleapis.com",
"compute.googleapis.com",
"datacatalog.googleapis.com",
"dataflow.googleapis.com",
"iam.googleapis.com",
"monitoring.googleapis.com",
"notebooks.googleapis.com",
"secretmanager.googleapis.com",
"servicenetworking.googleapis.com",
"serviceusage.googleapis.com"
]
validation {
condition = var.project_config.project_id != null
error_message = "Project id must be set."
}
nullable = false
}
variable "region" {
@ -135,18 +119,15 @@ variable "repo_name" {
default = null
}
variable "sa_mlops_name" {
description = "Name for the MLOPs Service Account."
type = string
default = "sa-mlops"
}
variable "service_encryption_keys" { # service encription key
variable "service_encryption_keys" {
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
type = object({
bq = string
compute = string
storage = string
aiplatform = optional(string)
bq = optional(string)
notebooks = optional(string)
secretmanager = optional(string)
storage = optional(string)
})
default = null
}
default = {}
nullable = false
}

View File

@ -0,0 +1,127 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
resource "google_vertex_ai_metadata_store" "store" {
provider = google-beta
project = module.project.project_id
name = "default"
description = "Vertex Ai Metadata Store"
region = var.region
dynamic "encryption_spec" {
for_each = var.service_encryption_keys.aiplatform == null ? [] : [""]
content {
kms_key_name = var.service_encryption_keys.aiplatform
}
}
# `state` value will be decided automatically based on the result of the configuration
lifecycle {
ignore_changes = [state]
}
}
module "service-account-notebook" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "notebook-sa"
}
resource "google_notebooks_runtime" "runtime" {
for_each = { for k, v in var.notebooks : k => v if v.type == "MANAGED" }
name = "${var.prefix}-${each.key}"
project = module.project.project_id
location = var.region
access_config {
access_type = "SINGLE_USER"
runtime_owner = try(var.notebooks[each.key].owner, null)
}
software_config {
enable_health_monitoring = true
}
virtual_machine {
virtual_machine_config {
machine_type = var.notebooks[each.key].machine_type
network = local.vpc
subnet = local.subnet
internal_ip_only = var.notebooks[each.key].internal_ip_only
dynamic "encryption_config" {
for_each = var.service_encryption_keys.notebooks == null ? [] : [1]
content {
kms_key = var.service_encryption_keys.notebooks
}
}
metadata = {
notebook-disable-nbconvert = "false"
notebook-disable-downloads = "true"
notebook-disable-terminal = "false"
notebook-disable-root = "true"
}
data_disk {
initialize_params {
disk_size_gb = "100"
disk_type = "PD_STANDARD"
}
}
}
}
}
resource "google_notebooks_instance" "playground" {
for_each = { for k, v in var.notebooks : k => v if v.type == "USER_MANAGED" }
name = "${var.prefix}-${each.key}"
location = "${var.region}-b"
machine_type = var.notebooks[each.key].machine_type
project = module.project.project_id
container_image {
repository = "gcr.io/deeplearning-platform-release/base-cpu"
tag = "latest"
}
install_gpu_driver = true
boot_disk_type = "PD_SSD"
boot_disk_size_gb = 110
disk_encryption = var.service_encryption_keys.notebooks != null ? "CMEK" : null
kms_key = var.service_encryption_keys.notebooks
no_public_ip = var.notebooks[each.key].internal_ip_only
no_proxy_access = false
network = local.vpc
subnet = local.subnet
instance_owners = try(tolist(var.notebooks[each.key].owner), null)
service_account = module.service-account-notebook.email
metadata = {
notebook-disable-nbconvert = "false"
notebook-disable-downloads = "false"
notebook-disable-terminal = "false"
notebook-disable-root = "true"
}
# Remove once terraform-provider-google/issues/9164 is fixed
lifecycle {
ignore_changes = [disk_encryption, kms_key]
}
#TODO Uncomment once terraform-provider-google/issues/9273 is fixed
# tags = ["ssh"]
depends_on = [
google_project_iam_member.shared_vpc,
]
}

View File

@ -18,6 +18,7 @@
service_agent: "service-%s@gcp-sa-adsdatahub.iam.gserviceaccount.com"
- name: "aiplatform"
service_agent: "service-%s@gcp-sa-aiplatform.iam.gserviceaccount.com"
jit: true
- name: "aiplatform-cc"
service_agent: "service-%s@gcp-sa-aiplatform-cc.iam.gserviceaccount.com"
- name: "alloydb"