History

Wiktor Niesiobędzki b902b1dab9 Fix non-empty plan after apply		2024-02-13 07:40:31 +01:00
..
images	Vertex Pipelines MLOps framework blueprint (#1038 )	2023-02-02 19:13:13 +01:00
README.md	end-to-end tests for Vertex blueprint	2023-11-15 11:04:12 +00:00
blueprint-providers.tf	Update readme	2023-04-20 16:53:09 +02:00
ci-cd.tf	Added CMEK for Secret auto managed (#1739 )	2023-11-10 16:45:47 +01:00
main.tf	Extend FAST to support different principal types (#2064 )	2024-02-12 14:35:30 +01:00
metadata.yaml	chore: update mlops blueprint metadata (#1382 )	2023-05-17 07:41:57 +00:00
outputs.tf	Fix Shielded Folder - VertexML interoperability (#1355 )	2023-05-05 07:54:57 +00:00
variables.tf	Make deletion protection consistent across all modules (#1735 )	2023-10-05 17:31:07 +02:00
vertex.tf	Fix non-empty plan after apply	2024-02-13 07:40:31 +01:00

README.md

MLOps with Vertex AI

Tagline

Create a Vertex AI environment needed for MLOps.

Detailed

This example implements the infrastructure required to deploy an end-to-end MLOps process using Vertex AI platform.

Architecture

The blueprint will deploy all the required resources to have a fully functional MLOPs environment containing:

Vertex Workbench (for the experimentation environment).
GCP Project (optional) to host all the resources.
Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the network_configvariable.
Firewall rule to allow the internal subnet communication required by Dataflow.
Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow).
GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining).
BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset.
Artifact Registry Docker repository to host the custom images.
Service account (PREFIX-sa-mlops) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline).
Service account (PREFIX-sa-github@) to be used by Workload Identity Federation, to federate Github identity (Optional).
Secret Manager to store the Github SSH key to get access the CICD code repo.

Documentation

Pre-requirements

User groups

Assign roles relying on User groups is a way to decouple the final set of permissions from the stage where entities and resources are created, and their IAM bindings defined. You can configure the group names through the groups variable. These groups should be created before launching Terraform.

We use the following groups to control access to resources:

Data Scientist (gcp-ml-ds@<company.org>). They manage notebooks and create ML pipelines.
ML Engineers (gcp-ml-eng@<company.org>). They manage the different Vertex resources.
ML Viewer (gcp-ml-eng@<company.org>). Group with wiewer permission for the different resources.

Please note that these groups are not suitable for production grade environments. Roles can be customized in the main.tffile.

Instructions

Deploy the experimentation environment

Create a terraform.tfvars file and specify the variables to match your desired configuration. You can use the provided terraform.tfvars.sample as reference.
Run terraform init and terraform apply

What's next?

This blueprint can be used as a building block for setting up an end2end ML Ops solution. As next step, you can follow this guide to setup a Vertex AI pipeline and run it on the deployed infraestructure.

Usage

Basic usage of this module is as follows:

module "test" {
  source = "./fabric/blueprints/data-solutions/vertex-mlops/"
  notebooks = {
    "myworkbench" = {
      type = "USER_MANAGED"
    }
  }
  prefix = "pref-dev"
  project_config = {
    billing_account_id = "000000-123456-123456"
    parent             = "folders/111111111111"
    project_id         = "test-dev"
  }
}
# tftest modules=11 resources=62

Variables

name	description	type	required	default
notebooks	Vertex AI workbenches to be deployed. Service Account runtime/instances deployed.	`map(object({…}))`	✓
project_config	Provide 'billing_account_id' value if project creation is needed, uses existing 'project_id' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format.	`object({…})`	✓
bucket_name	GCS bucket name to store the Vertex AI artifacts.	`string`		`null`
dataset_name	BigQuery Dataset to store the training data.	`string`		`null`
deletion_protection	Prevent Terraform from destroying data storage resources (storage buckets, GKE clusters, CloudSQL instances) in this blueprint. When this field is set in Terraform state, a terraform destroy or terraform apply that would delete data storage resources will fail.	`bool`		`false`
groups	Name of the groups (name@domain.org) to apply opinionated IAM permissions.	`object({…})`		`{}`
identity_pool_claims	Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created.	`string`		`null`
labels	Labels to be assigned at project level.	`map(string)`		`{}`
location	Location used for multi-regional resources.	`string`		`"eu"`
network_config	Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values.	`object({…})`		`null`
prefix	Prefix used for the project id.	`string`		`null`
region	Region used for regional resources.	`string`		`"europe-west4"`
repo_name	Cloud Source Repository name. null to avoid to create it.	`string`		`null`
service_encryption_keys	Cloud KMS to use to encrypt different services. Key location should match service region.	`object({…})`		`{}`

Outputs

name	description	sensitive
github	Github Configuration.
notebook	Vertex AI notebooks ids.
project_id	Project ID.

Test

module "test" {
  source = "./fabric/blueprints/data-solutions/vertex-mlops/"
  labels = {
    "env"  = "dev",
    "team" = "ml"
  }
  bucket_name          = "gcs-test"
  dataset_name         = "bq_test"
  identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
  notebooks = {
    "myworkbench" = {
      type = "USER_MANAGED"
    }
  }
  prefix = var.prefix
  project_config = {
    billing_account_id = var.billing_account_id
    parent             = var.folder_id
    project_id         = "test-dev"
  }
}
# tftest modules=13 resources=67 e2e