Vertex Pipelines MLOps framework blueprint (#1038)

* First release of the MLOps blueprint
This commit is contained in:
javiergp 2023-02-02 19:13:13 +01:00 committed by GitHub
parent 46ba8f5691
commit ce1f86d20b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 799 additions and 2 deletions

View File

@ -6,7 +6,7 @@ Currently available blueprints:
- **apigee** - [Apigee Hybrid on GKE](./apigee/hybrid-gke/), [Apigee X analytics in BigQuery](./apigee/bigquery-analytics), [Apigee network patterns](./apigee/network-patterns/)
- **cloud operations** - [Active Directory Federation Services](./cloud-operations/adfs), [Cloud Asset Inventory feeds for resource change tracking and remediation](./cloud-operations/asset-inventory-feed-remediation), [Fine-grained Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Cloud DNS & Shared VPC design](./cloud-operations/dns-shared-vpc), [Delegated Role Grants](./cloud-operations/iam-delegated-role-grants), [Networking Dashboard](./cloud-operations/network-dashboard), [Managing on-prem service account keys by uploading public keys](./cloud-operations/onprem-sa-key-management), [Compute Image builder with Hashicorp Packer](./cloud-operations/packer-image-builder), [Packer example](./cloud-operations/packer-image-builder/packer), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Configuring workload identity federation for Terraform Cloud/Enterprise workflow](./cloud-operations/terraform-enterprise-wif), [TCP healthcheck and restart for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [Migrate for Compute Engine (v5) blueprints](./cloud-operations/vm-migration), [Configuring workload identity federation to access Google Cloud resources from apps running on Azure](./cloud-operations/workload-identity-federation)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops)
- **factories** - [The why and the how of Resource Factories](./factories), [Google Cloud Identity Group Factory](./factories/cloud-identity-group-factory), [Google Cloud BQ Factory](./factories/bigquery-factory), [Google Cloud VPC Firewall Factory](./factories/net-vpc-firewall-yaml), [Minimal Project Factory](./factories/project-factory)
- **GKE** - [Binary Authorization Pipeline Blueprint](./gke/binauthz), [Storage API](./gke/binauthz/image), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api), [GKE Multitenant Blueprint](./gke/multitenant-fleet), [Shared VPC with GKE support](./networking/shared-vpc-gke/)
- **networking** - [Decentralized firewall management](./networking/decentralized-firewall), [Decentralized firewall validator](./networking/decentralized-firewall/validator), [Network filtering with Squid](./networking/filtering-proxy), [Network filtering with Squid with isolated VPCs using Private Service Connect](./networking/filtering-proxy-psc), [HTTP Load Balancer with Cloud Armor](./networking/glb-and-armor), [Hub and Spoke via VPN](./networking/hub-and-spoke-vpn), [Hub and Spoke via VPC Peering](./networking/hub-and-spoke-peering), [Internal Load Balancer as Next Hop](./networking/ilb-next-hop), On-prem DNS and Google Private Access, [Calling a private Cloud Function from On-premises](./networking/private-cloud-function-from-onprem), [Hybrid connectivity to on-premise services through PSC](./networking/psc-hybrid), [PSC Producer](./networking/psc-hybrid/psc-producer), [PSC Consumer](./networking/psc-hybrid/psc-consumer), [Shared VPC with optional GKE cluster](./networking/shared-vpc-gke)

Binary file not shown.

Binary file not shown.

View File

@ -52,6 +52,13 @@ running on a VPC with a private IP and a dedicated Service Account. A GCS bucket
### SQL Server Always On Availability Groups
<a href="./sqlserver-alwayson/" title="SQL Server Always On Availability Groups"><img src="https://cloud.google.com/compute/images/sqlserver-ag-architecture.svg" align="left" width="280px"></a>
This [blueprint](./data-platform-foundations/) implements SQL Server Always On Availability Groups using Fabric modules. It builds a two node cluster with a fileshare witness instance in an existing VPC and adds the necessary firewalling. The actual setup process (apart from Active Directory operations) has been scripted, so that least amount of manual works needs to performed.
This [blueprint](./sqlserver-alwayson/) implements SQL Server Always On Availability Groups using Fabric modules. It builds a two node cluster with a fileshare witness instance in an existing VPC and adds the necessary firewalling. The actual setup process (apart from Active Directory operations) has been scripted, so that least amount of manual works needs to performed.
<br clear="left">
### MLOps with Vertex AI
<a href="./vertex-mlops/" title="MLOps with Vertex AI"><img src="./vertex-mlops/images/mlops_projects.png" align="left" width="280px"></a>
This [blueprint](./vertex-mlops/) implements the infrastructure required to have a fully functional MLOPs environment using Vertex AI: required GCP services activation, Vertex Workbench, GCS buckets to host Vertex AI and Cloud Build artifacts, Artifact Registry docker repository to host custom images, required service accounts, networking and Workload Identity Federation Provider for Github integration (optional).
<br clear="left">

View File

@ -0,0 +1,79 @@
# MLOps with Vertex AI
## Introduction
This example implements the infrastructure required to deploy an end-to-end [MLOps process](https://services.google.com/fh/files/misc/practitioners_guide_to_mlops_whitepaper.pdf) using [Vertex AI](https://cloud.google.com/vertex-ai) platform.
## GCP resources
The blueprint will deploy all the required resources to have a fully functional MLOPs environment containing:
- Vertex Workbench (for the experimentation environment)
- GCP Project (optional) to host all the resources
- Isolated VPC network and a subnet to be used by Vertex and Dataflow. Alternatively, an external Shared VPC can be configured using the `network_config`variable.
- Firewall rule to allow the internal subnet communication required by Dataflow
- Cloud NAT required to reach the internet from the different computing resources (Vertex and Dataflow)
- GCS buckets to host Vertex AI and Cloud Build Artifacts. By default the buckets will be regional and should match the Vertex AI region for the different resources (i.e. Vertex Managed Dataset) and processes (i.e. Vertex trainining)
- BigQuery Dataset where the training data will be stored. This is optional, since the training data could be already hosted in an existing BigQuery dataset.
- Artifact Registry Docker repository to host the custom images.
- Service account (`mlops-[env]@`) with the minimum permissions required by Vertex AI and Dataflow (if this service is used inside of the Vertex AI Pipeline).
- Service account (`github@`) to be used by Workload Identity Federation, to federate Github identity (Optional).
- Secret to store the Github SSH key to get access the CICD code repo.
![MLOps project description](./images/mlops_projects.png "MLOps project description")
## Pre-requirements
### User groups
Assign roles relying on User groups is a way to decouple the final set of permissions from the stage where entities and resources are created, and their IAM bindings defined. You can configure the group names through the `groups` variable. These groups should be created before launching Terraform.
We use the following groups to control access to resources:
- *Data Scientits* (gcp-ml-ds@<company.org>). They manage notebooks and create ML pipelines.
- *ML Engineers* (gcp-ml-eng@<company.org>). They manage the different Vertex resources.
- *ML Viewer* (gcp-ml-eng@<company.org>). Group with wiewer permission for the different resources.
Please note that these groups are not suitable for production grade environments. Roles can be customized in the `main.tf`file.
## Instructions
### Deploy the experimentation environment
- Create a `terraform.tfvars` file and specify the variables to match your desired configuration. You can use the provided `terraform.tfvars.sample` as reference.
- Run `terraform init` and `terraform apply`
## What's next?
This blueprint can be used as a building block for setting up an end2end ML Ops solution. As next step, you can follow this [guide](https://cloud.google.com/architecture/architecture-for-mlops-using-tfx-kubeflow-pipelines-and-cloud-build) to setup a Vertex AI pipeline and run it on the deployed infraestructure.
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [project_id](variables.tf#L101) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [bucket_name](variables.tf#L18) | GCS bucket name to store the Vertex AI artifacts. | <code>string</code> | | <code>null</code> |
| [dataset_name](variables.tf#L24) | BigQuery Dataset to store the training data. | <code>string</code> | | <code>null</code> |
| [groups](variables.tf#L30) | Name of the groups (name@domain.org) to apply opinionated IAM permissions. | <code title="object&#40;&#123;&#10; gcp-ml-ds &#61; string&#10; gcp-ml-eng &#61; string&#10; gcp-ml-viewer &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code title="&#123;&#10; gcp-ml-ds &#61; null&#10; gcp-ml-eng &#61; null&#10; gcp-ml-viewer &#61; null&#10;&#125;">&#123;&#8230;&#125;</code> |
| [identity_pool_claims](variables.tf#L45) | Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L51) | Labels to be assigned at project level. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [location](variables.tf#L57) | Location used for multi-regional resources. | <code>string</code> | | <code>&#34;eu&#34;</code> |
| [network_config](variables.tf#L63) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_link &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [notebooks](variables.tf#L73) | Vertex AI workbenchs to be deployed. | <code title="map&#40;object&#40;&#123;&#10; owner &#61; string&#10; region &#61; string&#10; subnet &#61; string&#10; internal_ip_only &#61; optional&#40;bool, false&#41;&#10; idle_shutdown &#61; optional&#40;bool&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [prefix](variables.tf#L86) | Prefix used for the project id. | <code>string</code> | | <code>null</code> |
| [project_create](variables.tf#L92) | Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [project_services](variables.tf#L106) | List of core services enabled on all projects. | <code>list&#40;string&#41;</code> | | <code title="&#91;&#10; &#34;aiplatform.googleapis.com&#34;,&#10; &#34;artifactregistry.googleapis.com&#34;,&#10; &#34;bigquery.googleapis.com&#34;,&#10; &#34;cloudbuild.googleapis.com&#34;,&#10; &#34;compute.googleapis.com&#34;,&#10; &#34;datacatalog.googleapis.com&#34;,&#10; &#34;dataflow.googleapis.com&#34;,&#10; &#34;iam.googleapis.com&#34;,&#10; &#34;monitoring.googleapis.com&#34;,&#10; &#34;notebooks.googleapis.com&#34;,&#10; &#34;secretmanager.googleapis.com&#34;,&#10; &#34;servicenetworking.googleapis.com&#34;,&#10; &#34;serviceusage.googleapis.com&#34;&#10;&#93;">&#91;&#8230;&#93;</code> |
| [region](variables.tf#L126) | Region used for regional resources. | <code>string</code> | | <code>&#34;europe-west4&#34;</code> |
| [repo_name](variables.tf#L132) | Cloud Source Repository name. null to avoid to create it. | <code>string</code> | | <code>null</code> |
| [sa_mlops_name](variables.tf#L138) | Name for the MLOPs Service Account. | <code>string</code> | | <code>&#34;sa-mlops&#34;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [github](outputs.tf#L33) | Github Configuration. | |
| [notebook](outputs.tf#L39) | Vertex AI managed notebook details. | |
| [project](outputs.tf#L44) | The project resource as return by the `project` module. | |
| [project_id](outputs.tf#L49) | Project ID. | |
<!-- END TFDOC -->
# TODO
- Add support for User Managed Notebooks, SA permission option and non default SA for Single User mode.
- Improve default naming for local VPC and Cloud NAT

View File

@ -0,0 +1,74 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
resource "google_iam_workload_identity_pool" "github_pool" {
count = var.identity_pool_claims == null ? 0 : 1
project = module.project.project_id
workload_identity_pool_id = "gh-pool"
display_name = "Github Actions Identity Pool"
description = "Identity pool for Github Actions"
}
resource "google_iam_workload_identity_pool_provider" "github_provider" {
count = var.identity_pool_claims == null ? 0 : 1
project = module.project.project_id
workload_identity_pool_id = google_iam_workload_identity_pool.github_pool[0].workload_identity_pool_id
workload_identity_pool_provider_id = "gh-provider"
display_name = "Github Actions provider"
description = "OIDC provider for Github Actions"
attribute_mapping = {
"google.subject" = "assertion.sub"
"attribute.repository" = "assertion.repository"
}
oidc {
issuer_uri = "https://token.actions.githubusercontent.com"
}
}
module "artifact_registry" {
source = "../../../modules/artifact-registry"
id = "docker-repo"
project_id = module.project.project_id
location = var.region
format = "DOCKER"
# iam = {
# "roles/artifactregistry.admin" = ["group:cicd@example.com"]
# }
}
module "service-account-github" {
source = "../../../modules/iam-service-account"
name = "sa-github"
project_id = module.project.project_id
iam = var.identity_pool_claims == null ? {} : { "roles/iam.workloadIdentityUser" = ["principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github_pool[0].name}/${var.identity_pool_claims}"] }
}
# NOTE: Secret manager module at the moment does not support CMEK
module "secret-manager" {
project_id = module.project.project_id
source = "../../../modules/secret-manager"
secrets = {
github-key = [var.region]
}
iam = {
github-key = {
"roles/secretmanager.secretAccessor" = [
"serviceAccount:${module.project.service_accounts.robots.cloudbuild}",
module.service-account-mlops.iam_email
]
}
}
}

Binary file not shown.

After

Width:  |  Height:  |  Size: 32 KiB

View File

@ -0,0 +1,278 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
locals {
group_iam = merge(
var.groups.gcp-ml-viewer == null ? {} : {
(var.groups.gcp-ml-viewer) = [
"roles/aiplatform.viewer",
"roles/artifactregistry.reader",
"roles/dataflow.viewer",
"roles/logging.viewer",
"roles/storage.objectViewer"
]
},
var.groups.gcp-ml-ds == null ? {} : {
(var.groups.gcp-ml-ds) = [
"roles/aiplatform.admin",
"roles/artifactregistry.admin",
"roles/bigquery.dataEditor",
"roles/bigquery.jobUser",
"roles/bigquery.user",
"roles/cloudbuild.builds.editor",
"roles/cloudfunctions.developer",
"roles/dataflow.developer",
"roles/dataflow.worker",
"roles/iam.serviceAccountUser",
"roles/logging.logWriter",
"roles/logging.viewer",
"roles/notebooks.admin",
"roles/pubsub.editor",
"roles/serviceusage.serviceUsageConsumer",
"roles/storage.admin"
]
},
var.groups.gcp-ml-eng == null ? {} : {
(var.groups.gcp-ml-eng) = [
"roles/aiplatform.admin",
"roles/artifactregistry.admin",
"roles/bigquery.dataEditor",
"roles/bigquery.jobUser",
"roles/bigquery.user",
"roles/dataflow.developer",
"roles/dataflow.worker",
"roles/iam.serviceAccountUser",
"roles/logging.logWriter",
"roles/logging.viewer",
"roles/serviceusage.serviceUsageConsumer",
"roles/storage.admin"
]
}
)
service_encryption_keys = var.service_encryption_keys
shared_vpc_project = try(var.network_config.host_project, null)
subnet = (
local.use_shared_vpc
? var.network_config.subnet_self_link
: values(module.vpc-local.0.subnet_self_links)[0]
)
vpc = (
local.use_shared_vpc
? var.network_config.network_self_link
: module.vpc-local.0.self_link
)
use_shared_vpc = var.network_config != null
shared_vpc_bindings = {
"roles/compute.networkUser" = [
"robot-df", "notebooks"
]
}
shared_vpc_role_members = {
robot-df = "serviceAccount:${module.project.service_accounts.robots.dataflow}"
notebooks = "serviceAccount:${module.project.service_accounts.robots.notebooks}"
}
# reassemble in a format suitable for for_each
shared_vpc_bindings_map = {
for binding in flatten([
for role, members in local.shared_vpc_bindings : [
for member in members : { role = role, member = member }
]
]) : "${binding.role}-${binding.member}" => binding
}
}
module "gcs-bucket" {
count = var.bucket_name == null ? 0 : 1
source = "../../../modules/gcs"
project_id = module.project.project_id
name = var.bucket_name
prefix = var.prefix
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
}
# Default bucket for Cloud Build to prevent error: "'us' violates constraint constraints/gcp.resourceLocations"
# https://stackoverflow.com/questions/53206667/cloud-build-fails-with-resource-location-constraint
module "gcs-bucket-cloudbuild" {
source = "../../../modules/gcs"
project_id = module.project.project_id
name = "${var.project_id}_cloudbuild"
prefix = var.prefix
location = var.region
storage_class = "REGIONAL"
versioning = false
encryption_key = try(local.service_encryption_keys.storage, null)
}
module "bq-dataset" {
count = var.dataset_name == null ? 0 : 1
source = "../../../modules/bigquery-dataset"
project_id = module.project.project_id
id = var.dataset_name
location = var.region
encryption_key = try(local.service_encryption_keys.bq, null)
}
module "vpc-local" {
count = local.use_shared_vpc ? 0 : 1
source = "../../../modules/net-vpc"
project_id = module.project.project_id
name = "default"
subnets = [
{
"name" : "default",
"region" : "${var.region}",
"ip_cidr_range" : "10.4.0.0/24",
"secondary_ip_range" : null
}
]
psa_config = {
ranges = {
"vertex" : "10.13.0.0/18"
}
routes = null
}
}
module "firewall" {
count = local.use_shared_vpc ? 0 : 1
source = "../../../modules/net-vpc-firewall"
project_id = module.project.project_id
network = module.vpc-local[0].name
default_rules_config = {
disabled = true
}
ingress_rules = {
dataflow-ingress = {
description = "Dataflow service."
direction = "INGRESS"
action = "allow"
sources = ["dataflow"]
targets = ["dataflow"]
ranges = []
use_service_accounts = false
rules = [{ protocol = "tcp", ports = ["12345-12346"] }]
extra_attributes = {}
}
}
}
module "cloudnat" {
count = local.use_shared_vpc ? 0 : 1
source = "../../../modules/net-cloudnat"
project_id = module.project.project_id
region = var.region
name = "default"
router_network = module.vpc-local[0].self_link
}
module "project" {
source = "../../../modules/project"
name = var.project_id
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account_id, null)
project_create = var.project_create != null
prefix = var.prefix
group_iam = local.group_iam
iam = {
"roles/aiplatform.user" = [module.service-account-mlops.iam_email]
"roles/artifactregistry.reader" = [module.service-account-mlops.iam_email]
"roles/artifactregistry.writer" = [module.service-account-github.iam_email]
"roles/bigquery.dataEditor" = [module.service-account-mlops.iam_email]
"roles/bigquery.jobUser" = [module.service-account-mlops.iam_email]
"roles/bigquery.user" = [module.service-account-mlops.iam_email]
"roles/cloudbuild.builds.editor" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
]
"roles/cloudfunctions.invoker" = [module.service-account-mlops.iam_email]
"roles/dataflow.developer" = [module.service-account-mlops.iam_email]
"roles/dataflow.worker" = [module.service-account-mlops.iam_email]
"roles/iam.serviceAccountUser" = [
module.service-account-mlops.iam_email,
"serviceAccount:${module.project.service_accounts.robots.cloudbuild}"
]
"roles/monitoring.metricWriter" = [module.service-account-mlops.iam_email]
"roles/run.invoker" = [module.service-account-mlops.iam_email]
"roles/serviceusage.serviceUsageConsumer" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
]
"roles/storage.admin" = [
module.service-account-mlops.iam_email,
module.service-account-github.iam_email
]
}
labels = var.labels
org_policies = {
# Example of applying a project wide policy
# "constraints/compute.requireOsLogin" = {
# enforce = false
# }
}
service_encryption_key_ids = {
bq = [try(local.service_encryption_keys.bq, null)]
compute = [try(local.service_encryption_keys.compute, null)]
cloudbuild = [try(local.service_encryption_keys.storage, null)]
notebooks = [try(local.service_encryption_keys.compute, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
services = var.project_services
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
}
}
module "service-account-mlops" {
source = "../../../modules/iam-service-account"
name = var.sa_mlops_name
project_id = module.project.project_id
iam = {
"roles/iam.serviceAccountUser" = [module.service-account-github.iam_email]
}
}
resource "google_project_iam_member" "shared_vpc" {
count = local.use_shared_vpc ? 1 : 0
project = var.network_config.host_project
role = "roles/compute.networkUser"
member = "serviceAccount:${module.project.service_accounts.robots.notebooks}"
}
resource "google_sourcerepo_repository" "code-repo" {
count = var.repo_name == null ? 0 : 1
name = var.repo_name
project = module.project.project_id
}

View File

@ -0,0 +1,60 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
resource "google_notebooks_runtime" "runtime" {
for_each = var.notebooks
name = each.key
project = module.project.project_id
location = var.notebooks[each.key].region
access_config {
access_type = "SINGLE_USER"
runtime_owner = var.notebooks[each.key].owner
}
software_config {
enable_health_monitoring = true
idle_shutdown = var.notebooks[each.key].idle_shutdown
idle_shutdown_timeout = 1800
}
virtual_machine {
virtual_machine_config {
machine_type = "n1-standard-4"
network = local.vpc
subnet = local.subnet
internal_ip_only = var.notebooks[each.key].internal_ip_only
dynamic "encryption_config" {
for_each = try(local.service_encryption_keys.compute, null) == null ? [] : [1]
content {
kms_key = local.service_encryption_keys.compute
}
}
metadata = {
notebook-disable-nbconvert = "false"
notebook-disable-downloads = "false"
notebook-disable-terminal = "false"
#notebook-disable-root = "true"
#notebook-upgrade-schedule = "48 4 * * MON"
}
data_disk {
initialize_params {
disk_size_gb = "100"
disk_type = "PD_STANDARD"
}
}
}
}
}

View File

@ -0,0 +1,52 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
# TODO(): proper outputs
locals {
docker_split = try(split("/", module.artifact_registry.id), null)
docker_repo = try("${local.docker_split[3]}-docker.pkg.dev/${local.docker_split[1]}/${local.docker_split[5]}", null)
gh_config = {
WORKLOAD_ID_PROVIDER = try(google_iam_workload_identity_pool_provider.github_provider[0].name, null)
SERVICE_ACCOUNT = try(module.service-account-github.email, null)
PROJECT_ID = module.project.project_id
DOCKER_REPO = local.docker_repo
SA_MLOPS = module.service-account-mlops.email
SUBNETWORK = local.subnet
}
}
output "github" {
description = "Github Configuration."
value = local.gh_config
}
output "notebook" {
description = "Vertex AI managed notebook details."
value = { for k, v in resource.google_notebooks_runtime.runtime : k => v.id }
}
output "project" {
description = "The project resource as return by the `project` module."
value = module.project
}
output "project_id" {
description = "Project ID."
value = module.project.project_id
}

View File

@ -0,0 +1,20 @@
bucket_name = "creditcards-dev"
dataset_name = "creditcards"
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
labels = {
"env" : "dev",
"team" : "ml"
}
notebooks = {
"myworkbench" : {
"owner" : "user@example.com",
"region" : "europe-west4",
"subnet" : "default",
}
}
prefix = "pref"
project_id = "creditcards-dev"
project_create = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
}

View File

@ -0,0 +1,152 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "bucket_name" {
description = "GCS bucket name to store the Vertex AI artifacts."
type = string
default = null
}
variable "dataset_name" {
description = "BigQuery Dataset to store the training data."
type = string
default = null
}
variable "groups" {
description = "Name of the groups (name@domain.org) to apply opinionated IAM permissions."
type = object({
gcp-ml-ds = string
gcp-ml-eng = string
gcp-ml-viewer = string
})
default = {
gcp-ml-ds = null
gcp-ml-eng = null
gcp-ml-viewer = null
}
nullable = false
}
variable "identity_pool_claims" {
description = "Claims to be used by Workload Identity Federation (i.e.: attribute.repository/ORGANIZATION/REPO). If a not null value is provided, then google_iam_workload_identity_pool resource will be created."
type = string
default = null
}
variable "labels" {
description = "Labels to be assigned at project level."
type = map(string)
default = {}
}
variable "location" {
description = "Location used for multi-regional resources."
type = string
default = "eu"
}
variable "network_config" {
description = "Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values."
type = object({
host_project = string
network_self_link = string
subnet_self_link = string
})
default = null
}
variable "notebooks" {
description = "Vertex AI workbenchs to be deployed."
type = map(object({
owner = string
region = string
subnet = string
internal_ip_only = optional(bool, false)
idle_shutdown = optional(bool)
}))
default = {}
nullable = false
}
variable "prefix" {
description = "Prefix used for the project id."
type = string
default = null
}
variable "project_create" {
description = "Provide values if project creation is needed, uses existing project if null. Parent is in 'folders/nnn' or 'organizations/nnn' format."
type = object({
billing_account_id = string
parent = string
})
default = null
}
variable "project_id" {
description = "Project id, references existing project if `project_create` is null."
type = string
}
variable "project_services" {
description = "List of core services enabled on all projects."
type = list(string)
default = [
"aiplatform.googleapis.com",
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"cloudbuild.googleapis.com",
"compute.googleapis.com",
"datacatalog.googleapis.com",
"dataflow.googleapis.com",
"iam.googleapis.com",
"monitoring.googleapis.com",
"notebooks.googleapis.com",
"secretmanager.googleapis.com",
"servicenetworking.googleapis.com",
"serviceusage.googleapis.com"
]
}
variable "region" {
description = "Region used for regional resources."
type = string
default = "europe-west4"
}
variable "repo_name" {
description = "Cloud Source Repository name. null to avoid to create it."
type = string
default = null
}
variable "sa_mlops_name" {
description = "Name for the MLOPs Service Account."
type = string
default = "sa-mlops"
}
variable "service_encryption_keys" { # service encription key
description = "Cloud KMS to use to encrypt different services. Key location should match service region."
type = object({
bq = string
compute = string
storage = string
})
default = null
}

View File

@ -0,0 +1,13 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@ -0,0 +1,39 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
module "projects" {
source = "../../../../../blueprints/data-solutions/vertex-mlops/"
labels = {
"env" : "dev",
"team" : "ml"
}
bucket_name = "test-dev"
dataset_name = "test"
identity_pool_claims = "attribute.repository/ORGANIZATION/REPO"
notebooks = {
"myworkbench" : {
"owner" : "user@example.com",
"region" : "europe-west4",
"subnet" : "default",
}
}
prefix = "pref"
project_id = "test-dev"
project_create = {
billing_account_id = "000000-123456-123456"
parent = "folders/111111111111"
}
}

View File

@ -0,0 +1,23 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import pytest
FIXTURES_DIR = os.path.join(os.path.dirname(__file__), 'fixture')
def test_resources(e2e_plan_runner):
"Test that plan works and the numbers of resources is as expected."
modules, resources = e2e_plan_runner(FIXTURES_DIR)
# TODO: to re-enable per-module resource count check print _, then test
assert len(modules) > 0 and len(resources) > 0