Merge pull request #1210 from GoogleCloudPlatform/lcaggio/bqml

Blueprint - BigQuery ML and Vertex AI Pipeline
This commit is contained in:
lcaggio 2023-03-06 13:51:02 +01:00 committed by GitHub
commit ca31192570
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
14 changed files with 1025 additions and 1 deletions

View File

@ -6,7 +6,7 @@ Currently available blueprints:
- **apigee** - [Apigee Hybrid on GKE](./apigee/hybrid-gke/), [Apigee X analytics in BigQuery](./apigee/bigquery-analytics), [Apigee network patterns](./apigee/network-patterns/)
- **cloud operations** - [Active Directory Federation Services](./cloud-operations/adfs), [Cloud Asset Inventory feeds for resource change tracking and remediation](./cloud-operations/asset-inventory-feed-remediation), [Fine-grained Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Cloud DNS & Shared VPC design](./cloud-operations/dns-shared-vpc), [Delegated Role Grants](./cloud-operations/iam-delegated-role-grants), [Networking Dashboard](./cloud-operations/network-dashboard), [Managing on-prem service account keys by uploading public keys](./cloud-operations/onprem-sa-key-management), [Compute Image builder with Hashicorp Packer](./cloud-operations/packer-image-builder), [Packer example](./cloud-operations/packer-image-builder/packer), [Compute Engine quota monitoring](./cloud-operations/quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Configuring workload identity federation with Terraform Cloud/Enterprise workflows](./cloud-operations/terraform-cloud-dynamic-credentials), [TCP healthcheck and restart for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [Migrate for Compute Engine (v5) blueprints](./cloud-operations/vm-migration), [Configuring workload identity federation to access Google Cloud resources from apps running on Azure](./cloud-operations/workload-identity-federation)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder)
- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud Composer version 2 private instance, supporting Shared VPC and external CMEK key](./data-solutions/composer-2), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml)
- **factories** - [The why and the how of Resource Factories](./factories), [Google Cloud Identity Group Factory](./factories/cloud-identity-group-factory), [Google Cloud BQ Factory](./factories/bigquery-factory), [Google Cloud VPC Firewall Factory](./factories/net-vpc-firewall-yaml), [Minimal Project Factory](./factories/project-factory)
- **GKE** - [Binary Authorization Pipeline Blueprint](./gke/binauthz), [Storage API](./gke/binauthz/image), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api), [GKE Multitenant Blueprint](./gke/multitenant-fleet), [Shared VPC with GKE support](./networking/shared-vpc-gke/)
- **networking** - [Calling a private Cloud Function from On-premises](./networking/private-cloud-function-from-onprem), [Decentralized firewall management](./networking/decentralized-firewall), [Decentralized firewall validator](./networking/decentralized-firewall/validator), [Network filtering with Squid](./networking/filtering-proxy), [GLB and multi-regional daisy-chaining through hybrid NEGs](./networking/glb-hybrid-neg-internal), [Hybrid connectivity to on-premise services through PSC](./networking/psc-hybrid), [HTTP Load Balancer with Cloud Armor](./networking/glb-and-armor), [Hub and Spoke via VPN](./networking/hub-and-spoke-vpn), [Hub and Spoke via VPC Peering](./networking/hub-and-spoke-peering), [Internal Load Balancer as Next Hop](./networking/ilb-next-hop), [Network filtering with Squid with isolated VPCs using Private Service Connect](./networking/filtering-proxy-psc), On-prem DNS and Google Private Access, [PSC Producer](./networking/psc-hybrid/psc-producer), [PSC Consumer](./networking/psc-hybrid/psc-consumer), [Shared VPC with optional GKE cluster](./networking/shared-vpc-gke)

View File

@ -69,3 +69,9 @@ This [blueprint](./vertex-mlops/) implements the infrastructure required to have
This [blueprint](./shielded-folder/) implements an opinionated folder configuration according to GCP best practices. Configurations implemented on the folder would be beneficial to host workloads inheriting constraints from the folder they belong to.
<br clear="left">
### BigQuery ML and Vertex AI Pipeline
<a href="./bq-ml/" title="BigQuery ML and Vertex AI Pipeline"><img src="./bq-ml/images/diagram.png" align="left" width="280px"></a>
This [blueprint](./bq-ml/) provides the necessary infrastructure to create a complete development environment for building and deploying machine learning models using BigQuery ML and Vertex AI. With this blueprint, you can deploy your models to a Vertex AI endpoint or use them within BigQuery ML.
<br clear="left">

View File

@ -0,0 +1,102 @@
# BigQuery ML and Vertex AI Pipeline
This blueprint provides the necessary infrastructure to create a complete development environment for building and deploying machine learning models using BigQuery ML and Vertex AI. With this blueprint, you can deploy your models to a Vertex AI endpoint or use them within BigQuery ML.
This is the high-level diagram:
![High-level diagram](diagram.png "High-level diagram")
It also includes the IAM wiring needed to make such scenarios work. Regional resources are used in this example, but the same logic applies to 'dual regional', 'multi regional', or 'global' resources.
The example is designed to match real-world use cases with a minimum amount of resources and be used as a starting point for your scenario.
## Managed resources and services
This sample creates several distinct groups of resources:
- Networking
- VPC network
- Subnet
- Firewall rules for SSH access via IAP and open communication within the VPC
- Cloud Nat
- IAM
- Vertex AI workbench service account
- Vertex AI pipeline service account
- Storage
- GCS bucket
- Bigquery dataset
## Customization
### Virtual Private Cloud (VPC) design
As is often the case in real-world configurations, this blueprint accepts an existing Shared-VPC via the `vpc_config` variable as input.
### Customer Managed Encryption Keys
As is often the case in real-world configurations, this blueprint accepts as input existing Cloud KMS keys to encrypt resources via the `service_encryption_keys` variable.
## Demo
In the [`demo`](./demo/) folder, you can find an example of creating a Vertex AI pipeline from a publicly available dataset and deploying the model to be used from a Vertex AI managed endpoint or from within Bigquery.
To run the demo:
- Connect to the Vertex AI workbench instance
- Clone this repository
- Run the and run [`demo/bmql_pipeline.ipynb`](demo/bmql_pipeline.ipynb) Jupyter Notebook.
## Files
| name | description | modules | resources |
|---|---|---|---|
| [datastorage.tf](./datastorage.tf) | Datastorage resources. | <code>bigquery-dataset</code> · <code>gcs</code> | |
| [main.tf](./main.tf) | Core resources. | <code>project</code> | |
| [outputs.tf](./outputs.tf) | Output variables. | | |
| [variables.tf](./variables.tf) | Terraform variables. | | |
| [versions.tf](./versions.tf) | Version pins. | | |
| [vertex.tf](./vertex.tf) | Vertex resources. | <code>iam-service-account</code> | <code>google_notebooks_instance</code> · <code>google_vertex_ai_metadata_store</code> |
| [vpc.tf](./vpc.tf) | VPC resources. | <code>net-cloudnat</code> · <code>net-vpc</code> · <code>net-vpc-firewall</code> | <code>google_project_iam_member</code> |
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [prefix](variables.tf#L23) | Prefix used for resource names. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L41) | Project id references existing project if `project_create` is null. | <code>string</code> | ✓ | |
| [location](variables.tf#L17) | The location where resources will be deployed. | <code>string</code> | | <code>&#34;US&#34;</code> |
| [project_create](variables.tf#L32) | Provide values if project creation is needed, use existing project if null. Parent format: folders/folder_id or organizations/org_id. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [region](variables.tf#L46) | The region where resources will be deployed. | <code>string</code> | | <code>&#34;us-central1&#34;</code> |
| [service_encryption_keys](variables.tf#L52) | Cloud KMS to use to encrypt different services. The key location should match the service region. | <code title="object&#40;&#123;&#10; bq &#61; string&#10; compute &#61; string&#10; storage &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [vpc_config](variables.tf#L62) | Shared VPC network configurations to use. If null networks will be created in projects with pre-configured values. | <code title="object&#40;&#123;&#10; host_project &#61; string&#10; network_self_link &#61; string&#10; subnet_self_link &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [bucket](outputs.tf#L17) | GCS Bucket URL. | |
| [dataset](outputs.tf#L22) | GCS Bucket URL. | |
| [notebook](outputs.tf#L27) | Vertex AI notebook details. | |
| [project](outputs.tf#L35) | Project id. | |
| [service-account-vertex](outputs.tf#L40) | Service account to be used for Vertex AI pipelines. | |
| [vertex-ai-metadata-store](outputs.tf#L45) | Vertex AI Metadata Store ID. | |
| [vpc](outputs.tf#L50) | VPC Network. | |
<!-- END TFDOC -->
## Test
```hcl
module "test" {
source = "./fabric/blueprints/data-solutions/bq-ml/"
project_create = {
billing_account_id = "123456-123456-123456"
parent = "folders/12345678"
}
project_id = "project-1"
prefix = "prefix"
}
# tftest modules=9 resources=46
```

View File

@ -0,0 +1,32 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Datastorage resources.
module "bucket" {
source = "../../../modules/gcs"
project_id = module.project.project_id
prefix = var.prefix
location = var.location
name = "data"
encryption_key = try(local.service_encryption_keys.storage, null) # Example assignment of an encryption key
}
module "dataset" {
source = "../../../modules/bigquery-dataset"
project_id = module.project.project_id
id = "${replace(var.prefix, "-", "_")}_data"
encryption_key = try(local.service_encryption_keys.bq, null) # Example assignment of an encryption key
location = "US"
}

View File

@ -0,0 +1,40 @@
# BigQuery ML and Vertex AI Pipeline Demo
This demo shows how to combine BigQuery ML (BQML) and Vertex AI to create a ML pipeline leveraging the infrastructure created in the blueprint.
More in details, this tutorial will focus on the following three steps:
- define a Vertex AI pipeline to create features, train and evaluate BQML models
- serve a BQ model through an API powered by Vertex AI Endpoint
- create batch prediction via BigQuery
In this tutorial we will also see how to make explainable predictions, in order to understand what are the most important features that most influence the algorithm outputs.
# Dataset
This tutorial uses a fictitious e-commerce dataset collecting programmatically generated data from the fictitious e-commerce store called The Look. The dataset is publicy available on BigQuery at this location `bigquery-public-data.thelook_ecommerce`.
# Goal
The goal of this tutorial is to train a classification ML model using BigQuery ML and predict if a new web session is going to convert.
The tutorial focuses more on how to combine Vertex AI and BigQuery ML to create a model that can be used both for near-real time and batch predictions rather than the design of the model itself.
# Main components
In this tutorial we will make use of the following main components:
- Big Query:
- standard: to create a view which contains the model features and the target variable
- ML: to train, evaluate and make batch predictions
- Vertex AI:
- Pipeline: to define a configurable and re-usable set of steps to train and evaluate a BQML model
- Experiment: to keep track of all the trainings done via the Pipeline
- Model Registry: to keep track of the trained versions of a specific model
- Endpoint: to serve the model via API
- Workbench: to run this demo
# How to get started
1. Access the Vertex AI Workbench
2. clone this repository
2. run the [`bmql_pipeline.ipynb`](bmql_pipeline.ipynb) Jupyter Notebook

View File

@ -0,0 +1,459 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Copyright 2023 Google LLC**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Copyright 2023 Google LLC\n",
"#\n",
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Install python requirements and import packages"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%pip install -r requirements.txt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import kfp\n",
"import google_cloud_pipeline_components.v1.bigquery as bqop\n",
"\n",
"from google.cloud import aiplatform as aip\n",
"from google.cloud import bigquery"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Set your env variables"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"EXPERIMENT_NAME = 'bqml-experiment'\n",
"ENDPOINT_DISPLAY_NAME = 'bqml-endpoint'\n",
"DATASET = \"{}_data\".format(PREFIX.replace(\"-\",\"_\")) \n",
"LOCATION = 'US'\n",
"MODEL_NAME = 'bqml-model'\n",
"PIPELINE_NAME = 'bqml-vertex-pipeline'\n",
"PIPELINE_ROOT = f\"gs://{PREFIX}-data\"\n",
"PREFIX = 'your-prefix'\n",
"PROJECT_ID = 'your-project-id'\n",
"REGION = 'us-central1'\n",
"SERVICE_ACCOUNT = f\"vertex-sa@{PROJECT_ID}.iam.gserviceaccount.com\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Vertex AI Pipeline Definition\n",
"\n",
"Let's first define the queries for the features and target creation and the query to train the model\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this query creates the features for our model and the target value we would like to predict\n",
"\n",
"features_query = \"\"\"\n",
"CREATE VIEW if NOT EXISTS `{project_id}.{dataset}.ecommerce_abt` AS\n",
"WITH abt AS (\n",
" SELECT user_id,\n",
" session_id,\n",
" city,\n",
" postal_code,\n",
" browser,\n",
" traffic_source,\n",
" min(created_at) AS session_starting_ts,\n",
" sum(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) has_purchased\n",
" FROM `bigquery-public-data.thelook_ecommerce.events` \n",
" GROUP BY user_id,\n",
" session_id,\n",
" city,\n",
" postal_code,\n",
" browser,\n",
" traffic_source\n",
"), previous_orders AS (\n",
" SELECT user_id,\n",
" array_agg (struct(created_at AS order_creations_ts,\n",
" o.order_id,\n",
" o.status,\n",
" oi.order_cost)) as user_orders\n",
" FROM `bigquery-public-data.thelook_ecommerce.orders` o\n",
" JOIN (SELECT order_id,\n",
" sum(sale_price) order_cost \n",
" FROM `bigquery-public-data.thelook_ecommerce.order_items`\n",
" GROUP BY 1) oi\n",
" ON o.order_id = oi.order_id\n",
" GROUP BY 1\n",
")\n",
"SELECT abt.*,\n",
" CASE WHEN extract(DAYOFWEEK FROM session_starting_ts) IN (1,7)\n",
" THEN 'WEEKEND' \n",
" ELSE 'WEEKDAY'\n",
" END AS day_of_week,\n",
" extract(HOUR FROM session_starting_ts) hour_of_day,\n",
" (SELECT count(DISTINCT uo.order_id) \n",
" FROM unnest(user_orders) uo \n",
" WHERE uo.order_creations_ts < session_starting_ts \n",
" AND status IN ('Shipped', 'Complete', 'Processing')) AS number_of_successful_orders,\n",
" IFNULL((SELECT sum(DISTINCT uo.order_cost) \n",
" FROM unnest(user_orders) uo \n",
" WHERE uo.order_creations_ts < session_starting_ts \n",
" AND status IN ('Shipped', 'Complete', 'Processing')), 0) AS sum_previous_orders,\n",
" (SELECT count(DISTINCT uo.order_id) \n",
" FROM unnest(user_orders) uo \n",
" WHERE uo.order_creations_ts < session_starting_ts \n",
" AND status IN ('Cancelled', 'Returned')) AS number_of_unsuccessful_orders\n",
"FROM abt \n",
"LEFT JOIN previous_orders pso \n",
"ON abt.user_id = pso.user_id\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this query create the train job on BQ ML\n",
"train_query = \"\"\"\n",
"CREATE OR REPLACE MODEL `{project_id}.{dataset}.{model_name}`\n",
"OPTIONS(MODEL_TYPE='{model_type}',\n",
" INPUT_LABEL_COLS=['has_purchased'],\n",
" ENABLE_GLOBAL_EXPLAIN=TRUE,\n",
" MODEL_REGISTRY='VERTEX_AI',\n",
" DATA_SPLIT_METHOD = 'RANDOM',\n",
" DATA_SPLIT_EVAL_FRACTION = {split_fraction}\n",
" ) AS \n",
"SELECT * EXCEPT (session_id, session_starting_ts, user_id) \n",
"FROM `{project_id}.{dataset}.ecommerce_abt`\n",
"WHERE extract(ISOYEAR FROM session_starting_ts) = 2022\n",
"\"\"\""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following code block, we are defining our Vertex AI pipeline. It is made up of three main steps:\n",
"1. Create a BigQuery dataset that will contain the BigQuery ML models\n",
"2. Train the BigQuery ML model, in this case, a logistic regression\n",
"3. Evaluate the BigQuery ML model with the standard evaluation metrics\n",
"\n",
"The pipeline takes as input the following variables:\n",
"- ```dataset```: name of the dataset where the artifacts will be stored\n",
"- ```evaluate_job_conf```: bq dict configuration to define where to store evaluation metrics\n",
"- ```location```: BigQuery location\n",
"- ```model_name```: the display name of the BigQuery ML model\n",
"- ```project_id```: the project id where the GCP resources will be created\n",
"- ```split_fraction```: the percentage of data that will be used as an evaluation dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"@kfp.dsl.pipeline(name='bqml-pipeline', pipeline_root=PIPELINE_ROOT)\n",
"def pipeline(\n",
" model_name: str,\n",
" split_fraction: float,\n",
" evaluate_job_conf: dict, \n",
" dataset: str = DATASET,\n",
" project_id: str = PROJECT_ID,\n",
" location: str = LOCATION,\n",
" ):\n",
"\n",
" create_dataset = bqop.BigqueryQueryJobOp(\n",
" project=project_id,\n",
" location=location,\n",
" query=f'CREATE SCHEMA IF NOT EXISTS {dataset}'\n",
" )\n",
"\n",
" create_features_view = bqop.BigqueryQueryJobOp(\n",
" project=project_id,\n",
" location=location,\n",
" query=features_query.format(dataset=dataset, project_id=project_id),\n",
" #job_configuration_query = {\"writeDisposition\": \"WRITE_TRUNCATE\"} #, \"destinationTable\":{\"projectId\":project_id,\"datasetId\":dataset,\"tableId\":\"ecommerce_abt_table\"}} #{\"destinationTable\":{\"projectId\":\"project_id\",\"datasetId\":dataset,\"tableId\":\"ecommerce_abt_table\"}}, #\"writeDisposition\": \"WRITE_TRUNCATE\", \n",
"\n",
" ).after(create_dataset)\n",
"\n",
" create_bqml_model = bqop.BigqueryCreateModelJobOp(\n",
" project=project_id,\n",
" location=location,\n",
" query=train_query.format(model_type = 'LOGISTIC_REG'\n",
" , project_id = project_id\n",
" , dataset = dataset\n",
" , model_name = model_name\n",
" , split_fraction=split_fraction)\n",
" ).after(create_features_view)\n",
"\n",
" evaluate_bqml_model = bqop.BigqueryEvaluateModelJobOp(\n",
" project=project_id,\n",
" location=location,\n",
" model=create_bqml_model.outputs[\"model\"],\n",
" job_configuration_query=evaluate_job_conf\n",
" ).after(create_bqml_model)\n",
"\n",
"\n",
"# this is to compile our pipeline and generate the json description file\n",
"kfp.v2.compiler.Compiler().compile(pipeline_func=pipeline,\n",
" package_path=f'{PIPELINE_NAME}.json') "
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Create Experiment\n",
"\n",
"We will create an experiment to keep track of our training and tasks on a specific issue or problem."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_experiment = aip.Experiment.get_or_create(\n",
" experiment_name=EXPERIMENT_NAME,\n",
" description='This is a new experiment to keep track of bqml trainings',\n",
" project=PROJECT_ID,\n",
" location=REGION\n",
")"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Running the same training Vertex AI pipeline with different parameters\n",
"\n",
"One of the main tasks during the training phase is to compare different models or to try the same model with different inputs. We can leverage the power of Vertex AI Pipelines to submit the same steps with different training parameters. Thanks to the experiments artifact, it is possible to easily keep track of all the tests that have been done. This simplifies the process of selecting the best model to deploy.\n",
"\n",
"In this demo case, we will run the same training pipeline while changing the split data percentage between training and test data."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# this configuration is needed in order to persist the evaluation metrics on big query\n",
"job_configuration_query = {\"destinationTable\": {\"projectId\": PROJECT_ID, \"datasetId\": DATASET}, \"writeDisposition\": \"WRITE_TRUNCATE\"}\n",
"\n",
"for split_fraction in [0.1, 0.2]:\n",
" job_configuration_query['destinationTable']['tableId'] = MODEL_NAME+'-fraction-{}-eval_table'.format(int(split_fraction*100))\n",
" pipeline = aip.PipelineJob(\n",
" parameter_values = {'split_fraction':split_fraction, 'model_name': MODEL_NAME+'-fraction-{}'.format(int(split_fraction*100)), 'evaluate_job_conf': job_configuration_query },\n",
" display_name=PIPELINE_NAME,\n",
" template_path=f'{PIPELINE_NAME}.json',\n",
" pipeline_root=PIPELINE_ROOT,\n",
" enable_caching=True\n",
" )\n",
"\n",
" pipeline.submit(service_account=SERVICE_ACCOUNT, experiment=my_experiment)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Deploy the model on a Vertex AI endpoint\n",
"\n",
"Thanks to the integration of Vertex AI Endpoint, creating a live endpoint to serve the model we prefer is very straightforward."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# get the model from the Model Registry \n",
"model = aip.Model(model_name=f'{MODEL_NAME}-fraction-10')\n",
"\n",
"# let's create a Vertex Endpoint where we will deploy the ML model\n",
"endpoint = aip.Endpoint.create(\n",
" display_name=ENDPOINT_DISPLAY_NAME,\n",
" project=PROJECT_ID,\n",
" location=REGION,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# deploy the BigQuery ML model on Vertex Endpoint\n",
"# have a coffe - this step can take up 10/15 minutes to finish\n",
"model.deploy(endpoint=endpoint, deployed_model_display_name='bqml-deployed-model')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Let's get a prediction from new data\n",
"inference_test = {\n",
" 'postal_code': '97700-000',\n",
" 'number_of_successful_orders': 0,\n",
" 'city': 'Santiago',\n",
" 'sum_previous_orders': 1,\n",
" 'number_of_unsuccessful_orders': 0,\n",
" 'day_of_week': 'WEEKDAY',\n",
" 'traffic_source': 'Facebook',\n",
" 'browser': 'Firefox',\n",
" 'hour_of_day': 20\n",
"}"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"my_prediction = endpoint.predict([inference_test])\n",
"\n",
"my_prediction"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# batch prediction on BigQuery\n",
"\n",
"explain_predict_query = \"\"\"\n",
"SELECT *\n",
"FROM ML.EXPLAIN_PREDICT(MODEL `{project_id}.{dataset}.{model_name}`,\n",
" (SELECT * EXCEPT (session_id, session_starting_ts, user_id, has_purchased) \n",
" FROM `{project_id}.{dataset}.ecommerce_abt`\n",
" WHERE extract(ISOYEAR FROM session_starting_ts) = 2023),\n",
" STRUCT(5 AS top_k_features, 0.5 AS threshold))\n",
"LIMIT 100\n",
"\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# batch prediction on BigQuery\n",
"\n",
"with open(\"sql/explain_predict.sql\") as file:\n",
" explain_predict_query = file.read()\n",
"\n",
"client = bigquery_client = bigquery.Client(location=LOCATION, project=PROJECT_ID)\n",
"batch_predictions = bigquery_client.query(\n",
" explain_predict_query.format(\n",
" project_id=PROJECT_ID,\n",
" dataset=DATASET,\n",
" model_name=f'{MODEL_NAME}-fraction-10')\n",
" ).to_dataframe()\n",
"\n",
"batch_predictions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusions\n",
"\n",
"Thanks to this tutorial we were able to:\n",
"- Define a re-usable Vertex AI pipeline to train and evaluate BQ ML models\n",
"- Use a Vertex AI Experiment to keep track of multiple trainings for the same model with different paramenters (in this case a different split for train/test data)\n",
"- Deploy the preferred model on a Vertex AI managed Endpoint in order to serve the model for real-time use cases via API\n",
"- Make batch prediction via Big Query and see what are the top 5 features which influenced the algorithm output"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.8.9"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}

View File

@ -0,0 +1,2 @@
kfp==1.8.19
google-cloud-pipeline-components==1.0.39

Binary file not shown.

After

Width:  |  Height:  |  Size: 56 KiB

View File

@ -0,0 +1,65 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Core resources.
locals {
service_encryption_keys = var.service_encryption_keys
shared_vpc_project = try(var.vpc_config.host_project, null)
subnet = (
local.use_shared_vpc
? var.vpc_config.subnet_self_link
: values(module.vpc.0.subnet_self_links)[0]
)
use_shared_vpc = var.vpc_config != null
vpc = (
local.use_shared_vpc
? var.vpc_config.network_self_link
: module.vpc.0.self_link
)
}
module "project" {
source = "../../../modules/project"
name = var.project_id
parent = try(var.project_create.parent, null)
billing_account = try(var.project_create.billing_account_id, null)
project_create = var.project_create != null
prefix = var.project_create == null ? null : var.prefix
services = [
"aiplatform.googleapis.com",
"bigquery.googleapis.com",
"bigquerystorage.googleapis.com",
"bigqueryreservation.googleapis.com",
"compute.googleapis.com",
"ml.googleapis.com",
"notebooks.googleapis.com",
"servicenetworking.googleapis.com",
"stackdriver.googleapis.com",
"storage.googleapis.com",
"storage-component.googleapis.com"
]
shared_vpc_service_config = local.shared_vpc_project == null ? null : {
attach = true
host_project = local.shared_vpc_project
}
service_encryption_key_ids = {
compute = [try(local.service_encryption_keys.compute, null)]
bq = [try(local.service_encryption_keys.bq, null)]
storage = [try(local.service_encryption_keys.storage, null)]
}
service_config = {
disable_on_destroy = false, disable_dependent_services = false
}
}

View File

@ -0,0 +1,53 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Output variables.
output "bucket" {
description = "GCS Bucket URL."
value = module.bucket.url
}
output "dataset" {
description = "GCS Bucket URL."
value = module.dataset.id
}
output "notebook" {
description = "Vertex AI notebook details."
value = {
name = resource.google_notebooks_instance.playground.name
id = resource.google_notebooks_instance.playground.id
}
}
output "project" {
description = "Project id."
value = module.project.project_id
}
output "service-account-vertex" {
description = "Service account to be used for Vertex AI pipelines."
value = module.service-account-vertex.email
}
output "vertex-ai-metadata-store" {
description = "Vertex AI Metadata Store ID."
value = google_vertex_ai_metadata_store.store.id
}
output "vpc" {
description = "VPC Network."
value = local.vpc
}

View File

@ -0,0 +1,70 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Terraform variables.
variable "location" {
description = "The location where resources will be deployed."
type = string
default = "US"
}
variable "prefix" {
description = "Prefix used for resource names."
type = string
validation {
condition = var.prefix != ""
error_message = "Prefix cannot be empty."
}
}
variable "project_create" {
description = "Provide values if project creation is needed, use existing project if null. Parent format: folders/folder_id or organizations/org_id."
type = object({
billing_account_id = string
parent = string
})
default = null
}
variable "project_id" {
description = "Project id references existing project if `project_create` is null."
type = string
}
variable "region" {
description = "The region where resources will be deployed."
type = string
default = "us-central1"
}
variable "service_encryption_keys" {
description = "Cloud KMS to use to encrypt different services. The key location should match the service region."
type = object({
bq = string
compute = string
storage = string
})
default = null
}
variable "vpc_config" {
description = "Shared VPC network configurations to use. If null networks will be created in projects with pre-configured values."
type = object({
host_project = string
network_self_link = string
subnet_self_link = string
})
default = null
}

View File

@ -0,0 +1,27 @@
# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
terraform {
required_version = ">= 1.3.1"
required_providers {
google = {
source = "hashicorp/google"
version = ">= 4.55.0" # tftest
}
google-beta = {
source = "hashicorp/google-beta"
version = ">= 4.55.0" # tftest
}
}
}

View File

@ -0,0 +1,104 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description Vertex resources.
resource "google_vertex_ai_metadata_store" "store" {
provider = google-beta
project = module.project.project_id
name = "default" #"${var.prefix}-metadata-store"
description = "Vertex Ai Metadata Store"
region = var.region
#TODO Check/Implement P4SA logic for IAM role
# encryption_spec {
# kms_key_name = var.service_encryption_keys.ai_metadata_store
# }
}
module "service-account-notebook" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "notebook-sa"
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.admin",
"roles/bigquery.jobUser",
"roles/bigquery.dataEditor",
"roles/bigquery.user",
"roles/dialogflow.client",
"roles/storage.admin",
"roles/aiplatform.user",
"roles/iam.serviceAccountUser"
]
}
}
module "service-account-vertex" {
source = "../../../modules/iam-service-account"
project_id = module.project.project_id
name = "vertex-sa"
iam_project_roles = {
(module.project.project_id) = [
"roles/bigquery.admin",
"roles/bigquery.jobUser",
"roles/bigquery.dataEditor",
"roles/bigquery.user",
"roles/dialogflow.client",
"roles/storage.admin",
"roles/aiplatform.user"
]
}
}
resource "google_notebooks_instance" "playground" {
name = "${var.prefix}-notebook"
location = format("%s-%s", var.region, "b")
machine_type = "e2-medium"
project = module.project.project_id
container_image {
repository = "gcr.io/deeplearning-platform-release/base-cpu"
tag = "latest"
}
install_gpu_driver = true
boot_disk_type = "PD_SSD"
boot_disk_size_gb = 110
disk_encryption = try(local.service_encryption_keys.compute != null, false) ? "CMEK" : null
kms_key = try(local.service_encryption_keys.compute, null)
no_public_ip = true
no_proxy_access = false
network = local.vpc
subnet = local.subnet
service_account = module.service-account-notebook.email
# Enable Secure Boot
shielded_instance_config {
enable_secure_boot = true
}
# Remove once terraform-provider-google/issues/9164 is fixed
lifecycle {
ignore_changes = [disk_encryption, kms_key]
}
#TODO Uncomment once terraform-provider-google/issues/9273 is fixed
# tags = ["ssh"]
depends_on = [
google_project_iam_member.shared_vpc,
]
}

View File

@ -0,0 +1,64 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# tfdoc:file:description VPC resources.
module "vpc" {
source = "../../../modules/net-vpc"
count = local.use_shared_vpc ? 0 : 1
project_id = module.project.project_id
name = "${var.prefix}-vpc"
subnets = [
{
ip_cidr_range = "10.0.0.0/20"
name = "${var.prefix}-subnet"
region = var.region
}
]
}
module "vpc-firewall" {
source = "../../../modules/net-vpc-firewall"
count = local.use_shared_vpc ? 0 : 1
project_id = module.project.project_id
network = module.vpc.0.name
default_rules_config = {
admin_ranges = ["10.0.0.0/20"]
}
ingress_rules = {
#TODO Remove and rely on 'ssh' tag once terraform-provider-google/issues/9273 is fixed
("${var.prefix}-iap") = {
description = "Enable SSH from IAP on Notebooks."
source_ranges = ["35.235.240.0/20"]
targets = ["notebook-instance"]
rules = [{ protocol = "tcp", ports = [22] }]
}
}
}
module "cloudnat" {
source = "../../../modules/net-cloudnat"
count = local.use_shared_vpc ? 0 : 1
project_id = module.project.project_id
name = "${var.prefix}-default"
region = var.region
router_network = module.vpc.0.name
}
resource "google_project_iam_member" "shared_vpc" {
count = local.use_shared_vpc ? 1 : 0
project = var.vpc_config.host_project
role = "roles/compute.networkUser"
member = "serviceAccount:${module.project.service_accounts.robots.notebooks}"
}