sql fix and more comments on demo notebook

This commit is contained in:
Giorgio Conte 2023-03-06 11:21:30 +00:00
parent cd24d90e0d
commit 0ac6dd65cf
3 changed files with 61 additions and 5 deletions

View File

@ -8,6 +8,8 @@ More in details, this tutorial will focus on the following three steps:
- serve a BQ model through an API powered by Vertex AI Endpoint
- create batch prediction via BigQuery
In this tutorial we will also see how to make explainable predictions, in order to understand what are the most important features that most influence the algorithm outputs.
# Dataset
This tutorial uses a fictitious e-commerce dataset collecting programmatically generated data from the fictitious e-commerce store called The Look. The dataset is publicy available on BigQuery at this location `bigquery-public-data.thelook_ecommerce`.

View File

@ -1,5 +1,40 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"**Copyright 2023 Google LLC**"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
"# you may not use this file except in compliance with the License.\n",
"# You may obtain a copy of the License at\n",
"#\n",
"# https://www.apache.org/licenses/LICENSE-2.0\n",
"#\n",
"# Unless required by applicable law or agreed to in writing, software\n",
"# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
"# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
"# See the License for the specific language governing permissions and\n",
"# limitations under the License."
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Install python requirements and import packages"
]
},
{
"cell_type": "code",
"execution_count": null,
@ -26,7 +61,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Set your env variable"
"# Set your env variables"
]
},
{
@ -159,7 +194,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Running the same training Verte AI pipeline with different parameters\n",
"# Running the same training Vertex AI pipeline with different parameters\n",
"\n",
"One of the main tasks during the training phase is to compare different models or to try the same model with different inputs. We can leverage the power of Vertex AI Pipelines to submit the same steps with different training parameters. Thanks to the experiments artifact, it is possible to easily keep track of all the tests that have been done. This simplifies the process of selecting the best model to deploy.\n",
"\n",
@ -266,13 +301,32 @@
"# batch prediction on BigQuery\n",
"\n",
"with open(\"sql/explain_predict.sql\") as file:\n",
" train_query = file.read()\n",
" explain_predict_query = file.read()\n",
"\n",
"client = bigquery_client = bigquery.Client(location=LOCATION, project=PROJECT_ID)\n",
"batch_predictions = bigquery_client.query(train_query.format(project_id=PROJECT_ID, dataset=DATASET, model_name=f'{MODEL_NAME}-fraction-10')).to_dataframe()\n",
"batch_predictions = bigquery_client.query(\n",
" explain_predict_query.format(\n",
" project_id=PROJECT_ID,\n",
" dataset=DATASET,\n",
" model_name=f'{MODEL_NAME}-fraction-10')\n",
" ).to_dataframe()\n",
"\n",
"batch_predictions"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Conclusions\n",
"\n",
"Thanks to this tutorial we were able to:\n",
"- Define a re-usable Vertex AI pipeline to train and evaluate BQ ML models\n",
"- Use a Vertex AI Experiment to keep track of multiple trainings for the same model with different paramenters (in this case a different split for train/test data)\n",
"- Deploy the preferred model on a Vertex AI managed Endpoint in order to serve the model for real-time use cases via API\n",
"- Make batch prediction via Big Query and see what are the top 5 features which influenced the algorithm output"
]
}
],
"metadata": {

View File

@ -15,7 +15,7 @@
*/
SELECT *
FROM ML.EXPLAIN_PREDICT(MODEL `{project_id}.{dataset}.{model-name}`,
FROM ML.EXPLAIN_PREDICT(MODEL `{project_id}.{dataset}.{model_name}`,
(SELECT * EXCEPT (session_id, session_starting_ts, user_id, has_purchased)
FROM `{project_id}.{dataset}.ecommerce_abt`
WHERE extract(ISOYEAR FROM session_starting_ts) = 2023),