History

Ludovico Magnocavallo 6941313c7d Factories refactor (#1843 ) * factories refactor doc * Adds file schema and filesystem organization * Update 20231106-factories.md * move factories out of blueprints and create new factories README * align factory in billing-account module * align factory in dataplex-datascan module * align factory in billing-account module * align factory in net-firewall-policy module * align factory in dns-response-policy module * align factory in net-vpc-firewall module * align factory in net-vpc module * align factory variable names in FAST * remove decentralized firewall blueprint * bump terraform version * bump module versions * update top-level READMEs * move project factory to modules * fix variable names and tests * tfdoc * remove changelog link * add project factory to top-level README * fix cludrun eventarc diff * fix README * fix cludrun eventarc diff --------- Co-authored-by: Simone Ruffilli <sruffilli@google.com>		2024-02-26 10:16:52 +00:00
..
README.md	Updated bigquery module (#1861 )	2023-11-30 14:33:50 +01:00
main.tf	Updated bigquery module (#1861 )	2023-11-30 14:33:50 +01:00
outputs.tf	Add outputs to BigQuery dataset module (#1752 )	2023-10-13 17:02:47 +02:00
variables.tf	Updated bigquery module (#1861 )	2023-11-30 14:33:50 +01:00
versions.tf	Factories refactor (#1843 )	2024-02-26 10:16:52 +00:00

README.md

Google Cloud Bigquery Module

This module allows managing a single BigQuery dataset, including access configuration, tables and views.

TODO

check for dynamic values in tables and views
add support for external tables

Examples

Simple dataset with access configuration

Access configuration defaults to using the separate google_bigquery_dataset_access resource, so as to leave the default dataset access rules untouched.

You can choose to manage the google_bigquery_dataset access rules instead via the dataset_access variable, but be sure to always have at least one OWNER access and to avoid duplicating accesses, or terraform apply will fail.

The access variables are split into access and access_identities variables, so that dynamic values can be passed in for identities (eg a service account email generated by a different module or resource).

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my-dataset"
  access = {
    reader-group   = { role = "READER", type = "group" }
    owner          = { role = "OWNER", type = "user" }
    project_owners = { role = "OWNER", type = "special_group" }
    view_1         = { role = "READER", type = "view" }
  }
  access_identities = {
    reader-group   = "playground-test@ludomagno.net"
    owner          = "ludo@ludomagno.net"
    project_owners = "projectOwners"
    view_1         = "my-project|my-dataset|my-table"
  }
}
# tftest modules=1 resources=5 inventory=simple.yaml

IAM roles

Access configuration can also be specified via IAM instead of basic roles via the iam variable. When using IAM, basic roles cannot be used via the access family variables.

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my-dataset"
  iam = {
    "roles/bigquery.dataOwner" = ["user:user1@example.org"]
  }
}
# tftest modules=1 resources=2 inventory=iam.yaml

Authorized Views, Datasets, and Routines

You can specify authorized views, datasets, and routines via the authorized_views, authorized_datasets and authorized_routines variables, respectively.

// Create private BigQuery dataset that will not be publicly accessible, except via the authorized BigQuery resources
module "bigquery-dataset-private" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "private_project"
  id         = "private_dataset"
  authorized_views = [
    {
      project_id = "auth_view_project"
      dataset_id = "auth_view_dataset"
      table_id   = "auth_view"
    }
  ]
  authorized_datasets = [
    {
      project_id = "auth_dataset_project"
      dataset_id = "auth_dataset"
    }
  ]
  authorized_routines = [
    {
      project_id = "auth_routine_project"
      dataset_id = "auth_routine_dataset"
      routine_id = "auth_routine"
    }
  ]
}

// Create authorized view in a public dataset
module "bigquery-authorized-views-dataset-public" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "auth_view_project"
  id         = "auth_view_dataset"
  views = {
    auth_view = {
      friendly_name       = "Public"
      labels              = {}
      query               = "SELECT * FROM `private_project.private_dataset.private_table`"
      use_legacy_sql      = false
      deletion_protection = true
    }
  }
}

// Create public authorized dataset
module "bigquery-authorized-dataset-public" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "auth_dataset_project"
  id         = "auth_dataset"
}

// Create public authorized routine
module "bigquery-authorized-authorized-routine-dataset-public" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "auth_routine_project"
  id         = "auth_routine_dataset"
}

resource "google_bigquery_routine" "public-routine" {
  project         = "private_project"
  dataset_id      = module.bigquery-authorized-authorized-routine-dataset-public.dataset_id
  routine_id      = "auth_routine"
  routine_type    = "TABLE_VALUED_FUNCTION"
  language        = "SQL"
  definition_body = <<-EOS
    SELECT 1 + value AS value
  EOS
  arguments {
    name          = "value"
    argument_kind = "FIXED_TYPE"
    data_type     = jsonencode({ "typeKind" = "INT64" })
  }
  return_table_type = jsonencode({ "columns" = [
    { "name" = "value", "type" = { "typeKind" = "INT64" } },
  ] })
}
# tftest modules=4 resources=9 inventory=authorized_resources.yaml

Authorized views can be specified both using the standard access options and the authorized_views blocks. The example configuration below uses both blocks, and will create a dataset with three authorized views view_id_1, view_id_2, and view_id_3.

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my-dataset"
  authorized_views = [
    {
      project_id = "view_project"
      dataset_id = "view_dataset"
      table_id   = "view_id_1"
    },
    {
      project_id = "view_project"
      dataset_id = "view_dataset"
      table_id   = "view_id_2"
    }
  ]
  access = {
    view_2 = { role = "READER", type = "view" }
    view_3 = { role = "READER", type = "view" }
  }
  access_identities = {
    view_2 = "view_project|view_dataset|view_id_2"
    view_3 = "view_project|view_dataset|view_id_3"
  }
}
# tftest modules=1 resources=4 inventory=authorized_resources_views.yaml

Dataset options

Dataset options are set via the options variable. all options must be specified, but a null value can be set to options that need to use defaults.

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my-dataset"
  options = {
    default_table_expiration_ms     = 3600000
    default_partition_expiration_ms = null
    delete_contents_on_destroy      = false
    max_time_travel_hours           = 168
  }
}
# tftest modules=1 resources=1 inventory=options.yaml

Tables and views

Tables are created via the tables variable, or the view variable for views. Support for external tables will be added in a future release.

locals {
  countries_schema = jsonencode([
    { name = "country", type = "STRING" },
    { name = "population", type = "INT64" },
  ])
}

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my_dataset"
  tables = {
    countries = {
      friendly_name       = "Countries"
      schema              = local.countries_schema
      deletion_protection = true
    }
  }
}
# tftest modules=1 resources=2 inventory=tables.yaml

If partitioning is needed, populate the partitioning variable using either the time or range attribute.

locals {
  countries_schema = jsonencode([
    { name = "country", type = "STRING" },
    { name = "population", type = "INT64" },
  ])
}

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my-dataset"
  tables = {
    table_a = {
      deletion_protection = true
      friendly_name       = "Table a"
      schema              = local.countries_schema
      partitioning = {
        time = { type = "DAY", expiration_ms = null }
      }
    }
  }
}
# tftest modules=1 resources=2 inventory=partitioning.yaml

To create views use the view variable. If you're querying a table created by the same module terraform apply will initially fail and eventually succeed once the underlying table has been created. You can probably also use the module's output in the view's query to create a dependency on the table.

locals {
  countries_schema = jsonencode([
    { name = "country", type = "STRING" },
    { name = "population", type = "INT64" },
  ])
}

module "bigquery-dataset" {
  source     = "./fabric/modules/bigquery-dataset"
  project_id = "my-project"
  id         = "my_dataset"
  tables = {
    countries = {
      friendly_name       = "Countries"
      schema              = local.countries_schema
      deletion_protection = true
    }
  }
  views = {
    population = {
      friendly_name       = "Population"
      query               = "SELECT SUM(population) FROM my_dataset.countries"
      use_legacy_sql      = false
      deletion_protection = true
    }
  }
}

# tftest modules=1 resources=3 inventory=views.yaml

Variables

name	description	type	required	default
id	Dataset id.	`string`	✓
project_id	Id of the project where datasets will be created.	`string`	✓
access	Map of access rules with role and identity type. Keys are arbitrary and must match those in the `access_identities` variable, types are `domain`, `group`, `special_group`, `user`, `view`.	`map(object({…}))`		`{}`
access_identities	Map of access identities used for basic access roles. View identities have the format 'project_id\|dataset_id\|table_id'.	`map(string)`		`{}`
authorized_datasets	An array of datasets to be authorized on the dataset.	`list(object({…}))`		`[]`
authorized_routines	An array of authorized routine to be authorized on the dataset.	`list(object({…}))`		`[]`
authorized_views	An array of views to be authorized on the dataset.	`list(object({…}))`		`[]`
dataset_access	Set access in the dataset resource instead of using separate resources.	`bool`		`false`
description	Optional description.	`string`		`"Terraform managed."`
encryption_key	Self link of the KMS key that will be used to protect destination table.	`string`		`null`
friendly_name	Dataset friendly name.	`string`		`null`
iam	IAM bindings in {ROLE => [MEMBERS]} format. Mutually exclusive with the access_* variables used for basic roles.	`map(list(string))`		`{}`
labels	Dataset labels.	`map(string)`		`{}`
location	Dataset location.	`string`		`"EU"`
materialized_views	Materialized views definitions.	`map(object({…}))`		`{}`
options	Dataset options.	`object({…})`		`{}`
tables	Table definitions. Options and partitioning default to null. Partitioning can only use `range` or `time`, set the unused one to null.	`map(object({…}))`		`{}`
views	View definitions.	`map(object({…}))`		`{}`

Outputs

name	description	sensitive
dataset	Dataset resource.
dataset_id	Dataset id.
id	Fully qualified dataset id.
materialized_view_ids	Map of fully qualified materialized view ids keyed by view ids.
materialized_views	Materialized view resources.
self_link	Dataset self link.
table_ids	Map of fully qualified table ids keyed by table ids.
tables	Table resources.
view_ids	Map of fully qualified view ids keyed by view ids.
views	View resources.