2020-05-02 08:33:48 -07:00
# Google Cloud Bigquery Module
This module allows managing a single BigQuery dataset, including access configuration, tables and views.
2020-05-03 02:46:04 -07:00
## TODO
- [ ] check for dynamic values in tables and views
- [ ] add support for external tables
2020-05-02 08:33:48 -07:00
## Examples
### Simple dataset with access configuration
2020-05-03 06:10:36 -07:00
Access configuration defaults to using the separate `google_bigquery_dataset_access` resource, so as to leave the default dataset access rules untouched.
You can choose to manage the `google_bigquery_dataset` access rules instead via the `dataset_access` variable, but be sure to always have at least one `OWNER` access and to avoid duplicating accesses, or `terraform apply` will fail.
2022-03-23 08:34:45 -07:00
The access variables are split into `access` and `access_identities` variables, so that dynamic values can be passed in for identities (eg a service account email generated by a different module or resource).
2020-05-02 08:33:48 -07:00
```hcl
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-05-28 23:25:51 -07:00
project_id = "my-project"
2022-12-16 03:53:56 -08:00
id = "my-dataset"
2020-11-21 00:45:56 -08:00
access = {
reader-group = { role = "READER", type = "group" }
owner = { role = "OWNER", type = "user" }
project_owners = { role = "OWNER", type = "special_group" }
view_1 = { role = "READER", type = "view" }
2020-05-03 06:10:36 -07:00
}
access_identities = {
2020-11-21 00:45:56 -08:00
reader-group = "playground-test@ludomagno.net"
owner = "ludo@ludomagno.net"
project_owners = "projectOwners"
view_1 = "my-project|my-dataset|my-table"
2020-05-03 02:43:11 -07:00
}
2020-05-02 08:33:48 -07:00
}
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=5 inventory=simple.yaml
2020-05-02 08:33:48 -07:00
```
2020-11-21 00:45:56 -08:00
### IAM roles
Access configuration can also be specified via IAM instead of basic roles via the `iam` variable. When using IAM, basic roles cannot be used via the `access` family variables.
```hcl
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-11-21 00:45:56 -08:00
project_id = "my-project"
2022-12-16 03:53:56 -08:00
id = "my-dataset"
2020-11-21 00:45:56 -08:00
iam = {
"roles/bigquery.dataOwner" = ["user:user1@example.org"]
}
}
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=2 inventory=iam.yaml
2020-11-21 00:45:56 -08:00
```
2023-06-30 09:44:57 -07:00
### Authorized Views, Datasets, and Routines
You can specify authorized [views ](https://cloud.google.com/bigquery/docs/authorized-views ), [datasets ](https://cloud.google.com/bigquery/docs/authorized-datasets?hl=en ), and [routines ](https://cloud.google.com/bigquery/docs/authorized-routines ) via the `authorized_views` , `authorized_datasets` and `authorized_routines` variables, respectively.
```hcl
// Create private BigQuery dataset that will not be publicly accessible, except via the authorized BigQuery resources
module "bigquery-dataset-private" {
source = "./fabric/modules/bigquery-dataset"
project_id = "private_project"
id = "private_dataset"
authorized_views = [
{
project_id = "auth_view_project"
dataset_id = "auth_view_dataset"
table_id = "auth_view"
}
]
authorized_datasets = [
{
project_id = "auth_dataset_project"
dataset_id = "auth_dataset"
}
]
authorized_routines = [
{
project_id = "auth_routine_project"
dataset_id = "auth_routine_dataset"
routine_id = "auth_routine"
}
]
}
// Create authorized view in a public dataset
module "bigquery-authorized-views-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_view_project"
id = "auth_view_dataset"
views = {
auth_view = {
friendly_name = "Public"
labels = {}
query = "SELECT * FROM `private_project.private_dataset.private_table` "
use_legacy_sql = false
deletion_protection = true
}
}
}
// Create public authorized dataset
module "bigquery-authorized-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_dataset_project"
id = "auth_dataset"
}
// Create public authorized routine
module "bigquery-authorized-authorized-routine-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_routine_project"
id = "auth_routine_dataset"
}
resource "google_bigquery_routine" "public-routine" {
2023-10-03 05:15:36 -07:00
project = "private_project"
2023-06-30 09:44:57 -07:00
dataset_id = module.bigquery-authorized-authorized-routine-dataset-public.dataset_id
routine_id = "auth_routine"
routine_type = "TABLE_VALUED_FUNCTION"
language = "SQL"
definition_body = < < -EOS
SELECT 1 + value AS value
EOS
arguments {
name = "value"
argument_kind = "FIXED_TYPE"
data_type = jsonencode({ "typeKind" = "INT64" })
}
return_table_type = jsonencode({ "columns" = [
{ "name" = "value", "type" = { "typeKind" = "INT64" } },
] })
}
# tftest modules=4 resources=9 inventory=authorized_resources.yaml
```
Authorized views can be specified both using the standard `access` options and the `authorized_views` blocks. The example configuration below uses both blocks, and will create a dataset with three authorized views `view_id_1` , `view_id_2` , and `view_id_3` .
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my-dataset"
authorized_views = [
{
project_id = "view_project"
dataset_id = "view_dataset"
table_id = "view_id_1"
},
{
project_id = "view_project"
dataset_id = "view_dataset"
table_id = "view_id_2"
}
]
access = {
view_2 = { role = "READER", type = "view" }
view_3 = { role = "READER", type = "view" }
}
access_identities = {
view_2 = "view_project|view_dataset|view_id_2"
view_3 = "view_project|view_dataset|view_id_3"
}
}
# tftest modules=1 resources=4 inventory=authorized_resources_views.yaml
```
2020-05-02 08:33:48 -07:00
### Dataset options
Dataset options are set via the `options` variable. all options must be specified, but a `null` value can be set to options that need to use defaults.
```hcl
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-05-28 23:25:51 -07:00
project_id = "my-project"
2020-05-02 08:33:48 -07:00
id = "my-dataset"
options = {
default_table_expiration_ms = 3600000
default_partition_expiration_ms = null
delete_contents_on_destroy = false
2023-02-13 23:43:15 -08:00
max_time_travel_hours = 168
2020-05-02 08:33:48 -07:00
}
}
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=1 inventory=options.yaml
2020-05-02 08:33:48 -07:00
```
### Tables and views
Tables are created via the `tables` variable, or the `view` variable for views. Support for external tables will be added in a future release.
```hcl
2021-06-01 09:36:53 -07:00
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
}
2020-05-02 08:33:48 -07:00
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-05-28 23:25:51 -07:00
project_id = "my-project"
2021-06-01 09:36:53 -07:00
id = "my_dataset"
2020-05-02 08:33:48 -07:00
tables = {
2021-06-01 09:36:53 -07:00
countries = {
friendly_name = "Countries"
schema = local.countries_schema
deletion_protection = true
2020-05-02 08:33:48 -07:00
}
}
}
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=2 inventory=tables.yaml
2020-05-02 08:33:48 -07:00
```
If partitioning is needed, populate the `partitioning` variable using either the `time` or `range` attribute.
```hcl
2021-06-01 09:36:53 -07:00
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
}
2020-05-02 08:33:48 -07:00
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-05-28 23:25:51 -07:00
project_id = "my-project"
2020-05-02 08:33:48 -07:00
id = "my-dataset"
tables = {
table_a = {
2023-08-06 02:25:45 -07:00
deletion_protection = true
friendly_name = "Table a"
schema = local.countries_schema
2020-05-02 08:33:48 -07:00
partitioning = {
2023-08-06 02:25:45 -07:00
time = { type = "DAY", expiration_ms = null }
2020-05-02 08:33:48 -07:00
}
}
}
}
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=2 inventory=partitioning.yaml
2020-05-02 08:33:48 -07:00
```
To create views use the `view` variable. If you're querying a table created by the same module `terraform apply` will initially fail and eventually succeed once the underlying table has been created. You can probably also use the module's output in the view's query to create a dependency on the table.
```hcl
2021-06-01 09:36:53 -07:00
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
}
2020-05-02 08:33:48 -07:00
module "bigquery-dataset" {
2022-09-06 08:46:09 -07:00
source = "./fabric/modules/bigquery-dataset"
2020-05-28 23:25:51 -07:00
project_id = "my-project"
2021-06-01 09:36:53 -07:00
id = "my_dataset"
2020-05-02 08:33:48 -07:00
tables = {
2021-06-01 09:36:53 -07:00
countries = {
friendly_name = "Countries"
schema = local.countries_schema
deletion_protection = true
2020-05-02 08:33:48 -07:00
}
}
views = {
2021-06-01 09:36:53 -07:00
population = {
friendly_name = "Population"
query = "SELECT SUM(population) FROM my_dataset.countries"
use_legacy_sql = false
deletion_protection = true
2020-05-02 08:33:48 -07:00
}
}
}
2021-06-01 09:36:53 -07:00
2023-04-21 05:43:53 -07:00
# tftest modules=1 resources=3 inventory=views.yaml
2020-05-02 08:33:48 -07:00
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
2021-12-20 23:51:51 -08:00
|---|---|:---:|:---:|:---:|
2023-06-30 09:44:57 -07:00
| [id ](variables.tf#L98 ) | Dataset id. | < code > string</ code > | ✓ | |
2023-10-04 05:25:56 -07:00
| [project_id ](variables.tf#L162 ) | Id of the project where datasets will be created. | < code > string</ code > | ✓ | |
2022-01-22 04:34:35 -08:00
| [access ](variables.tf#L17 ) | Map of access rules with role and identity type. Keys are arbitrary and must match those in the `access_identities` variable, types are `domain` , `group` , `special_group` , `user` , `view` . | < code title = "map(object({ role = string type = string }))" > map( object({…})) </ code > | | < code > {} </ code > |
2022-03-23 08:34:45 -07:00
| [access_identities ](variables.tf#L33 ) | Map of access identities used for basic access roles. View identities have the format 'project_id\|dataset_id\|table_id'. | < code > map( string) </ code > | | < code > {} </ code > |
2023-06-30 09:44:57 -07:00
| [authorized_datasets ](variables.tf#L39 ) | An array of datasets to be authorized on the dataset. | < code title = "list(object({ dataset_id = string, project_id = string, }))" > list( object({…})) </ code > | | < code > [] </ code > |
| [authorized_routines ](variables.tf#L48 ) | An array of authorized routine to be authorized on the dataset. | < code title = "list(object({ project_id = string, dataset_id = string, routine_id = string }))" > list( object({…})) </ code > | | < code > [] </ code > |
| [authorized_views ](variables.tf#L58 ) | An array of views to be authorized on the dataset. | < code title = "list(object({ dataset_id = string, project_id = string, table_id = string # this is the view id, but we keep table_id to stay consistent as the resource }))" > list( object({…})) </ code > | | < code > [] </ code > |
| [dataset_access ](variables.tf#L68 ) | Set access in the dataset resource instead of using separate resources. | < code > bool</ code > | | < code > false</ code > |
| [description ](variables.tf#L74 ) | Optional description. | < code > string</ code > | | < code > " Terraform managed." </ code > |
| [encryption_key ](variables.tf#L80 ) | Self link of the KMS key that will be used to protect destination table. | < code > string</ code > | | < code > null</ code > |
| [friendly_name ](variables.tf#L86 ) | Dataset friendly name. | < code > string</ code > | | < code > null</ code > |
| [iam ](variables.tf#L92 ) | IAM bindings in {ROLE => [MEMBERS]} format. Mutually exclusive with the access_* variables used for basic roles. | < code > map( list( string)) </ code > | | < code > {} </ code > |
| [labels ](variables.tf#L103 ) | Dataset labels. | < code > map( string) </ code > | | < code > {} </ code > |
| [location ](variables.tf#L109 ) | Dataset location. | < code > string</ code > | | < code > " EU" </ code > |
2023-10-04 05:25:56 -07:00
| [materialized_views ](variables.tf#L115 ) | Materialized views definitions. | < code title = "map(object({ query = string deletion_protection = optional(bool) description = optional(string, "Terraform managed.") friendly_name = optional(string) labels = optional(map(string), {}) enable_refresh = optional(bool) refresh_interval_ms = optional(bool) allow_non_incremental_definition = optional(bool) options = optional(object({ clustering = optional(list(string)) expiration_time = optional(number) }), {}) partitioning = optional(object({ field = optional(string) range = optional(object({ end = number interval = number start = number })) time = optional(object({ type = string expiration_ms = optional(number) field = optional(string) require_partition_filter = optional(bool) })) })) }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [options ](variables.tf#L148 ) | Dataset options. | < code title = "object({ default_collation = optional(string) default_table_expiration_ms = optional(number) default_partition_expiration_ms = optional(number) delete_contents_on_destroy = optional(bool, false) is_case_insensitive = optional(bool) max_time_travel_hours = optional(number, 168) storage_billing_model = optional(string) })" > object({…}) </ code > | | < code > {} </ code > |
| [tables ](variables.tf#L167 ) | Table definitions. Options and partitioning default to null. Partitioning can only use `range` or `time` , set the unused one to null. | < code title = "map(object({ deletion_protection = optional(bool) description = optional(string, "Terraform managed.") friendly_name = optional(string) labels = optional(map(string), {}) schema = optional(string) options = optional(object({ clustering = optional(list(string)) encryption_key = optional(string) expiration_time = optional(number) }), {}) partitioning = optional(object({ field = optional(string) range = optional(object({ end = number interval = number start = number })) time = optional(object({ type = string expiration_ms = optional(number) field = optional(string) require_partition_filter = optional(bool) })) })) }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [views ](variables.tf#L198 ) | View definitions. | < code title = "map(object({ query = string deletion_protection = optional(bool) description = optional(string, "Terraform managed.") friendly_name = optional(string) labels = optional(map(string), {}) use_legacy_sql = optional(bool) }))" > map( object({…})) </ code > | | < code > {} </ code > |
2020-05-02 08:33:48 -07:00
## Outputs
| name | description | sensitive |
|---|---|:---:|
2022-01-22 04:34:35 -08:00
| [dataset ](outputs.tf#L17 ) | Dataset resource. | |
| [dataset_id ](outputs.tf#L22 ) | Dataset id. | |
2023-06-30 09:44:57 -07:00
| [id ](outputs.tf#L36 ) | Fully qualified dataset id. | |
| [self_link ](outputs.tf#L50 ) | Dataset self link. | |
| [table_ids ](outputs.tf#L64 ) | Map of fully qualified table ids keyed by table ids. | |
| [tables ](outputs.tf#L69 ) | Table resources. | |
| [view_ids ](outputs.tf#L74 ) | Map of fully qualified view ids keyed by view ids. | |
| [views ](outputs.tf#L79 ) | View resources. | |
2020-05-02 08:33:48 -07:00
<!-- END TFDOC -->