Updated the DataQualitySpec for Dataplex Datascan (#2008)

* Updated the DataQualitySpec for Dataplex Datascan

* Fix linting

---------

Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Julio Castillo <jccb@google.com>
This commit is contained in:
shourya116 2024-01-30 20:44:49 +05:30 committed by GitHub
parent 37fc16ab42
commit 7b58114d65
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 34 additions and 17 deletions

View File

@ -431,21 +431,21 @@ module "dataplex-datascan" {
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [data](variables.tf#L17) | The data source for DataScan. The source can be either a Dataplex `entity` or a BigQuery `resource`. | <code title="object&#40;&#123;&#10; entity &#61; optional&#40;string&#41;&#10; resource &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [name](variables.tf#L157) | Name of Dataplex Scan. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L168) | The ID of the project where the Dataplex DataScan will be created. | <code>string</code> | ✓ | |
| [region](variables.tf#L173) | Region for the Dataplex DataScan. | <code>string</code> | ✓ | |
| [name](variables.tf#L162) | Name of Dataplex Scan. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L173) | The ID of the project where the Dataplex DataScan will be created. | <code>string</code> | ✓ | |
| [region](variables.tf#L178) | Region for the Dataplex DataScan. | <code>string</code> | ✓ | |
| [data_profile_spec](variables.tf#L29) | DataProfileScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataProfileSpec. | <code title="object&#40;&#123;&#10; sampling_percent &#61; optional&#40;number&#41;&#10; row_filter &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [data_quality_spec](variables.tf#L38) | DataQualityScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object&#40;&#123;&#10; sampling_percent &#61; optional&#40;number&#41;&#10; row_filter &#61; optional&#40;string&#41;&#10; rules &#61; list&#40;object&#40;&#123;&#10; column &#61; optional&#40;string&#41;&#10; ignore_null &#61; optional&#40;bool, null&#41;&#10; dimension &#61; string&#10; threshold &#61; optional&#40;number&#41;&#10; non_null_expectation &#61; optional&#40;object&#40;&#123;&#125;&#41;&#41;&#10; range_expectation &#61; optional&#40;object&#40;&#123;&#10; min_value &#61; optional&#40;number&#41;&#10; max_value &#61; optional&#40;number&#41;&#10; strict_min_enabled &#61; optional&#40;bool&#41;&#10; strict_max_enabled &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; regex_expectation &#61; optional&#40;object&#40;&#123;&#10; regex &#61; string&#10; &#125;&#41;&#41;&#10; set_expectation &#61; optional&#40;object&#40;&#123;&#10; values &#61; list&#40;string&#41;&#10; &#125;&#41;&#41;&#10; uniqueness_expectation &#61; optional&#40;object&#40;&#123;&#125;&#41;&#41;&#10; statistic_range_expectation &#61; optional&#40;object&#40;&#123;&#10; statistic &#61; string&#10; min_value &#61; optional&#40;number&#41;&#10; max_value &#61; optional&#40;number&#41;&#10; strict_min_enabled &#61; optional&#40;bool&#41;&#10; strict_max_enabled &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; row_condition_expectation &#61; optional&#40;object&#40;&#123;&#10; sql_expression &#61; string&#10; &#125;&#41;&#41;&#10; table_condition_expectation &#61; optional&#40;object&#40;&#123;&#10; sql_expression &#61; string&#10; &#125;&#41;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [data_quality_spec_file](variables.tf#L80) | Path to a YAML file containing DataQualityScan related setting. Input content can use either camelCase or snake_case. Variables description are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object&#40;&#123;&#10; path &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [description](variables.tf#L88) | Custom description for DataScan. | <code>string</code> | | <code>null</code> |
| [execution_schedule](variables.tf#L94) | Schedule DataScan to run periodically based on a cron schedule expression. If not specified, the DataScan is created with `on_demand` schedule, which means it will not run until the user calls `dataScans.run` API. | <code>string</code> | | <code>null</code> |
| [group_iam](variables.tf#L100) | Authoritative IAM binding for organization groups, in {GROUP_EMAIL => [ROLES]} format. Group emails need to be static. Can be used in combination with the `iam` variable. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam](variables.tf#L107) | Dataplex DataScan IAM bindings in {ROLE => [MEMBERS]} format. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings](variables.tf#L114) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map&#40;object&#40;&#123;&#10; members &#61; list&#40;string&#41;&#10; role &#61; string&#10; condition &#61; optional&#40;object&#40;&#123;&#10; expression &#61; string&#10; title &#61; string&#10; description &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings_additive](variables.tf#L129) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map&#40;object&#40;&#123;&#10; member &#61; string&#10; role &#61; string&#10; condition &#61; optional&#40;object&#40;&#123;&#10; expression &#61; string&#10; title &#61; string&#10; description &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [incremental_field](variables.tf#L144) | The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time. If not specified, a data scan will run for all data in the table. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L150) | Resource labels. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [prefix](variables.tf#L162) | Optional prefix used to generate Dataplex DataScan ID. | <code>string</code> | | <code>null</code> |
| [data_quality_spec](variables.tf#L38) | DataQualityScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object&#40;&#123;&#10; sampling_percent &#61; optional&#40;number&#41;&#10; row_filter &#61; optional&#40;string&#41;&#10; post_scan_actions &#61; optional&#40;object&#40;&#123;&#10; bigquery_export &#61; optional&#40;object&#40;&#123;&#10; results_table &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10; &#125;&#41;&#41;&#10; rules &#61; list&#40;object&#40;&#123;&#10; column &#61; optional&#40;string&#41;&#10; ignore_null &#61; optional&#40;bool, null&#41;&#10; dimension &#61; string&#10; threshold &#61; optional&#40;number&#41;&#10; non_null_expectation &#61; optional&#40;object&#40;&#123;&#125;&#41;&#41;&#10; range_expectation &#61; optional&#40;object&#40;&#123;&#10; min_value &#61; optional&#40;number&#41;&#10; max_value &#61; optional&#40;number&#41;&#10; strict_min_enabled &#61; optional&#40;bool&#41;&#10; strict_max_enabled &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; regex_expectation &#61; optional&#40;object&#40;&#123;&#10; regex &#61; string&#10; &#125;&#41;&#41;&#10; set_expectation &#61; optional&#40;object&#40;&#123;&#10; values &#61; list&#40;string&#41;&#10; &#125;&#41;&#41;&#10; uniqueness_expectation &#61; optional&#40;object&#40;&#123;&#125;&#41;&#41;&#10; statistic_range_expectation &#61; optional&#40;object&#40;&#123;&#10; statistic &#61; string&#10; min_value &#61; optional&#40;number&#41;&#10; max_value &#61; optional&#40;number&#41;&#10; strict_min_enabled &#61; optional&#40;bool&#41;&#10; strict_max_enabled &#61; optional&#40;bool&#41;&#10; &#125;&#41;&#41;&#10; row_condition_expectation &#61; optional&#40;object&#40;&#123;&#10; sql_expression &#61; string&#10; &#125;&#41;&#41;&#10; table_condition_expectation &#61; optional&#40;object&#40;&#123;&#10; sql_expression &#61; string&#10; &#125;&#41;&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [data_quality_spec_file](variables.tf#L85) | Path to a YAML file containing DataQualityScan related setting. Input content can use either camelCase or snake_case. Variables description are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object&#40;&#123;&#10; path &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [description](variables.tf#L93) | Custom description for DataScan. | <code>string</code> | | <code>null</code> |
| [execution_schedule](variables.tf#L99) | Schedule DataScan to run periodically based on a cron schedule expression. If not specified, the DataScan is created with `on_demand` schedule, which means it will not run until the user calls `dataScans.run` API. | <code>string</code> | | <code>null</code> |
| [group_iam](variables.tf#L105) | Authoritative IAM binding for organization groups, in {GROUP_EMAIL => [ROLES]} format. Group emails need to be static. Can be used in combination with the `iam` variable. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam](variables.tf#L112) | Dataplex DataScan IAM bindings in {ROLE => [MEMBERS]} format. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings](variables.tf#L119) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map&#40;object&#40;&#123;&#10; members &#61; list&#40;string&#41;&#10; role &#61; string&#10; condition &#61; optional&#40;object&#40;&#123;&#10; expression &#61; string&#10; title &#61; string&#10; description &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings_additive](variables.tf#L134) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map&#40;object&#40;&#123;&#10; member &#61; string&#10; role &#61; string&#10; condition &#61; optional&#40;object&#40;&#123;&#10; expression &#61; string&#10; title &#61; string&#10; description &#61; optional&#40;string&#41;&#10; &#125;&#41;&#41;&#10;&#125;&#41;&#41;">map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [incremental_field](variables.tf#L149) | The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time. If not specified, a data scan will run for all data in the table. | <code>string</code> | | <code>null</code> |
| [labels](variables.tf#L155) | Resource labels. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [prefix](variables.tf#L167) | Optional prefix used to generate Dataplex DataScan ID. | <code>string</code> | | <code>null</code> |
## Outputs

View File

@ -17,9 +17,10 @@
locals {
prefix = var.prefix == null || var.prefix == "" ? "" : "${var.prefix}-"
_file_data_quality_spec = var.data_quality_spec_file == null ? null : {
sampling_percent = try(local._file_data_quality_spec_raw.samplingPercent, local._file_data_quality_spec_raw.sampling_percent, null)
row_filter = try(local._file_data_quality_spec_raw.rowFilter, local._file_data_quality_spec_raw.row_filter, null)
rules = local._parsed_rules
sampling_percent = try(local._file_data_quality_spec_raw.samplingPercent, local._file_data_quality_spec_raw.sampling_percent, null)
row_filter = try(local._file_data_quality_spec_raw.rowFilter, local._file_data_quality_spec_raw.row_filter, null)
rules = local._parsed_rules
post_scan_actions = try(local._file_data_quality_spec_raw.postScanActions, local._file_data_quality_spec_raw.post_scan_actions, null)
}
data_quality_spec = (
var.data_quality_spec != null || var.data_quality_spec_file != null ?
@ -71,6 +72,17 @@ resource "google_dataplex_datascan" "datascan" {
content {
sampling_percent = try(local.data_quality_spec.sampling_percent, null)
row_filter = try(local.data_quality_spec.row_filter, null)
dynamic "post_scan_actions" {
for_each = local.data_quality_spec.post_scan_actions != null ? [""] : []
content {
dynamic "bigquery_export" {
for_each = local.data_quality_spec.post_scan_actions.bigquery_export != null ? [""] : []
content {
results_table = try(local.data_quality_spec.post_scan_actions.bigquery_export.results_table, null)
}
}
}
}
dynamic "rules" {
for_each = local.data_quality_spec.rules
content {

View File

@ -41,6 +41,11 @@ variable "data_quality_spec" {
type = object({
sampling_percent = optional(number)
row_filter = optional(string)
post_scan_actions = optional(object({
bigquery_export = optional(object({
results_table = optional(string)
}))
}))
rules = list(object({
column = optional(string)
ignore_null = optional(bool, null)