171 lines
15 KiB
Markdown
171 lines
15 KiB
Markdown
# Google Cloud Dataproc
|
|
|
|
This module Manages a Google Cloud [Dataproc](https://cloud.google.com/dataproc) cluster resource, including IAM.
|
|
|
|
<!-- BEGIN TOC -->
|
|
- [TODO](#todo)
|
|
- [Examples](#examples)
|
|
- [Simple](#simple)
|
|
- [Cluster configuration](#cluster-configuration)
|
|
- [Cluster with CMEK encryption](#cluster-with-cmek-encryption)
|
|
- [IAM](#iam)
|
|
- [Authoritative IAM](#authoritative-iam)
|
|
- [Additive IAM](#additive-iam)
|
|
- [Variables](#variables)
|
|
- [Outputs](#outputs)
|
|
<!-- END TOC -->
|
|
|
|
## TODO
|
|
|
|
- [ ] Add support for Cloud Dataproc [autoscaling policy](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_autoscaling_policy_iam).
|
|
|
|
## Examples
|
|
|
|
### Simple
|
|
|
|
```hcl
|
|
module "processing-dp-cluster-2" {
|
|
source = "./fabric/modules/dataproc"
|
|
project_id = "my-project"
|
|
name = "my-cluster"
|
|
region = "europe-west1"
|
|
}
|
|
# tftest modules=1 resources=1
|
|
```
|
|
|
|
### Cluster configuration
|
|
|
|
To set cluster configuration use the 'dataproc_config.cluster_config' variable.
|
|
|
|
```hcl
|
|
module "processing-dp-cluster" {
|
|
source = "./fabric/modules/dataproc"
|
|
project_id = "my-project"
|
|
name = "my-cluster"
|
|
region = "europe-west1"
|
|
prefix = "prefix"
|
|
dataproc_config = {
|
|
cluster_config = {
|
|
gce_cluster_config = {
|
|
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
|
|
zone = "europe-west1-b"
|
|
service_account = ""
|
|
service_account_scopes = ["cloud-platform"]
|
|
internal_ip_only = true
|
|
}
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1
|
|
```
|
|
|
|
### Cluster with CMEK encryption
|
|
|
|
To set cluster configuration use the Customer Managed Encryption key, set `dataproc_config.encryption_config.` variable. The Compute Engine service agent and the Cloud Storage service agent need to have `CryptoKey Encrypter/Decrypter` role on they configured KMS key ([Documentation](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/customer-managed-encryption)).
|
|
|
|
```hcl
|
|
module "processing-dp-cluster" {
|
|
source = "./fabric/modules/dataproc"
|
|
project_id = "my-project"
|
|
name = "my-cluster"
|
|
region = "europe-west1"
|
|
prefix = "prefix"
|
|
dataproc_config = {
|
|
cluster_config = {
|
|
gce_cluster_config = {
|
|
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
|
|
zone = "europe-west1-b"
|
|
service_account = ""
|
|
service_account_scopes = ["cloud-platform"]
|
|
internal_ip_only = true
|
|
}
|
|
}
|
|
encryption_config = {
|
|
kms_key_name = "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name"
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1
|
|
```
|
|
|
|
## IAM
|
|
|
|
IAM is managed via several variables that implement different features and levels of control:
|
|
|
|
- `iam` and `group_iam` configure authoritative bindings that manage individual roles exclusively, and are internally merged
|
|
- `iam_bindings` configure authoritative bindings with optional support for conditions, and are not internally merged with the previous two variables
|
|
- `iam_bindings_additive` configure additive bindings via individual role/member pairs with optional support conditions
|
|
|
|
The authoritative and additive approaches can be used together, provided different roles are managed by each. Some care must also be taken with the `groups_iam` variable to ensure that variable keys are static values, so that Terraform is able to compute the dependency graph.
|
|
|
|
Refer to the [project module](../project/README.md#iam) for examples of the IAM interface.
|
|
|
|
### Authoritative IAM
|
|
|
|
```hcl
|
|
module "processing-dp-cluster" {
|
|
source = "./fabric/modules/dataproc"
|
|
project_id = "my-project"
|
|
name = "my-cluster"
|
|
region = "europe-west1"
|
|
prefix = "prefix"
|
|
group_iam = {
|
|
"gcp-data-engineers@example.net" = [
|
|
"roles/dataproc.viewer"
|
|
]
|
|
}
|
|
iam = {
|
|
"roles/dataproc.viewer" = [
|
|
"serviceAccount:service-account@PROJECT_ID.iam.gserviceaccount.com"
|
|
]
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2
|
|
```
|
|
|
|
### Additive IAM
|
|
|
|
```hcl
|
|
module "processing-dp-cluster" {
|
|
source = "./fabric/modules/dataproc"
|
|
project_id = "my-project"
|
|
name = "my-cluster"
|
|
region = "europe-west1"
|
|
prefix = "prefix"
|
|
iam_bindings_additive = {
|
|
am1-viewer = {
|
|
member = "user:am1@example.com"
|
|
role = "roles/dataproc.viewer"
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2
|
|
```
|
|
<!-- BEGIN TFDOC -->
|
|
## Variables
|
|
|
|
| name | description | type | required | default |
|
|
|---|---|:---:|:---:|:---:|
|
|
| [name](variables.tf#L235) | Cluster name. | <code>string</code> | ✓ | |
|
|
| [project_id](variables.tf#L250) | Project ID. | <code>string</code> | ✓ | |
|
|
| [region](variables.tf#L255) | Dataproc region. | <code>string</code> | ✓ | |
|
|
| [dataproc_config](variables.tf#L17) | Dataproc cluster config. | <code title="object({ graceful_decommission_timeout = optional(string) cluster_config = optional(object({ staging_bucket = optional(string) temp_bucket = optional(string) gce_cluster_config = optional(object({ zone = optional(string) network = optional(string) subnetwork = optional(string) service_account = optional(string) service_account_scopes = optional(list(string)) tags = optional(list(string), []) internal_ip_only = optional(bool) metadata = optional(map(string), {}) reservation_affinity = optional(object({ consume_reservation_type = string key = string values = string })) node_group_affinity = optional(object({ node_group_uri = string })) shielded_instance_config = optional(object({ enable_secure_boot = bool enable_vtpm = bool enable_integrity_monitoring = bool })) })) master_config = optional(object({ num_instances = number machine_type = string min_cpu_platform = string image_uri = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) accelerators = optional(object({ accelerator_type = string accelerator_count = number })) })) worker_config = optional(object({ num_instances = number machine_type = string min_cpu_platform = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) image_uri = string accelerators = optional(object({ accelerator_type = string accelerator_count = number })) })) preemptible_worker_config = optional(object({ num_instances = number preemptibility = string disk_config = optional(object({ boot_disk_type = string boot_disk_size_gb = number num_local_ssds = number })) })) software_config = optional(object({ image_version = optional(string) override_properties = map(string) optional_components = optional(list(string)) })) security_config = optional(object({ kerberos_config = object({ cross_realm_trust_admin_server = optional(string) cross_realm_trust_kdc = optional(string) cross_realm_trust_realm = optional(string) cross_realm_trust_shared_password_uri = optional(string) enable_kerberos = optional(string) kdc_db_key_uri = optional(string) key_password_uri = optional(string) keystore_uri = optional(string) keystore_password_uri = optional(string) kms_key_uri = string realm = optional(string) root_principal_password_uri = string tgt_lifetime_hours = optional(string) truststore_password_uri = optional(string) truststore_uri = optional(string) }) })) autoscaling_config = optional(object({ policy_uri = string })) initialization_action = optional(object({ script = string timeout_sec = optional(string) })) encryption_config = optional(object({ kms_key_name = string })) lifecycle_config = optional(object({ idle_delete_ttl = optional(string) auto_delete_time = optional(string) })) endpoint_config = optional(object({ enable_http_port_access = string })) dataproc_metric_config = optional(object({ metrics = list(object({ metric_source = string metric_overrides = optional(list(string)) })) })) metastore_config = optional(object({ dataproc_metastore_service = string })) })) virtual_cluster_config = optional(object({ staging_bucket = optional(string) auxiliary_services_config = optional(object({ metastore_config = optional(object({ dataproc_metastore_service = string })) spark_history_server_config = optional(object({ dataproc_cluster = string })) })) kubernetes_cluster_config = object({ kubernetes_namespace = optional(string) kubernetes_software_config = object({ component_version = list(map(string)) properties = optional(list(map(string))) }) gke_cluster_config = object({ gke_cluster_target = optional(string) node_pool_target = optional(object({ node_pool = string roles = list(string) node_pool_config = optional(object({ autoscaling = optional(object({ min_node_count = optional(number) max_node_count = optional(number) })) config = object({ machine_type = optional(string) preemptible = optional(bool) local_ssd_count = optional(number) min_cpu_platform = optional(string) spot = optional(bool) }) locations = optional(list(string)) })) })) }) }) })) })">object({…})</code> | | <code>{}</code> |
|
|
| [group_iam](variables.tf#L185) | Authoritative IAM binding for organization groups, in {GROUP_EMAIL => [ROLES]} format. Group emails need to be static. Can be used in combination with the `iam` variable. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
| [iam](variables.tf#L192) | IAM bindings in {ROLE => [MEMBERS]} format. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
| [iam_bindings](variables.tf#L199) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map(object({ members = list(string) role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [iam_bindings_additive](variables.tf#L214) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map(object({ member = string role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [labels](variables.tf#L229) | The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. | <code>map(string)</code> | | <code>{}</code> |
|
|
| [prefix](variables.tf#L240) | Optional prefix used to generate project id and name. | <code>string</code> | | <code>null</code> |
|
|
| [service_account](variables.tf#L260) | Service account to set on the Dataproc cluster. | <code>string</code> | | <code>null</code> |
|
|
|
|
## Outputs
|
|
|
|
| name | description | sensitive |
|
|
|---|---|:---:|
|
|
| [bucket_names](outputs.tf#L19) | List of bucket names which have been assigned to the cluster. | |
|
|
| [http_ports](outputs.tf#L24) | The map of port descriptions to URLs. | |
|
|
| [id](outputs.tf#L29) | Fully qualified cluster id. | |
|
|
| [instance_names](outputs.tf#L34) | List of instance names which have been assigned to the cluster. | |
|
|
| [name](outputs.tf#L43) | The name of the cluster. | |
|
|
<!-- END TFDOC -->
|