2023-02-27 13:14:07 -08:00
# Google Cloud Dataproc
2023-02-27 22:48:47 -08:00
This module Manages a Google Cloud [Dataproc ](https://cloud.google.com/dataproc ) cluster resource, including IAM.
2023-02-27 13:14:07 -08:00
2023-08-14 02:54:50 -07:00
<!-- BEGIN TOC -->
- [TODO ](#todo )
- [Examples ](#examples )
- [Simple ](#simple )
- [Cluster configuration ](#cluster-configuration )
- [Cluster with CMEK encryption ](#cluster-with-cmek-encryption )
2023-08-20 00:44:20 -07:00
- [IAM ](#iam )
2023-08-14 02:54:50 -07:00
- [Authoritative IAM ](#authoritative-iam )
- [Additive IAM ](#additive-iam )
- [Variables ](#variables )
- [Outputs ](#outputs )
<!-- END TOC -->
2023-02-27 21:52:06 -08:00
## TODO
2023-02-27 22:48:47 -08:00
- [ ] Add support for Cloud Dataproc [autoscaling policy ](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_autoscaling_policy_iam ).
2023-02-27 21:52:06 -08:00
2023-02-27 13:14:07 -08:00
## Examples
### Simple
```hcl
module "processing-dp-cluster-2" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
}
# tftest modules=1 resources=1
```
### Cluster configuration
2023-02-27 21:52:06 -08:00
To set cluster configuration use the 'dataproc_config.cluster_config' variable.
2023-02-27 13:14:07 -08:00
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
dataproc_config = {
cluster_config = {
gce_cluster_config = {
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
zone = "europe-west1-b"
service_account = ""
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
}
}
}
# tftest modules=1 resources=1
```
2023-03-01 01:48:33 -08:00
### Cluster with CMEK encryption
2023-02-28 22:54:10 -08:00
2023-03-01 01:44:01 -08:00
To set cluster configuration use the Customer Managed Encryption key, set `dataproc_config.encryption_config.` variable. The Compute Engine service agent and the Cloud Storage service agent need to have `CryptoKey Encrypter/Decrypter` role on they configured KMS key ([Documentation](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/customer-managed-encryption)).
2023-02-28 22:54:10 -08:00
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
dataproc_config = {
cluster_config = {
gce_cluster_config = {
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
zone = "europe-west1-b"
service_account = ""
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
}
2023-03-01 01:44:01 -08:00
encryption_config = {
kms_key_name = "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name"
}
2023-02-28 22:54:10 -08:00
}
}
# tftest modules=1 resources=1
```
2023-08-20 00:44:20 -07:00
## IAM
2023-02-27 13:14:07 -08:00
2023-08-20 00:44:20 -07:00
IAM is managed via several variables that implement different features and levels of control:
2023-02-27 13:14:07 -08:00
2023-08-20 00:44:20 -07:00
- `iam` and `group_iam` configure authoritative bindings that manage individual roles exclusively, and are internally merged
- `iam_bindings` configure authoritative bindings with optional support for conditions, and are not internally merged with the previous two variables
- `iam_bindings_additive` configure additive bindings via individual role/member pairs with optional support conditions
2023-08-14 02:54:50 -07:00
2023-08-20 00:44:20 -07:00
The authoritative and additive approaches can be used together, provided different roles are managed by each. Some care must also be taken with the `groups_iam` variable to ensure that variable keys are static values, so that Terraform is able to compute the dependency graph.
2023-02-27 13:14:07 -08:00
2023-08-20 00:44:20 -07:00
Refer to the [project module ](../project/README.md#iam ) for examples of the IAM interface.
2023-02-27 13:14:07 -08:00
2023-08-20 00:44:20 -07:00
### Authoritative IAM
2023-02-27 13:14:07 -08:00
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
group_iam = {
"gcp-data-engineers@example.net" = [
"roles/dataproc.viewer"
]
}
2023-08-20 00:44:20 -07:00
iam = {
"roles/dataproc.viewer" = [
"serviceAccount:service-account@PROJECT_ID.iam.gserviceaccount.com"
]
}
2023-02-27 13:14:07 -08:00
}
# tftest modules=1 resources=2
```
2023-02-27 21:52:06 -08:00
### Additive IAM
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
2023-08-20 00:44:20 -07:00
iam_bindings_additive = {
2023-08-14 02:54:50 -07:00
am1-viewer = {
member = "user:am1@example.com"
role = "roles/dataproc.viewer"
}
}
2023-02-27 21:52:06 -08:00
}
2023-08-20 00:44:20 -07:00
# tftest modules=1 resources=2
2023-02-27 21:52:06 -08:00
```
2023-02-27 13:14:07 -08:00
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
2023-09-07 23:56:31 -07:00
| [name ](variables.tf#L235 ) | Cluster name. | < code > string</ code > | ✓ | |
| [project_id ](variables.tf#L250 ) | Project ID. | < code > string</ code > | ✓ | |
| [region ](variables.tf#L255 ) | Dataproc region. | < code > string</ code > | ✓ | |
2023-06-28 11:09:18 -07:00
| [dataproc_config ](variables.tf#L17 ) | Dataproc cluster config. | < code title = "object({ graceful_decommission_timeout &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; cluster_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; staging_bucket &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; temp_bucket &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; gce_cluster_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; zone &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; network &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; subnetwork &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; service_account &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; service_account_scopes &# 61 ; optional &# 40 ; list &# 40 ; string &# 41 ;&# 41 ;&# 10 ; tags &# 61 ; optional &# 40 ; list &# 40 ; string &# 41 ;, &# 91 ;&# 93 ;&# 41 ;&# 10 ; internal_ip_only &# 61 ; optional &# 40 ; bool &# 41 ;&# 10 ; metadata &# 61 ; optional &# 40 ; map &# 40 ; string &# 41 ;, &# 123 ;&# 125 ;&# 41 ;&# 10 ; reservation_affinity &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; consume_reservation_type &# 61 ; string &# 10 ; key &# 61 ; string &# 10 ; values &# 61 ; string &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; node_group_affinity &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; node_group_uri &# 61 ; string &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ;&# 10 ;&# 10 ; shielded_instance_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; enable_secure_boot &# 61 ; bool &# 10 ; enable_vtpm &# 61 ; bool &# 10 ; enable_integrity_monitoring &# 61 ; bool &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; master_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; num_instances &# 61 ; number &# 10 ; machine_type &# 61 ; string &# 10 ; min_cpu_platform &# 61 ; string &# 10 ; image_uri &# 61 ; string &# 10 ; disk_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; boot_disk_type &# 61 ; string &# 10 ; boot_disk_size_gb &# 61 ; number &# 10 ; num_local_ssds &# 61 ; number &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; accelerators &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; accelerator_type &# 61 ; string &# 10 ; accelerator_count &# 61 ; number &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; worker_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; num_instances &# 61 ; number &# 10 ; machine_type &# 61 ; string &# 10 ; min_cpu_platform &# 61 ; string &# 10 ; disk_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; boot_disk_type &# 61 ; string &# 10 ; boot_disk_size_gb &# 61 ; number &# 10 ; num_local_ssds &# 61 ; number &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; image_uri &# 61 ; string &# 10 ; accelerators &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; accelerator_type &# 61 ; string &# 10 ; accelerator_count &# 61 ; number &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; preemptible_worker_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; num_instances &# 61 ; number &# 10 ; preemptibility &# 61 ; string &# 10 ; disk_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; boot_disk_type &# 61 ; string &# 10 ; boot_disk_size_gb &# 61 ; number &# 10 ; num_local_ssds &# 61 ; number &# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; software_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; image_version &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; override_properties &# 61 ; map &# 40 ; string &# 41 ;&# 10 ; optional_components &# 61 ; optional &# 40 ; list &# 40 ; string &# 41 ;&# 41 ;&# 10 ; &# 125 ;&# 41 ;&# 41 ;&# 10 ; security_config &# 61 ; optional &# 40 ; object &# 40 ;&# 123 ;&# 10 ; kerberos_config &# 61 ; object &# 40 ;&# 123 ;&# 10 ; cross_realm_trust_admin_server &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; cross_realm_trust_kdc &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; cross_realm_trust_realm &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; cross_realm_trust_shared_password_uri &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; enable_kerberos &# 61 ; optional &# 40 ; string &# 41 ;&# 10 ; kdc_db_
| [group_iam ](variables.tf#L185 ) | Authoritative IAM binding for organization groups, in {GROUP_EMAIL => [ROLES]} format. Group emails need to be static. Can be used in combination with the `iam` variable. | < code > map( list( string)) </ code > | | < code > {} </ code > |
| [iam ](variables.tf#L192 ) | IAM bindings in {ROLE => [MEMBERS]} format. | < code > map( list( string)) </ code > | | < code > {} </ code > |
2023-09-07 23:56:31 -07:00
| [iam_bindings ](variables.tf#L199 ) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | < code title = "map(object({ members = list(string) role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [iam_bindings_additive ](variables.tf#L214 ) | Individual additive IAM bindings. Keys are arbitrary. | < code title = "map(object({ member = string role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))" > map( object({…})) </ code > | | < code > {} </ code > |
| [labels ](variables.tf#L229 ) | The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. | < code > map( string) </ code > | | < code > {} </ code > |
| [prefix ](variables.tf#L240 ) | Optional prefix used to generate project id and name. | < code > string</ code > | | < code > null</ code > |
| [service_account ](variables.tf#L260 ) | Service account to set on the Dataproc cluster. | < code > string</ code > | | < code > null</ code > |
2023-02-27 13:14:07 -08:00
2023-02-28 01:35:44 -08:00
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [bucket_names ](outputs.tf#L19 ) | List of bucket names which have been assigned to the cluster. | |
2023-02-28 01:50:58 -08:00
| [http_ports ](outputs.tf#L24 ) | The map of port descriptions to URLs. | |
2023-06-02 07:07:22 -07:00
| [id ](outputs.tf#L29 ) | Fully qualified cluster id. | |
| [instance_names ](outputs.tf#L34 ) | List of instance names which have been assigned to the cluster. | |
| [name ](outputs.tf#L43 ) | The name of the cluster. | |
2023-02-27 13:14:07 -08:00
<!-- END TFDOC -->