# Google Cloud Dataproc
This module Manages a Google Cloud [Dataproc](https://cloud.google.com/dataproc) cluster resource, including IAM.
- [TODO](#todo)
- [Examples](#examples)
- [Simple](#simple)
- [Cluster configuration](#cluster-configuration)
- [Cluster with CMEK encryption](#cluster-with-cmek-encryption)
- [IAM](#iam)
- [Authoritative IAM](#authoritative-iam)
- [Additive IAM](#additive-iam)
- [Variables](#variables)
- [Outputs](#outputs)
## TODO
- [ ] Add support for Cloud Dataproc [autoscaling policy](https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/dataproc_autoscaling_policy_iam).
## Examples
### Simple
```hcl
module "processing-dp-cluster-2" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
}
# tftest modules=1 resources=1
```
### Cluster configuration
To set cluster configuration use the 'dataproc_config.cluster_config' variable.
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
dataproc_config = {
cluster_config = {
gce_cluster_config = {
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
zone = "europe-west1-b"
service_account = ""
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
}
}
}
# tftest modules=1 resources=1
```
### Cluster with CMEK encryption
To set cluster configuration use the Customer Managed Encryption key, set `dataproc_config.encryption_config.` variable. The Compute Engine service agent and the Cloud Storage service agent need to have `CryptoKey Encrypter/Decrypter` role on they configured KMS key ([Documentation](https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/customer-managed-encryption)).
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
dataproc_config = {
cluster_config = {
gce_cluster_config = {
subnetwork = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
zone = "europe-west1-b"
service_account = ""
service_account_scopes = ["cloud-platform"]
internal_ip_only = true
}
}
encryption_config = {
kms_key_name = "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name"
}
}
}
# tftest modules=1 resources=1
```
## IAM
IAM is managed via several variables that implement different features and levels of control:
- `iam` and `group_iam` configure authoritative bindings that manage individual roles exclusively, and are internally merged
- `iam_bindings` configure authoritative bindings with optional support for conditions, and are not internally merged with the previous two variables
- `iam_bindings_additive` configure additive bindings via individual role/member pairs with optional support conditions
The authoritative and additive approaches can be used together, provided different roles are managed by each. Some care must also be taken with the `groups_iam` variable to ensure that variable keys are static values, so that Terraform is able to compute the dependency graph.
Refer to the [project module](../project/README.md#iam) for examples of the IAM interface.
### Authoritative IAM
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
group_iam = {
"gcp-data-engineers@example.net" = [
"roles/dataproc.viewer"
]
}
iam = {
"roles/dataproc.viewer" = [
"serviceAccount:service-account@PROJECT_ID.iam.gserviceaccount.com"
]
}
}
# tftest modules=1 resources=2
```
### Additive IAM
```hcl
module "processing-dp-cluster" {
source = "./fabric/modules/dataproc"
project_id = "my-project"
name = "my-cluster"
region = "europe-west1"
prefix = "prefix"
iam_bindings_additive = {
am1-viewer = {
member = "user:am1@example.com"
role = "roles/dataproc.viewer"
}
}
}
# tftest modules=1 resources=2
```
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [name](variables.tf#L235) | Cluster name. | string
| ✓ | |
| [project_id](variables.tf#L250) | Project ID. | string
| ✓ | |
| [region](variables.tf#L255) | Dataproc region. | string
| ✓ | |
| [dataproc_config](variables.tf#L17) | Dataproc cluster config. | object({…})
| | {}
|
| [group_iam](variables.tf#L185) | Authoritative IAM binding for organization groups, in {GROUP_EMAIL => [ROLES]} format. Group emails need to be static. Can be used in combination with the `iam` variable. | map(list(string))
| | {}
|
| [iam](variables.tf#L192) | IAM bindings in {ROLE => [MEMBERS]} format. | map(list(string))
| | {}
|
| [iam_bindings](variables.tf#L199) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | map(object({…}))
| | {}
|
| [iam_bindings_additive](variables.tf#L214) | Individual additive IAM bindings. Keys are arbitrary. | map(object({…}))
| | {}
|
| [labels](variables.tf#L229) | The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. | map(string)
| | {}
|
| [prefix](variables.tf#L240) | Optional prefix used to generate project id and name. | string
| | null
|
| [service_account](variables.tf#L260) | Service account to set on the Dataproc cluster. | string
| | null
|
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [bucket_names](outputs.tf#L19) | List of bucket names which have been assigned to the cluster. | |
| [http_ports](outputs.tf#L24) | The map of port descriptions to URLs. | |
| [id](outputs.tf#L29) | Fully qualified cluster id. | |
| [instance_names](outputs.tf#L34) | List of instance names which have been assigned to the cluster. | |
| [name](outputs.tf#L43) | The name of the cluster. | |