cloud-foundation-fabric/modules/dataproc
Ludovico Magnocavallo 6941313c7d
Factories refactor (#1843)
* factories refactor doc

* Adds file schema and filesystem organization

* Update 20231106-factories.md

* move factories out of blueprints and create new factories  README

* align factory in billing-account module

* align factory in dataplex-datascan module

* align factory in billing-account module

* align factory in net-firewall-policy module

* align factory in dns-response-policy module

* align factory in net-vpc-firewall module

* align factory in net-vpc module

* align factory variable names in FAST

* remove decentralized firewall blueprint

* bump terraform version

* bump module versions

* update top-level READMEs

* move project factory to modules

* fix variable names and tests

* tfdoc

* remove changelog link

* add project factory to top-level README

* fix cludrun eventarc diff

* fix README

* fix cludrun eventarc diff

---------

Co-authored-by: Simone Ruffilli <sruffilli@google.com>
2024-02-26 10:16:52 +00:00
..
README.md hotfix/dataproc-variables: fix type of kubernetes_software_config.component_version and properties (#2104) 2024-02-22 07:23:38 +00:00
iam.tf Extend FAST to support different principal types (#2064) 2024-02-12 14:35:30 +01:00
main.tf hotfix/dataproc-variables: fix type of kubernetes_software_config.component_version and properties (#2104) 2024-02-22 07:23:38 +00:00
outputs.tf Ensure all modules have an `id` output (#1410) 2023-06-02 16:07:22 +02:00
variables-iam.tf Extend FAST to support different principal types (#2064) 2024-02-12 14:35:30 +01:00
variables.tf hotfix/dataproc-variables: fix type of kubernetes_software_config.component_version and properties (#2104) 2024-02-22 07:23:38 +00:00
versions.tf Factories refactor (#1843) 2024-02-26 10:16:52 +00:00

README.md

Google Cloud Dataproc

This module Manages a Google Cloud Dataproc cluster resource, including IAM.

TODO

Examples

Simple

module "processing-dp-cluster-2" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-cluster"
  region     = "europe-west1"
}
# tftest modules=1 resources=1

Cluster configuration on GCE

To set cluster configuration use the 'dataproc_config.cluster_config' variable.

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-cluster"
  region     = "europe-west1"
  prefix     = "prefix"
  dataproc_config = {
    cluster_config = {
      gce_cluster_config = {
        subnetwork             = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
        zone                   = "europe-west1-b"
        service_account        = ""
        service_account_scopes = ["cloud-platform"]
        internal_ip_only       = true
      }
    }
  }
}
# tftest modules=1 resources=1

Cluster configuration on GCE with CMEK encryption

To set cluster configuration use the Customer Managed Encryption key, set dataproc_config.encryption_config. variable. The Compute Engine service agent and the Cloud Storage service agent need to have CryptoKey Encrypter/Decrypter role on they configured KMS key (Documentation).

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-cluster"
  region     = "europe-west1"
  prefix     = "prefix"
  dataproc_config = {
    cluster_config = {
      gce_cluster_config = {
        subnetwork             = "https://www.googleapis.com/compute/v1/projects/PROJECT/regions/europe-west1/subnetworks/SUBNET"
        zone                   = "europe-west1-b"
        service_account        = ""
        service_account_scopes = ["cloud-platform"]
        internal_ip_only       = true
      }
    }
    encryption_config = {
      kms_key_name = "projects/project-id/locations/region/keyRings/key-ring-name/cryptoKeys/key-name"
    }
  }
}
# tftest modules=1 resources=1

Cluster configuration on GKE

To set cluster configuration GKE use the 'dataproc_config.virtual_cluster_config' variable.

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-gke-cluster"
  region     = "europe-west1"
  prefix     = "prefix"
  dataproc_config = {
    virtual_cluster_config = {
      kubernetes_cluster_config = {
        kubernetes_namespace = "foobar"
        kubernetes_software_config = {
          component_version = {
            "SPARK" : "3.1-dataproc-7"
          }
          properties = {
            "spark:spark.kubernetes.container.image" : "us-east4-docker.pkg.dev/cloud-dataproc/dpgke/sparkengine:dataproc-14"
          }
        }
        gke_cluster_config = {
          gke_cluster_target = "projects/my-project/locations/my-location/clusters/gke-cluster-name"
          node_pool_target = {
            node_pool = "node-pool-name"
            roles     = ["DEFAULT"]
          }
        }
      }
    }
  }
}
# tftest modules=1 resources=1

IAM

IAM is managed via several variables that implement different features and levels of control:

  • iam and iam_by_principals configure authoritative bindings that manage individual roles exclusively, and are internally merged
  • iam_bindings configure authoritative bindings with optional support for conditions, and are not internally merged with the previous two variables
  • iam_bindings_additive configure additive bindings via individual role/member pairs with optional support conditions

The authoritative and additive approaches can be used together, provided different roles are managed by each. Some care must also be taken with the iam_by_principals variable to ensure that variable keys are static values, so that Terraform is able to compute the dependency graph.

Refer to the project module for examples of the IAM interface.

Authoritative IAM

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-cluster"
  region     = "europe-west1"
  prefix     = "prefix"
  iam_by_principals = {
    "group:gcp-data-engineers@example.net" = [
      "roles/dataproc.viewer"
    ]
  }
  iam = {
    "roles/dataproc.viewer" = [
      "serviceAccount:service-account@PROJECT_ID.iam.gserviceaccount.com"
    ]
  }
}
# tftest modules=1 resources=2

Additive IAM

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = "my-project"
  name       = "my-cluster"
  region     = "europe-west1"
  prefix     = "prefix"
  iam_bindings_additive = {
    am1-viewer = {
      member = "user:am1@example.com"
      role   = "roles/dataproc.viewer"
    }
  }
}
# tftest modules=1 resources=2

Variables

name description type required default
name Cluster name. string
project_id Project ID. string
region Dataproc region. string
dataproc_config Dataproc cluster config. object({…}) {}
iam IAM bindings in {ROLE => [MEMBERS]} format. map(list(string)) {}
iam_bindings Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. map(object({…})) {}
iam_bindings_additive Individual additive IAM bindings. Keys are arbitrary. map(object({…})) {}
iam_by_principals Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid cycle errors. Merged internally with the iam variable. map(list(string)) {}
labels The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. map(string) {}
prefix Optional prefix used to generate project id and name. string null
service_account Service account to set on the Dataproc cluster. string null

Outputs

name description sensitive
bucket_names List of bucket names which have been assigned to the cluster.
http_ports The map of port descriptions to URLs.
id Fully qualified cluster id.
instance_names List of instance names which have been assigned to the cluster.
name The name of the cluster.