This example shows how to leverage [Serverless VPC Access](https://cloud.google.com/vpc/docs/configure-serverless-vpc-access) and Cloud Functions to organize a highly performant TCP healthcheck for unmanaged GCE instances. Healthchecker Cloud Function uses [goroutines](https://gobyexample.com/goroutines) to achieve parallel healthchecking for multiple instances and handles up to 1 thousand VMs checked in less than a second execution time.
**_NOTE:_** [Managed Instance Groups](https://cloud.google.com/compute/docs/instance-groups/autohealing-instances-in-migs) has autohealing functionality out of the box, current example is more applicable for standalone VMs or VMs in an unmanaged instance group.
The example contains the following components:
- [Cloud Scheduler](https://cloud.google.com/scheduler) to initiate a healthcheck on a schedule.
- [Serverless VPC Connector](https://cloud.google.com/vpc/docs/configure-serverless-vpc-access) to allow Cloud Functions TCP level access to private GCE instances.
- **Healthchecker** Cloud Function to perform TCP checks against GCE instances.
- **Restarter** PubSub topic to keep track of instances which are to be restarted.
- **Restarter** Cloud Function to perform GCE instance reset for instances which are failing TCP healthcheck.
The resources created in this example are shown in the high level diagram below:
**_NOTE:_** In the current example `healthchecker` is used along with the `restarter` cloud function, but restarter can be replaced with another function like [Pubsub2Inbox](https://github.com/GoogleCloudPlatform/professional-services/tree/main/tools/pubsub2inbox) for email notifications.
Clone this repository or [open it in cloud shell](https://ssh.cloud.google.com/cloudshell/editor?cloudshell_git_repo=https%3A%2F%2Fgithub.com%2Fterraform-google-modules%2Fcloud-foundation-fabric&cloudshell_print=cloud-shell-readme.txt&cloudshell_working_dir=examples%2Fcloud-operations%2Funmanaged-instances-healthcheck), then go through the following steps to create resources:
-`terraform init`
-`terraform apply -var project_id=my-project-id`
Once done testing, you can clean up resources by running `terraform destroy`. To persist state, check out the `backend.tf.sample` file.
## Testing the example
Configure `gcloud` with the project used for the deployment
```bash
gcloud config set project <MY-PROJECT-ID>
```
Wait until cloud scheduler executes the healthchecker function
```bash
gcloud scheduler jobs describe healthchecker-schedule
```
Check the healthchecker function logs to ensure instance is checked and healthy
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [billing_account](variables.tf#L16) | Billing account id used as default for new projects. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L33) | Project id to create a project when `project_create` is `true`, or to be used when `false`. | <code>string</code> | ✓ | |
| [grace_period](variables.tf#L56) | Grace period for an instance startup. | <code>string</code> | | <code>"180s"</code> |
| [location](variables.tf#L21) | App Engine location used in the example (required for CloudFunctions). | <code>string</code> | | <code>"europe-west"</code> |
| [project_create](variables.tf#L27) | Create project instead of using an existing one. | <code>bool</code> | | <code>false</code> |
| [region](variables.tf#L38) | Compute region used in the example. | <code>string</code> | | <code>"europe-west1"</code> |
| [root_node](variables.tf#L44) | The resource name of the parent folder or organization for project creation, in 'folders/folder_id' or 'organizations/org_id' format. | <code>string</code> | | <code>null</code> |