cloud-foundation-fabric/blueprints/cloud-operations/network-quota-monitoring/src/README.md

113 lines
5.5 KiB
Markdown
Raw Permalink Normal View History

# Network Quota Monitoring Tool
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
This tool constitutes the discovery and data gathering side of the Network Dashboard, and can be used in combination with the related [Terraform deployment examples](../), or packaged in different ways including standalone manual use.
- [Network Quota Monitoring Tool](#network-dashboard-discovery-tool)
- [Quick Usage Example](#quick-usage-example)
- [High Level Architecture and Plugin Design](#high-level-architecture-and-plugin-design)
- [Debugging and Troubleshooting](#debugging-and-troubleshooting)
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
## Quick Usage Example
The tool behaves like a regular CLI app, with several options documented via the usual short help:
```text
./main.py --help
Usage: main.py [OPTIONS]
CLI entry point.
Options:
-dr, --discovery-root TEXT Root node for asset discovery,
organizations/nnn or folders/nnn. [required]
-mon, --monitoring-project TEXT GCP monitoring project where metrics will be
stored. [required]
-p, --project TEXT GCP project id to be monitored, can be specified multiple
times.
-f, --folder INTEGER GCP folder id to be monitored, can be specified multiple
times.
--custom-quota-file FILENAME Custom quota file in yaml format.
--dump-file FILENAME Export JSON representation of resources to
file.
--load-file FILENAME Load JSON resources from file, skips init and
discovery.
--debug-plugin TEXT Run only core and specified timeseries plugin.
--help Show this message and exit.
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
```
In normal use three pieces of information need to be passed in:
- the monitoring project where metric descriptors and timeseries will be stored
- the discovery root scope (organization or top-level folder, [see here for examples](../deploy-cloud-function/README.md#discovery-configuration))
- the list of folders and/or projects that contain the resources to be monitored (folders will discover all included projects)
To account for custom quota which are not yet exposed via API or which are applied to individual networks, a YAML file with quota overrides can be specified via the `--custom-quota-file` option. Refer to the [included sample](./custom-quotas.sample) for details on its format.
A typical invocation might look like this:
```bash
./main.py \
-dr organizations/1234567890 \
-mon my-monitoring-project \
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
--folder 1234567890 --folder 987654321 \
--project my-net-project \
--custom-quota-file custom-quotas.yaml
```
## High Level Architecture and Plugin Design
The tool is composed of two main processing phases
- the discovery of resources within a predefined scope using Cloud Asset Inventory and Compute APIs
- the computation of metric timeseries derived from discovered resources
Once both phases are complete, the tool sends generated timeseries to Cloud Operations together with any missing metric descriptors.
Every action during those phases is delegated to a series of plugins, which conform to simple interfaces and exchange predefined basic types with the main module. Plugins are registered at runtime, and are split in broad categories depending on the stage where they execute:
- init plugin functions have the task of preparing the required keys in the shared resource data structure. Usually, init functions are usually small and there's one for each discovery plugin
- discovery plugin functions do the bulk of the work of discovering resources; they return HTTP Requests (e.g. calls to GCP APIs) or Resource objects (extracted from the API responses) to the main module, and receive HTTP Responses
- timeseries plugin read from the shared resource data structure, and return computed Metric Descriptors and Timeseries objects
Plugins are registered via simple functions defined in the [plugin package initialization file](./plugins/__init__.py), and leverage [utility functions](./plugins/utils.py) for batching API requests and parsing results.
The main module cycles through stages, calling stage plugins in succession iterating over their results.
## Debugging and Troubleshooting
Note that python version >= 3.10 is required.
If you run into a `ModuleNotFoundError`, install the required dependencies:
`pip3 install -r requirements.txt`
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
A few convenience options are provided to simplify development, debugging and troubleshooting:
- the discovery phase results can be dumped to a JSON file, that can then be used to check actual resource representation, or skip the discovery phase entirely to speed up development of timeseries-related functions
- a single timeseries plugin can be optionally run alone, to focus debugging and decrease the amount of noise from logs and outputs
This is an example call that stores discovery results to a file:
```bash
./main.py \
-dr organizations/1234567890 \
-mon my-monitoring-project \
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
--folder 1234567890 --folder 987654321 \
--project my-net-project \
--custom-quota-file custom-quotas.yaml \
--dump-file out.json
```
And this is the corresponding call that skips the discovery phase and also runs a single timeseries plugin:
```bash
./main.py \
-dr organizations/1234567890 \
-mon my-monitoring-project \
Networking dashboard and discovery tool refactor (#1020) * wip * wip * wip * wip * wip * discovery * single discovery * page token * batch requests * remove plugin name * streamline * streamline * dynamic routes * dynamic routes * forwarding rules and addresses * batch requests * metrics * notes * notes * streamline * fixes, dump * streamline * remove globals * wip metrics * subnet time series * networks per project plugin * firewall rules timeseries * use names in metric labels * firewall policies timeseries * wip * instances per network timeseries * routes timeseries * custom quota * simpler quota, network peering timeseries * peering timeseries * timeseries names * wip descriptors * metric descriptors * fixes * wip * Use partial for all cf init functions * Add requirements.txt * fix org key mismatch * Fix folder short cli name * Fix instance_networks when iterable is empty * more readability and fixing some strings * replace() -> removeprefix and remove unneeded quoting * setdefault in init()s * Fix next hop type * Remove unneeded fstring * create descriptors * create descriptors log * rename descriptor requests function * non-working metrics implementation (duplicate timeseries batched) * timeseries * fixes * write timseries * fix timeseries plugins * start documenting code * docstrings and comments * docstrings comments and small fixes * rename cf to src * discover nodes instead of just projects * discovery node can be a folder or org * cf entrypoint and fixes * cf deployment * remove old paths * cloud function deploy readme * diagrams * resource ids in example * discovery tool readme * top-level README * Some documentation fixes * Add secondary ranges * Update README.md * add legend to scope diagram * improve description of discovery configuration variable * add comment in example for custom quotas file * rename op_project to monitoring_project * dashboard metric rename wip * Update discover-cai-compute.py * deploy sample dashboard Co-authored-by: Julio Castillo <jccb@google.com> Co-authored-by: Aurélien Legrand <aurelien.legrand01@gmail.com>
2022-12-18 01:07:24 -08:00
--folder 1234567890 --folder 987654321 \
--project my-net-project \
--custom-quota-file custom-quotas.yaml \
--load-file out.json \
--debug-plugin plugins.series-firewall-rules.timeseries
```