docs(decisions): add architectural decision records structure (#9310)

* docs(decisions): add architectural decision records structure

Create a structured decision records system to document important technical choices across multiple domains (DevOps, Network, Consensus, etc.).

This implements a modified MADR template approach for preserving context, trade-offs, and reasoning behind significant architectural decisions.

* fix(docs): suggestions from code review

Co-authored-by: Marek <mail@marek.onl>

---------

Co-authored-by: Marek <mail@marek.onl>
This commit is contained in:
Gustavo Valverde 2025-03-10 14:17:26 +00:00 committed by GitHub
parent de7e5b547f
commit f873aa12a6
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 288 additions and 0 deletions

22
docs/decisions/README.md Normal file
View File

@ -0,0 +1,22 @@
# Decision Log
We capture important decisions with [architectural decision records](https://adr.github.io/).
These records provide context, trade-offs, and reasoning taken at our community & technical cross-roads. Our goal is to preserve the understanding of the project growth, and capture enough insight to effectively revisit previous decisions.
To get started, create a new decision record using the template:
```sh
cp template.md NNNN-title-with-dashes.md
```
For more rationale for this approach, see [Michael Nygard's article](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).
We've inherited MADR [ADR template](https://adr.github.io/madr/), which is a bit more verbose than Nygard's original template. We may simplify it in the future.
## Evolving Decisions
Many decisions build on each other, a driver of iterative change and messiness
in software. By laying out the "story arc" of a particular system within the
application, we hope future maintainers will be able to identify how to rewind
decisions when refactoring the application becomes necessary.

View File

@ -0,0 +1,51 @@
---
status: accepted
date: 2025-02-28
story: Appropriate UID/GID values for container users
---
# Use High UID/GID Values for Container Users
## Context & Problem Statement
Docker containers share the host's user namespace by default. If container UIDs/GIDs overlap with privileged host accounts, this could lead to privilege escalation if a container escape vulnerability is exploited. Low UIDs (especially in the system user range of 100-999) are particularly risky as they often map to privileged system users on the host.
Our previous approach used UID/GID 101 with the `--system` flag for user creation, which falls within the system user range and could potentially overlap with critical system users on the host.
## Priorities & Constraints
* Enhance security by reducing the risk of container user namespace overlaps
* Avoid warnings during container build related to system user ranges
* Maintain compatibility with common Docker practices
* Prevent potential privilege escalation in case of container escape
## Considered Options
* Option 1: Keep using low UID/GID (101) with `--system` flag
* Option 2: Use UID/GID (1000+) without `--system` flag
* Option 3: Use high UID/GID (10000+) without `--system` flag
## Decision Outcome
Chosen option: [Option 3: Use high UID/GID (10000+) without `--system` flag]
We decided to:
1. Change the default UID/GID from 101 to 10001
2. Remove the `--system` flag from user/group creation commands
3. Document the security rationale for these changes
This approach significantly reduces the risk of UID/GID collision with host system users while avoiding build-time warnings related to system user ranges. Using a very high UID/GID (10001) provides an additional security boundary in containers where user namespaces are shared with the host.
### Expected Consequences
* Improved security posture by reducing the risk of container escapes leading to privilege escalation
* Elimination of build-time warnings related to system user UID/GID ranges
* Consistency with industry best practices for container security
* No functional impact on container operation, as the internal user permissions remain the same
## More Information
* [NGINX Docker User ID Issue](https://github.com/nginxinc/docker-nginx/issues/490) - Demonstrates the risks of using UID 101 which overlaps with `systemd-network` user on Debian systems
* [.NET Docker Issue on System Users](https://github.com/dotnet/dotnet-docker/issues/4624) - Details the problems with using `--system` flag and the SYS_UID_MAX warnings
* [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/) - General security recommendations for Docker containers

View File

@ -0,0 +1,51 @@
---
status: accepted
date: 2025-02-28
story: Volumes permissions and privilege management in container entrypoint
---
# Use gosu for Privilege Dropping in Entrypoint
## Context & Problem Statement
Running containerized applications as the root user is a security risk. If an attacker compromises the application, they gain root access within the container, potentially facilitating a container escape. However, some operations during container startup, such as creating directories or modifying file permissions in locations not owned by the application user, require root privileges. We need a way to perform these initial setup tasks as root, but then switch to a non-privileged user *before* executing the main application (`zebrad`). Using `USER` in the Dockerfile is insufficient because it applies to the entire runtime, and we need to change permissions *after* volumes are mounted.
## Priorities & Constraints
* Minimize the security risk by running the main application (`zebrad`) as a non-privileged user.
* Allow initial setup tasks (file/directory creation, permission changes) that require root privileges.
* Maintain a clean and efficient entrypoint script.
* Avoid complex signal handling and TTY issues associated with `su` and `sudo`.
* Ensure 1:1 parity with Docker's `--user` flag behavior.
## Considered Options
* Option 1: Use `USER` directive in Dockerfile.
* Option 2: Use `su` within the entrypoint script.
* Option 3: Use `sudo` within the entrypoint script.
* Option 4: Use `gosu` within the entrypoint script.
* Option 5: Use `chroot --userspec`
* Option 6: Use `setpriv`
## Decision Outcome
Chosen option: [Option 4: Use `gosu` within the entrypoint script]
We chose to use `gosu` because it provides a simple and secure way to drop privileges from root to a non-privileged user *after* performing necessary setup tasks. `gosu` avoids the TTY and signal-handling complexities of `su` and `sudo`. It's designed specifically for this use case (dropping privileges in container entrypoints) and leverages the same underlying mechanisms as Docker itself for user/group handling, ensuring consistent behavior.
### Expected Consequences
* Improved security by running `zebrad` as a non-privileged user.
* Simplified entrypoint script compared to using `su` or `sudo`.
* Avoidance of TTY and signal-handling issues.
* Consistent behavior with Docker's `--user` flag.
* No negative impact on functionality, as initial setup tasks can still be performed.
## More Information
* [gosu GitHub repository](https://github.com/tianon/gosu#why) - Explains the rationale behind `gosu` and its advantages over `su` and `sudo`.
* [gosu usage warning](https://github.com/tianon/gosu#warning) - Highlights the core use case (stepping down from root) and potential vulnerabilities in other scenarios.
* Alternatives considered:
* `chroot --userspec`: While functional, it's less common and less directly suited to this specific task than `gosu`.
* `setpriv`: A viable alternative, but `gosu` is already well-established in our workflow and offers the desired functionality with a smaller footprint than a full `util-linux` installation.
* `su-exec`: Another minimal alternative, but it has known parser bugs that could lead to unexpected root execution.

View File

@ -0,0 +1,115 @@
---
status: proposed
date: 2025-02-28
story: Standardize filesystem hierarchy for Zebra deployments
---
# Standardize Filesystem Hierarchy: FHS vs. XDG
## Context & Problem Statement
Zebra currently has inconsistencies in its filesystem layout, particularly regarding where configuration, data, cache files, and binaries are stored. We need a standardized approach compatible with:
1. Traditional Linux systems.
2. Containerized deployments (Docker).
3. Cloud environments with stricter filesystem restrictions (e.g., Google's Container-Optimized OS).
We previously considered using the Filesystem Hierarchy Standard (FHS) exclusively ([Issue #3432](https://github.com/ZcashFoundation/zebra/issues/3432)). However, recent changes introduced the XDG Base Directory Specification, which offers a user-centric approach. We need to decide whether to:
* Adhere to FHS.
* Adopt XDG Base Directory Specification.
* Use a hybrid approach, leveraging the strengths of both.
The choice impacts how we structure our Docker images, where configuration files are located, and how users interact with Zebra in different environments.
## Priorities & Constraints
* **Security:** Minimize the risk of privilege escalation by adhering to least-privilege principles.
* **Maintainability:** Ensure a clear and consistent filesystem layout that is easy to understand and maintain.
* **Compatibility:** Work seamlessly across various Linux distributions, Docker, and cloud environments (particularly those with restricted filesystems like Google's Container-Optimized OS).
* **User Experience:** Provide a predictable and user-friendly experience for locating configuration and data files.
* **Flexibility:** Allow users to override default locations via environment variables where appropriate.
* **Avoid Breaking Changes:** Minimize disruption to existing users and deployments, if possible.
## Considered Options
### Option 1: FHS
* Configuration: `/etc/zebrad/`
* Data: `/var/lib/zebrad/`
* Cache: `/var/cache/zebrad/`
* Logs: `/var/log/zebrad/`
* Binary: `/opt/zebra/bin/zebrad` or `/usr/local/bin/zebrad`
### Option 2: XDG Base Directory Specification
* Configuration: `$HOME/.config/zebrad/`
* Data: `$HOME/.local/share/zebrad/`
* Cache: `$HOME/.cache/zebrad/`
* State: `$HOME/.local/state/zebrad/`
* Binary: `$HOME/.local/bin/zebrad` or `/usr/local/bin/zebrad`
### Option 3: Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
* System-wide configuration: `/etc/zebrad/`
* User-specific configuration: `$XDG_CONFIG_HOME/zebrad/`
* System-wide data (read-only, shared): `/usr/share/zebrad/` (e.g., checkpoints)
* User-specific data: `$XDG_DATA_HOME/zebrad/`
* Cache: `$XDG_CACHE_HOME/zebrad/`
* State: `$XDG_STATE_HOME/zebrad/`
* Runtime: `$XDG_RUNTIME_DIR/zebrad/`
* Binary: `/opt/zebra/bin/zebrad` (system-wide) or `$HOME/.local/bin/zebrad` (user-specific)
## Pros and Cons of the Options
### FHS
* **Pros:**
* Traditional and well-understood by system administrators.
* Clear separation of configuration, data, cache, and binaries.
* Suitable for packaged software installations.
* **Cons:**
* Less user-friendly; requires root access to modify configuration.
* Can conflict with stricter cloud environments restricting writes to `/etc` and `/var`.
* Doesn't handle multi-user scenarios as gracefully as XDG.
### XDG Base Directory Specification
* **Pros:**
* User-centric: configuration and data stored in user-writable locations.
* Better suited for containerized and cloud environments.
* Handles multi-user scenarios gracefully.
* Clear separation of configuration, data, cache, and state.
* **Cons:**
* Less traditional; might be unfamiliar to some system administrators.
* Requires environment variables to be set correctly.
* Binary placement less standardized.
### Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
* **Pros:**
* Combines strengths of FHS and XDG.
* Allows system-wide defaults while prioritizing user-specific configurations.
* Flexible and adaptable to different deployment scenarios.
* Clear binary placement in `/opt`.
* **Cons:**
* More complex than either FHS or XDG alone.
* Requires careful consideration of precedence rules.
## Decision Outcome
Pending
## Expected Consequences
Pending
## More Information
* [Filesystem Hierarchy Standard (FHS) v3.0](https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html)
* [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/)
* [Zebra Issue #3432: Use the Filesystem Hierarchy Standard (FHS) for deployments and artifacts](https://github.com/ZcashFoundation/zebra/issues/3432)
* [Google Container-Optimized OS: Working with the File System](https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem#working_with_the_file_system)

View File

@ -0,0 +1,49 @@
---
# status and date are the only required elements. Feel free to remove the rest.
status: {[proposed | rejected | accepted | deprecated | … | superseded by [ADR-NAME](adr-file-name.md)]}
date: {YYYY-MM-DD when the decision was last updated}
builds-on: {[Short Title](2021-05-15-short-title.md)}
story: {description or link to contextual issue}
---
# {short title of solved problem and solution}
## Context and Problem Statement
{2-3 sentences explaining the problem and the forces influencing the decision.}
<!-- The language in this section is value-neutral. It is simply describing facts. -->
## Priorities & Constraints <!-- optional -->
* {List of concerns or constraints}
* {Factors influencing the decision}
## Considered Options
* Option 1: Thing
* Option 2: Another
### Pros and Cons of the Options <!-- optional -->
#### Option 1: {Brief description}
* Good, because {reason}
* Bad, because {reason}
## Decision Outcome
Chosen option [Option 1: Thing]
{Clearly state the chosen option and provide justification. Reference the "Pros and Cons of the Options" section below if applicable.}
### Expected Consequences <!-- optional -->
* List of outcomes resulting from this decision
<!-- Positive, negative, and/or neutral consequences, as long as they affect the team and project in the future. -->
## More Information <!-- optional -->
<!-- * Resources reviewed as part of making this decision -->
<!-- * Links to any supporting documents or resources -->
<!-- * Related PRs -->
<!-- * Related User Journeys -->