docs(decisions): add architectural decision records structure (#9310)
* docs(decisions): add architectural decision records structure Create a structured decision records system to document important technical choices across multiple domains (DevOps, Network, Consensus, etc.). This implements a modified MADR template approach for preserving context, trade-offs, and reasoning behind significant architectural decisions. * fix(docs): suggestions from code review Co-authored-by: Marek <mail@marek.onl> --------- Co-authored-by: Marek <mail@marek.onl>
This commit is contained in:
parent
de7e5b547f
commit
f873aa12a6
|
@ -0,0 +1,22 @@
|
|||
# Decision Log
|
||||
|
||||
We capture important decisions with [architectural decision records](https://adr.github.io/).
|
||||
|
||||
These records provide context, trade-offs, and reasoning taken at our community & technical cross-roads. Our goal is to preserve the understanding of the project growth, and capture enough insight to effectively revisit previous decisions.
|
||||
|
||||
To get started, create a new decision record using the template:
|
||||
|
||||
```sh
|
||||
cp template.md NNNN-title-with-dashes.md
|
||||
```
|
||||
|
||||
For more rationale for this approach, see [Michael Nygard's article](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).
|
||||
|
||||
We've inherited MADR [ADR template](https://adr.github.io/madr/), which is a bit more verbose than Nygard's original template. We may simplify it in the future.
|
||||
|
||||
## Evolving Decisions
|
||||
|
||||
Many decisions build on each other, a driver of iterative change and messiness
|
||||
in software. By laying out the "story arc" of a particular system within the
|
||||
application, we hope future maintainers will be able to identify how to rewind
|
||||
decisions when refactoring the application becomes necessary.
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
status: accepted
|
||||
date: 2025-02-28
|
||||
story: Appropriate UID/GID values for container users
|
||||
---
|
||||
|
||||
# Use High UID/GID Values for Container Users
|
||||
|
||||
## Context & Problem Statement
|
||||
|
||||
Docker containers share the host's user namespace by default. If container UIDs/GIDs overlap with privileged host accounts, this could lead to privilege escalation if a container escape vulnerability is exploited. Low UIDs (especially in the system user range of 100-999) are particularly risky as they often map to privileged system users on the host.
|
||||
|
||||
Our previous approach used UID/GID 101 with the `--system` flag for user creation, which falls within the system user range and could potentially overlap with critical system users on the host.
|
||||
|
||||
## Priorities & Constraints
|
||||
|
||||
* Enhance security by reducing the risk of container user namespace overlaps
|
||||
* Avoid warnings during container build related to system user ranges
|
||||
* Maintain compatibility with common Docker practices
|
||||
* Prevent potential privilege escalation in case of container escape
|
||||
|
||||
## Considered Options
|
||||
|
||||
* Option 1: Keep using low UID/GID (101) with `--system` flag
|
||||
* Option 2: Use UID/GID (1000+) without `--system` flag
|
||||
* Option 3: Use high UID/GID (10000+) without `--system` flag
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: [Option 3: Use high UID/GID (10000+) without `--system` flag]
|
||||
|
||||
We decided to:
|
||||
|
||||
1. Change the default UID/GID from 101 to 10001
|
||||
2. Remove the `--system` flag from user/group creation commands
|
||||
3. Document the security rationale for these changes
|
||||
|
||||
This approach significantly reduces the risk of UID/GID collision with host system users while avoiding build-time warnings related to system user ranges. Using a very high UID/GID (10001) provides an additional security boundary in containers where user namespaces are shared with the host.
|
||||
|
||||
### Expected Consequences
|
||||
|
||||
* Improved security posture by reducing the risk of container escapes leading to privilege escalation
|
||||
* Elimination of build-time warnings related to system user UID/GID ranges
|
||||
* Consistency with industry best practices for container security
|
||||
* No functional impact on container operation, as the internal user permissions remain the same
|
||||
|
||||
## More Information
|
||||
|
||||
* [NGINX Docker User ID Issue](https://github.com/nginxinc/docker-nginx/issues/490) - Demonstrates the risks of using UID 101 which overlaps with `systemd-network` user on Debian systems
|
||||
* [.NET Docker Issue on System Users](https://github.com/dotnet/dotnet-docker/issues/4624) - Details the problems with using `--system` flag and the SYS_UID_MAX warnings
|
||||
* [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/) - General security recommendations for Docker containers
|
|
@ -0,0 +1,51 @@
|
|||
---
|
||||
status: accepted
|
||||
date: 2025-02-28
|
||||
story: Volumes permissions and privilege management in container entrypoint
|
||||
---
|
||||
|
||||
# Use gosu for Privilege Dropping in Entrypoint
|
||||
|
||||
## Context & Problem Statement
|
||||
|
||||
Running containerized applications as the root user is a security risk. If an attacker compromises the application, they gain root access within the container, potentially facilitating a container escape. However, some operations during container startup, such as creating directories or modifying file permissions in locations not owned by the application user, require root privileges. We need a way to perform these initial setup tasks as root, but then switch to a non-privileged user *before* executing the main application (`zebrad`). Using `USER` in the Dockerfile is insufficient because it applies to the entire runtime, and we need to change permissions *after* volumes are mounted.
|
||||
|
||||
## Priorities & Constraints
|
||||
|
||||
* Minimize the security risk by running the main application (`zebrad`) as a non-privileged user.
|
||||
* Allow initial setup tasks (file/directory creation, permission changes) that require root privileges.
|
||||
* Maintain a clean and efficient entrypoint script.
|
||||
* Avoid complex signal handling and TTY issues associated with `su` and `sudo`.
|
||||
* Ensure 1:1 parity with Docker's `--user` flag behavior.
|
||||
|
||||
## Considered Options
|
||||
|
||||
* Option 1: Use `USER` directive in Dockerfile.
|
||||
* Option 2: Use `su` within the entrypoint script.
|
||||
* Option 3: Use `sudo` within the entrypoint script.
|
||||
* Option 4: Use `gosu` within the entrypoint script.
|
||||
* Option 5: Use `chroot --userspec`
|
||||
* Option 6: Use `setpriv`
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option: [Option 4: Use `gosu` within the entrypoint script]
|
||||
|
||||
We chose to use `gosu` because it provides a simple and secure way to drop privileges from root to a non-privileged user *after* performing necessary setup tasks. `gosu` avoids the TTY and signal-handling complexities of `su` and `sudo`. It's designed specifically for this use case (dropping privileges in container entrypoints) and leverages the same underlying mechanisms as Docker itself for user/group handling, ensuring consistent behavior.
|
||||
|
||||
### Expected Consequences
|
||||
|
||||
* Improved security by running `zebrad` as a non-privileged user.
|
||||
* Simplified entrypoint script compared to using `su` or `sudo`.
|
||||
* Avoidance of TTY and signal-handling issues.
|
||||
* Consistent behavior with Docker's `--user` flag.
|
||||
* No negative impact on functionality, as initial setup tasks can still be performed.
|
||||
|
||||
## More Information
|
||||
|
||||
* [gosu GitHub repository](https://github.com/tianon/gosu#why) - Explains the rationale behind `gosu` and its advantages over `su` and `sudo`.
|
||||
* [gosu usage warning](https://github.com/tianon/gosu#warning) - Highlights the core use case (stepping down from root) and potential vulnerabilities in other scenarios.
|
||||
* Alternatives considered:
|
||||
* `chroot --userspec`: While functional, it's less common and less directly suited to this specific task than `gosu`.
|
||||
* `setpriv`: A viable alternative, but `gosu` is already well-established in our workflow and offers the desired functionality with a smaller footprint than a full `util-linux` installation.
|
||||
* `su-exec`: Another minimal alternative, but it has known parser bugs that could lead to unexpected root execution.
|
|
@ -0,0 +1,115 @@
|
|||
---
|
||||
status: proposed
|
||||
date: 2025-02-28
|
||||
story: Standardize filesystem hierarchy for Zebra deployments
|
||||
---
|
||||
|
||||
# Standardize Filesystem Hierarchy: FHS vs. XDG
|
||||
|
||||
## Context & Problem Statement
|
||||
|
||||
Zebra currently has inconsistencies in its filesystem layout, particularly regarding where configuration, data, cache files, and binaries are stored. We need a standardized approach compatible with:
|
||||
|
||||
1. Traditional Linux systems.
|
||||
2. Containerized deployments (Docker).
|
||||
3. Cloud environments with stricter filesystem restrictions (e.g., Google's Container-Optimized OS).
|
||||
|
||||
We previously considered using the Filesystem Hierarchy Standard (FHS) exclusively ([Issue #3432](https://github.com/ZcashFoundation/zebra/issues/3432)). However, recent changes introduced the XDG Base Directory Specification, which offers a user-centric approach. We need to decide whether to:
|
||||
|
||||
* Adhere to FHS.
|
||||
* Adopt XDG Base Directory Specification.
|
||||
* Use a hybrid approach, leveraging the strengths of both.
|
||||
|
||||
The choice impacts how we structure our Docker images, where configuration files are located, and how users interact with Zebra in different environments.
|
||||
|
||||
## Priorities & Constraints
|
||||
|
||||
* **Security:** Minimize the risk of privilege escalation by adhering to least-privilege principles.
|
||||
* **Maintainability:** Ensure a clear and consistent filesystem layout that is easy to understand and maintain.
|
||||
* **Compatibility:** Work seamlessly across various Linux distributions, Docker, and cloud environments (particularly those with restricted filesystems like Google's Container-Optimized OS).
|
||||
* **User Experience:** Provide a predictable and user-friendly experience for locating configuration and data files.
|
||||
* **Flexibility:** Allow users to override default locations via environment variables where appropriate.
|
||||
* **Avoid Breaking Changes:** Minimize disruption to existing users and deployments, if possible.
|
||||
|
||||
## Considered Options
|
||||
|
||||
### Option 1: FHS
|
||||
|
||||
* Configuration: `/etc/zebrad/`
|
||||
* Data: `/var/lib/zebrad/`
|
||||
* Cache: `/var/cache/zebrad/`
|
||||
* Logs: `/var/log/zebrad/`
|
||||
* Binary: `/opt/zebra/bin/zebrad` or `/usr/local/bin/zebrad`
|
||||
|
||||
### Option 2: XDG Base Directory Specification
|
||||
|
||||
* Configuration: `$HOME/.config/zebrad/`
|
||||
* Data: `$HOME/.local/share/zebrad/`
|
||||
* Cache: `$HOME/.cache/zebrad/`
|
||||
* State: `$HOME/.local/state/zebrad/`
|
||||
* Binary: `$HOME/.local/bin/zebrad` or `/usr/local/bin/zebrad`
|
||||
|
||||
### Option 3: Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
|
||||
|
||||
* System-wide configuration: `/etc/zebrad/`
|
||||
* User-specific configuration: `$XDG_CONFIG_HOME/zebrad/`
|
||||
* System-wide data (read-only, shared): `/usr/share/zebrad/` (e.g., checkpoints)
|
||||
* User-specific data: `$XDG_DATA_HOME/zebrad/`
|
||||
* Cache: `$XDG_CACHE_HOME/zebrad/`
|
||||
* State: `$XDG_STATE_HOME/zebrad/`
|
||||
* Runtime: `$XDG_RUNTIME_DIR/zebrad/`
|
||||
* Binary: `/opt/zebra/bin/zebrad` (system-wide) or `$HOME/.local/bin/zebrad` (user-specific)
|
||||
|
||||
## Pros and Cons of the Options
|
||||
|
||||
### FHS
|
||||
|
||||
* **Pros:**
|
||||
* Traditional and well-understood by system administrators.
|
||||
* Clear separation of configuration, data, cache, and binaries.
|
||||
* Suitable for packaged software installations.
|
||||
|
||||
* **Cons:**
|
||||
* Less user-friendly; requires root access to modify configuration.
|
||||
* Can conflict with stricter cloud environments restricting writes to `/etc` and `/var`.
|
||||
* Doesn't handle multi-user scenarios as gracefully as XDG.
|
||||
|
||||
### XDG Base Directory Specification
|
||||
|
||||
* **Pros:**
|
||||
* User-centric: configuration and data stored in user-writable locations.
|
||||
* Better suited for containerized and cloud environments.
|
||||
* Handles multi-user scenarios gracefully.
|
||||
* Clear separation of configuration, data, cache, and state.
|
||||
|
||||
* **Cons:**
|
||||
* Less traditional; might be unfamiliar to some system administrators.
|
||||
* Requires environment variables to be set correctly.
|
||||
* Binary placement less standardized.
|
||||
|
||||
### Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
|
||||
|
||||
* **Pros:**
|
||||
* Combines strengths of FHS and XDG.
|
||||
* Allows system-wide defaults while prioritizing user-specific configurations.
|
||||
* Flexible and adaptable to different deployment scenarios.
|
||||
* Clear binary placement in `/opt`.
|
||||
|
||||
* **Cons:**
|
||||
* More complex than either FHS or XDG alone.
|
||||
* Requires careful consideration of precedence rules.
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Pending
|
||||
|
||||
## Expected Consequences
|
||||
|
||||
Pending
|
||||
|
||||
## More Information
|
||||
|
||||
* [Filesystem Hierarchy Standard (FHS) v3.0](https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html)
|
||||
* [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/)
|
||||
* [Zebra Issue #3432: Use the Filesystem Hierarchy Standard (FHS) for deployments and artifacts](https://github.com/ZcashFoundation/zebra/issues/3432)
|
||||
* [Google Container-Optimized OS: Working with the File System](https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem#working_with_the_file_system)
|
|
@ -0,0 +1,49 @@
|
|||
---
|
||||
# status and date are the only required elements. Feel free to remove the rest.
|
||||
status: {[proposed | rejected | accepted | deprecated | … | superseded by [ADR-NAME](adr-file-name.md)]}
|
||||
date: {YYYY-MM-DD when the decision was last updated}
|
||||
builds-on: {[Short Title](2021-05-15-short-title.md)}
|
||||
story: {description or link to contextual issue}
|
||||
---
|
||||
|
||||
# {short title of solved problem and solution}
|
||||
|
||||
## Context and Problem Statement
|
||||
|
||||
{2-3 sentences explaining the problem and the forces influencing the decision.}
|
||||
<!-- The language in this section is value-neutral. It is simply describing facts. -->
|
||||
|
||||
## Priorities & Constraints <!-- optional -->
|
||||
|
||||
* {List of concerns or constraints}
|
||||
* {Factors influencing the decision}
|
||||
|
||||
## Considered Options
|
||||
|
||||
* Option 1: Thing
|
||||
* Option 2: Another
|
||||
|
||||
### Pros and Cons of the Options <!-- optional -->
|
||||
|
||||
#### Option 1: {Brief description}
|
||||
|
||||
* Good, because {reason}
|
||||
* Bad, because {reason}
|
||||
|
||||
## Decision Outcome
|
||||
|
||||
Chosen option [Option 1: Thing]
|
||||
|
||||
{Clearly state the chosen option and provide justification. Reference the "Pros and Cons of the Options" section below if applicable.}
|
||||
|
||||
### Expected Consequences <!-- optional -->
|
||||
|
||||
* List of outcomes resulting from this decision
|
||||
<!-- Positive, negative, and/or neutral consequences, as long as they affect the team and project in the future. -->
|
||||
|
||||
## More Information <!-- optional -->
|
||||
|
||||
<!-- * Resources reviewed as part of making this decision -->
|
||||
<!-- * Links to any supporting documents or resources -->
|
||||
<!-- * Related PRs -->
|
||||
<!-- * Related User Journeys -->
|
Loading…
Reference in New Issue