RFC: Release process (#1063)

* rfc: initial draft of release process proposal This is a version of the notes I posted in slack for preliminary discussion, slightly reformatted to fit the Zebra RFC template. @yaahc suggested I post them as an RFC, to have a concrete proposal for discussion. * Add draft note and link to ticket and PR Co-authored-by: teor <teor@riseup.net>
2021-03-29 22:55:23 -07:00 · 2021-03-29 22:55:23 -07:00 · 0e5a6d7efd
parent 258bd881aa
commit 0e5a6d7efd
1 changed files with 197 additions and 0 deletions
--- a/book/src/dev/rfcs/drafts/xxxx-release-planning.md
+++ b/book/src/dev/rfcs/drafts/xxxx-release-planning.md
@ -0,0 +1,197 @@
 - Feature Name: `release_planning`
 - Start Date: 2020-09-14
 - Design PR: [ZcashFoundation/zebra#1063](https://github.com/ZcashFoundation/zebra/pull/1063)
 - Zebra Issue: [ZcashFoundation/zebra#1963](https://github.com/ZcashFoundation/zebra/issues/1963)
 # Draft
 Note: This is a draft Zebra RFC. See
 [ZcashFoundation/zebra#1963](https://github.com/ZcashFoundation/zebra/issues/1963)
 for more details.
 # Summary
 [summary]: #summary
 Release and distribution plans for Zebra.
 # Motivation
 [motivation]: #motivation
 We need to plan our release and distribution processes for Zebra.  Since these
 processes determine how users get Zebra and place constraints on how we do
 Zebra development, it's important to think through the implications of our
 process choices.
 # Guide-level explanation
 [guide-level-explanation]: #guide-level-explanation
 Zebra is developed *library-first*, as a collection of independently useful
 Rust crates composed together to create `zebrad`, the full node implementation.
 This means that our release and distribution processes need to handle
 distribution of the `zebra-*` libraries, as well as of `zebrad`.
 The official distribution channels are as follows:
 - `zebra-*` libraries are distributed via Cargo, using
  [crates.io](https://crates.io);
 - `zebrad` is distributed in binary form via Docker images generated in CI
  (binaries) or in source form via `cargo install`.
 The release process is controlled by pushing an appropriate tag to the
 `ZcashFoundation/zebra` git repository.
 # Reference-level explanation
 [reference-level-explanation]: #reference-level-explanation
 (This section should describe the mechanics of the release process in more
 detail, once we have agreement on distribution channels.  Currently, one
 suggestion is described and motivated below).
 # Rationale and alternatives
 [rationale-and-alternatives]: #rationale-and-alternatives
 ## Versioning
 We previously agreed on a tentative versioning policy for `zebrad` and the
 component `zebra-` libraries.  In both cases, we follow semver rules.  For
 `zebrad`, we plan to align the major version number with the NU number, so that
 mainnet NU3 corresponds to `3.x`, mainnet NU4 to `4.x`, etc.  For the `zebra-` libraries, we
 commit to following semver rules, but plan to increment the major versions as
 fast as we need to implement the features we want.
 ## Distribution Channels
 To handle releases of the component `zebra-` libraries, there's a clear best
 answer: publish the libraries to crates.io, so that they can be used by other
 Rust projects by adding an appropriate line to the `Cargo.toml`.
 For `zebrad` the situation is somewhat different, because it is an application,
 not a library.  Because Zcash is a living protocol and `zebrad` is living
 software, whatever process we choose must consider how it handles updates.
 Broadly speaking, possible processes can be divided into three categories:
 1.  Do not expect our users to update their software and do not provide them a
    means to do so;
 2.  Use an existing update / deployment mechanism to distribute our software;
 3.  Write our own update / deployment mechanism to distribute our software.
 The first category is mentioned for completeness, but we need to provide users
 with a way to update their software.  Unfortunately, this means that standalone
 binaries without an update mechanism are not a workable option for us.  The
 third category is also unfavorable, because it creates a large amount of work
 for a task that is not really the focus of our product.  This suggests that we
 focus on solutions in the second category.
 One solution in the second category is to publish Docker images.  This has a
 number of attractive features.  First, we already produce Docker images for our
 own cloud deployments, so there is little-to-no marginal effort required to
 produce these for others as a distribution mechanism.  Second, providing Docker
 images will make it easier for us to provide a collection of related software
 in the future (e.g., providing an easy-to-deploy Prometheus / Grafana instance,
 or a sidecar Tor instance).  Third, Docker has a solid upgrade story, and we
 can instruct users to use the `:latest` version of the Docker image or steer
 them to auto-update mechanisms like Watchtower.
 While this solution works well for cloud deployments, Docker is not suitable
 everywhere.  What should we do outside of Docker?  One solution would be to try
 to create packages for each platform-specific package manager (Homebrew,
 something for Windows, various different flavors of Linux distribution), but
 this creates a large amount of additional work requiring platform-specific
 knowledge.  Worse, this work cannot be outsourced to others without giving up
 control over our software distribution -- if, for instance, a third party
 creates a Homebrew package, and we recommend people install Zebra using that
 package, we're reliant on that third party to continue packaging our software
 forever, or leave our users stranded.
 Instead, we can publish `zebrad` as a Rust crate and recommend `cargo install`.
 This approach has two major downsides.  First, installation takes longer,
 because Zebra is compiled locally.  Second, as long as we have a dependency on
 `zcashconsensus`, we'll have to instruct users to install some
 platform-specific equivalent of a `build-essential` package. And as long as
 we depend on `zcash_script`, we'll have to instruct users to install `libclang`.
 However, even for
 crates such as `zcashconsensus` that build native code, the `cargo`-managed
 build process is far, far more reliable than build processes for C or C++
 projects.  We would not be asking users to run autotools or `./configure`, just
 a one-step `cargo install`.  We also know that it's possible to reliably build
 Zebra on each platform with minimal additional steps, because we do so in our
 CI.
 In contrast to these downsides, distributing `zebra` through Cargo has a number
 of upsides.  First, because we distribute our libraries using crates.io, we
 already have to manage tooling for publishing to crates.io, so there's no
 additional work required to publish `zebrad` this way.  Second, we get a
 cross-platform update mechanism with no additional work, since `cargo install`
 will upgrade to the latest published version.  Third, we don't rely on any
 third parties to mediate the relationship between us and our users, so users
 can get updates as soon as we publish them.  Fourth, unlike a system package
 manager, we can pin exact hashes of every transitive dependency (via the
 `Cargo.lock`, which `cargo install` can be configured to respect).  Fifth,
 we're positioned to pick up (or contribute to) ecosystem-wide integrity
 improvements like a transparency log for `crates.io` or work on reproducible
 builds for Rust.
 This proposal is summarized above in the [guide-level
 explanation](#guide-level-explanation).
 ## Release Processes
 The next question is what kind of release processes and automation we should
 use.  Here are two important priorities for these processes:
 1.  Reducing the friction of doing any individual release, allowing us to move
    closer to a continuous deployment model;
 2.  Reducing the risk of error in the release process.
 These are roughly in order of priority but they're clearly related, since the
 more friction we have in the release process, the greater the risk of error,
 and the greater the risk of error, the more friction we require to prevent it.
 Automation helps to reduce friction and to reduce the risk of error, but
 although it helps enormously, it has declining returns after some point (in the
 sense that automating the final 5% of the work is significantly more complex
 and error-prone than automating the first 5% of the work).  So the challenge is
 to find the right "entry point" for the automated part of the system that
 strikes the right balance.
 One possibility is the following.  CD automation is triggered by pushing a new
 tag to the git repository.  Tags are specified as `crate-semver`, e.g.
 `zebra-network-3.2.8`, `zebrad-3.9.1`, etc.  When a new tag is pushed, the CD
 automation parses the tag to determine the crate name.  If it is `zebrad`, it
 builds a new Docker image and publishes the image and the crate.  Otherwise, it
 just publishes the crate.
 To publish a new version of any component crate, the process is:
 1.  Edit the `Cargo.toml` to increment the version number;
 2.  Update `crate/CHANGELOG.md` with a few human-readable sentences describing
    changes since the last release (examples: [cl1], [cl2], [cl3]);
 3.  Submit a PR with these changes;
 4.  Tag the merge commit and push the tag to the git repo.
 [cl1]: https://github.com/ZcashFoundation/ed25519-zebra/blob/main/CHANGELOG.md
 [cl2]: https://github.com/dalek-cryptography/x25519-dalek/blob/master/CHANGELOG.md
 [cl3]: https://github.com/dalek-cryptography/curve25519-dalek/blob/master/CHANGELOG.md
 All subsequent steps (publishing to crates.io, building docker images, etc) are
 fully automated.
 Why make these choices?
 - Triggering on a tag, rather than some other trigger, ensures that each
  deployment corresponds to a particular, known, state of the source
  repository.
 - Editing the version number in the `Cargo.toml` should be done manually,
  because it is the source of truth for the crate version.
 - Changelog entries should be written by hand, rather than auto-generated from
  `git log`, because the information useful as part of a changelog is generally
  not the same as the information useful as part of commit messages.  (If this
  were not the case, changelogs would not be useful, because `git log` already
  exists).  Writing the changelog entries by hand would be a burden if we
  queued a massive set of changes between releases, but because releases are
  low-friction and we control the distribution channel, we can avoid this
  problem by releasing frequently, on a weekly or daily basis.