From 5ee65cb897c94011c4a3ed7f72c34e8d45d97c67 Mon Sep 17 00:00:00 2001 From: Alexander Bezobchuk Date: Mon, 17 Feb 2020 17:10:54 +0100 Subject: [PATCH] Merge PR #5650: ADR 019 - Protocol Buffer State Encoding --- docs/architecture/README.md | 1 + .../adr-019-protobuf-state-encoding.md | 248 ++++++++++++++++++ docs/architecture/adr-template.md | 2 +- 3 files changed, 250 insertions(+), 1 deletion(-) create mode 100644 docs/architecture/adr-019-protobuf-state-encoding.md diff --git a/docs/architecture/README.md b/docs/architecture/README.md index afa60a752..09632c114 100644 --- a/docs/architecture/README.md +++ b/docs/architecture/README.md @@ -43,3 +43,4 @@ Please add a entry below in your Pull Request for an ADR. - [ADR 016: Validator Consensus Key Rotation](./adr-016-validator-consensus-key-rotation.md) - [ADR 017: Historical Header Module](./adr-017-historical-header-module.md) - [ADR 018: Extendable Voting Periods](./adr-018-extendable-voting-period.md) +- [ADR 019: Protocol Buffer State Encoding](./adr-019-protobuf-state-encoding.md) diff --git a/docs/architecture/adr-019-protobuf-state-encoding.md b/docs/architecture/adr-019-protobuf-state-encoding.md new file mode 100644 index 000000000..f43c53e81 --- /dev/null +++ b/docs/architecture/adr-019-protobuf-state-encoding.md @@ -0,0 +1,248 @@ +# ADR 019: Protocol Buffer State Encoding + +## Changelog + +- 2020 Feb 15: Initial Draft + +## Status + +Accepted + +## Context + +Currently, the Cosmos SDK utilizes [go-amino](https://github.com/tendermint/go-amino/) for binary +and JSON object encoding over the wire bringing parity between logical objects and persistence objects. + +From the Amino docs: + +> Amino is an object encoding specification. It is a subset of Proto3 with an extension for interface +> support. See the [Proto3 spec](https://developers.google.com/protocol-buffers/docs/proto3) for more +> information on Proto3, which Amino is largely compatible with (but not with Proto2). +> +> The goal of the Amino encoding protocol is to bring parity into logic objects and persistence objects. + +Amino also aims to have the following goals (not a complete list): + +- Binary bytes must be decode-able with a schema. +- Schema must be upgradeable. +- The encoder and decoder logic must be reasonably simple. + +However, we believe that Amino does not fulfill these goals completely and does not fully meet the +needs of a truly flexible cross-language and multi-client compatible encoding protocol in the Cosmos SDK. +Namely, Amino has proven to be a big pain-point in regards to supporting object serialization across +clients written in various languages while providing virtually little in the way of true backwards +compatibility and upgradeability. Furthermore, through profiling and various benchmarks, Amino has +been shown to be an extremely large performance bottleneck in the Cosmos SDK 1. This is +largely reflected in the performance of simulations and application transaction throughput. + +Thus, we need to adopt an encoding protocol that meets the following criteria for state serialization: + +- Language agnostic +- Platform agnostic +- Rich client support and thriving ecosystem +- High performance +- Minimal encoded message size +- Codegen-based over reflection-based +- Supports backward and forward compatibility + +Note, migrating away from Amino should be viewed as a two-pronged approach, state and client encoding. +This ADR focuses on state serialization in the Cosmos SDK state machine. A corresponding ADR will be +made to address client-side encoding. + +## Decision + +We will adopt [Protocol Buffers](https://developers.google.com/protocol-buffers) for serializing +persisted structured data in the Cosmos SDK while providing a clean mechanism and developer UX for +applications wishing to continue to use Amino. We will provide this mechanism by updating modules to +accept a codec interface, `Marshaler`, instead of a concrete Amino codec. Furthermore, the Cosmos SDK +will provide three concrete implementations of the `Marshaler` interface: `AminoCodec`, `ProtoCodec`, +and `HybridCodec`. + +- `AminoCodec`: Uses Amino for both binary and JSON encoding. +- `ProtoCodec`: Uses Protobuf for or both binary and JSON encoding. +- `HybridCodec`: Uses Amino for JSON encoding and Protobuf for binary encoding. + +Until the client migration landscape is fully understood and designed, modules will use a `HybridCodec` +as the concrete codec it accepts and/or extends. This means that all client JSON encoding, including +genesis state, will still use Amino. The ultimate goal will be to replace Amino JSON encoding with +Protbuf encoding and thus have modules accept and/or extend `ProtoCodec`. + +### Module Design + +Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf +migration is pretty straightforward. These modules are to simply migrate any existing types that +are encoded and persisted via their concrete Amino codec to Protobuf and have their keeper accept a +`Marshaler` that will be a `HybridCodec`. This migration is simple as things will just work as-is. + +Note, any business logic that needs to encode primitive types like `bool` or `int64` should use +[gogoprotobuf](https://github.com/gogo/protobuf) Value types. + +Example: + +```go + ts, err := gogotypes.TimestampProto(completionTime) + if err != nil { + // ... + } + + bz := cdc.MustMarshalBinaryLengthPrefixed(ts) +``` + +However, modules can vary greatly in purpose and design and so we must support the ability for modules +to be able to encode and work with interfaces (e.g. `Account` or `Content`). For these modules, they +must define their own codec interface that extends `Marshaler`. These specific interfaces are unique +to the module and will contain method contracts that know how to serialize the needed interfaces. + +Example: + +```go +// x/auth/types/codec.go + +type Codec interface { + codec.Marshaler + + MarshalAccount(acc exported.Account) ([]byte, error) + UnmarshalAccount(bz []byte) (exported.Account, error) + + MarshalAccountJSON(acc exported.Account) ([]byte, error) + UnmarshalAccountJSON(bz []byte) (exported.Account, error) +} +``` + +Note, concrete types implementing these interfaces can be defined outside the scope of the module +that defines the interface (e.g. `ModuleAccount` in `x/supply`). To handle these cases, a Protobuf +message must be defined at the application-level along with a single codec that will be passed to _all_ +modules using a `oneof` approach. + +Example: + +```protobuf +// app/codec/codec.proto + +import "third_party/proto/cosmos-proto/cosmos.proto"; +import "x/auth/types/types.proto"; +import "x/auth/vesting/types/types.proto"; +import "x/supply/types/types.proto"; + +message Account { + option (cosmos_proto.interface_type) = "*github.com/cosmos/cosmos-sdk/x/auth/exported.Account"; + + // sum defines a list of all acceptable concrete Account implementations. + oneof sum { + cosmos_sdk.x.auth.v1.BaseAccount base_account = 1; + cosmos_sdk.x.auth.vesting.v1.ContinuousVestingAccount continuous_vesting_account = 2; + cosmos_sdk.x.auth.vesting.v1.DelayedVestingAccount delayed_vesting_account = 3; + cosmos_sdk.x.auth.vesting.v1.PeriodicVestingAccount periodic_vesting_account = 4; + cosmos_sdk.x.supply.v1.ModuleAccount module_account = 5; + } + + // ... +} +``` + +```go +// app/codec/codec.go + +import ( + "github.com/cosmos/cosmos-sdk/codec" + "github.com/cosmos/cosmos-sdk/x/auth" + "github.com/cosmos/cosmos-sdk/x/supply" + authexported "github.com/cosmos/cosmos-sdk/x/auth/exported" + // ... +) + +var ( + _ auth.Codec = (*Codec)(nil) + // ... +) + +type Codec struct { + codec.Marshaler + + + amino *codec.Codec +} + +func NewAppCodec(amino *codec.Codec) *Codec { + return &Codec{Marshaler: codec.NewHybridCodec(amino), amino: amino} +} + +func (c *Codec) MarshalAccount(accI authexported.Account) ([]byte, error) { + acc := &Account{} + if err := acc.SetAccount(accI); err != nil { + return nil, err + } + + return c.Marshaler.MarshalBinaryLengthPrefixed(acc) +} + +func (c *Codec) UnmarshalAccount(bz []byte) (authexported.Account, error) { + acc := &Account{} + if err := c.Marshaler.UnmarshalBinaryLengthPrefixed(bz, acc); err != nil { + return nil, err + } + + return acc.GetAccount(), nil +} +``` + +Since the `Codec` implements `auth.Codec` (and all other required interfaces), it is passed to _all_ +the modules and satisfies all the interfaces. Now each module needing to work with interfaces will know +about all the required types. Note, the use of `interface_type` allows us to avoid a significant +amount of code boilerplate when implementing the `Codec`. + +A similar concept is to be applied for messages that contain interfaces fields. The module will +define a "base" concrete message type (e.g. `MsgSubmitProposalBase`) that the application-level codec +will extend via `oneof` (e.g. `MsgSubmitProposal`) that fulfills the required interface +(e.g. `MsgSubmitProposalI`). Note, however, the module's message handler must now switch on the +interface rather than the concrete type for this particular message. + +### Why Wasn't X Chosen Instead + +For a more complete comparison to alternative protocols, see [here](https://codeburst.io/json-vs-protocol-buffers-vs-flatbuffers-a4247f8bda6f). + +### Cap'n Proto + +While [Cap’n Proto](https://capnproto.org/) does seem like an advantageous alternative to Protobuf +due to it's native support for interfaces/generics and built in canonicalization, it does lack the +rich client ecosystem compared to Protobuf and is a bit less mature. + +### FlatBuffers + +[FlatBuffers](https://google.github.io/flatbuffers/) is also a potentially viable alternative, with the +primary difference being that FlatBuffers does not need a parsing/unpacking step to a secondary +representation before you can access data, often coupled with per-object memory allocation. + +However, it would require great efforts into research and full understanding the scope of the migration +and path forward -- which isn't immediately clear. In addition, FlatBuffers aren't designed for +untrusted inputs. + +## Future Improvements & Roadmap + +The landscape and roadmap to restructuring queriers and tx generation to fully support +Protobuf isn't fully understood yet. Once all modules are migrated, we will have a better +understanding on how to proceed with client improvements (e.g. gRPC) 2. + +## Consequences + +### Positive + +- Significant performance gains. +- Supports backward and forward type compatibility. +- Better support for cross-language clients. + +### Negative + +- Learning curve required to understand and implement Protobuf messages. +- Less flexibility in cross-module type registration. We now need to define types +at the application-level. +- Client business logic and tx generation may become a bit more complex. + +### Neutral + +{neutral consequences} + +## References + +1. https://github.com/cosmos/cosmos-sdk/issues/4977 +2. https://github.com/cosmos/cosmos-sdk/issues/5444 diff --git a/docs/architecture/adr-template.md b/docs/architecture/adr-template.md index 0638a71f4..71b67a011 100644 --- a/docs/architecture/adr-template.md +++ b/docs/architecture/adr-template.md @@ -37,4 +37,4 @@ ## References -- {reference link} \ No newline at end of file +- {reference link}