cosmos-sdk/docs/architecture/adr-019-protobuf-state-enco...

11 KiB
Raw Blame History

ADR 019: Protocol Buffer State Encoding

Changelog

  • 2020 Feb 15: Initial Draft
  • 2020 Feb 24: Updates to handle messages with interface fields

Status

Accepted

Context

Currently, the Cosmos SDK utilizes go-amino for binary and JSON object encoding over the wire bringing parity between logical objects and persistence objects.

From the Amino docs:

Amino is an object encoding specification. It is a subset of Proto3 with an extension for interface support. See the Proto3 spec for more information on Proto3, which Amino is largely compatible with (but not with Proto2).

The goal of the Amino encoding protocol is to bring parity into logic objects and persistence objects.

Amino also aims to have the following goals (not a complete list):

  • Binary bytes must be decode-able with a schema.
  • Schema must be upgradeable.
  • The encoder and decoder logic must be reasonably simple.

However, we believe that Amino does not fulfill these goals completely and does not fully meet the needs of a truly flexible cross-language and multi-client compatible encoding protocol in the Cosmos SDK. Namely, Amino has proven to be a big pain-point in regards to supporting object serialization across clients written in various languages while providing virtually little in the way of true backwards compatibility and upgradeability. Furthermore, through profiling and various benchmarks, Amino has been shown to be an extremely large performance bottleneck in the Cosmos SDK 1. This is largely reflected in the performance of simulations and application transaction throughput.

Thus, we need to adopt an encoding protocol that meets the following criteria for state serialization:

  • Language agnostic
  • Platform agnostic
  • Rich client support and thriving ecosystem
  • High performance
  • Minimal encoded message size
  • Codegen-based over reflection-based
  • Supports backward and forward compatibility

Note, migrating away from Amino should be viewed as a two-pronged approach, state and client encoding. This ADR focuses on state serialization in the Cosmos SDK state machine. A corresponding ADR will be made to address client-side encoding.

Decision

We will adopt Protocol Buffers for serializing persisted structured data in the Cosmos SDK while providing a clean mechanism and developer UX for applications wishing to continue to use Amino. We will provide this mechanism by updating modules to accept a codec interface, Marshaler, instead of a concrete Amino codec. Furthermore, the Cosmos SDK will provide three concrete implementations of the Marshaler interface: AminoCodec, ProtoCodec, and HybridCodec.

  • AminoCodec: Uses Amino for both binary and JSON encoding.
  • ProtoCodec: Uses Protobuf for or both binary and JSON encoding.
  • HybridCodec: Uses Amino for JSON encoding and Protobuf for binary encoding.

Until the client migration landscape is fully understood and designed, modules will use a HybridCodec as the concrete codec it accepts and/or extends. This means that all client JSON encoding, including genesis state, will still use Amino. The ultimate goal will be to replace Amino JSON encoding with Protbuf encoding and thus have modules accept and/or extend ProtoCodec.

Module Design

Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf migration is pretty straightforward. These modules are to simply migrate any existing types that are encoded and persisted via their concrete Amino codec to Protobuf and have their keeper accept a Marshaler that will be a HybridCodec. This migration is simple as things will just work as-is.

Note, any business logic that needs to encode primitive types like bool or int64 should use gogoprotobuf Value types.

Example:

  ts, err := gogotypes.TimestampProto(completionTime)
  if err != nil {
    // ...
  }

  bz := cdc.MustMarshalBinaryLengthPrefixed(ts)

However, modules can vary greatly in purpose and design and so we must support the ability for modules to be able to encode and work with interfaces (e.g. Account or Content). For these modules, they must define their own codec interface that extends Marshaler. These specific interfaces are unique to the module and will contain method contracts that know how to serialize the needed interfaces.

Example:

// x/auth/types/codec.go

type Codec interface {
  codec.Marshaler

  MarshalAccount(acc exported.Account) ([]byte, error)
  UnmarshalAccount(bz []byte) (exported.Account, error)

  MarshalAccountJSON(acc exported.Account) ([]byte, error)
  UnmarshalAccountJSON(bz []byte) (exported.Account, error)
}

Note, concrete types implementing these interfaces can be defined outside the scope of the module that defines the interface (e.g. ModuleAccount in x/supply). To handle these cases, a Protobuf message must be defined at the application-level along with a single codec that will be passed to all modules using a oneof approach.

Example:

// app/codec/codec.proto

import "third_party/proto/cosmos-proto/cosmos.proto";
import "x/auth/types/types.proto";
import "x/auth/vesting/types/types.proto";
import "x/supply/types/types.proto";

message Account {
  option (cosmos_proto.interface_type) = "*github.com/cosmos/cosmos-sdk/x/auth/exported.Account";

  // sum defines a list of all acceptable concrete Account implementations.
  oneof sum {
    cosmos_sdk.x.auth.v1.BaseAccount                      base_account               = 1;
    cosmos_sdk.x.auth.vesting.v1.ContinuousVestingAccount continuous_vesting_account = 2;
    cosmos_sdk.x.auth.vesting.v1.DelayedVestingAccount    delayed_vesting_account    = 3;
    cosmos_sdk.x.auth.vesting.v1.PeriodicVestingAccount   periodic_vesting_account   = 4;
    cosmos_sdk.x.supply.v1.ModuleAccount                  module_account             = 5;
  }

  // ...
}
// app/codec/codec.go

type Codec struct {
  codec.Marshaler


  amino *codec.Codec
}

func NewAppCodec(amino *codec.Codec) *Codec {
  return &Codec{Marshaler: codec.NewHybridCodec(amino), amino: amino}
}

func (c *Codec) MarshalAccount(accI authexported.Account) ([]byte, error) {
  acc := &Account{}
  if err := acc.SetAccount(accI); err != nil {
    return nil, err
  }

  return c.Marshaler.MarshalBinaryLengthPrefixed(acc)
}

func (c *Codec) UnmarshalAccount(bz []byte) (authexported.Account, error) {
  acc := &Account{}
  if err := c.Marshaler.UnmarshalBinaryLengthPrefixed(bz, acc); err != nil {
    return nil, err
  }

  return acc.GetAccount(), nil
}

Since the Codec implements auth.Codec (and all other required interfaces), it is passed to all the modules and satisfies all the interfaces. Now each module needing to work with interfaces will know about all the required types. Note, the use of interface_type allows us to avoid a significant amount of code boilerplate when implementing the Codec.

A similar concept is to be applied for messages that contain interfaces fields. The module will define a "base" concrete message type that the application-level codec will extend via oneof that fulfills the required message interface.

Example:

The MsgSubmitEvidence defined by the x/evidence module contains a field Evidence which is an interface.

type MsgSubmitEvidence struct {
  Evidence  exported.Evidence
  Submitter sdk.AccAddress
}

Instead, we will implement a "base" message type and an interface which the concrete message type must implement.

// x/evidence/types/types.proto

message MsgSubmitEvidenceBase {
  bytes submitter = 1
    [
      (gogoproto.casttype) = "github.com/cosmos/cosmos-sdk/types.AccAddress"
    ];
}
// x/evidence/exported/evidence.go

type MsgSubmitEvidence interface {
  sdk.Msg

  GetEvidence() Evidence
  GetSubmitter() sdk.AccAddress
}

Notice the MsgSubmitEvidence interface extends sdk.Msg and allows for the Evidence interface to be retrieved from the concrete message type.

Now, the application-level codec will define the concrete MsgSubmitEvidence type and will have it fulfill the MsgSubmitEvidence interface defined by x/evidence.

// app/codec/codec.proto

message Evidence {
  option (gogoproto.equal)             = true;
  option (cosmos_proto.interface_type) = "github.com/cosmos/cosmos-sdk/x/evidence/exported.Evidence";

  oneof sum {
    cosmos_sdk.x.evidence.v1.Equivocation equivocation = 1;
  }
}

message MsgSubmitEvidence {
  option (gogoproto.equal)           = true;
  option (gogoproto.goproto_getters) = false;

  Evidence                                       evidence = 1;
  cosmos_sdk.x.evidence.v1.MsgSubmitEvidenceBase base     = 2
    [
      (gogoproto.nullable) = false,
      (gogoproto.embed)    = true
    ];
}
// app/codec/msgs.go

func (msg MsgSubmitEvidence) GetEvidence() eviexported.Evidence {
  return msg.Evidence.GetEvidence()
}

func (msg MsgSubmitEvidence) GetSubmitter() sdk.AccAddress {
  return msg.Submitter
}

Note, however, the module's message handler must now handle the interface MsgSubmitEvidence in addition to any concrete types.

Why Wasn't X Chosen Instead

For a more complete comparison to alternative protocols, see here.

Cap'n Proto

While Capn Proto does seem like an advantageous alternative to Protobuf due to it's native support for interfaces/generics and built in canonicalization, it does lack the rich client ecosystem compared to Protobuf and is a bit less mature.

FlatBuffers

FlatBuffers is also a potentially viable alternative, with the primary difference being that FlatBuffers does not need a parsing/unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation.

However, it would require great efforts into research and full understanding the scope of the migration and path forward -- which isn't immediately clear. In addition, FlatBuffers aren't designed for untrusted inputs.

Future Improvements & Roadmap

The landscape and roadmap to restructuring queriers and tx generation to fully support Protobuf isn't fully understood yet. Once all modules are migrated, we will have a better understanding on how to proceed with client improvements (e.g. gRPC) 2.

Consequences

Positive

  • Significant performance gains.
  • Supports backward and forward type compatibility.
  • Better support for cross-language clients.

Negative

  • Learning curve required to understand and implement Protobuf messages.
  • Less flexibility in cross-module type registration. We now need to define types at the application-level.
  • Client business logic and tx generation may become a bit more complex.

Neutral

{neutral consequences}

References

  1. https://github.com/cosmos/cosmos-sdk/issues/4977
  2. https://github.com/cosmos/cosmos-sdk/issues/5444