11 KiB
ADR 019: Protocol Buffer State Encoding
Changelog
- 2020 Feb 15: Initial Draft
- 2020 Feb 24: Updates to handle messages with interface fields
Status
Accepted
Context
Currently, the Cosmos SDK utilizes go-amino for binary and JSON object encoding over the wire bringing parity between logical objects and persistence objects.
From the Amino docs:
Amino is an object encoding specification. It is a subset of Proto3 with an extension for interface support. See the Proto3 spec for more information on Proto3, which Amino is largely compatible with (but not with Proto2).
The goal of the Amino encoding protocol is to bring parity into logic objects and persistence objects.
Amino also aims to have the following goals (not a complete list):
- Binary bytes must be decode-able with a schema.
- Schema must be upgradeable.
- The encoder and decoder logic must be reasonably simple.
However, we believe that Amino does not fulfill these goals completely and does not fully meet the needs of a truly flexible cross-language and multi-client compatible encoding protocol in the Cosmos SDK. Namely, Amino has proven to be a big pain-point in regards to supporting object serialization across clients written in various languages while providing virtually little in the way of true backwards compatibility and upgradeability. Furthermore, through profiling and various benchmarks, Amino has been shown to be an extremely large performance bottleneck in the Cosmos SDK 1. This is largely reflected in the performance of simulations and application transaction throughput.
Thus, we need to adopt an encoding protocol that meets the following criteria for state serialization:
- Language agnostic
- Platform agnostic
- Rich client support and thriving ecosystem
- High performance
- Minimal encoded message size
- Codegen-based over reflection-based
- Supports backward and forward compatibility
Note, migrating away from Amino should be viewed as a two-pronged approach, state and client encoding. This ADR focuses on state serialization in the Cosmos SDK state machine. A corresponding ADR will be made to address client-side encoding.
Decision
We will adopt Protocol Buffers for serializing
persisted structured data in the Cosmos SDK while providing a clean mechanism and developer UX for
applications wishing to continue to use Amino. We will provide this mechanism by updating modules to
accept a codec interface, Marshaler
, instead of a concrete Amino codec. Furthermore, the Cosmos SDK
will provide three concrete implementations of the Marshaler
interface: AminoCodec
, ProtoCodec
,
and HybridCodec
.
AminoCodec
: Uses Amino for both binary and JSON encoding.ProtoCodec
: Uses Protobuf for or both binary and JSON encoding.HybridCodec
: Uses Amino for JSON encoding and Protobuf for binary encoding.
Until the client migration landscape is fully understood and designed, modules will use a HybridCodec
as the concrete codec it accepts and/or extends. This means that all client JSON encoding, including
genesis state, will still use Amino. The ultimate goal will be to replace Amino JSON encoding with
Protbuf encoding and thus have modules accept and/or extend ProtoCodec
.
Module Design
Modules that do not require the ability to work with and serialize interfaces, the path to Protobuf
migration is pretty straightforward. These modules are to simply migrate any existing types that
are encoded and persisted via their concrete Amino codec to Protobuf and have their keeper accept a
Marshaler
that will be a HybridCodec
. This migration is simple as things will just work as-is.
Note, any business logic that needs to encode primitive types like bool
or int64
should use
gogoprotobuf Value types.
Example:
ts, err := gogotypes.TimestampProto(completionTime)
if err != nil {
// ...
}
bz := cdc.MustMarshalBinaryLengthPrefixed(ts)
However, modules can vary greatly in purpose and design and so we must support the ability for modules
to be able to encode and work with interfaces (e.g. Account
or Content
). For these modules, they
must define their own codec interface that extends Marshaler
. These specific interfaces are unique
to the module and will contain method contracts that know how to serialize the needed interfaces.
Example:
// x/auth/types/codec.go
type Codec interface {
codec.Marshaler
MarshalAccount(acc exported.Account) ([]byte, error)
UnmarshalAccount(bz []byte) (exported.Account, error)
MarshalAccountJSON(acc exported.Account) ([]byte, error)
UnmarshalAccountJSON(bz []byte) (exported.Account, error)
}
Note, concrete types implementing these interfaces can be defined outside the scope of the module
that defines the interface (e.g. ModuleAccount
in x/supply
). To handle these cases, a Protobuf
message must be defined at the application-level along with a single codec that will be passed to all
modules using a oneof
approach.
Example:
// app/codec/codec.proto
import "third_party/proto/cosmos-proto/cosmos.proto";
import "x/auth/types/types.proto";
import "x/auth/vesting/types/types.proto";
import "x/supply/types/types.proto";
message Account {
option (cosmos_proto.interface_type) = "*github.com/cosmos/cosmos-sdk/x/auth/exported.Account";
// sum defines a list of all acceptable concrete Account implementations.
oneof sum {
cosmos_sdk.x.auth.v1.BaseAccount base_account = 1;
cosmos_sdk.x.auth.vesting.v1.ContinuousVestingAccount continuous_vesting_account = 2;
cosmos_sdk.x.auth.vesting.v1.DelayedVestingAccount delayed_vesting_account = 3;
cosmos_sdk.x.auth.vesting.v1.PeriodicVestingAccount periodic_vesting_account = 4;
cosmos_sdk.x.supply.v1.ModuleAccount module_account = 5;
}
// ...
}
// app/codec/codec.go
type Codec struct {
codec.Marshaler
amino *codec.Codec
}
func NewAppCodec(amino *codec.Codec) *Codec {
return &Codec{Marshaler: codec.NewHybridCodec(amino), amino: amino}
}
func (c *Codec) MarshalAccount(accI authexported.Account) ([]byte, error) {
acc := &Account{}
if err := acc.SetAccount(accI); err != nil {
return nil, err
}
return c.Marshaler.MarshalBinaryLengthPrefixed(acc)
}
func (c *Codec) UnmarshalAccount(bz []byte) (authexported.Account, error) {
acc := &Account{}
if err := c.Marshaler.UnmarshalBinaryLengthPrefixed(bz, acc); err != nil {
return nil, err
}
return acc.GetAccount(), nil
}
Since the Codec
implements auth.Codec
(and all other required interfaces), it is passed to all
the modules and satisfies all the interfaces. Now each module needing to work with interfaces will know
about all the required types. Note, the use of interface_type
allows us to avoid a significant
amount of code boilerplate when implementing the Codec
.
A similar concept is to be applied for messages that contain interfaces fields. The module will
define a "base" concrete message type that the application-level codec will extend via oneof
that
fulfills the required message interface.
Example:
The MsgSubmitEvidence
defined by the x/evidence
module contains a field Evidence
which is an
interface.
type MsgSubmitEvidence struct {
Evidence exported.Evidence
Submitter sdk.AccAddress
}
Instead, we will implement a "base" message type and an interface which the concrete message type must implement.
// x/evidence/types/types.proto
message MsgSubmitEvidenceBase {
bytes submitter = 1
[
(gogoproto.casttype) = "github.com/cosmos/cosmos-sdk/types.AccAddress"
];
}
// x/evidence/exported/evidence.go
type MsgSubmitEvidence interface {
sdk.Msg
GetEvidence() Evidence
GetSubmitter() sdk.AccAddress
}
Notice the MsgSubmitEvidence
interface extends sdk.Msg
and allows for the Evidence
interface
to be retrieved from the concrete message type.
Now, the application-level codec will define the concrete MsgSubmitEvidence
type and will have it
fulfill the MsgSubmitEvidence
interface defined by x/evidence
.
// app/codec/codec.proto
message Evidence {
option (gogoproto.equal) = true;
option (cosmos_proto.interface_type) = "github.com/cosmos/cosmos-sdk/x/evidence/exported.Evidence";
oneof sum {
cosmos_sdk.x.evidence.v1.Equivocation equivocation = 1;
}
}
message MsgSubmitEvidence {
option (gogoproto.equal) = true;
option (gogoproto.goproto_getters) = false;
Evidence evidence = 1;
cosmos_sdk.x.evidence.v1.MsgSubmitEvidenceBase base = 2
[
(gogoproto.nullable) = false,
(gogoproto.embed) = true
];
}
// app/codec/msgs.go
func (msg MsgSubmitEvidence) GetEvidence() eviexported.Evidence {
return msg.Evidence.GetEvidence()
}
func (msg MsgSubmitEvidence) GetSubmitter() sdk.AccAddress {
return msg.Submitter
}
Note, however, the module's message handler must now handle the interface MsgSubmitEvidence
in
addition to any concrete types.
Why Wasn't X Chosen Instead
For a more complete comparison to alternative protocols, see here.
Cap'n Proto
While Cap’n Proto does seem like an advantageous alternative to Protobuf due to it's native support for interfaces/generics and built in canonicalization, it does lack the rich client ecosystem compared to Protobuf and is a bit less mature.
FlatBuffers
FlatBuffers is also a potentially viable alternative, with the primary difference being that FlatBuffers does not need a parsing/unpacking step to a secondary representation before you can access data, often coupled with per-object memory allocation.
However, it would require great efforts into research and full understanding the scope of the migration and path forward -- which isn't immediately clear. In addition, FlatBuffers aren't designed for untrusted inputs.
Future Improvements & Roadmap
The landscape and roadmap to restructuring queriers and tx generation to fully support Protobuf isn't fully understood yet. Once all modules are migrated, we will have a better understanding on how to proceed with client improvements (e.g. gRPC) 2.
Consequences
Positive
- Significant performance gains.
- Supports backward and forward type compatibility.
- Better support for cross-language clients.
Negative
- Learning curve required to understand and implement Protobuf messages.
- Less flexibility in cross-module type registration. We now need to define types at the application-level.
- Client business logic and tx generation may become a bit more complex.
Neutral
{neutral consequences}