Propose ADR 027: Deterministic Protobuf Serialization (#6979)
* Add ADR 026: Protocol Buffer Regencode * Use Regencode in ADR-020 * Apply suggestions from code review Co-authored-by: Aaron Craelius <aaron@regen.network> * Add reasoning behind omitting empty fields * Rename to ADR 026: Deterministic Protobuf Serialization * i++ * explicitely -> explicitly * Fix description of optional values * Update reasoning behind omitting default values * Fix empty -> defaut * Update docs/architecture/adr-027-deterministic-protobuf-serialization.md Co-authored-by: Amaury Martiny <amaury.martiny@protonmail.com> * Add rules to avoid duplicate encoding and specify packed fields * Add rule for variant encoding * Add surrogate pair to test UTF-8 encoding Co-authored-by: Aaron Craelius <aaron@regen.network> Co-authored-by: Amaury Martiny <amaury.martiny@protonmail.com>
This commit is contained in:
parent
bf0142b272
commit
1b9f144b9d
|
@ -52,4 +52,5 @@ Please add a entry below in your Pull Request for an ADR.
|
|||
- [ADR 023: Protocol Buffer Naming and Versioning](./adr-023-protobuf-naming.md)
|
||||
- [ADR 024: Coin Metadata](./adr-024-coin-metadata.md)
|
||||
- [ADR 025: IBC Passive Channels](./adr-025-ibc-passive-channels.md)
|
||||
- [ADR 026: IBC Client Recovery Mechanisms](./adr-026-ibc-client-recovery-mechanisms.md)
|
||||
- [ADR 026: IBC Client Recovery Mechanisms](./adr-026-ibc-client-recovery-mechanisms.md)
|
||||
- [ADR 027: Deterministic Protobuf Serialization](./adr-027-deterministic-protobuf-serialization.md)
|
||||
|
|
|
@ -8,6 +8,7 @@
|
|||
- 2020 April 30: Switch to `Any`
|
||||
- 2020 May 14: Describe public key encoding
|
||||
- 2020 June 08: Store `TxBody` and `AuthInfo` as bytes in `SignDoc`; Document `TxRaw` as broadcast and storage type.
|
||||
- 2020 August 07: Use ADR 027 for serializing `SignDoc`.
|
||||
|
||||
## Status
|
||||
|
||||
|
@ -188,8 +189,6 @@ message SignDoc {
|
|||
bytes auth_info = 2;
|
||||
string chain_id = 3;
|
||||
uint64 account_number = 4;
|
||||
// account_sequence starts at 1 rather than 0 to avoid the case where
|
||||
// the default 0 value must be omitted in protobuf serialization
|
||||
uint64 account_sequence = 5;
|
||||
}
|
||||
```
|
||||
|
@ -197,8 +196,7 @@ message SignDoc {
|
|||
In order to sign in the default mode, clients take the following steps:
|
||||
|
||||
1. Serialize `TxBody` and `AuthInfo` using any valid protobuf implementation.
|
||||
2. Create a `SignDoc` and encode it. (The only requirement of the underlying
|
||||
protobuf implementation is that fields are serialized in order).
|
||||
2. Create a `SignDoc` and serialize it using [ADR 027](./adr-027-deterministic-protobuf-serialization.md).
|
||||
3. Sign the encoded `SignDoc` bytes.
|
||||
4. Build a `TxRaw` and serialize it for broadcasting.
|
||||
|
||||
|
@ -214,8 +212,7 @@ Signature verifiers do:
|
|||
3. For each required signer:
|
||||
- Pull account number and sequence from the state.
|
||||
- Obtain the public key either from state or `AuthInfo`'s `signer_infos`.
|
||||
- Create a `SignDoc` and serialize it. Due to the simplicity of the type it
|
||||
is expected that this matches the serialization used by the signer.
|
||||
- Create a `SignDoc` and serialize it using [ADR 027](./adr-027-deterministic-protobuf-serialization.md).
|
||||
- Verify the signature at the the same list position against the serialized `SignDoc`.
|
||||
|
||||
#### `SIGN_MODE_LEGACY_AMINO`
|
||||
|
@ -423,8 +420,6 @@ message SignDocAux {
|
|||
PublicKey public_key = 2;
|
||||
string chain_id = 3;
|
||||
uint64 account_number = 4;
|
||||
// account_sequence starts at 1 rather than 0 to avoid the case where
|
||||
// the default 0 value must be omitted in protobuf serialization
|
||||
uint64 account_sequence = 5;
|
||||
}
|
||||
```
|
||||
|
|
|
@ -0,0 +1,251 @@
|
|||
# ADR 027: Deterministic Protobuf Serialization
|
||||
|
||||
## Changelog
|
||||
|
||||
- 2020-08-07: Initial Draft
|
||||
|
||||
## Status
|
||||
|
||||
Proposed
|
||||
|
||||
## Context
|
||||
|
||||
[Protobuf](https://developers.google.com/protocol-buffers/docs/proto3)
|
||||
seralization is not unique (i.e. there exist a practically unlimited number of
|
||||
valid binary representations for a protobuf document)<sup>1</sup>. For signature
|
||||
verification in Cosmos SDK, signer and verifier need to agree on the same
|
||||
serialization of a SignDoc as defined in
|
||||
[ADR-020](./adr-020-protobuf-transaction-encoding.md) without transmitting the
|
||||
serialization. This document describes a deterministic serialization scheme for
|
||||
a subset of protobuf documents, that covers this use case but can be reused in
|
||||
other cases as well.
|
||||
|
||||
## Decision
|
||||
|
||||
The following encoding scheme is proposed to be used by other ADRs.
|
||||
|
||||
### Scope
|
||||
|
||||
This ADR defines a protobuf3 serializer. The output is a valid protobuf
|
||||
serialization, such that every protobuf parser can parse it.
|
||||
|
||||
No maps are supported in version 1 due to the complexity of defining a
|
||||
derterministic serialization. This might change in future. Implementations must
|
||||
reject documents containing maps as invalid input.
|
||||
|
||||
### Serialization rules
|
||||
|
||||
The serialization is based on the
|
||||
[protobuf 3 encoding](https://developers.google.com/protocol-buffers/docs/encoding)
|
||||
with the following additions:
|
||||
|
||||
1. Fields must be serialized only once in ascending order
|
||||
2. Extra fields or any extra data must not be added
|
||||
3. [Default values](https://developers.google.com/protocol-buffers/docs/proto3#default)
|
||||
must be omitted
|
||||
4. `repeated` fields of scalar numeric types must use
|
||||
[packed encoding](https://developers.google.com/protocol-buffers/docs/encoding#packed)
|
||||
by default.
|
||||
5. Variant encoding of integers must not be longer than needed.
|
||||
|
||||
While rule number 1. and 2. should be pretty straight forward and describe the
|
||||
default behaviour of all protobuf encoders the author is aware of, the 3rd rule
|
||||
is more interesting. After a protobuf 3 deserialization you cannot differentiate
|
||||
between unset fields and fields set to the default value<sup>2</sup>. At
|
||||
serialization level however, it is possible to set the fields with an empty
|
||||
value or omitting them entirely. This is a significant difference to e.g. JSON
|
||||
where a property can be empty (`""`, `0`), `null` or undefined, leading to 3
|
||||
different documents.
|
||||
|
||||
Omitting fields set to default values is valid because the parser must assign
|
||||
the default value to fields missing in the serialization<sup>3</sup>. For scalar
|
||||
types, omitting defaults is required by the spec<sup>4</sup>. For `repeated`
|
||||
fields, not serializing them is the only way to express empty lists. Enums must
|
||||
have a first element of numeric value 0, which is the default<sup>5</sup>. And
|
||||
message fields default to unset<sup>6</sup>.
|
||||
|
||||
Omitting defaults allows for some amount of forward compatibility: users of
|
||||
newer versions of a protobuf schema produce the same serialization as users of
|
||||
older versions as long as newly added fields are not used (i.e. set to their
|
||||
default value).
|
||||
|
||||
### Implementation
|
||||
|
||||
There are three main implementation strategies, ordered from the least to the
|
||||
most custom development:
|
||||
|
||||
- **Use a protobuf serializer that follows the above rules by default.** E.g.
|
||||
[gogoproto](https://pkg.go.dev/github.com/gogo/protobuf/gogoproto) is known to
|
||||
be compliant by in most cases, but not when certain annotations such as
|
||||
`nullable = false` are used. It might also be an option to configure an
|
||||
existing serializer accordingly.
|
||||
- **Normalize default values before encoding them.** If your serializer follows
|
||||
rule 1. and 2. and allows you to explicitly unset fields for serialization,
|
||||
you can normalize default values to unset. This can be done when working with
|
||||
[protobuf.js](https://www.npmjs.com/package/protobufjs):
|
||||
|
||||
```js
|
||||
const bytes = SignDoc.encode({
|
||||
bodyBytes: body.length > 0 ? body : null, // normalize empty bytes to unset
|
||||
authInfoBytes: authInfo.length > 0 ? authInfo : null, // normalize empty bytes to unset
|
||||
chainId: chainId || null, // normalize "" to unset
|
||||
accountNumber: accountNumber || null, // normalize 0 to unset
|
||||
accountSequence: accountSequence || null, // normalize 0 to unset
|
||||
}).finish();
|
||||
```
|
||||
|
||||
- **Use a hand-written serializer for the types you need.** If none of the above
|
||||
ways works for you, you can write a serializer yourself. For SignDoc this
|
||||
would look something like this in Go, building on existing protobuf utilities:
|
||||
|
||||
```go
|
||||
if !signDoc.body_bytes.empty() {
|
||||
buf.WriteUVarInt64(0xA) // wire type and field number for body_bytes
|
||||
buf.WriteUVarInt64(signDoc.body_bytes.length())
|
||||
buf.WriteBytes(signDoc.body_bytes)
|
||||
}
|
||||
|
||||
if !signDoc.auth_info.empty() {
|
||||
buf.WriteUVarInt64(0x12) // wire type and field number for auth_info
|
||||
buf.WriteUVarInt64(signDoc.auth_info.length())
|
||||
buf.WriteBytes(signDoc.auth_info)
|
||||
}
|
||||
|
||||
if !signDoc.chain_id.empty() {
|
||||
buf.WriteUVarInt64(0x1a) // wire type and field number for chain_id
|
||||
buf.WriteUVarInt64(signDoc.chain_id.length())
|
||||
buf.WriteBytes(signDoc.chain_id)
|
||||
}
|
||||
|
||||
if signDoc.account_number != 0 {
|
||||
buf.WriteUVarInt64(0x20) // wire type and field number for account_number
|
||||
buf.WriteUVarInt(signDoc.account_number)
|
||||
}
|
||||
|
||||
if signDoc.account_sequence != 0 {
|
||||
buf.WriteUVarInt64(0x28) // wire type and field number for account_sequence
|
||||
buf.WriteUVarInt(signDoc.account_sequence)
|
||||
}
|
||||
```
|
||||
|
||||
### Test vectors
|
||||
|
||||
Given the protobuf definition `Article.proto`
|
||||
|
||||
```protobuf
|
||||
package blog;
|
||||
syntax = "proto3";
|
||||
|
||||
enum Type {
|
||||
UNSPECIFIED = 0;
|
||||
IMAGES = 1;
|
||||
NEWS = 2;
|
||||
};
|
||||
|
||||
enum Review {
|
||||
UNSPECIFIED = 0;
|
||||
ACCEPTED = 1;
|
||||
REJECTED = 2;
|
||||
};
|
||||
|
||||
message Article {
|
||||
string title = 1;
|
||||
string description = 2;
|
||||
uint64 created = 3;
|
||||
uint64 updated = 4;
|
||||
bool public = 5;
|
||||
bool promoted = 6;
|
||||
Type type = 7;
|
||||
Review review = 8;
|
||||
repeated string comments = 9;
|
||||
repeated string backlinks = 10;
|
||||
};
|
||||
```
|
||||
|
||||
serializing the values
|
||||
|
||||
```yaml
|
||||
title: "The world needs change 🌳"
|
||||
description: ""
|
||||
created: 1596806111080
|
||||
updated: 0
|
||||
public: true
|
||||
promoted: false
|
||||
type: Type.NEWS
|
||||
review: Review.UNSPECIFIED
|
||||
comments: ["Nice one", "Thank you"]
|
||||
backlinks: []
|
||||
```
|
||||
|
||||
must result in the serialization
|
||||
|
||||
```
|
||||
0a1b54686520776f726c64206e65656473206368616e676520f09f8cb318e8bebec8bc2e280138024a084e696365206f6e654a095468616e6b20796f75
|
||||
```
|
||||
|
||||
When inspecting the serialized document, you see that every second field is
|
||||
omitted:
|
||||
|
||||
```
|
||||
$ echo 0a1b54686520776f726c64206e65656473206368616e676520f09f8cb318e8bebec8bc2e280138024a084e696365206f6e654a095468616e6b20796f75 | xxd -r -p | protoc --decode_raw
|
||||
1: "The world needs change \360\237\214\263"
|
||||
3: 1596806111080
|
||||
5: 1
|
||||
7: 2
|
||||
9: "Nice one"
|
||||
9: "Thank you"
|
||||
```
|
||||
|
||||
## Consequences
|
||||
|
||||
Having such an encoding available allows us to get deterministic serialization
|
||||
for all protobuf documents we need in the context of Cosmos SDK signing.
|
||||
|
||||
### Positive
|
||||
|
||||
- Well defined rules that can be verified independent of a reference
|
||||
implementation
|
||||
- Simple enough to keep the barrier to implement transaction signing low
|
||||
- It allows us to continue to use 0 and other empty values in SignDoc, avoiding
|
||||
the need to work around 0 sequences. This does not imply the change from
|
||||
https://github.com/cosmos/cosmos-sdk/pull/6949 should not be merged, but not
|
||||
too important anymore.
|
||||
|
||||
### Negative
|
||||
|
||||
- When implementing transaction signing, the encoding rules above must be
|
||||
understood and implemented.
|
||||
- The need for rule number 3. adds some complexity to implementations.
|
||||
|
||||
### Neutral
|
||||
|
||||
## References
|
||||
|
||||
- <sup>1</sup> _When a message is serialized, there is no guaranteed order for
|
||||
how its known or unknown fields should be written. Serialization order is an
|
||||
implementation detail and the details of any particular implementation may
|
||||
change in the future. Therefore, protocol buffer parsers must be able to parse
|
||||
fields in any order._ from
|
||||
https://developers.google.com/protocol-buffers/docs/encoding#order
|
||||
- <sup>2</sup> _Note that for scalar message fields, once a message is parsed
|
||||
there's no way of telling whether a field was explicitly set to the default
|
||||
value (for example whether a boolean was set to false) or just not set at all:
|
||||
you should bear this in mind when defining your message types. For example,
|
||||
don't have a boolean that switches on some behaviour when set to false if you
|
||||
don't want that behaviour to also happen by default._ from
|
||||
https://developers.google.com/protocol-buffers/docs/proto3#default
|
||||
- <sup>3</sup> _When a message is parsed, if the encoded message does not
|
||||
contain a particular singular element, the corresponding field in the parsed
|
||||
object is set to the default value for that field._ from
|
||||
https://developers.google.com/protocol-buffers/docs/proto3#default
|
||||
- <sup>4</sup> _Also note that if a scalar message field is set to its default,
|
||||
the value will not be serialized on the wire._ from
|
||||
https://developers.google.com/protocol-buffers/docs/proto3#default
|
||||
- <sup>5</sup> _For enums, the default value is the first defined enum value,
|
||||
which must be 0._ from
|
||||
https://developers.google.com/protocol-buffers/docs/proto3#default
|
||||
- <sup>6</sup> _For message fields, the field is not set. Its exact value is
|
||||
language-dependent._ from
|
||||
https://developers.google.com/protocol-buffers/docs/proto3#default
|
||||
- Encoding rules and parts of the reasoning taken from
|
||||
[canonical-proto3 Aaron Craelius](https://github.com/regen-network/canonical-proto3)
|
Loading…
Reference in New Issue