This has a test that the serialization implementation round trips
correctly, but is very much a work in progress. Issues with this code
include:
The deserialization logic for message headers is somewhat ugly because
of a lack of a convenient idiom for reading a few bytes at a time into a
fixed-size array. Perhaps this could be fixed with an extension trait,
or avoided altogether by using `u32`s instead of `[u8; 4]`, even when
the latter is the "morally correct type".
Deserialization does an extra allocation, copying the body into a
temporary buffer. This avoids two problems: 1) capping the number of
bytes that can be read by the `Read`er passed into the body parser and
2) allows making two passes over the body data, one to parse it and one
to compute the checksum.
We could avoid making two passes over the body by computing the checksum
simultaneously with the parsing. A convenient way to do this would be to
define a
```
struct ChecksumReader<R: Read> {
inner: R,
digest: Sha256,
}
impl<R: Read> Read for ChecksumReader<R> { /* ... */ }
```
and implement `Read` for `ChecksumReader` to forward reads from the
inner reader, copying data into the digest object as it does so. (It
could also have a maximum length to enforce that we do not read past the
nominal end of the body).
A similar `ChecksumWriter` could be used during serialization, although
because the checksum is at the beginning rather than the end of the
message it does not help avoid an allocation there. It could also be
generic over a `Digest` implementation, although because we need a
truncated double-SHA256 we would have to write a custom `Digest`
implementation, so this is probably not worthwhile unless we have other
checksum types.
Finally, this does very minimal testing -- just round-trip serialization
on a single message. It would be good to build in support for
property-based testing, probably using `proptest`; if we could define
generation and shrinking strategies for every component type of every
message, we could do strong randomized testing of the serialization.
The core serialization logic is now in zebra-chain and consists of two
pairs of traits:
These are analogues of the Serde `Serialize` and `Deserialize` traits,
but explicitly intended for consensus-critical serialization formats.
Thus some struct `Foo` may have derived `Serialize` and `Deserialize`
implementations for (internal) use with Serde, and explicitly-written
`ZcashSerialize` and `ZcashDeserialize` implementations for use in
consensus-critical contexts. The consensus-critical implementations
provide `zcash`-prefixed `zcash_serialize` and `zcash_deserialize`
methods to make it clear in client contexts that the serialization is
consensus-critical.
These are utility traits, analogous to the `ReadBytesExt` and
`WriteBytesExt` traits provided by `byteorder`. A generic
implementation is provided for any `io::Read` or `io::Write`, so that
bringing the traits into scope adds additional Zcash-specific traits to
generic readers and writers -- for instance, writing a `u64` in the
Bitcoin "CompactSize" format.
Currently these just have write_compactsize and read_compactsize methods which
allow reading and writing u64s to any `Read` or `Write` implementation using
the Bitcoin "CompactSize" variable integer encoding.
These methods read and write u64s rather than defining a new `CompactSize`
type, because the `CompactSize` is just an encoding detail, not a different
type with any distinct meaning.
The `NetworkAddress` type was a `(Services, SocketAddr)` pair as used in the
`version` handshake message, described as the `net_addr` struct in the Bitcoin
wiki protocol documentation. However, all of the other uses of the `net_addr`
struct are a `(Timestamp, Services, SocketAddr)` pair (where the timestamp is
the last-seen time of the peer), and the timestamp is omitted only during the
`version` messages, which are used only during the handshake, so it seems
better to include the timestamp field and omit it during serialization of
`version` packets.