Join appendices into one file

2018-02-13 21:17:27 +01:00 · 2018-02-13 21:17:27 +01:00 · ecb1f93e19
parent cdf08ecdb7
commit ecb1f93e19
7 changed files with 102 additions and 247 deletions
--- a/docs/spec/ibc/appendix-a.md
+++ b/docs/spec/ibc/appendix-a.md
@ -1,11 +0,0 @@
-## Appendix A: Encoding Libraries
-
-([Back to table of contents](specification.md#contents))
-
-The specification has focused on semantics and functionality of the IBC protocol. However in order to facilitate the communication between multiple implementations of the protocol, we seek to define a standard syntax, or binary encoding, of the data structures defined above. Many structures are universal and for these, we provide one standard syntax. Other structures, such as _H<sub>h </sub>, U<sub>h </sub>, _and _X<sub>h</sub>_ are tied to the consensus engine and we can define the standard encoding for tendermint, but support for additional consensus engines must be added separately. Finally, there are some aspects of the messaging, such as the envelope to post this data (fees, nonce, signatures, etc.), which is different for every chain, and must be known to the relay, but are not important to the IBC algorithm itself and left undefined.
-
-In defining a standard binary encoding for all the "universal" components, we wish to make use of a standardized library, with efficient serialization and support in multiple languages. We considered two main formats: ethereum's rlp[[6](./footnotes.md#6)] and google's protobuf[[7](./footnotes.md#7)]. We decided for protobuf, as it is more widely supported, is more expressive for different data types, and supports code generation for very efficient (de)serialization codecs. It does have a learning curve and more setup to generate the code from the type specifications, but the ibc data types should not change often and this code generation setup only needs to happen once per language (and can be exposed in a common repo), so this is not a strong counter-argument. Efficiency, expressiveness, and wider support rule in its favor. It is also widely used in gRPC and in many microservice architectures.
-
-The tendermint-specific data structures are encoded with go-wire[[8](./footnotes.md#8)], the native binary encoding used inside of tendermint. Most blockchains define their own formats, and until some universal format for headers and signatures among blockchains emerge, it seems very premature to enforce any encoding here. These are defined as arbitrary byte slices in the protocol, to be parsed in an consensus engine-dependent manner.
-
-For the following appendixes, the data structure specifications will be in proto3[[9](./footnotes.md#9)] format.
--- a/docs/spec/ibc/appendix-b.md
+++ b/docs/spec/ibc/appendix-b.md
@ -1,62 +0,0 @@
-## Appendix B: IBC Queue Format
-
-([Back to table of contents](specification.md#contents))
-
-The foundational data structure of the IBC protocol are the message queues stored inside each chain. We start with a well-defined binary representation of the keys and values used in these queues. The encodings mirror the semantics defined above:
-
-_key = _(_remote id, [send|receipt], [head|tail|index])_
-
-_V<sub>send</sub> = (maxHeight, maxTime, type, data)_
-
-_V<sub>receipt</sub> = (result, [success|error code])_
-
-
-```
- message QueueName {
-  // chain_id is which chain this queue is
-  // associated with
-  string chain_id = 1;
-  enum Purpose {
-          SEND = 0;
-          RECEIPT = 1;
-  }
-  Purpose purpose = 2;
- }
- // StateKey is a key for the head/tail of a given queue
- message StateKey {
-  QueueName queue = 1;
-  // both encode into one byte with varint encoding
-  // never clash with 8 byte message indexes
-  enum State {
-          HEAD = 0;
-          TAIL = 0x7f;
-  }
-  State state = 2;
- }
- // StateValue is the type stored under a StateKey
- message StateValue {
-    fixed64 index = 1;
- }
- // MessageKey is the key for message *index* in a given queue
- message MessageKey {
-  QueueName queue = 1;
-  fixed64 index = 2;
- }
- // SendValue is stored under a MessageKey in the SEND queue
- message SendValue {
-  uint64 maxHeight = 1;
-  google.protobuf.Timestamp maxTime = 2;
-  // use kind instead of type to avoid keyword conflict
-  bytes kind = 3;
-  bytes data = 4;
- }
- // ReceiptValue is stored under a MessageKey in the RECEIPT queue
- message ReceiptValue {
-  // 0 is success, others are application-defined errors
-  int32 errorCode = 1;
-  // contains result on success, optional info on error
-  bytes data = 2;
- }
-```
-
-Keys and values are binary encoded and stored as bytes in the merkle tree in order to generate the root hash stored in the block header, which validates all proofs. They are treated as arrays of bytes by the merkle proofs for deterministically generating the sequence of hashes, and passed as such in all interchain messages. Once the validity of a key value pair has been determined from the merkle proof and header, the bytes can be deserialized and the contents interpreted by the protocol.
--- a/docs/spec/ibc/appendix-c.md
+++ b/docs/spec/ibc/appendix-c.md
@ -1,87 +0,0 @@
-## Appendix C: Merkle Proof Formats
-
-([Back to table of contents](specification.md#contents))
-
-A merkle tree (or a trie) generates one hash that can prove every element of the tree. Generating this hash starts with hashing the leaf nodes. Then hashing multiple leaf nodes together to get the hash of an inner node (two or more, based on degree k of the k-ary tree). And continue hashing together the inner nodes at each level of the tree, until it reaches a root hash. Once you have a known root hash, you can prove key/value belongs to this tree by tracing the path to the value and revealing the (k-1) hashes for all the paths we did not take on each level. If this is new to you, you can read a basic introduction[[10](./footnotes.md#10)].
-
-There are a number of different implementations of this basic idea, using different hash functions, as well as prefixes to prevent second preimage attacks (differentiating leaf nodes from inner nodes). Rather than force all chains that wish to participate in IBC to use the same data store, we provide a data structure that can represent merkle proofs from a variety of data stores, and provide for chaining proofs to allow for sub-trees. While searching for a solution, we did find the chainpoint proof format[[11](./footnotes.md#11)], which inspired this design significantly, but didn't (yet) offer the flexibility we needed.
-
-We generalize the left/right idiom to concatenating a (possibly empty) fixed prefix, the (just calculated) last hash, and a (possibly empty) fixed suffix. We must only define two fields on each level and can represent any type, even a 16-ary Patricia tree, with this structure. One must only translate from the store's native proof to this format, and it can be verified by any chain, providing compatibility for arbitrary data stores.
-
-The proof format also allows for chaining of trees, combining multiple merkle stores into a "multi-store". Many applications (such as the EVM) define a data store with a large proof size for internal use. Rather than force them to change the store (impossible), or live with huge proofs (inefficient), we provide the possibility to express merkle proofs connecting multiple subtrees. Thus, one could have one subtree for data, and a second for IBC. Each tree produces their own merkle root, and these are then hashed together to produce the root hash that is stored in the block header.
-
-A valid merkle proof for IBC must either consist of a proof of one tree, and prepend "ibc" to all key names as defined above, or use a subtree named "ibc" in the first section, and store the key names as above in the second tree.
-
-For those who wish to minimize the size of their merkle proofs, we recommend using Tendermint's IAVL+ tree implementation[[12](./footnotes.md#12)], which is designed for optimal proof size, and freely available for use. It uses an AVL tree (a type of binary tree) with ripemd160 as the hashing algorithm at each stage. This produces optimally compact proofs, ideal for posting in blockchain transactions. For a data store of _n_ values, there will be _log<sub>2</sub>(n)_ levels, each requiring one 20-byte hash for proving the branch not taken (plus possible metadata for the level). We can express a proof in a tree of 1 million elements in something around 400 bytes. If we further store all IBC messages in a separate subtree, we should expect the count of nodes in this tree to be a few thousand, and require less than 400 bytes, even for blockchains with a quite large state.
-
-```
- // HashOp is the hashing algorithm we use at each level
- enum HashOp {
-     RIPEMD160 = 0;
-     SHA224 = 1;
-     SHA256 = 2;
-     SHA384 = 3;
-     SHA512 = 4;
-     SHA3_224 = 5;
-     SHA3_256 = 6;
-     SHA3_384 = 7;
-     SHA3_512 = 8;
-     SHA256_X2 = 9;
- };
- // Op represents one hash in a chain of hashes.
- // An operation takes the output of the last level and returns
- // a hash for the next level:
- // Op(last) => Operation(prefix + last + sufix)
- //
- // A simple left/right hash would simply set prefix=left or
- // suffix=right and leave the other blank. However, one could
- // also represent the a Patricia trie proof by setting
- // prefix to the rlp encoding of all nodes before the branch
- // we select, and suffix to all those after the one we select.
- message Op {
-     bytes prefix = 1;
-     bytes suffix = 2;
-     HashOp op = 3;
- }
- // Data is the end value stored, used to generate the initial hash
- message Data {
-     bytes prefix = 1;
-     bytes key = 2;
-     bytes value = 3;
-     HashOp op = 4;
-     // If it is KeyValue, this is the data we want
-     // If it is SubTree, key is name of the tree, value is root hash
-     // Expect another branch to follow
-     enum DataType {
-         KeyValue = 0;
-         SubTree = 1;
-     }
-     DataType dataType = 5;
- }
- // Branch will hash data and then pass it through operations from
- // last to first in order to calculate the root node.
- //
- // Visualize Branch as representing the data closest to root as the
- // first item, and the leaf as the last item.
- message Branch {
-     repeated Op operations = 1;
-     Data data = 2;
- }
- // MerkleProof shows a veriable path from the data to
- // a root hash (potentially spanning multiple sub-trees).
- message MerkleProof {
-  // identify the header this is rooted in
-  string chainId = 1;
-  uint64 height = 2;
-  // this hash must match the header as well as the
-  // calculation from below
-  bytes rootHash = 3;
-  // branches start from the value, and then may
-  // include multiple subtree branches to embed it
-  //
-  // The first branch must have dataType KeyValue
-  // Following branches must have dataType SubTree
-  repeated Branch branches = 1;
- }
- ```
-
--- a/docs/spec/ibc/appendix-d.md
+++ b/docs/spec/ibc/appendix-d.md
@ -1,33 +0,0 @@
-## Appendix D: Universal IBC Packets
-
-([Back to table of contents](specification.md#contents))
-
-The structures above can be used to define standard encodings for the basic IBC transactions that must be exposed by a blockchain: _IBCreceive_, _IBCreceipt_,_ IBCtimeout_, and _IBCcleanup_. As mentioned above, these are not complete transactions to be posted as is to a blockchain, but rather the "data" content of a transaction, which must also contain fees, nonce, and signatures. The other IBC transaction types _IBCregisterChain_, _IBCupdateHeader_, and _IBCchangeValidators_ are specific to the consensus engine and use unique encodings. We define the tendermint-specific format in the next section.
-
-```
- // IBCPacket sends a proven key/value pair from an IBCQueue.
- // Depending on the type of message, we require a certain type
- // of key (MessageKey at a given height, or StateKey).
- //
- // Includes src_chain and src_height to look up the proper
- // header to verify the merkle proof.
- message IBCPacket {
-  // chain id it is coming from
-  string src_chain = 1;
-  // height for the header the proof belongs to
-  uint64 src_height = 2;
-  // the message type, which determines what key/value mean
-  enum MsgType {
-          RECEIVE = 0;
-          RECEIPT = 1;
-          TIMEOUT = 2;
-          CLEANUP = 3;
-  }
-  MsgType msgType = 3;
-  bytes key = 4;
-  bytes value = 5;
-  // the proof of the message
-  MerkleProof proof = 6;
- }
-```
-
--- a/docs/spec/ibc/appendix-e.md
+++ b/docs/spec/ibc/appendix-e.md
@ -1,49 +0,0 @@
-## Appendix E: Tendermint Header Proofs
-
-TODO: clean this all up
-
-This is a mess now, we need to figure out what formats we use, define go-wire, etc. or just point to the source???? Will do more later, need help here from the tendermint core team.
-
-In order to prove a merkle root, we must fully define the headers, signatures, and validator information returned from the Tendermint consensus engine, as well as the rules by which to verify a header. We also define here the messages used for creating and removing connections to other blockchains as well as how to handle forks.
-
-**Building Blocks: Header, PubKey, Signature, Commit, ValidatorSet**
-
-**-> needs input/support from Tendermint Core team (and go-crypto)**
-
-**Registering Chain**
-
-**Updating Header**
-
-**Validator Changes**
-
-ROOT of trust
-
-As mentioned in the definitions, all proofs are based on an original assumption. The root of trust here is either the genesis block (if it is newer than the unbonding period) or any signed header of the other chain.
-
-When governance on a pair of chain, the respective chains must agree to a root of trust on the counterparty chain. This can be the genesis block on a chain that launches with an IBC channel or a later block header.
-
-From this signed header, one can check the validator set against the validator hash stored in the header, and then verify the signatures match. This provides internal consistency and accountability, but if 5 nodes provide you different headers (eg. of forks), you must make a subjective decision which one to trust. This should be performed by on-chain governance to avoid an exploitable position of trust.
-
-VERIFYING HEADERS
-
-Once we have a trusted header with a known validator set, we can quickly validate any new header with the same validator set. To validate a new header, simply verifying that the validator hash has not changed, and that over 2/3 of the voting power in that set has properly signed a commit for that header. We can skip all intervening headers, as we have complete finality (no forks) and accountability (to punish a double-sign).
-
-This is safe as long as we have a valid signed header by the trusted validator set that is within the unbonding period for staking. In that case, if we were given a false (forked) header, we could use this as proof to slash the stake of all the double-signing validators. This demonstrates the importance of attribution and is the same security guarantee of any non-validating full node. Even in the presence of some ultra-powerful malicious actors, this makes the cost of creating a fake proof for a header equal to at least one third of all staked tokens, which should be significantly higher than any gain of a false message.
-
-UPDATING VALIDATORS SET
-
-If the validator hash is different than the trusted one, we must simultaneously both verify that if the change is valid while, as well as use using the new set to validate the header.  Since the entire validator set is not provided by default when we give a header and commit votes, this must be provided as extra data to the certifier.
-
-A validator change in Tendermint can be securely verified with the following checks:
-
-
-
-*   First, that the new header, validators, and signatures are internally consistent
-    *   We have a new set of validators that matches the hash on the new header
-    *   At least 2/3 of the voting power of the new set validates the new header
-*   Second, that the new header is also valid in the eyes of our trust set
-    *   Verify at least 2/3 of the voting power of our trusted set, which are also in the new set, properly signed a commit to the new header
-
-In that case, we can update to this header, and update the trusted validator set, with the same guarantees as above (the ability to slash at least one third of all staked tokens on any false proof).
-
-
--- a/docs/spec/ibc/appendix.md
+++ b/docs/spec/ibc/appendix.md
@ -0,0 +1,97 @@
+# Appendices
+
+([Back to table of contents](specification.md#contents))
+
+## Appendix A: Encoding Libraries
+
+The specification has focused on semantics and functionality of the IBC protocol. However in order to facilitate the communication between multiple implementations of the protocol, we seek to define a standard syntax, or binary encoding, of the data structures defined above. Many structures are universal and for these, we provide one standard syntax. Other structures, such as _H<sub>h </sub>, U<sub>h </sub>, _and _X<sub>h</sub>_ are tied to the consensus engine and we can define the standard encoding for tendermint, but support for additional consensus engines must be added separately. Finally, there are some aspects of the messaging, such as the envelope to post this data (fees, nonce, signatures, etc.), which is different for every chain, and must be known to the relay, but are not important to the IBC algorithm itself and left undefined.
+
+In defining a standard binary encoding for all the "universal" components, we wish to make use of a standardized library, with efficient serialization and support in multiple languages. We considered two main formats: ethereum's rlp[[6](./footnotes.md#6)] and google's protobuf[[7](./footnotes.md#7)]. We decided for protobuf, as it is more widely supported, is more expressive for different data types, and supports code generation for very efficient (de)serialization codecs. It does have a learning curve and more setup to generate the code from the type specifications, but the ibc data types should not change often and this code generation setup only needs to happen once per language (and can be exposed in a common repo), so this is not a strong counter-argument. Efficiency, expressiveness, and wider support rule in its favor. It is also widely used in gRPC and in many microservice architectures.
+
+The tendermint-specific data structures are encoded with go-wire[[8](./footnotes.md#8)], the native binary encoding used inside of tendermint. Most blockchains define their own formats, and until some universal format for headers and signatures among blockchains emerge, it seems very premature to enforce any encoding here. These are defined as arbitrary byte slices in the protocol, to be parsed in an consensus engine-dependent manner.
+
+For the following appendixes, the data structure specifications will be in proto3[[9](./footnotes.md#9)] format.
+
+## Appendix B: IBC Queue Format
+
+The foundational data structure of the IBC protocol are the message queues stored inside each chain. We start with a well-defined binary representation of the keys and values used in these queues. The encodings mirror the semantics defined above:
+
+_key = _(_remote id, [send|receipt], [head|tail|index])_
+
+_V<sub>send</sub> = (maxHeight, maxTime, type, data)_
+
+_V<sub>receipt</sub> = (result, [success|error code])_
+
+Keys and values are binary encoded and stored as bytes in the merkle tree in order to generate the root hash stored in the block header, which validates all proofs. They are treated as arrays of bytes by the merkle proofs for deterministically generating the sequence of hashes, and passed as such in all interchain messages. Once the validity of a key value pair has been determined from the merkle proof and header, the bytes can be deserialized and the contents interpreted by the protocol.
+
+See [binary format as protobuf specification](./protobuf/queue.proto)
+
+## Appendix C: Merkle Proof Formats
+
+A merkle tree (or a trie) generates one hash that can prove every element of the tree. Generating this hash starts with hashing the leaf nodes. Then hashing multiple leaf nodes together to get the hash of an inner node (two or more, based on degree k of the k-ary tree). And continue hashing together the inner nodes at each level of the tree, until it reaches a root hash. Once you have a known root hash, you can prove key/value belongs to this tree by tracing the path to the value and revealing the (k-1) hashes for all the paths we did not take on each level. If this is new to you, you can read a basic introduction[[10](./footnotes.md#10)].
+
+There are a number of different implementations of this basic idea, using different hash functions, as well as prefixes to prevent second preimage attacks (differentiating leaf nodes from inner nodes). Rather than force all chains that wish to participate in IBC to use the same data store, we provide a data structure that can represent merkle proofs from a variety of data stores, and provide for chaining proofs to allow for sub-trees. While searching for a solution, we did find the chainpoint proof format[[11](./footnotes.md#11)], which inspired this design significantly, but didn't (yet) offer the flexibility we needed.
+
+We generalize the left/right idiom to concatenating a (possibly empty) fixed prefix, the (just calculated) last hash, and a (possibly empty) fixed suffix. We must only define two fields on each level and can represent any type, even a 16-ary Patricia tree, with this structure. One must only translate from the store's native proof to this format, and it can be verified by any chain, providing compatibility for arbitrary data stores.
+
+The proof format also allows for chaining of trees, combining multiple merkle stores into a "multi-store". Many applications (such as the EVM) define a data store with a large proof size for internal use. Rather than force them to change the store (impossible), or live with huge proofs (inefficient), we provide the possibility to express merkle proofs connecting multiple subtrees. Thus, one could have one subtree for data, and a second for IBC. Each tree produces their own merkle root, and these are then hashed together to produce the root hash that is stored in the block header.
+
+A valid merkle proof for IBC must either consist of a proof of one tree, and prepend "ibc" to all key names as defined above, or use a subtree named "ibc" in the first section, and store the key names as above in the second tree.
+
+For those who wish to minimize the size of their merkle proofs, we recommend using Tendermint's IAVL+ tree implementation[[12](./footnotes.md#12)], which is designed for optimal proof size, and freely available for use. It uses an AVL tree (a type of binary tree) with ripemd160 as the hashing algorithm at each stage. This produces optimally compact proofs, ideal for posting in blockchain transactions. For a data store of _n_ values, there will be _log<sub>2</sub>(n)_ levels, each requiring one 20-byte hash for proving the branch not taken (plus possible metadata for the level). We can express a proof in a tree of 1 million elements in something around 400 bytes. If we further store all IBC messages in a separate subtree, we should expect the count of nodes in this tree to be a few thousand, and require less than 400 bytes, even for blockchains with a quite large state.
+
+See [binary format as protobuf specification](./protobuf/merkle.proto)
+
+## Appendix D: Universal IBC Packets
+
+The structures above can be used to define standard encodings for the basic IBC transactions that must be exposed by a blockchain: _IBCreceive_, _IBCreceipt_,_ IBCtimeout_, and _IBCcleanup_. As mentioned above, these are not complete transactions to be posted as is to a blockchain, but rather the "data" content of a transaction, which must also contain fees, nonce, and signatures. The other IBC transaction types _IBCregisterChain_, _IBCupdateHeader_, and _IBCchangeValidators_ are specific to the consensus engine and use unique encodings. We define the tendermint-specific format in the next section.
+
+See [binary format as protobuf specification](./protobuf/messages.proto)
+
+## Appendix E: Tendermint Header Proofs
+
+**TODO: clean this all up**
+
+This is a mess now, we need to figure out what formats we use, define go-wire, etc. or just point to the source???? Will do more later, need help here from the tendermint core team.
+
+In order to prove a merkle root, we must fully define the headers, signatures, and validator information returned from the Tendermint consensus engine, as well as the rules by which to verify a header. We also define here the messages used for creating and removing connections to other blockchains as well as how to handle forks.
+
+Building Blocks: Header, PubKey, Signature, Commit, ValidatorSet
+
+-> needs input/support from Tendermint Core team (and go-crypto)
+
+Registering Chain
+
+Updating Header
+
+Validator Changes
+
+**ROOT of trust**
+
+As mentioned in the definitions, all proofs are based on an original assumption. The root of trust here is either the genesis block (if it is newer than the unbonding period) or any signed header of the other chain.
+
+When governance on a pair of chain, the respective chains must agree to a root of trust on the counterparty chain. This can be the genesis block on a chain that launches with an IBC channel or a later block header.
+
+From this signed header, one can check the validator set against the validator hash stored in the header, and then verify the signatures match. This provides internal consistency and accountability, but if 5 nodes provide you different headers (eg. of forks), you must make a subjective decision which one to trust. This should be performed by on-chain governance to avoid an exploitable position of trust.
+
+**VERIFYING HEADERS**
+
+Once we have a trusted header with a known validator set, we can quickly validate any new header with the same validator set. To validate a new header, simply verifying that the validator hash has not changed, and that over 2/3 of the voting power in that set has properly signed a commit for that header. We can skip all intervening headers, as we have complete finality (no forks) and accountability (to punish a double-sign).
+
+This is safe as long as we have a valid signed header by the trusted validator set that is within the unbonding period for staking. In that case, if we were given a false (forked) header, we could use this as proof to slash the stake of all the double-signing validators. This demonstrates the importance of attribution and is the same security guarantee of any non-validating full node. Even in the presence of some ultra-powerful malicious actors, this makes the cost of creating a fake proof for a header equal to at least one third of all staked tokens, which should be significantly higher than any gain of a false message.
+
+**UPDATING VALIDATORS SET**
+
+If the validator hash is different than the trusted one, we must simultaneously both verify that if the change is valid while, as well as use using the new set to validate the header.  Since the entire validator set is not provided by default when we give a header and commit votes, this must be provided as extra data to the certifier.
+
+A validator change in Tendermint can be securely verified with the following checks:
+
+*   First, that the new header, validators, and signatures are internally consistent
+    *   We have a new set of validators that matches the hash on the new header
+    *   At least 2/3 of the voting power of the new set validates the new header
+*   Second, that the new header is also valid in the eyes of our trust set
+    *   Verify at least 2/3 of the voting power of our trusted set, which are also in the new set, properly signed a commit to the new header
+
+In that case, we can update to this header, and update the trusted validator set, with the same guarantees as above (the ability to slash at least one third of all staked tokens on any false proof).
+
+
--- a/docs/spec/ibc/specification.md
+++ b/docs/spec/ibc/specification.md
@ -36,13 +36,13 @@ The protocol makes no assumptions of block times or network delays in the transm
    1.  Handling Byzantine Failures
 1.  **[Conclusion](conclusion.md)**

-**[Appendix A: Encoding Libraries](appendix-a.md)**
+**[Appendix A: Encoding Libraries](appendix.md#appendix-a-encoding-libraries)**

-**[Appendix B: IBC Queue Format](appendix-b.md)**
+**[Appendix B: IBC Queue Format](appendix.md#appendix-b-ibc-queue-format)**

-**[Appendix C: Merkle Proof Format](appendix-c.md)**
+**[Appendix C: Merkle Proof Format](appendix.md#appendix-c-merkle-proof-formats)**

-**[Appendix D: Universal IBC Packets](appendix-d.md)**
+**[Appendix D: Universal IBC Packets](appendix.md#appendix-d-universal-ibc-packets)**

-**[Appendix E: Tendermint Header Proofs](appendix-e.md)**
+**[Appendix E: Tendermint Header Proofs](appendix.md#appendix-e-tendermint-header-proofs)**