2018-02-13 09:11:10 -08:00
|
|
|
## 4 Optimizations
|
|
|
|
|
|
|
|
([Back to table of contents](specification.md#contents))
|
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
The above sections describe a secure messaging protocol that can handle all
|
|
|
|
normal situations between two blockchains. It guarantees that all messages are
|
|
|
|
processed exactly once and in order, and provides a mechanism for non-blocking
|
|
|
|
atomic transactions spanning two blockchains. However, to increase efficiency
|
|
|
|
over millions of messages with many possible failure modes on both sides of the
|
|
|
|
connection, we can extend the protocol. These extensions allow us to clean up
|
|
|
|
the receipt queue to avoid state bloat, as well as more gracefully recover from
|
|
|
|
cases where large numbers of messages are not being relayed, or other failure
|
|
|
|
modes in the remote chain.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
### 4.1 Timeouts
|
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
Sometimes it is desirable to have some timeout, an upper limit to how long you
|
|
|
|
will wait for a transaction to be processed before considering it an error. At
|
|
|
|
the same time, this is an obvious attack vector for a double spend, just
|
|
|
|
delaying the relay of the receipt or waiting to send the message in the first
|
|
|
|
place and then relaying it right after the cutoff to take advantage of
|
|
|
|
different local clocks on the two chains.
|
|
|
|
|
|
|
|
One solution to this is to include a timeout in the IBC message itself. When
|
|
|
|
sending it, one can specify a block height or timestamp on the **receiving**
|
|
|
|
chain after which it is no longer valid. If the message is posted before the
|
|
|
|
cutoff, it will be processed normally. If it is posted after that cutoff, it
|
|
|
|
will be a guaranteed error. Note that to make this secure, the timeout must be
|
|
|
|
relative to a condition on the **receiving** chain, and the sending chain must
|
|
|
|
have proof of the state of the receiving chain after the cutoff.
|
|
|
|
|
|
|
|
For a sending chain _A_ and a receiving chain _B_, with _k=(\_, \_, i)_ for
|
|
|
|
_A:q<sub>B.send</sub>_ or _B:q<sub>A.receipt</sub>_ we currently have the
|
|
|
|
following guarantees:
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
_A:M<sub>k,v,h</sub> =_ ∅ _if message i was not sent before height h_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_A:M<sub>k,v,h</sub> =_ ∅ _if message i was sent and receipt received
|
|
|
|
before height h (and the receipts for all messages j < i were also handled)_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_A:M<sub>k,v,h </sub>_ ≠ ∅ _otherwise (message result is not yet
|
|
|
|
processed)_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
_B:M<sub>k,v,h</sub> =_ ∅ _if message i was not received before height h_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_B:M<sub>k,v,h </sub>_ ≠ ∅ _if message i was received before height
|
|
|
|
h (and all messages j < i were received)_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
Based on these guarantees, we can make a few modifications of the above
|
|
|
|
protocol to allow us to prove timeouts, by adding some fields to the messages
|
|
|
|
in the send queue, and defining an expired function that returns true iff
|
|
|
|
_h > maxHeight_ or _timestamp(H<sub>h </sub>) > maxTime_.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
_V<sub>send</sub> = (maxHeight, maxTime, type, data)_
|
|
|
|
|
2018-02-13 12:10:30 -08:00
|
|
|
_expired(H<sub>h </sub>,V<sub>send </sub>)_ ⇒ _[true|false]_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
We then update message handling in _IBCreceive_, so it doesn't even call the
|
|
|
|
handler function if the timeout was reached, but rather directly writes and
|
|
|
|
error in the receipt queue:
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
_IBCreceive:_
|
2018-02-13 12:05:23 -08:00
|
|
|
* _expired(latestHeader, v)_ ⇒ _push(q<sub>S.receipt </sub>, (None, TimeoutError)),_
|
|
|
|
* _v = (\_, \_, type, data)_ ⇒ _(result, err) := f<sub>type</sub>(data); push(q<sub>S.receipt </sub>, (result, err));_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
and add a new _IBCtimeout_ function to accept tail proofs to demonstrate that
|
|
|
|
the message was not processed at some given header on the recipient chain. This
|
|
|
|
allows the sender chain to assert timeouts locally.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
_S:IBCtimeout(A, M<sub>k,v,h</sub>)_ ⇒ _match_
|
|
|
|
* _q<sub>A.send</sub> =_ ∅ ⇒ _Error("unregistered sender"),_
|
|
|
|
* _k = (\_, send, \_)_ ⇒ _Error("must be a receipt"),_
|
2018-02-26 01:48:12 -08:00
|
|
|
* _k = (d, \_, \_) and d_ ≠ _S_ ⇒ _Error("sent to a different
|
|
|
|
chain"),_
|
|
|
|
* _H<sub>h</sub>_ ∉ _T<sub>A</sub>_ ⇒ _Error("must submit header
|
|
|
|
for height h"),_
|
|
|
|
* _not valid(H<sub>h</sub> , M<sub>k,v,h </sub>)_ ⇒ _Error("invalid
|
|
|
|
merkle proof"),_
|
2018-02-13 12:05:23 -08:00
|
|
|
* _k = (S, receipt, tail)_ ⇒ _match_
|
2018-02-26 01:48:12 -08:00
|
|
|
* _tail_ ≥ _head(q<sub>S.send </sub>)_ ⇒ _Error("receipt
|
|
|
|
exists, no timeout proof")_
|
|
|
|
* _not expired(peek(q<sub>S.send </sub>))_ ⇒ _Error("message timeout
|
|
|
|
not yet reached")_
|
|
|
|
* _default_ ⇒ _(\_, \_, type, data) := pop(q<sub>S.send </sub>);
|
|
|
|
rollback<sub>type</sub>(data); Success_
|
2018-02-13 12:05:23 -08:00
|
|
|
* _default_ ⇒ _Error("must be a tail proof")_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
which processes timeouts in order, and adds one more condition to the queues:
|
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_A:M<sub>k,v,h</sub> =_ ∅ _if message i was sent and timeout proven
|
|
|
|
before height h (and the receipts for all messages j < i were also handled)_
|
|
|
|
|
|
|
|
Now chain A can rollback all transactions that were blocked by this flood of
|
|
|
|
unrelayed messages, without waiting for chain B to process them and return a
|
|
|
|
receipt. Adding reasonable timeouts to all packets allows us to gracefully
|
|
|
|
handle any errors with the IBC relay processes, or a flood of unrelayed "spam"
|
|
|
|
IBC packets. If a blockchain requires a timeout on all messages, and imposes
|
|
|
|
some reasonable upper limit (or just assigns it automatically), we can
|
|
|
|
guarantee that if message _i_ is not processed by the upper limit of the
|
|
|
|
timeout period, then all previous messages must also have either been processed
|
|
|
|
or reached the timeout period.
|
|
|
|
|
|
|
|
Note that in order to avoid any possible "double-spend" attacks, the timeout
|
|
|
|
algorithm requires that the destination chain is running and reachable. One can
|
|
|
|
prove nothing in a complete network partition, and must wait to connect; the
|
|
|
|
timeout must be proven on the recipient chain, not simply the absence of a
|
|
|
|
response on the sending chain.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
### 4.2 Clean up
|
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
While we clean up the _send queue_ upon getting a receipt, if left to run
|
|
|
|
indefinitely, the _receipt queues_ could grow without limit and create a major
|
|
|
|
storage requirement for the chains. However, we must not delete receipts until
|
|
|
|
they have been proven to be processed by the sending chain, or we lose
|
|
|
|
important information and sacrifice reliability.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
The observant reader may also notice, that when we perform the timeout on the
|
|
|
|
sending chain, we do not update the _receipt queue_ on the receiving chain, and
|
|
|
|
now it is blocked waiting for a message _i_, which **no longer exists** on the
|
|
|
|
sending chain. We can update the guarantees of the receipt queue as follows to
|
|
|
|
allow us to handle both:
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
_B:M<sub>k,v,h</sub> =_ ∅ _if message i was not received before height h_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_B:M<sub>k,v,h</sub> =_ ∅ _if message i was provably resolved on the
|
|
|
|
sending chain before height h_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
_B:M<sub>k,v,h </sub>_ ≠ ∅ _otherwise (if message i was processed
|
|
|
|
before height h, and no ack of receipt from the sending chain)_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
Consider a connection where many messages have been sent, and their receipts
|
|
|
|
processed on the sending chain, either explicitly or through a timeout. We wish
|
|
|
|
to quickly advance over all the processed messages, either for a normal
|
|
|
|
cleanup, or to prepare the queue for normal use again after timeouts.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
Through the definition of the send queue above, we see that all messages
|
|
|
|
_i < head_ have been fully processed, and all messages _head <= i < tail_ are
|
|
|
|
awaiting processing. By proving a much advanced _head_ of the _send queue_, we
|
|
|
|
can demonstrate that the sending chain already handled all messages. Thus, we
|
|
|
|
can safely advance our local _receipt queue_ to the new head of the remote
|
|
|
|
_send queue_.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
_S:IBCcleanup(A, M<sub>k,v,h</sub>)_ ⇒ _match_
|
|
|
|
* _q<sub>A.receipt</sub> =_ ∅ ⇒ _Error("unknown sender"),_
|
|
|
|
* _k = (\_, send, \_)_ ⇒ _Error("must be for the send queue"),_
|
2018-02-26 01:48:12 -08:00
|
|
|
* _k = (d, \_, \_) and d_ ≠ _S_ ⇒ _Error("sent to a different
|
|
|
|
chain"),_
|
|
|
|
* _k_ ≠ _(\_, \_, head)_ ⇒ _Error("Need a proof of the head of
|
|
|
|
the queue"),_
|
|
|
|
* _H<sub>h</sub>_ ∉ _T<sub>A</sub>_ ⇒ _Error("must submit header
|
|
|
|
for height h"),_
|
|
|
|
* _not valid(H<sub>h</sub> ,M<sub>k,v,h </sub>)_ ⇒ _Error("invalid
|
|
|
|
merkle proof"),_
|
2018-02-13 12:05:23 -08:00
|
|
|
* _head := v_ ⇒ _match_
|
2018-02-26 01:48:12 -08:00
|
|
|
* _head <= head(q<sub>A.receipt</sub>)_ ⇒ _Error("cleanup must go
|
|
|
|
forward"),_
|
2018-02-13 12:05:23 -08:00
|
|
|
* _default_ ⇒ _advance(q<sub>A.receipt </sub>, head); Success_
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
This allows us to invoke the _IBCcleanup_ function to resolve all outstanding
|
|
|
|
messages up to and including _head_ with one merkle proof. Note that this
|
|
|
|
handles both recovering from a blocked queue after timeouts, as well as a
|
|
|
|
routine cleanup method to recover space. In the cleanup scenario, we assume
|
|
|
|
that there may also be a number of messages that have been processed by the
|
|
|
|
receiving chain, but not yet posted to the sending chain,
|
|
|
|
_tail(B:q<sub>A.reciept </sub>) > head(A:q<sub>B.send </sub>)_. As such, the
|
|
|
|
_advance_ function must not modify any messages between the head and the tail.
|
2018-02-13 09:11:10 -08:00
|
|
|
|
2018-02-13 12:05:23 -08:00
|
|
|
![Cleaning up Packets](images/CleanUp.png)
|
2018-02-13 09:11:10 -08:00
|
|
|
|
|
|
|
|
|
|
|
### 4.3 Handling Byzantine Failures
|
|
|
|
|
2018-02-26 01:48:12 -08:00
|
|
|
While every message is guaranteed reliable in the face of malicious nodes or
|
|
|
|
relays, all guarantees break down when the entire blockchain on the other end
|
|
|
|
of the connection exhibits byzantine faults. These can be in two forms:
|
|
|
|
|
|
|
|
* failures of the consensus mechanism (reversing "final" blocks)
|
|
|
|
* failure at the application level (not performing the action defined by the
|
|
|
|
message).
|
|
|
|
|
|
|
|
The IBC protocol can only detect byzantine faults at the consensus level, and
|
|
|
|
is designed to halt with an error upon detecting any such fault. That is, if it
|
|
|
|
ever sees two different headers for the same height (or any evidence that
|
|
|
|
headers belong to different forks), then it must freeze the connection
|
|
|
|
immediately. The resolution of the fault must be handled by the blockchain
|
|
|
|
governance, as this is a serious incident and cannot be predefined.
|
|
|
|
|
|
|
|
If there is a big divide in the remote chain and they split eg. 60-40 as to the
|
|
|
|
direction of the chain, then the light-client protocol will refuses to follow
|
|
|
|
either fork. If both sides declare a hard fork and continue with new validator
|
|
|
|
sets that are not compatible with the consensus engine (they don't have ⅔
|
|
|
|
support from the previous block), then users will have to manually tell their
|
|
|
|
local client which chain to follow (or fork and follow both with different IDs).
|
|
|
|
|
|
|
|
The IBC protocol doesn't have the option to follow both chains as the queue and
|
|
|
|
associated state must map to exactly one remote chain. In a fork, the chain can
|
|
|
|
continue the connection with one fork, and optionally make a fresh connection
|
|
|
|
with the other fork (which will also have to adjust internally to wipe its view
|
|
|
|
of the connection clean).
|
|
|
|
|
|
|
|
The other major byzantine action is at the application level. Let us assume
|
|
|
|
messages represent transfer of value. If chain A sends a message with X tokens
|
|
|
|
to chain B, then it promises to remove X tokens from the local supply. And if
|
|
|
|
chain B handles this message with a success code, it promises to credit X
|
|
|
|
tokens to the account mentioned in the message. What if A isn't actually
|
|
|
|
removing tokens from the supply, or if B is not actually crediting accounts?
|
|
|
|
|
|
|
|
Such application level issues cannot be proven in a generic sense, but must be
|
|
|
|
handled individually by each application. The activity should be provable in
|
|
|
|
some manner (as it is all in an auditable blockchain), but there are too many
|
|
|
|
failure modes to attempt to enumerate, so we rely on the vigilance of the
|
|
|
|
participants in the extremely rare case of a rogue blockchain. Of course, this
|
|
|
|
misbehavior is provable and can negatively impact the value of the offending
|
|
|
|
chain, providing economic incentives for any normal chain not to run malicious
|
|
|
|
applications over IBC.
|