cosmos-sdk/queues.md at 1feb84e2726c268e183ba6e4aa2f5a2ca2631b68

15 KiB

Raw Blame History

3 Messaging Queue

Messaging in distributed systems is a deeply researched field and a primitive upon which many other systems are built. We can model asynchronous message passing, and make no timing assumptions on the communication channels. By doing this, we allow each zone to move at its own speed, unblocked by any other zone, but able to communicate as fast as the network allows at that moment.

Another benefit of using message passing as our primitive, is that the receiver decides how to act upon the incoming message. Just because one zone sends a message and we have an IBC connection with this zone, doesn't mean we have to execute the requested action. Each zone can add its own business logic upon receiving the message to decide whether to accept or reject the message. To maintain consistency, both sides must only agree on the proper state transitions associated with accepting or rejecting.

This encapsulation is very difficult to impossible to achieve in a shared-state scenario. Message passing allows each zone to ensure its security and autonomy, while simultaneously allowing the different systems to work as one whole. This can be seen as an analogue to a microservices architecture, but across organizational boundaries.

To build useful algorithms upon a provable asynchronous messaging primitive, we introduce a reliable messaging queue (hereafter just referred to as a queue), typical in asynchronous message passing, to allow us to guarantee a causal ordering[5], and avoid blocking.

Causal ordering means that if x is causally before y on chain A, it must also be on chain B. Many events may happen concurrently (unrelated tx on two different blockchains) with no causality relation, but every transaction on the same chain has a clear causality relation (same as the order in the blockchain).

Message passing implies a causal ordering over multiple chains and these can be important for reasoning on the system. Given x → y means x is causally before y, and chains A and B, and a ⇒ b means a implies b:

A:send(msg_i) → B:receive(msg_i)

B:receive(msg_i) → A:receipt(msg_i)

A:send(msg_i) → A:send(msg_i+1)

x → A:send(msg_i) ⇒ x → B:receive(msg_i)

y → B:receive(msg_i) ⇒ y → A:receipt(msg_i)

(https://en.wikipedia.org/wiki/Vector_clock)

In this section, we define an efficient implementation of a secure, reliable messaging queue.

3.1 Merkle Proofs for Queues

Given the three proofs we have available, we make use of the most flexible one, M_k,v,h, to provide proofs for a message queue. To do so, we must define a unique, deterministic, and predictable key in the merkle store for each message in the queue. We also define a clearly defined format for the content of each message in the queue, which can be parsed by all chains participating in IBC. The key format and queue ordering are conceptually explained here. The binary encoding format can be found in Appendix C.

We can visualize a queue as a slice pointing into an infinite sized array. It maintains a head and a tail pointing to two indexes, such that there is data for every index where head <= index < tail. Data is pushed to the tail and popped from the head. Another method, advance, is introduced to pop all messages until i, and is useful for cleanup:

init: q_head = q_tail = 0

peek ⇒ m: if q_head = q_tail { return None } else { return q[q_head] }

pop ⇒ m: if q_head = q_tail { return None } else { q_head++; return q[q_head-1] }

push(m): q[q_tail] = m; q_tail++

advance(i): q_head = i; q_tail = max(q_tail, i)

head ⇒ i: q_head

tail ⇒ i: q_tail

Based upon this needed functionality, we define a set of keys to be stored in the merkle tree, which allows us to efficiently implement and prove any of the above queries.

Key: (queue name, [head|tail|index])

The index is stored as a fixed-length unsigned integer in big endian format, so that the lexicographical order of the byte representation of the key is consistent with their sequence number. This allows us to quickly iterate over the queue, as well as prove the content of a packet (or lack of packet) at a given sequence. head and tail are two special constants that store an integer index, and are chosen such that their serialization cannot collide with any possible index.

A message queue is simply a set of serialized packets stored at predefined keys in a merkle store, which can produce proofs for any key. Once a packet is written it must be immutable (except for deleting when popped from the queue). That is, if a value v is written to a queue, then every valid proof _M_k,v,h_must refer to the same v. This property is essential to safely process asynchronous messages.

Every IBC implementation must provide a protected subspace of the merkle store for use by each queue that cannot be affected by other modules.

3.2 Naming Queues

As mentioned above, in order for the receiver to unambiguously interpret the merkle proofs, we need a unique, deterministic, and predictable key in the merkle store for each message in the queue. We explained how the indexes are generated to provide each message in a queue a unique key, and mentioned the need for a unique name for each queue.

The queue name must be unambiguously associated with a given connection to another chain, so an observer can prove if a message was intended for chain A or chain B. In order to do so, upon registration of a connection with a remote chain, we create two queues with different names (prefixes).

ibc::send - all outgoing packets destined to chain A
ibc::receipt - the results of executing the packets received from chain A

These two queues have different purposes and store messages of different types. By parsing the key of a merkle proof, a recipient can uniquely identify which queue, if any, this message belongs to. We now define k = (remote id, [send|receipt], index). This tuple is used to route and verify every message, before the contents of the packet are processed by the appropriate application logic.

3.3 Message Contents

Up to this point, we have focused on the semantics of the message key, and how we can produce a unique identifier for every possible message in every possible connection. The actual data written at the location has been left as an opaque blob, put by providing some structure to the messages, we can enable more functionality.

We define every message in a _send queue _to consist of a well-known type and opaque data. The IBC protocol relies on the type for routing, and lets the appropriate module process the data as it sees fit. The receipt queue stores if it was an error, an optional error code, and an optional return value. We use the same index as the received message, so that the results of A:q_B.send[i] are stored at B:q_A.receipt[i]. (read: the message at index i in the send queue for chain B as stored on chain A)

V_send = (type, data)

V_receipt = (result, [success|error code])

3.4 Sending a Message

A proper implementation of IBC requires all relevant state to be encapsulated, so that other modules can only interact with it via a fixed API (to be defined in the next sections) rather than directly mutating internal state. This allows the IBC module to provide security guarantees.

Sending an IBC packet involves an application module calling the send method of the IBC module with a packet and a destination chain id. The IBC module must ensure that the destination chain was already properly registered, and that the calling module has permission to write this packet. If so, the IBC module simply pushes the packet to the tail of the send queue, which enables all the proofs described above.

The permissioning of which module can write which packet can be defined per type, so this module can maintain any application-level invariants related to this area. Thus, the "coin" module can maintain the constant supply of tokens, while another module can maintain its own invariants, without IBC messages providing a means to escape their encapsulations. The IBC module must associate every supported message type with a particular handler (f_type) and return an error for unsupported types.

(IBCsend(D, type, data) ⇒ Success) ⇒ push(q_D.send ,V_send{type, data})

We also consider how a given blockchain _A _is expected to receive the packet from a source chain S with a merkle proof, given the current set of trusted headers for that chain, T_S:

A:IBCreceive(S, M_k,v,h) ⇒ match

q_S.receipt = ∅ ⇒ Error("unregistered sender"),
k = (_, reciept, _) ⇒ Error("must be a send"),
k = (d, _, _) and d ≠ A ⇒ Error("sent to a different chain"),
k = (_, send, i) and head(q_S.receipt) ≠ i ⇒ Error("out of order"),
H_h ∉ T_S ⇒ Error("must submit header for height h"),
valid(H_h ,M_k,v,h) = false ⇒ Error("invalid merkle proof"),
v = (type, data) ⇒ (result, err) := f_type(data); push(q_S.receipt, (result, err)); Success

Note that this requires not only an valid proof, but also that the proper header as well as all prior messages were previously submitted. This returns success upon accepting a proper message, even if the message execution returned an error (which must then be relayed to the sender).

3.5 Receipts

When we wish to create a transaction that atomically commits or rolls back across two chains, we must look at the receipts from sending the original message. For example, if I want to send tokens from Alice on chain A to Bob on chain B, chain A must decrement Alice's account if and only if Bob's account was incremented on chain B. We can achieve that by storing a protected intermediate state on chain A, which is then committed or rolled back based on the result of executing the transaction on chain B.

To do this requires that we not only provable send a message from chain A to chain B, but provably return the result of that message (the receipt) from chain B to chain A. As one noticed above in the implementation of IBCreceive, if the valid IBC message was sent from A to B, then the result of executing it, even if it was an error, is stored in B:q_A.receipt. Since the receipts are stored in a queue with the same key construction as the sending queue, we can generate the same set of proofs for them, and perform a similar sequence of steps to handle a receipt coming back to S for a message previously sent to A:

S:IBCreceipt(A, M_k,v,h) ⇒ match

q_A.send = ∅ ⇒ _Error("unregistered sender"), _
k = (_, send, _) ⇒ Error("must be a recipient"),
k = (d, _, _) and d ≠ S ⇒ Error("sent to a different chain"),
H_h ∉ T_A ⇒ Error("must submit header for height h"),
not valid(H_h, M_k,v,h) ⇒ Error("invalid merkle proof"),
k = (_, receipt, head|tail) ⇒ Error("only accepts message proofs"),
k = (_, receipt, i) and head(q_S.send) ≠ i ⇒ Error("out of order"),
v = (_, error) ⇒ (type, data) := pop(q_S.send); rollback_type(data); Success
v = (res, success) ⇒ (type, data) := pop(q_S.send); commit_type(data, res); Success

This enforces that the receipts are processed in order, to allow some the application to make use of some basic assumptions about ordering. It also removes the message from the send queue, as there is now proof it was processed on the receiving chain and there is no more need to store this information.

3.6 Relay Process

The blockchain itself only records the intention to send the given message to the recipient chain, it doesn't make any network connections as that would add unbounded delays and non-determinism into the state machine. We define the concept of a relay process that connects two chain by querying one for all proofs needed to prove outgoing messages and submit these proofs to the recipient chain.

The relay process must have access to accounts on both chains with sufficient balance to pay for transaction fees but needs no other permissions. Many relay processes may run in parallel without violating any safety consideration. However, they will consume unnecessary fees if they submit the same proof multiple times, so some minimal coordination is ideal.

As an example, here is a naive algorithm for relaying send messages from A to B, without error handling. We must also concurrently run the relay of receipts from B back to A, in order to complete the cycle. Note that all reads of variables belonging to a chain imply queries and all function calls imply submitting a transaction to the blockchain.

while true
   pending := tail(A:q<sub>B.send</sub>)
   received := tail(B:q<sub>A.receive</sub>)
   if pending > received
       U<sub>h</sub> := A:latestHeader
       B:updateHeader(U<sub>h</sub>)
       for i :=received...pending
           k := (B, send, i)
           packet := A:M<sub>k,v,h</sub>
           B:IBCreceive(A, packet)
   sleep(desiredLatency)

Note that updating a header is a costly transaction compared to posting a merkle proof for a known header. Thus, a process could wait until many messages are pending, then submit one header along with multiple merkle proofs, rather than a separate header for each message. This decreases total computation cost (and fees) at the price of additional latency and is a trade-off each relay can dynamically adjust.

In the presence of multiple concurrent relays, any given relay can perform local optimizations to minimize the number of headers it submits, but remember the frequency of header submissions defines the latency of the packet transfer.

Indeed, it is ideal if each user that initiates the creation of an IBC packet also relays it to the recipient chain. The only constraint is that the relay must be able to pay the appropriate fees on the destination chain. However, in order to avoid bottlenecks, a group may sponsor an account to pay fees for a public relayer that moves all unrelayed packets (perhaps with a high latency).

15 KiB Raw Blame History