Wormhole event data persistence design doc

Change-Id: I676831a71b05d63854dda5b0fe8a2020f87cbad3
This commit is contained in:
jschuldt 2021-06-15 14:17:40 -05:00 committed by Justin Schuldt
parent 9bc408ca19
commit bb64648f39
3 changed files with 97 additions and 0 deletions

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 21 KiB

View File

@ -0,0 +1,41 @@
## Wormhole event BigTable schema
### Row Keys
Row keys contain the MessageID, delimited by colons, like so: `EmitterChain:EmitterAddress:Sequence`.
BigTable can only be queried for data in the row key. Only row key data is indexed. You cannot query based on the value of a column; however you may filter based on column value.
### Column Families
BigTable requires that columns are within a "Column family". Families group columns that store related data. Grouping columns is useful for efficient reads, as you may specify which families you want returned.
The column families listed below represent data unique to a phase of the attestation lifecycle.
- `MessagePublication` holds data about a user's interaction with a Wormhole contract. Contains data from the Guardian's VAA struct.
- `Signatures` holds observed signatures from Guardians within a GuardianSet. Holds signatures independent of index within GuardianSet. This column family will provide an account of "which Guardians observed X transaction, and when?".
- `VAAState` records incremental updates toward Guardian consensus. The VAAState column family holds the progression of signatures of a GuardianSet. Each update to the Signatures list of the VAA struct is recorded. This column family will provide an account of "Which Guardians contributed to reaching quorum".
- `QuorumState` stores the signed VAA once quorum is reached.
### Column Qualifiers
Each column qualifier below is prefixed with its column family.
- `MessagePublication:Version` Version of the VAA schema.
- `MessagePublication:GuardianSetIndex` The index of the active Guardian set.
- `MessagePublication:Timestamp` Timestamp when the VAA was created by the Guardian.
- `MessagePublication:Nonce` Nonce of the user's transaction.
- `MessagePublication:Sequence` Sequence from the interaction with the Wormhole contract.
- `MessagePublication:EmitterChain` The chain the message was emitted on.
- `MessagePublication:EmitterAddress` The address of the contract that emitted the message.
- `MessagePublication:InitiatingTxID` The transaction identifier of the user's interaction with the contract.
- `MessagePublication:Payload` The payload of the user's message.
- `Signatures:{GuardianAddress}` This column qualifier will be the address of the Guardian, and the data stored here will be the signature broadcast by the Guardian. There will be a column in this family for each Guardian address that appears in a Guardian set. The column qualifier is a part of the data that is recorded here. See the [BigTable design docs](https://cloud.google.com/bigtable/docs/schema-design#columns) for the thought process behind this approach.
- `VAAState:Signatures:{GuardianSetIndex}` a list of objects containing Guardian signatures and the index of the Guardian within the GuardianSet. This is the Signatures list from the Guardian's VAA struct. Note that a BigTable column can store many values (aka "cells") for a single row, unique by timestamp. This column will hold cells containing a list of signatures, with each cell list containing one more signature than the previous cell. This will show the order that signatures accumulate.
- `QuorumState:SignedVAA` the VAA with the signatures that contributed to quorum.

View File

@ -0,0 +1,53 @@
# Centralized datastore for Wormhole visualizations
## Objective
Persist transient Guardian events in a database along with on-chain data, for easier introspection via a block-explorer style GUI.
## Background
Events observed and broadcast between Guardians are transient. Before a message is fully attested by the Guardians, an end user has no way to determine where within the lifecycle of attestation their event is. Saving the attestation state along with the message identifiers would allow the development of discovery interfaces.
Building a GUI that would allow querying and viewing Wormhole data by a single on-chain identifier would make using the Wormhole a friendlier experience. Building such a GUI would be difficult without an off-chain datastore that captures the entire lifecycle of Wormhole events.
## Goals
- Persist user intent with the relevant metadata (sender address, transaction hash/signature).
- Expose the Guardian network's Verifiable Action Approval state. Individual Signatures and if/when quorum was reached.
- Record the transaction hash/signature of all transactions performed by Guardians relevant to the User's intent.
- Allow querying by a transaction identifier and retrieving associated data.
## Non-Goals
- Centrally persisted Wormhole data does not aim to be a source of truth.
- Centrally persisted Wormhole data will not be publicly available for programmatic consumption.
## Overview
A Guardian can be configured to publish Wormhole events to a database. This will enable a discovery interface for users to query for Wormhole events, along with querying for message counts and statistics.
![Wormhole data flow](Wormhole-data-flow.svg)
## Detailed Design
A Google Cloud BigTable instance will be setup to store data about Wormhole events, with the schema described in the following section. BigTable is preferred because it does not require a global schema, along with its ability to efficiently deal with large amounts of historic data by row key sharding.
A block-explorer style web app will use BigTable to retrieve VAA state to create a discovery interface for Wormhole. The explorer web app could allow users to query for Wormhole events by a single identifier, similar to other block explorers, where a user may enter an address or a transaction identifier and see the relevant data.
### API / database schema
BigTable schema: [Wormhole event schema](./bigtable_event_schema.md)
## Caveats
It is undetermined how costly it will be to query for multiple transactions (rows) in the case of bridging tokens. For example, querying to retrieve the `assetMeta` transaction along with `transfer` message transaction.
## Alternatives Considered
### Database schema
Saving each Protobuf SignedObservation as its own row was considered. However, building a picture of the state of the user's intent with only SignedObservations is not ideal, as the logic to interpret the results would need come from somewhere, and additional data would need to be sourced.
Using VAA "digest" as BigTable RowKey was considered. Using the VAA digest would make database writes easy within the existing codebase. However, indexing on digest would heavily penalize reads as the digest will not be known to the user, so a full table scan would be required for every user request.