From 141be519a8b46dbbd72fec81dda5d76214e0f9e6 Mon Sep 17 00:00:00 2001 From: Richard Patel Date: Mon, 31 Oct 2022 11:36:26 +0100 Subject: [PATCH] doc: add ledger-history.md --- doc/ledger-history.md | 131 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 doc/ledger-history.md diff --git a/doc/ledger-history.md b/doc/ledger-history.md new file mode 100644 index 0000000..8a0035c --- /dev/null +++ b/doc/ledger-history.md @@ -0,0 +1,131 @@ +# Ledger History + +Radiance proposes a neutral file format to capture Solana ledger history. + +It also ships various reference tool to work with this file format. + +## Ledger content + +*Ledger data* is broadly defined as transaction and consensus data that can be trustlessly validated. + +On Solana, ledger data is made up by only two classes of information. +* Proof-of-History[^1] parameters (contains block hashes) +* Transactions (contains user txs and consensus txs) + +[^1]: PoH is a cryptographic delay function based on recursive SHA256 hashing + +### Entries + +The Solana protocol propagates ledger data in the form of _entries_. + +On the wire, entries are additionally _shredded_ into network packets with erasure coding, +but this bears no relevance to the Solana ledger itself. + +Entries have the following schema. + +```python +class Entry: + num_hashes: uint64 + prev_hash: Hash + transactions: list[Tx] +``` + +Note that a _block_ on Solana is made of up multiple entries. +But on the ledger itself, the concept of blocks is only implied. + +### Existing Formats + +We find that the following representations of ledger data are widely used. + +1. **Shreds** as UDP packets + - Used in the peer-to-peer network + - Hard to capture and archive for long-term storage +2. Archives of **blockstore** databases + - Archives of the RocksDB database used in the Solana Labs validator implementation + - Technically implementation-defined, forces use librocksdb (C++) +3. Google Cloud **Bigtable** integration for Solana RPC nodes + - Closed source + - Locked to one specific vendor + - Lacks PoH data + +We introduce a new format better suited for long-term archival and public distribution than the existing alternatives. + +## CARv1 File Format + +The **Content-addressable ARchive** is a streaming container format for blobs (files without a name). + +[IPLD CARv1 Specification](https://ipld.io/specs/transport/car/carv1/) + +### Content Addressing + +_Why not .tar.zst, .7z, .rar, etc?_ + +Unlike with traditional archive formats, all blobs in CARs are content-addressed with a hash function. +Blobs are referred by CIDs (content identifiers) which unambiguously refer to the exact byte contents. + +Leveraging the [IPLD Merkle-DAG](https://docs.ipfs.tech/concepts/merkle-dag/) construction, +blobs can recursively refer to other CIDs to build arbitrarily complex acyclic graphs of data. + +Thus, if users know and trust a root CID (~35 bytes), they can safely retrieve blobs from any untrusted source. +Notably, users have the ability to verify if untrusted blobs match exactly what was requested. + +### Determinism + +Ledger CAR files are reproducible and deterministic. +Independent node operators would generate byte-by-byte identical CAR files for the same extent of ledger history, +regardless of where that data is sourced from. + +### Header + +The header of the ledger CAR file is set to the following. + +```json +{ + "roots": ["bafkqaaa"], + "version": 1 +} +``` + +Rationale: The CAR file does not have a single root so we place the "empty" multihash instead, +as recommended by the [CARv1 spec](https://ipld.io/specs/transport/car/carv1/#number-of-roots). + +This implies that any CARv1 file starts with the following byte content (hex). + +``` +19 a2 65 72 6f 6f 74 73 +81 d8 2a 45 00 01 55 00 +00 67 76 65 72 73 69 6f +6e 01 +``` + +### IPLD data types + +#### Transactions + +Each Solana transaction is mapped to an IPLD block in native (bincode) serialization. + +``` +type Transaction bytes +``` + +#### Entries + +``` +type Entry struct { + numHashes Int + hash Hash + txs TransactionList +} representation tuple + +type TransactionList [ &Transaction ] +``` + +#### Blocks + +``` +type Block struct { + slot Int + entries [ Link ] + shredding [ Shredding ] +} representation tuple +```