tpuproxy/doc/ledger-history.md

3.7 KiB

Ledger History

Radiance proposes a neutral file format to capture Solana ledger history.

It also ships various reference tool to work with this file format.

Ledger content

Ledger data is broadly defined as transaction and consensus data that can be trustlessly validated.

On Solana, ledger data is made up by only two classes of information.

  • Proof-of-History1 parameters (contains block hashes)
  • Transactions (contains user txs and consensus txs)

Entries

The Solana protocol propagates ledger data in the form of entries.

On the wire, entries are additionally shredded into network packets with erasure coding, but this bears no relevance to the Solana ledger itself.

Entries have the following schema.

class Entry:
  num_hashes: uint64
  prev_hash: Hash
  transactions: list[Tx]

Note that a block on Solana is made of up multiple entries. But on the ledger itself, the concept of blocks is only implied.

Existing Formats

We find that the following representations of ledger data are widely used.

  1. Shreds as UDP packets
    • Used in the peer-to-peer network
    • Hard to capture and archive for long-term storage
  2. Archives of blockstore databases
    • Archives of the RocksDB database used in the Solana Labs validator implementation
    • Technically implementation-defined, forces use librocksdb (C++)
  3. Google Cloud Bigtable integration for Solana RPC nodes
    • Closed source
    • Locked to one specific vendor
    • Lacks PoH data

We introduce a new format better suited for long-term archival and public distribution than the existing alternatives.

CARv1 File Format

The Content-addressable ARchive is a streaming container format for blobs (files without a name).

IPLD CARv1 Specification

Content Addressing

Why not .tar.zst, .7z, .rar, etc?

Unlike with traditional archive formats, all blobs in CARs are content-addressed with a hash function. Blobs are referred by CIDs (content identifiers) which unambiguously refer to the exact byte contents.

Leveraging the IPLD Merkle-DAG construction, blobs can recursively refer to other CIDs to build arbitrarily complex acyclic graphs of data.

Thus, if users know and trust a root CID (~35 bytes), they can safely retrieve blobs from any untrusted source. Notably, users have the ability to verify if untrusted blobs match exactly what was requested.

Determinism

Ledger CAR files are reproducible and deterministic. Independent node operators would generate byte-by-byte identical CAR files for the same extent of ledger history, regardless of where that data is sourced from.

Header

The header of the ledger CAR file is set to the following.

{
  "roots": ["bafkqaaa"],
  "version": 1
}

Rationale: The CAR file does not have a single root so we place the "empty" multihash instead, as recommended by the CARv1 spec.

This implies that any CARv1 file starts with the following byte content (hex).

19 a2 65 72 6f 6f 74 73
81 d8 2a 45 00 01 55 00
00 67 76 65 72 73 69 6f
6e 01

IPLD data types

Transactions

Each Solana transaction is mapped to an IPLD block in native (bincode) serialization.

type Transaction bytes

Entries

type Entry struct {
  numHashes  Int
  hash       Hash
  txs        TransactionList
} representation tuple

type TransactionList [ &Transaction ]

Blocks

type Block struct {
  slot      Int
  entries   [ Link ]
  shredding [ Shredding ]
} representation tuple

  1. PoH is a cryptographic delay function based on recursive SHA256 hashing ↩︎