3.7 KiB
Ledger History
Radiance proposes a neutral file format to capture Solana ledger history.
It also ships various reference tool to work with this file format.
Ledger content
Ledger data is broadly defined as transaction and consensus data that can be trustlessly validated.
On Solana, ledger data is made up by only two classes of information.
- Proof-of-History1 parameters (contains block hashes)
- Transactions (contains user txs and consensus txs)
Entries
The Solana protocol propagates ledger data in the form of entries.
On the wire, entries are additionally shredded into network packets with erasure coding, but this bears no relevance to the Solana ledger itself.
Entries have the following schema.
class Entry:
num_hashes: uint64
prev_hash: Hash
transactions: list[Tx]
Note that a block on Solana is made of up multiple entries. But on the ledger itself, the concept of blocks is only implied.
Existing Formats
We find that the following representations of ledger data are widely used.
- Shreds as UDP packets
- Used in the peer-to-peer network
- Hard to capture and archive for long-term storage
- Archives of blockstore databases
- Archives of the RocksDB database used in the Solana Labs validator implementation
- Technically implementation-defined, forces use librocksdb (C++)
- Google Cloud Bigtable integration for Solana RPC nodes
- Closed source
- Locked to one specific vendor
- Lacks PoH data
We introduce a new format better suited for long-term archival and public distribution than the existing alternatives.
CARv1 File Format
The Content-addressable ARchive is a streaming container format for blobs (files without a name).
Content Addressing
Why not .tar.zst, .7z, .rar, etc?
Unlike with traditional archive formats, all blobs in CARs are content-addressed with a hash function. Blobs are referred by CIDs (content identifiers) which unambiguously refer to the exact byte contents.
Leveraging the IPLD Merkle-DAG construction, blobs can recursively refer to other CIDs to build arbitrarily complex acyclic graphs of data.
Thus, if users know and trust a root CID (~35 bytes), they can safely retrieve blobs from any untrusted source. Notably, users have the ability to verify if untrusted blobs match exactly what was requested.
Determinism
Ledger CAR files are reproducible and deterministic. Independent node operators would generate byte-by-byte identical CAR files for the same extent of ledger history, regardless of where that data is sourced from.
Header
The header of the ledger CAR file is set to the following.
{
"roots": ["bafkqaaa"],
"version": 1
}
Rationale: The CAR file does not have a single root so we place the "empty" multihash instead, as recommended by the CARv1 spec.
This implies that any CARv1 file starts with the following byte content (hex).
19 a2 65 72 6f 6f 74 73
81 d8 2a 45 00 01 55 00
00 67 76 65 72 73 69 6f
6e 01
IPLD data types
Transactions
Each Solana transaction is mapped to an IPLD block in native (bincode) serialization.
type Transaction bytes
Entries
type Entry struct {
numHashes Int
hash Hash
txs TransactionList
} representation tuple
type TransactionList [ &Transaction ]
Blocks
type Block struct {
slot Int
entries [ Link ]
shredding [ Shredding ]
} representation tuple
-
PoH is a cryptographic delay function based on recursive SHA256 hashing ↩︎