doc: add ledger-history.md

2022-10-31 11:36:26 +01:00 · 2022-10-31 11:36:26 +01:00 · 141be519a8
parent 928fefea9d
commit 141be519a8
1 changed files with 131 additions and 0 deletions
--- a/doc/ledger-history.md
+++ b/doc/ledger-history.md
@ -0,0 +1,131 @@
+# Ledger History
+
+Radiance proposes a neutral file format to capture Solana ledger history.
+
+It also ships various reference tool to work with this file format.
+
+## Ledger content
+
+*Ledger data* is broadly defined as transaction and consensus data that can be trustlessly validated.
+
+On Solana, ledger data is made up by only two classes of information.
+* Proof-of-History[^1] parameters (contains block hashes)
+* Transactions (contains user txs and consensus txs)
+
+[^1]: PoH is a cryptographic delay function based on recursive SHA256 hashing
+
+### Entries
+
+The Solana protocol propagates ledger data in the form of _entries_.
+
+On the wire, entries are additionally _shredded_ into network packets with erasure coding,
+but this bears no relevance to the Solana ledger itself.
+
+Entries have the following schema.
+
+```python
+class Entry:
+  num_hashes: uint64
+  prev_hash: Hash
+  transactions: list[Tx]
+```
+
+Note that a _block_ on Solana is made of up multiple entries.
+But on the ledger itself, the concept of blocks is only implied.
+
+### Existing Formats
+
+We find that the following representations of ledger data are widely used.
+
+1. **Shreds** as UDP packets
+   - Used in the peer-to-peer network
+   - Hard to capture and archive for long-term storage
+2. Archives of **blockstore** databases
+   - Archives of the RocksDB database used in the Solana Labs validator implementation
+   - Technically implementation-defined, forces use librocksdb (C++)
+3. Google Cloud **Bigtable** integration for Solana RPC nodes
+   - Closed source
+   - Locked to one specific vendor
+   - Lacks PoH data
+
+We introduce a new format better suited for long-term archival and public distribution than the existing alternatives.
+
+## CARv1 File Format
+
+The **Content-addressable ARchive** is a streaming container format for blobs (files without a name).
+
+[IPLD CARv1 Specification](https://ipld.io/specs/transport/car/carv1/)
+
+### Content Addressing
+
+_Why not .tar.zst, .7z, .rar, etc?_
+
+Unlike with traditional archive formats, all blobs in CARs are content-addressed with a hash function.
+Blobs are referred by CIDs (content identifiers) which unambiguously refer to the exact byte contents.
+
+Leveraging the [IPLD Merkle-DAG](https://docs.ipfs.tech/concepts/merkle-dag/) construction,
+blobs can recursively refer to other CIDs to build arbitrarily complex acyclic graphs of data.
+
+Thus, if users know and trust a root CID (~35 bytes), they can safely retrieve blobs from any untrusted source.
+Notably, users have the ability to verify if untrusted blobs match exactly what was requested.
+
+### Determinism
+
+Ledger CAR files are reproducible and deterministic.
+Independent node operators would generate byte-by-byte identical CAR files for the same extent of ledger history,
+regardless of where that data is sourced from.
+
+### Header
+
+The header of the ledger CAR file is set to the following.
+
+```json
+{
+  "roots": ["bafkqaaa"],
+  "version": 1
+}
+```
+
+Rationale: The CAR file does not have a single root so we place the "empty" multihash instead,
+as recommended by the [CARv1 spec](https://ipld.io/specs/transport/car/carv1/#number-of-roots).
+
+This implies that any CARv1 file starts with the following byte content (hex).
+
+```
+19 a2 65 72 6f 6f 74 73
+81 d8 2a 45 00 01 55 00
+00 67 76 65 72 73 69 6f
+6e 01
+```
+
+### IPLD data types
+
+#### Transactions
+
+Each Solana transaction is mapped to an IPLD block in native (bincode) serialization.
+
+```
+type Transaction bytes
+```
+
+#### Entries
+
+```
+type Entry struct {
+  numHashes  Int
+  hash       Hash
+  txs        TransactionList
+} representation tuple
+
+type TransactionList [ &Transaction ]
+```
+
+#### Blocks
+
+```
+type Block struct {
+  slot      Int
+  entries   [ Link ]
+  shredding [ Shredding ]
+} representation tuple
+```