# Ledger History

Radiance proposes a neutral file format to capture Solana ledger history.

It also ships various reference tool to work with this file format.

## Ledger content

*Ledger data* is broadly defined as transaction and consensus data that can be trustlessly validated.

On Solana, ledger data is made up by only two classes of information.
* Proof-of-History[^1] parameters (contains block hashes)
* Transactions (contains user txs and consensus txs)

[^1]: PoH is a cryptographic delay function based on recursive SHA256 hashing

### Entries

The Solana protocol propagates ledger data in the form of _entries_.

On the wire, entries are additionally _shredded_ into network packets with erasure coding,
but this bears no relevance to the Solana ledger itself.

Entries have the following schema.

```python
class Entry:
  num_hashes: uint64
  prev_hash: Hash
  transactions: list[Tx]
```

Note that a _block_ on Solana is made of up multiple entries.
But on the ledger itself, the concept of blocks is only implied.

### Existing Formats

We find that the following representations of ledger data are widely used.

1. **Shreds** as UDP packets
   - Used in the peer-to-peer network
   - Hard to capture and archive for long-term storage
2. Archives of **blockstore** databases
   - Archives of the RocksDB database used in the Solana Labs validator implementation
   - Technically implementation-defined, forces use librocksdb (C++)
3. Google Cloud **Bigtable** integration for Solana RPC nodes
   - Closed source
   - Locked to one specific vendor
   - Lacks PoH data

We introduce a new format better suited for long-term archival and public distribution than the existing alternatives.

## CARv1 File Format

The **Content-addressable ARchive** is a streaming container format for blobs (files without a name).

[IPLD CARv1 Specification](https://ipld.io/specs/transport/car/carv1/)

### Content Addressing

_Why not .tar.zst, .7z, .rar, etc?_

Unlike with traditional archive formats, all blobs in CARs are content-addressed with a hash function.
Blobs are referred by CIDs (content identifiers) which unambiguously refer to the exact byte contents.

Leveraging the [IPLD Merkle-DAG](https://docs.ipfs.tech/concepts/merkle-dag/) construction,
blobs can recursively refer to other CIDs to build arbitrarily complex acyclic graphs of data.

Thus, if users know and trust a root CID (~35 bytes), they can safely retrieve blobs from any untrusted source.
Notably, users have the ability to verify if untrusted blobs match exactly what was requested.

### Determinism

Ledger CAR files are reproducible and deterministic.
Independent node operators would generate byte-by-byte identical CAR files for the same extent of ledger history,
regardless of where that data is sourced from.

### Header

The header of the ledger CAR file is set to the following.

```json
{
  "roots": ["bafkqaaa"],
  "version": 1
}
```

Rationale: The CAR file does not have a single root so we place the "empty" multihash instead,
as recommended by the [CARv1 spec](https://ipld.io/specs/transport/car/carv1/#number-of-roots).

This implies that any CARv1 file starts with the following byte content (hex).

```
19 a2 65 72 6f 6f 74 73
81 d8 2a 45 00 01 55 00
00 67 76 65 72 73 69 6f
6e 01
```

### IPLD data types

#### Transactions

Each Solana transaction is mapped to an IPLD block in native (bincode) serialization.

```
type Transaction bytes
```

#### Entries

```
type Entry struct {
  numHashes  Int
  hash       Hash
  txs        TransactionList
} representation tuple

type TransactionList [ &Transaction ]
```

#### Blocks

```
type Block struct {
  slot      Int
  entries   [ Link ]
  shredding [ Shredding ]
} representation tuple
```