2020-07-02 13:50:39 -07:00
|
|
|
# Long term RPC Transaction History
|
2021-04-30 01:20:56 -07:00
|
|
|
|
|
|
|
There's a need for RPC to serve at least 6 months of transaction history. The
|
2020-07-02 13:50:39 -07:00
|
|
|
current history, on the order of days, is insufficient for downstream users.
|
|
|
|
|
|
|
|
6 months of transaction data cannot be stored practically in a validator's
|
2024-01-03 06:06:06 -08:00
|
|
|
rocksdb ledger so an external data store is necessary. The validator's rocksdb
|
|
|
|
ledger will continue to serve as the primary data source, and then will fall
|
|
|
|
back to the external data store.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
The affected RPC endpoints are:
|
2021-04-30 01:20:56 -07:00
|
|
|
|
2023-12-11 12:17:13 -08:00
|
|
|
- [getFirstAvailableBlock](https://solana.com/docs/rpc/http/getfirstavailableblock)
|
2024-01-03 06:06:06 -08:00
|
|
|
- [getConfirmedBlock](https://solana.com/docs/rpc/deprecated/getconfirmedblock)
|
|
|
|
- [getConfirmedBlocks](https://solana.com/docs/rpc/deprecated/getconfirmedblocks)
|
2023-12-11 12:17:13 -08:00
|
|
|
- [getConfirmedSignaturesForAddress](https://solana.com/docs/rpc/http/getconfirmedsignaturesforaddress)
|
2024-01-03 06:06:06 -08:00
|
|
|
- [getConfirmedTransaction](https://solana.com/docs/rpc/deprecated/getConfirmedTransaction)
|
2023-12-11 12:17:13 -08:00
|
|
|
- [getSignatureStatuses](https://solana.com/docs/rpc/http/getsignaturestatuses)
|
|
|
|
- [getBlockTime](https://solana.com/docs/rpc/http/getblocktime)
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
Some system design constraints:
|
2021-04-30 01:20:56 -07:00
|
|
|
|
|
|
|
- The volume of data to store and search can quickly jump into the terabytes,
|
2020-07-02 13:50:39 -07:00
|
|
|
and is immutable.
|
2021-04-30 01:20:56 -07:00
|
|
|
- The system should be as light as possible for SREs. For example an SQL
|
2020-07-02 13:50:39 -07:00
|
|
|
database cluster that requires an SRE to continually monitor and rebalance
|
|
|
|
nodes is undesirable.
|
2021-04-30 01:20:56 -07:00
|
|
|
- Data must be searchable in real time - batched queries that take minutes or
|
2020-07-02 13:50:39 -07:00
|
|
|
hours to run are unacceptable.
|
2021-04-30 01:20:56 -07:00
|
|
|
- Easy to replicate the data worldwide to co-locate it with the RPC endpoints
|
2020-07-02 13:50:39 -07:00
|
|
|
that will utilize it.
|
2021-04-30 01:20:56 -07:00
|
|
|
- Interfacing with the external data store should be easy and not require
|
2020-07-02 13:50:39 -07:00
|
|
|
depending on risky lightly-used community-supported code libraries
|
|
|
|
|
|
|
|
Based on these constraints, Google's BigTable product is selected as the data
|
|
|
|
store.
|
|
|
|
|
|
|
|
## Table Schema
|
2021-04-30 01:20:56 -07:00
|
|
|
|
2020-07-02 13:50:39 -07:00
|
|
|
A BigTable instance is used to hold all transaction data, broken up into
|
|
|
|
different tables for quick searching.
|
|
|
|
|
2024-01-03 06:06:06 -08:00
|
|
|
New data may be copied into the instance at anytime without affecting the
|
|
|
|
existing data, and all data is immutable. Generally the expectation is that new
|
|
|
|
data will be uploaded once an current epoch completes but there is no limitation
|
|
|
|
on the frequency of data dumps.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
Cleanup of old data is automatic by configuring the data retention policy of the
|
2024-01-03 06:06:06 -08:00
|
|
|
instance tables appropriately, it just disappears. Therefore the order of when
|
|
|
|
data is added becomes important. For example if data from epoch N-1 is added
|
|
|
|
after data from epoch N, the older epoch data will outlive the newer data.
|
|
|
|
However beyond producing _holes_ in query results, this kind of unordered
|
|
|
|
deletion will have no ill effect. Note that this method of cleanup effectively
|
|
|
|
allows for an unlimited amount of transaction data to be stored, restricted only
|
|
|
|
by the monetary costs of doing so.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
2021-04-30 01:20:56 -07:00
|
|
|
The table layout s supports the existing RPC endpoints only. New RPC endpoints
|
2020-07-02 13:50:39 -07:00
|
|
|
in the future may require additions to the schema and potentially iterating over
|
|
|
|
all transactions to build up the necessary metadata.
|
|
|
|
|
|
|
|
## Accessing BigTable
|
2021-04-30 01:20:56 -07:00
|
|
|
|
2020-07-02 13:50:39 -07:00
|
|
|
BigTable has a gRPC endpoint that can be accessed using the
|
2024-01-03 06:06:06 -08:00
|
|
|
[tonic](https://crates.io/crates/crate)] and the raw protobuf API, as currently
|
|
|
|
no higher-level Rust crate for BigTable exists. Practically this makes parsing
|
|
|
|
the results of BigTable queries more complicated but is not a significant issue.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
## Data Population
|
2021-04-30 01:20:56 -07:00
|
|
|
|
2024-01-03 06:06:06 -08:00
|
|
|
The ongoing population of instance data will occur on an epoch cadence through
|
2024-02-21 19:44:01 -08:00
|
|
|
the use of a new `agave-ledger-tool` command that will convert rocksdb data for
|
2024-01-03 06:06:06 -08:00
|
|
|
a given slot range into the instance schema.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
The same process will be run once, manually, to backfill the existing ledger
|
|
|
|
data.
|
|
|
|
|
|
|
|
### Block Table: `block`
|
|
|
|
|
|
|
|
This table contains the compressed block data for a given slot.
|
|
|
|
|
|
|
|
The row key is generated by taking the 16 digit lower case hexadecimal
|
|
|
|
representation of the slot, to ensure that the oldest slot with a confirmed
|
2024-01-03 06:06:06 -08:00
|
|
|
block will always be first when the rows are listed. eg, The row key for slot 42
|
|
|
|
would be 000000000000002a.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
The row data is a compressed `StoredConfirmedBlock` struct.
|
|
|
|
|
|
|
|
### Account Address Transaction Signature Lookup Table: `tx-by-addr`
|
|
|
|
|
|
|
|
This table contains the transactions that affect a given address.
|
|
|
|
|
2024-01-03 06:06:06 -08:00
|
|
|
The row key is
|
|
|
|
`<base58 address>/<slot-id-one's-compliment-hex-slot-0-prefixed-to-16-digits>`.
|
|
|
|
The row data is a compressed `TransactionByAddrInfo` struct.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
Taking the one's compliment of the slot allows for listing of slots ensures that
|
2024-01-03 06:06:06 -08:00
|
|
|
the newest slot with transactions that affect an address will always be listed
|
|
|
|
first.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
2024-01-03 06:06:06 -08:00
|
|
|
Sysvar addresses are not indexed. However frequently used programs such as Vote
|
|
|
|
or System are, and will likely have a row for every confirmed slot.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
### Transaction Signature Lookup Table: `tx`
|
|
|
|
|
2024-01-03 06:06:06 -08:00
|
|
|
This table maps a transaction signature to its confirmed block, and index within
|
|
|
|
that block.
|
2020-07-02 13:50:39 -07:00
|
|
|
|
|
|
|
The row key is the base58-encoded transaction signature.
|
|
|
|
The row data is a compressed `TransactionInfo` struct.
|
2023-12-22 09:07:40 -08:00
|
|
|
|
|
|
|
### Entries Table: `entries`
|
|
|
|
|
|
|
|
> Support for the `entries` table was added in v1.18.0.
|
|
|
|
|
|
|
|
This table contains data about the entries in a slot.
|
|
|
|
|
|
|
|
The row key is the same as a `block` row key.
|
|
|
|
|
|
|
|
The row data is a compressed `Entries` struct, which is a list of entry-summary
|
|
|
|
data, including hash, number of hashes since previous entry, number of
|
|
|
|
transactions, and starting transaction index.
|