Commit Graph

108 Commits

Author SHA1 Message Date
behzad nouri d87128e02c
fixes errors from clippy::needless_borrow (#29535)
https://rust-lang.github.io/rust-clippy/master/index.html#needless_borrow
2023-01-05 18:21:56 +00:00
Yueh-Hsuan Chiang f26a457cf5
Add comment block for DeadSlots LedgerColumn (#28341)
Add comment block for DeadSlots LedgerColumn
2022-10-20 09:47:19 -07:00
Yueh-Hsuan Chiang bf37228768
Add comment block for Root ledger column (#28358)
Add comment block for Root ledger column.
2022-10-19 17:06:21 -07:00
Yueh-Hsuan Chiang 9be7ada68d
Improve SlotColumn's code comment. (#28447)
Improve SlotColumn's code comment to include how
LedgerCleanupService manages the clean-up of a SlotColumn.
2022-10-18 23:50:22 -07:00
Yueh-Hsuan Chiang f910f2d7eb
Add comment block for Orphans column family. (#28340)
Add comment block for Orphans column family.
2022-10-16 23:47:02 -07:00
Yueh-Hsuan Chiang 6a96a4c2ee
Add comment block for ErasureMeta ledger column (#28356)
Add comment block for ErasureMeta ledger column.
2022-10-14 12:59:34 -07:00
Yueh-Hsuan Chiang 4539cb75fb
Add comment block for SlotMeta column family (#28339)
Add comment block for SlotMeta column family.
2022-10-14 12:56:56 -07:00
Yueh-Hsuan Chiang 5df10173dd
Add comment block for BankHash ledger column (#28357)
Add comment block for BankHash ledger column.
2022-10-13 11:41:58 -07:00
steviez 624f5cfcd5
Add rocksdb multi_get_bytes() method (#28244) 2022-10-07 18:05:13 -04:00
Yueh-Hsuan Chiang fb6abac4ca
Improve code comment for delete_range_cf (#28087)
#### Summary of Changes
Improve code comment for Blockstore::delete_range_cf esp. for the corner case
where the from Slot and to Slot are the same.
2022-09-27 20:12:08 -07:00
Yueh-Hsuan Chiang a76258f276
Improve code comments for ledgerstore columns. (#28054)
### Problem
The documentation of each column family is missing

### Summary
The goal is to create a comment block that will essentially include a high-level
concept on what each column family is about and what are their key/value formats.

This PR is the first cut that includes the key/value format of each column family.
This should at least provide an easy pointer for readers to understand what this
column family stores by searching its value type and how to access the data based
on the key type.
2022-09-27 00:31:23 -07:00
behzad nouri 9a57c64f21
patches clippy errors from new rust nightly release (#27996) 2022-09-22 22:23:03 +00:00
steviez 4e8e0cda7e
Remove extra data copy from several Rocks get() methods (#27693)
Several of the get() methods return a deserialized object (as opposed to
a Vec<u8>) by first getting a byte array out of Rocks, and then using
bincode::deserialize() to get the underlying type. However,
deserialize() only requires a u8 slice, not an owned Vec<u8>. So, we can
use get_pinned_cf() to reference memory owned by Rocks and avoid an
unnecessary copy.
2022-09-16 18:38:28 -05:00
Yueh-Hsuan Chiang 9831e4ddad
Remove daily rewrite/compaction of each ledger file (#27571)
#### Problem
Previously before #26651, our LedgerCleanupService needs RocksDB background
compactions to reclaim ledger disk space via our custom CompactionFilter.
However, since RocksDB's compaction isn't smart enough to know which file to pick,
we rely on the 1-day compaction period so that each file will be forced to be compacted
once a day so that we can reclaim ledger disk space in time.  The downside of this is
each ledger file will be rewritten once per day.

#### Summary of Changes
As #26651 makes LedgerCleanupService actively delete those files whose entire slot-range
is older than both --limit-ledger-size and the current root, we can remove the 1-day compaction
period and get rid of the daily ledger file rewrite.

The results on mainnet-beta shows that this PR reduces ~20% write-bytes-per-second
and reduces ~50% read-bytes-per-second on ledger disk.
2022-09-16 13:12:55 -07:00
Yueh-Hsuan Chiang ba3d9cd325
Add LedgerColumn::multi_get() (#26354)
#### Problem
Blockstore operations such as get_slots_since() issues multiple rocksdb::get()
at once which is not optimal for performance.

#### Summary of Changes
This PR adds LedgerColumn::multi_get() based on rocksdb::batched_multi_get(),
the optimized version of multi_get() where get requests are processed in batch
to minimize read I/O.
2022-09-12 15:01:22 -07:00
Yueh-Hsuan Chiang ed00365101
Add ledger tool command print-file-metadata (#26790)
Add ledger-tool command print-file-metadata

#### Summary of Changes
This PR adds a ledger tool subcommand print-file-metadata.
```
USAGE:
    solana-ledger-tool print-file-metadata [FLAGS] [OPTIONS] [SST_FILE_NAME]

    Prints the metadata of the specified ledger-store file.
    If no file name is unspecified, then it will print the metadata of all ledger files
```
2022-09-06 21:46:35 -07:00
Yueh-Hsuan Chiang 6d070cee08
Fix the boundary inconsistency between delete_file_in_range and delete_range (#27201)
#### Problem
RocksDB's delete_range applies to [from, to) while delete_file_in_range
applies to [from, to] by default, and the rust-rocksdb api does not include
the option to make delete_file_in_range apply to [from, to).  Such inconsistency
might cause `blockstore::run_purge` to produce an inconsistent result as it
invokes both delete_range and delete_file_in_range.

#### Summary of Changes
This PR makes all our purge / delete related functions to be inclusive
on both starting and ending slots.
2022-08-30 21:38:54 -07:00
steviez 14d9922105
Bump rust-rocksdb to 0.19.0 tag (#26949) 2022-08-11 09:41:01 -05:00
Yueh-Hsuan Chiang 99ef2184cc
Delete files older than the lowest_cleanup_slot in LedgerCleanupService::cleanup_ledger (#26651)
#### Problem
LedgerCleanupService requires compactions to propagate & digest range-delete tombstones
to eventually reclaim disk space.

#### Summary of Changes
This PR makes LedgerCleanupService::cleanup_ledger delete any file whose slot-range is
older than the lowest_cleanup_slot.  This allows us to reclaim disk space more often with
fewer IOps.  Experimental results on mainnet validators show that the PR can effectively
reduce 33% to 40% ledger disk size.
2022-08-09 00:48:06 +08:00
Yueh-Hsuan Chiang f284bba53b
Create const &strs for rocksdb perf write operation names (#26352)
#### Summary of Changes
Define PERF_METRIC_OP_NAME_PUT and PERF_METRIC_OP_NAME_WRITE_BATCH
to replace repetitive / hard-coded operation names for report_rocksdb_write_perf.
2022-07-13 00:05:26 +08:00
Yueh-Hsuan Chiang 9985215bc8
Make report_rocksdb_read_perf() to take a operation name. (#26351)
#### Problem
report_rocksdb_read_perf() always uses the hard-coded operation name "get"

#### Summary of Changes
As we will add a new read operation -- multi_get(), report_rocksdb_read_perf()
needs to have an input parameter for operation name.
2022-07-12 07:18:49 +08:00
Yueh-Hsuan Chiang 8674c96a66
Make the default values of FIFO compaction consistent with validator args (#25778)
#### Problem
When FIFO compaction is used, the size ratio between data shred and coding
shred is set to 1:1 based on the `--rocksdb_fifo_shred_storage_size` arg.
However, BlockstoreRocksFifoOptions::default() uses a slightly optimized
5:4 ratio instead, and the default() function is only used in benchmarks.

#### Summary of Changes
This PR makes both validator argument and BlockstoreRocksFifoOptions::default()
to use 1:1 ratio between data and coding shred size.
2022-06-07 15:24:58 +08:00
Yueh-Hsuan Chiang bcff88bf42
Use the new datapoint macro for RocksDB column family metrics (#25505)
#### Summary of Changes
Use the new datapoint macro that supports group-by for RocksDB column family metrics.
By using the new macro, we can further remove large chunks of boilerplate code that try to work around the previous datapoint macro that does not support group-by.
2022-05-31 09:26:57 -07:00
Yueh-Hsuan Chiang 24634b6e25
Use the new datapoint macro that supports group-by for RocksDB read/write metrics. (#25392)
#### Summary of Changes
Use the new datapoint macro that supports group-by for RocksDB read/write perf metrics.
2022-05-26 22:17:29 -07:00
Yueh-Hsuan Chiang 5b67960c76
(Refactor) Move blocktore options related stuff to blockstore_options.rs (#25509)
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.

#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.

By doing this, we address the mutual dependency and also make the code cleaner.
2022-05-26 16:59:26 -07:00
Michael Vines b05c7d91ed Fix derive_partial_eq_without_eq clippy lint 2022-05-22 22:22:21 -07:00
Yueh-Hsuan Chiang d3dc2db9fb
(LedgerStore) Rate-limit RocksDB perf sample by a minimum time interval (#25100)
#### Problem
The current RocksDB read/write perf metrics do not include the total operation nanos
and thus we have to include all fields that might contribute to the total operation nanos.

#### Summary of Changes
This PR includes the total operation nanos in RocksDB's read/write perf and reduces the
number of reported fields in its perf metric.
2022-05-21 16:42:33 -07:00
Jeff Biseda 8caf0aabd1
framework to preserve optimistic_slot in blockstore (#25362) 2022-05-20 16:46:23 -07:00
Yueh-Hsuan Chiang de2033f2f2
(LedgerStore) Rate-limit RocksDB perf sample by a minimum time interval (#25093)
#### Problem
When the number of RocksDB read/write operations spikes, its payload size
might exceed the limit (413 Payload Too Large).

#### Summary of Changes
This PR rate-limit the perf-sampling of RocksDB read/write operations by one second
in addition to the existing sampling that is configurable via the hidden validator
argument --rocksdb-perf-sample-interval.
2022-05-20 10:54:27 -07:00
Yueh-Hsuan Chiang 5625959f7e
(LedgerStore) Change perf_samples_counter from Arc<AtomicUsize> to AtomicUsize (#25043)
#### Problem
After #25042, each LedgerColumn has its own BlockstoreRocksDbWritePerfMetrics
and BlockstoreRocksDbReadPerfMetrics instances.  As it has total ownership,
its member field does not need to use Arc.

#### Summary of Changes
Change perf_samples_counter from Arc<AtomicUsize> to AtomicUsize
under BlockstoreRocksDbWritePerfMetrics and BlockstoreRocksDbReadPerfMetrics.
2022-05-16 11:31:07 -07:00
Yueh-Hsuan Chiang b2dcda8980
(LedgerStore) Move metric sample counters out from LedgerColumnOptions (#25042)
#### Problem
LedgerColumnOptions contain two fields, perf_read_counter and perf_write_counter,
that are not really options but internal counters.

#### Summary of Changes
This PR introduces BlockstoreRocksDbPerfSamplingStatus, a struct that holds internal
status for RocksDB perf sampling and moves perf_read_counter and perf_write_counter
out from LedgerColumnOptions.
2022-05-10 16:13:19 -07:00
Yueh-Hsuan Chiang 63bd0cdd5d
(LedgerStore) Move BlockstoreRocksDbColumnFamilyMetrics to blockstore_metric.rs (#24856)
#### Problem
blockstore_db.rs becomes bigger.

#### Summary of Changes
Move BlockstoreRocksDbColumnFamilyMetrics to blockstore_metric.rs out from blockstore_db.rs.
2022-05-03 14:46:59 -07:00
Yueh-Hsuan Chiang 0b9d04808f
(LedgerStore) Move trait ColumnMetrics and metric-macros to blockstore_metric.rs (#24855)
#### Problem
blockstore_db.rs becomes bigger.

#### Summary of Changes
Move trait ColumnMetrics and metric-macros to blockstore_metric.rs out from blockstore_db.rs.
2022-05-02 22:58:31 -07:00
Yueh-Hsuan Chiang eca0eb9585
(LedgerStore) Move metric-related functions to blockstore_metric.rs (#24854)
#### Problem
blockstore_db.rs becomes bigger.

#### Summary of Changes
This PR creates blockstore_metric.rs and moves metric-related functions out from blockstore_db.rs.
2022-05-02 20:53:25 -07:00
steviez 428cf54c91
Change BlockStore TryPrimaryThenSecondary to just Secondary (#23391) 2022-04-29 20:05:39 -05:00
Yueh-Hsuan Chiang 5245eb4229
(LedgerStore) Hidden validator argument for RocksDB perf samples (#24684)
#### Summary of Changes
This PR replaces the use of thread_rng in RocksDB perf metric samples by
AtomicU32 with Ordering::Relaxed to improve the performance of determining
whether to sample the current RocksDB's read/write perf metric.
2022-04-29 17:55:34 -07:00
Yueh-Hsuan Chiang b56c091b37
(LedgerStore) Hidden validator argument for RocksDB perf samples (#24682)
#### Problem
Currently, the number of RocksDB perf samples is controlled by an env arg
which is later handled using a lazy_static variable.  However, there is a known
performance overhead of using lazy_static as mentioned in
https://github.com/solana-labs/solana/pull/6472.

#### Summary of Changes
Instead, this PR uses a hidden validator argument, --rocksdb-perf-sample-interval,
for controlling how often RocksDB read/write performance sample is collected.
2022-04-29 15:28:50 -07:00
Yueh-Hsuan Chiang 27efcae16c
(LedgerStore) Convert Rocks from tuple to struct with named fields (#24761)
#### Problem
The RocksDB wrapper,`Rocks`, under blockstore_db is currently implemented
as a tuple with unnamed fields.  Accessing its fields requires syntax like `self.0`
which limits readability. 

#### Summary of Changes
This PR converts Rocks from tuple to struct so that it has more human-readable
fields.
2022-04-28 21:32:48 -07:00
Yueh-Hsuan Chiang 077bc4f407
(LedgerStore) Change the default RocksDB perf sample rate to 1 / 1000. (#24234) 2022-04-12 04:12:47 +00:00
Yueh-Hsuan Chiang 5a48ef72fd
(LedgerStore) Skip sampling check when ROCKSDB_PERF_CONTEXT_SAMPLES_IN_1K_DEFAULT = 0 (#24221)
#### Problem
Currently, even if SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K == 0, we are still doing
the sampling check for every RocksDB read.

```
thread_rng().gen_range(0, METRIC_SAMPLES_1K) > *ROCKSDB_PERF_CONTEXT_SAMPLES_IN_1K
```

#### Summary of Changes
This PR skips the sampling check when SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
is set to 0.
2022-04-11 20:39:46 -07:00
Yueh-Hsuan Chiang 1f136de294
(LedgerStore) Report perf metrics for RocksDB deletes (#24138)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB deletes.
Samples are reported under "blockstore_rocksdb_write_perf" with op=delete
The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:18:05 -07:00
Yueh-Hsuan Chiang b84521d47d
(LedgerStore) Report perf metrics for RocksDB write batch (#24061)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB write-batches.
Samples are reported under "blockstore_rocksdb_write_perf" with op=write_batch

Its cf_name tag is set to "write_batch" as well as each write-batch could include multiple column families.

The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:17:51 -07:00
Yueh-Hsuan Chiang 206c3dd402
(LedgerStore) Enable RocksDB Perf metrics reporting for get_bytes and put_bytes (#24066)
#### Summary of Changes
Enable RocksDB Perf metrics reporting for get_bytes and put_bytes.
2022-04-07 00:24:10 -07:00
Yueh-Hsuan Chiang 4f0e887702
(LedgerStore) Report RocksDB perf metrics for Protobuf Columns (#24065)
This PR enables the reporting of both RocksDB read and write perf metrics for ProtobufColumns,
including TransactionStatus and Rewards.
2022-04-07 00:15:00 -07:00
Yueh-Hsuan Chiang 2d1f27ed8e
(LedgerStore) Perf Metric for RocksDB Writes (#23951)
#### Summary of Changes
This PR implements the reporting of RocksDB write perf metrics to blockstore_rocksdb_write_perf
based on RocksDB's PerfContext.  The default sample rate is 10 in 1000, and the env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K can control the sample rate.
2022-04-06 12:12:38 -07:00
Yueh-Hsuan Chiang 24cc6c33de
(LedgerStore)(Refactor) Move metric reporting functions to a dedicate mod (#24060)
Previously, the metric reporting functions are implemented under LedgerColumnMetric.
However, there're operations like write batch which is issued by the function inside Rocks.

This PR moves reporting functions to its own dedicate mod so that both LedgerColumn and
Rocks can report column perf metrics.
2022-04-05 15:06:17 -07:00
Yueh-Hsuan Chiang 0b5ed87220
(LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
Yueh-Hsuan Chiang c83c95b56b
(LedgerStore) Create ColumnMetrics trait for CF metric reporting (#23763)
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.

This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
2022-03-23 20:51:49 -07:00
Yueh-Hsuan Chiang ae75b1a25f
(LedgerStore) Add compression type (#23578)
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus.  Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.

Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
2022-03-22 02:27:09 -07:00
Yueh-Hsuan Chiang f999eef452
(LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00