Commit Graph

858 Commits

Author SHA1 Message Date
behzad nouri 3bbfaae7b6
moves shred stats to a separate file (#24484) 2022-04-19 18:25:09 +00:00
HaoranYi c6a751d658
Detect and report bank drop signal queue full and disconnect events (#24112)
* nonblocking send when when droping banks

* detect and report drop signal queue full/disconnect events

* comments

* use counter for reporting bank_drop_queue events

* reduce log

* use datapoint to report stats

* logging instead of reporting bank drop signal full

* fix a corner case for reporting

* fix build
2022-04-18 11:19:11 -05:00
Justin Starry 4ed647d8ec
Test that tick slot hashes update the recent blockhash queue (#24242) 2022-04-16 00:30:20 +08:00
HaoranYi e3ef0741be
simplify bank drop calls (#24142)
* simplify bank drop calls

* clippy: import

* Update ledger/src/blockstore_processor.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/accounts_background_service.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/bank.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* cleanup

* format

* use msb of bank_id to indicates that we are dropping

* clippy

* restore bank id

* clippy

* revert is-serialized_with_abs flag

* assert

* clippy

* whitespace

* fix bank drop callback check

* more fix

* remove msb dropping implementation

* fix

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-04-14 08:43:54 -05:00
Yueh-Hsuan Chiang 077bc4f407
(LedgerStore) Change the default RocksDB perf sample rate to 1 / 1000. (#24234) 2022-04-12 04:12:47 +00:00
Yueh-Hsuan Chiang 5a48ef72fd
(LedgerStore) Skip sampling check when ROCKSDB_PERF_CONTEXT_SAMPLES_IN_1K_DEFAULT = 0 (#24221)
#### Problem
Currently, even if SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K == 0, we are still doing
the sampling check for every RocksDB read.

```
thread_rng().gen_range(0, METRIC_SAMPLES_1K) > *ROCKSDB_PERF_CONTEXT_SAMPLES_IN_1K
```

#### Summary of Changes
This PR skips the sampling check when SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
is set to 0.
2022-04-11 20:39:46 -07:00
Jon Cinque 9b8850f99e
test-validator: Add `--max-compute-units` flag (#24130)
* test-validator: Add `--max-compute-units` flag

* Add `RuntimeConfig` for tweaking runtime behavior

* Actually add the file

* Move RuntimeConfig to runtime
2022-04-12 02:28:10 +02:00
Giorgio Gambino 60b2155bd3
Add accounts-filler-size command line option (#23896) 2022-04-11 13:10:09 -05:00
steviez 6ca84f8a40
Move PurgeType enum to blockstore_purge.rs (#24185) 2022-04-08 11:46:12 -05:00
Tyera Eulberg d2702201ca
Bump tonic, tonic-build, prost, and etcd-client (#24147)
* Bump tonic, prost, and etcd-client

* Restore doc ignores
2022-04-08 10:21:45 -06:00
Yueh-Hsuan Chiang 1f136de294
(LedgerStore) Report perf metrics for RocksDB deletes (#24138)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB deletes.
Samples are reported under "blockstore_rocksdb_write_perf" with op=delete
The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:18:05 -07:00
Yueh-Hsuan Chiang b84521d47d
(LedgerStore) Report perf metrics for RocksDB write batch (#24061)
#### Summary of Changes
This PR enables perf metrics reporting for RocksDB write-batches.
Samples are reported under "blockstore_rocksdb_write_perf" with op=write_batch

Its cf_name tag is set to "write_batch" as well as each write-batch could include multiple column families.

The sampling rate is still controlled by env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K
and its default to 10 (meaning we report 10 in 1000 perf samples).
2022-04-08 00:17:51 -07:00
Yueh-Hsuan Chiang 206c3dd402
(LedgerStore) Enable RocksDB Perf metrics reporting for get_bytes and put_bytes (#24066)
#### Summary of Changes
Enable RocksDB Perf metrics reporting for get_bytes and put_bytes.
2022-04-07 00:24:10 -07:00
Yueh-Hsuan Chiang 4f0e887702
(LedgerStore) Report RocksDB perf metrics for Protobuf Columns (#24065)
This PR enables the reporting of both RocksDB read and write perf metrics for ProtobufColumns,
including TransactionStatus and Rewards.
2022-04-07 00:15:00 -07:00
Tyera Eulberg afeb1d3cca
Bump lru crate (#24150) 2022-04-06 16:18:42 -06:00
Yueh-Hsuan Chiang 2d1f27ed8e
(LedgerStore) Perf Metric for RocksDB Writes (#23951)
#### Summary of Changes
This PR implements the reporting of RocksDB write perf metrics to blockstore_rocksdb_write_perf
based on RocksDB's PerfContext.  The default sample rate is 10 in 1000, and the env arg SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K can control the sample rate.
2022-04-06 12:12:38 -07:00
Brooks Prumo c322842257
Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
Yueh-Hsuan Chiang 24cc6c33de
(LedgerStore)(Refactor) Move metric reporting functions to a dedicate mod (#24060)
Previously, the metric reporting functions are implemented under LedgerColumnMetric.
However, there're operations like write batch which is issued by the function inside Rocks.

This PR moves reporting functions to its own dedicate mod so that both LedgerColumn and
Rocks can report column perf metrics.
2022-04-05 15:06:17 -07:00
carllin 4ea59d8cb4
Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
HaoranYi eb65ffb779
Optimize replay waking up (#24051)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* increase replay wake up channel bounds

* reduce replay wakeup signal bound to 1

* reduce log level

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-05 08:57:12 -05:00
behzad nouri 855801cc95 removes deterministic-shred-seed feature 2022-04-05 12:04:12 +00:00
Jeff Biseda ee6bb0d5d3
track fec set turbine stats (#23989) 2022-04-04 14:44:21 -07:00
HaoranYi 6ba4e870c4
Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
Yueh-Hsuan Chiang 0b5ed87220
(LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
behzad nouri cda3d66b21
uses first_coding_index for erasure meta obtained from coding shreds (#23974)
Now that nodes correctly populate position field in coding shreds, and
first_coding_index in erasure meta, the old code to maintain backward
compatibility can be removed.
The commit is working towards changing erasure coding schema to 32:64.
2022-03-30 13:55:11 +00:00
Jeff Washington (jwash) c24de17278
remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
behzad nouri 1f9c89c1e8
expands lifetime of SlotStats (#23872)
Current slot stats are removed when the slot is full or every 30 seconds
if the slot is before root:
https://github.com/solana-labs/solana/blob/493a8e234/ledger/src/blockstore.rs#L2017-L2027

In order to track if the slot is ultimately marked as dead or rooted and
emit more metrics, this commit expands lifetime of SlotStats while
bounding total size of cache using an LRU eviction policy.
2022-03-25 19:32:22 +00:00
steviez c31db81ac4
Use VoteAccountsHashMap type alias in all applicable spots (#23904) 2022-03-24 12:09:48 -05:00
Yueh-Hsuan Chiang c83c95b56b
(LedgerStore) Create ColumnMetrics trait for CF metric reporting (#23763)
This PR does a refactoring on column family-related metrics reporting.
As the metric reporting is per column family basis, the PR creates
ColumnMetrics trait and move the metric reporting logic into it.

This refactoring will make future column metric reporting (such as
read PerfContext) much cleaner.
2022-03-23 20:51:49 -07:00
Jon Cinque 7af48465fa
transaction-status: Add return data to meta (#23688)
* transaction-status: Add return data to meta

* Add return data to simulation results

* Use pretty-hex for printing return data

* Update arg name, make TransactionRecord struct

* Rename TransactionRecord -> ExecutionRecord
2022-03-22 23:17:05 +01:00
Yueh-Hsuan Chiang ae75b1a25f
(LedgerStore) Add compression type (#23578)
This PR adds `--rocksdb-ledger-compression` as a hidden argument to the validator
for specifying the compression algorithm for TransactionStatus.  Available compression
algorithms include `lz4`, `snappy`, `zlib`. The default value is `none`.

Experimental results show that with lz4 compression, we can achieve ~37% size-reduction
on the TransactionStatus column family, or ~8% size-reduction of the ledger store size.
2022-03-22 02:27:09 -07:00
Will Hickey c4ecfa5716
Bump version to v1.11 (#23807)
* Revert crossbeam_epoch to stable. 0.9.8 only works with nightly
* Remove unneeded unit expression
2022-03-21 17:40:50 -05:00
Yueh-Hsuan Chiang f999eef452
(LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00
behzad nouri 6b0d34d70d
removes redundant Arcs from Blockstore (#23735) 2022-03-17 19:43:57 +00:00
Will Hickey 2f58c9e501
Bump version to 1.10.4 (#23743) 2022-03-17 14:02:13 -05:00
Yueh-Hsuan Chiang 86c695268e
(LedgerStore) Improve the function API of new_cf_descriptor (#23696)
As we start adding more options into BlockstoreOptions, it's better to allow
new_cf_descriptor to take the reference to BlockstoreOptions so that
we can avoid future function API changes on new_cf_descriptor.
2022-03-16 11:47:49 -07:00
dependabot[bot] 44ab660172
chore: bump libc from 0.2.119 to 0.2.120 (#23694)
* chore: bump libc from 0.2.119 to 0.2.120

Bumps [libc](https://github.com/rust-lang/libc) from 0.2.119 to 0.2.120.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.119...0.2.120)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* [auto-commit] Update all Cargo lock files

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: dependabot-buildkite <dependabot-buildkite@noreply.solana.com>
2022-03-16 00:34:40 -06:00
Michael Vines 390c5667f7 Setup bank hard_forks in load_bank_forks() 2022-03-15 23:08:07 -07:00
steviez e7cf84a593
Remove AccessType arg from create_new_ledger (#23591)
Creating a new ledger implicitly means that no other process could have
previously held access to it. Additionally, creating a new ledger
implicitly requires writing, so it follows that Primary access is
required and we can drop access type as an argument.
2022-03-15 00:07:16 -05:00
Michael Vines 390dc24608 Create leader schedule before processing blockstore 2022-03-14 15:29:58 -07:00
Michael Vines 115f376465 Factor out bank_forks_utils::load_bank_forks() 2022-03-14 15:29:58 -07:00
Michael Vines 63324be5b3 Remove last_full_snapshot_slot return value when it can be derived by the caller 2022-03-14 15:29:58 -07:00
Michael Vines cf5048faaa Eliminate to_loadresult() 2022-03-14 15:29:58 -07:00
Michael Vines 3d4bf1d00a Refactor bank_forks_utils::load() to invoke a common process_blockstore_from_root() 2022-03-14 15:29:58 -07:00
Michael Vines c2ce152be8 Inline do_process_blockstore_from_root 2022-03-14 15:29:58 -07:00
Michael Vines 07d5ee062d Push recyclers down the stack 2022-03-14 15:29:58 -07:00
HaoranYi fc2d1a61f3
signing_coding_time is not used (#23655) 2022-03-14 17:16:30 -05:00
Will Hickey 63bf0f66af
Bump version to 1.10.3 (#23648) 2022-03-14 11:18:45 -05:00
Yueh-Hsuan Chiang 1e20bd8f9a
(LedgerStore) Include storage type as a tag in RocksDB metric reporting (#23523)
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.

To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type. 
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.

Dependency: #23580
2022-03-11 15:17:34 -08:00