Commit Graph

436 Commits

Author SHA1 Message Date
behzad nouri 61f0a7d9c3
replaces Mutex<PohRecorder> with RwLock<PohRecorder> (#26370)
Mutex causes superfluous lock contention when a read-only reference suffices.
2022-07-05 14:29:44 +00:00
Jeff Washington (jwash) 557bf6e656
allow initial hash calc to occur in bg (#26271)
* allow initial hash calc to occur in bg

* validator_initialized -> startup_verification_complete

* add infos for leader and vote

* rework snapshot for startup verification

* change to assert
2022-06-29 16:48:33 -05:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
Brooks Prumo 662818ef0d
Use `VoteAccount::node_pubkey()` (#26207) 2022-06-27 09:09:06 -05:00
Ryo Onodera cd2878acf9
Avoid to miss to root for local slots before the hard fork (#19912)
* Make sure to root local slots even with hard fork

* Address review comments

* Cleanup a bit

* Further clean up

* Further clean up a bit

* Add comment

* Tweak hard fork reconciliation code placement
2022-06-26 15:14:17 +09:00
Jeff Biseda bafdb7dd62
Revert handle start_http failure in rpc_service (#25400) (#26130)
* revert e263be2000
2022-06-22 10:52:27 -07:00
HaoranYi b5d0c7b468
Revert "tvu and tpu timeout on joining its microservices (#24111)" (#26132)
This reverts commit e105547c14.
2022-06-22 10:57:46 -05:00
Pankaj Garg 43ff65ece9
Use single send socket in UdpTpuConnection (#26105) 2022-06-21 14:56:21 -07:00
Boqin Qin(秦 伯钦) 611d2ec73c
core: fix double-readlock in validator (#26053) 2022-06-20 15:07:00 +00:00
Trent Nelson a5f290a66f core: disable quic servers on mainnet-beta 2022-06-17 20:04:05 -06:00
Lijun Wang 29b597cea5
Connection pool support in connection cache and QUIC connection reliability improvement (#25793)
* Connection pool in connection cache and handle connection errors

1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
2022-06-10 09:25:24 -07:00
Yueh-Hsuan Chiang ee4469c882
Skip compaction in backup_and_clear_blockstore (#25810)
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.

#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)

Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.

Fixes #25710
2022-06-09 17:11:50 +08:00
Jon Cinque 79a8ecd0ac
client: Remove static connection cache, plumb it instead (#25667)
* client: Remove static connection cache, plumb it instead

* Add TpuClient::new_with_connection_cache to not break downstream

* Refactor get_connection and RwLock into ConnectionCache

* Fix merge conflicts from new async TpuClient

* Remove `ConnectionCache::set_use_quic`

* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
2022-06-08 13:57:12 +02:00
Brennan Watt ba04063956
Add CPUmetrics (#25802)
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
2022-06-07 11:34:25 -07:00
Pankaj Garg 1c2ae470c5
Fix forwarding of transactions over QUIC (#25674)
* Spawn QUIC server to receive forwarded txs

* Update validator port range

* forward votes using UDP

* no forwarding from unstaked nodes

* forwarding stats in banking stage

* fix test builds

* fix lifetime of forward sender
2022-06-02 11:14:58 -07:00
Ryo Onodera aedcb05dc8
Record solana-validator ver to metrics at startup (#25635)
* Record solana-validator ver to metrics at startup

* Update Cargo.lock
2022-06-01 13:37:50 +09:00
Yueh-Hsuan Chiang 5b67960c76
(Refactor) Move blocktore options related stuff to blockstore_options.rs (#25509)
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.

#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.

By doing this, we address the mutual dependency and also make the code cleaner.
2022-05-26 16:59:26 -07:00
Michael Vines b05c7d91ed Fix derive_partial_eq_without_eq clippy lint 2022-05-22 22:22:21 -07:00
Jeff Biseda e263be2000
handle start_http failure in rpc_service (#25400) 2022-05-20 17:59:23 -07:00
Jeff Washington (jwash) 3a4f0d3397
println -> info (#25163) 2022-05-12 11:07:13 -05:00
HaoranYi 41d34d45e0
pass exit by ref (#25120) 2022-05-11 09:17:21 -05:00
DimAn 2fa9bc3e70
Add options to store full and/or incremental snapshots in separate locations (#24247) 2022-05-10 16:37:41 -04:00
Pankaj Garg 88c16c0176
Check if quic is enabled before warming up quic connections (#24821)
* Check if quic is enabled before warming up quic connections

* fix after rebase

* don't start warmup service if quic not enabled

* fix test
2022-05-01 03:52:38 +00:00
Michael Vines d0a8a16a57 ReplayStage no longer relies on Validator to reset the poh recorder at start 2022-04-22 21:17:49 -07:00
Michael Vines 84e3342612 Process blockstore after starting the TVU 2022-04-22 21:17:49 -07:00
Michael Vines 83e041299a Run real snapshot packager while processing blockstore at validator startup 2022-04-22 21:17:49 -07:00
Justin Starry c544742091
Local cluster test cleanup and refactoring (#24559)
* remove FixedSchedule.start_epoch

* use duration for timing

* Rename to partition bool to turbine_disabled

* simplify partition config
2022-04-22 12:14:07 +08:00
Michael Vines 05f32f287c solana-validator monitor now reports slot-level progress while loading blockstore 2022-04-19 22:09:48 -07:00
Michael Vines 9e4999ef6a Remove halt_at_slot from RuntimeConfig, it's not a runtime concern 2022-04-19 19:23:58 -07:00
Michael Vines 988210908c Move verify_udp_stats_access out of the way 2022-04-19 19:23:58 -07:00
Michael Vines c6f3da4879 blockstore_processor now accepts an Arc<Rwlock<BankForks>> 2022-04-19 19:23:58 -07:00
Michael Vines 0e2e0c8b7d Extract most storage-related services from the Tvu abstraction 2022-04-19 19:23:58 -07:00
Michael Vines 268a2109de Relocate hard forks info log 2022-04-19 19:23:58 -07:00
Michael Vines dd766042df Remove LedgerMetricReportService from TVU 2022-04-19 19:23:58 -07:00
sakridge 7a4a6597c0
Don't enforce ulimit for validator test config (#24272) 2022-04-12 22:06:37 +02:00
Jon Cinque 9b8850f99e
test-validator: Add `--max-compute-units` flag (#24130)
* test-validator: Add `--max-compute-units` flag

* Add `RuntimeConfig` for tweaking runtime behavior

* Actually add the file

* Move RuntimeConfig to runtime
2022-04-12 02:28:10 +02:00
Giorgio Gambino 60b2155bd3
Add accounts-filler-size command line option (#23896) 2022-04-11 13:10:09 -05:00
HaoranYi e105547c14
tvu and tpu timeout on joining its microservices (#24111)
* panic when test timeout

* nonblocking send when when droping banks

* debug log

* timeout for tvu

* unused varaible

* timeout for tpu

* Revert "debug log"

This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.

* add timeout const

* fix typo

* Revert "nonblocking send when when droping banks".
I will create another pull request for this.

This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-07 20:20:13 -05:00
Tyera Eulberg fb67ff14de
Remove replica-node crates (#24152) 2022-04-06 16:52:19 -06:00
Brooks Prumo c322842257
Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
carllin 4ea59d8cb4
Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
HaoranYi 6ba4e870c4
Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
HaoranYi ffa4cafe1c
Revert sequential execution of validator_exit and validator_parallel_exit tests (#24048)
* handle channel disconnect

* revert sequential execution of validator_exit and parallel_validator_exit tests
2022-04-02 10:22:47 -05:00
HaoranYi 51b37f0184
Modify rpc_completed_slot_service to be non-blocking (#24007)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-31 16:44:23 -05:00
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
Jeff Washington (jwash) c24de17278
remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
HaoranYi 01af40d6b6
Fix intermittent validator_exit test failure (#23594)
* run validator_exit_test sequentially

* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests

* fix intermittent validator exit failure

* no sleep

* undo the code move
2022-03-25 14:38:19 -05:00
Steven Luscher f44c8f296f
fix: thread `enforce_ulimit_nofile` config down when opening blockstore (#23925) 2022-03-25 03:13:33 -05:00
Jon Cinque 7af48465fa
transaction-status: Add return data to meta (#23688)
* transaction-status: Add return data to meta

* Add return data to simulation results

* Use pretty-hex for printing return data

* Update arg name, make TransactionRecord struct

* Rename TransactionRecord -> ExecutionRecord
2022-03-22 23:17:05 +01:00
Yueh-Hsuan Chiang f999eef452
(LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00