Commit Graph

225 Commits

Author SHA1 Message Date
Tyera b389d509a8
Track max_complete_rewards_slot for use in rpc, bigtable (#30698)
* Add RewardsMessage enum

* Cache and update max_complete_rewards_slot

* Plumb max_complete_rewards_slot into JsonRpcRequestProcesseor

* Use max_complete_rewards_slot to check get_block requests

* Use max_complete_rewards_slot to limit Bigtable uploads

* Plumb max_complete_rewards_slot into RpcSubscriptions

* Use max_complete_rewards_slot to limit block subscriptions

* Nit: fix test
2023-03-14 12:08:48 -06:00
Wen 151585e596
Filter pubkey in gossip duplicateproof ingestion (#29879) 2023-02-03 11:41:32 -08:00
Ryo Onodera 40bbf99c74
Add fully-reproducible online tracer for banking (#29196)
* Add fully-reproducible online tracer for banking

* Don't use eprintln!()...

* Update programs/sbf/Cargo.lock...

* Remove meaningless assert_eq

* Group test-only code under aptly named mod

* Remove needless overflow handling in receive_until

* Delay stat aggregation as it's possible now

* Use Cow to avoid needless heap allocs

* Properly consume metrics action as soon as hold

* Trace UnprocessedTransactionStorage::len() instead

* Loosen joining api over type safety for replaystage

* Introce hash event to override these when simulating

* Use serde_with/serde_as instead of hacky workaround

* Update another Cargo.lock...

* Add detailed comment for Packet::buffer serialize

* Rename sender_overhead_minimized_receiver_loop()

* Use type interference for TraceError

* Another minor rename

* Retire now useless ForEach to simplify code

* Use type alias as much as possible

* Properly translate and propagate tracing errors

* Clarify --enable-banking-trace with better naming

* Consider unclean (signal-based) node restarts..

* Tweak logging and cli

* Remove Bank events as it's not needed anymore

* Make tpu own banking tracer thread

* Reduce diff a bit..

* Use latest serde_with

* Finally use the published rolling-file crate

* Make test code change more consistent

* Revive dead and non-terminating test code path...

* Dispose batches early now that possible

* Split off thread handle very early at ::new()

* Tweak message for TooSmallDirByteLimitl

* Remove too much of indirection

* Remove needless pub from ::channel()

* Clarify test comments

* Avoid needless event creation if tracer is disabled

* Write tests around file rotation and spill-over

* Remove unneeded PathBuf::clone()s...

* Introduce inner struct instead of tuple...

* Remove unused enum BankStatus...

* Avoid .unwrap() for the case of disabled tracer...
2023-01-25 21:54:38 +09:00
behzad nouri d75303f541
patches bug in sigverify-shreds when identity is hot-swapped (#29802)
Sigverify-shreds discards shreds from node's own leader slots:
https://github.com/solana-labs/solana/blob/6baab92ab/core/src/sigverify_shreds.rs#L153-L154

But if the identity is hot-swapped the pubkey would be wrong since it
is instantiated only once at startup:
https://github.com/solana-labs/solana/blob/6baab92ab/core/src/tvu.rs#L168
2023-01-21 20:07:41 +00:00
Wen b36791956e
Ingest duplicate proofs sent through Gossip (#29227)
* First draft of ingesting duplicate proofs in Gossip into blockstore.

* Add more unittests.

* Add more unittests for bad cases.

* Fix lint errors for tests.

* More linter fixes for tests.

* Lint fixes

* Rename get_entries, move location of comment.

* Some renaming changes and comment fixes.

* Fix compile warning, this enum is not used.

* Fix lint errors.

* Slow down cleanup because this could potentially be expensive.

* Forgot to reset cleanup count.

* Add protection against attackers when constructing chunk map when
we ingest Gossip proofs.

* Use duplicate shred index instead of get_entries.

* Rename ClusterInfoDuplicateShredListener and fix a few small problems.

* Use into_shreds to piece together the proof.

* Remove redundant code.

* Address a few small errors.

* Discard slots too advanced in the future.

* - Use oldest proof for each pubkey
- Limit number of pubkeys in each slot to 100

* Disable duplicate shred handling for now.

* Revert "Disable duplicate shred handling for now."

This reverts commit c3fcf403876cfbf90afe4d2265a826f21a5e24ab.
2023-01-19 13:00:56 -08:00
Ashwin Sekar f2ba16ee87
Plumb dumps from replay_stage to repair (#29058)
* Plumb dumps from replay_stage to repair

When dumping a slot from replay_stage as a result of duplicate or
ancestor hashes, properly update repair subtrees to keep weighting and
forks view accurate.

* add test

* pr comments
2022-12-25 09:58:30 -07:00
Jeff Biseda a44ea779bd
add support for a repair protocol whitelist (#29161) 2022-12-15 19:24:23 -08:00
Maximilian Schneider c8b0c3ede9
Update cost model to use requested_cu instead of estimated cu #27608 (#28281)
* Update cost model to use requested_cu instead of estimated cu #27608

* remove CostUpdate and CostModel from replay/tvu

* revive cost update service to send cost tracker stats

* CostModel is now static

* remove unused package

Co-authored-by: Tao Zhu <tao@solana.com>
2022-11-22 11:55:56 -06:00
Tyera c32377b5af
Split out quic- and udp-client definitions (#28762)
* Move ConnectionCache back to solana-client, and duplicate ThinClient, TpuClient there

* Dedupe thin_client modules

* Dedupe tpu_client modules

* Move TpuClient to TpuConnectionCache

* Move ThinClient to TpuConnectionCache

* Move TpuConnection and quic/udp trait implementations back to solana-client

* Remove enum_dispatch from solana-tpu-client

* Move udp-client to its own crate

* Move quic-client to its own crate
2022-11-18 12:21:45 -07:00
steviez 2272fd807e
Remove Blockstore manual compaction code (#28409)
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2022-10-28 10:39:00 +02:00
Tao Zhu 8bb039d08d
collect min prioritization fees when replaying sanitized transactions (#26709)
* Collect blocks' minimum prioritization fees when replaying sanitized transactions

* Limits block min-fee metrics reporting to top 10 writable accounts

* Add service thread to asynchronously update and finalize prioritization fee cache

* Add bench test for prioritization_fee_cache

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2022-08-31 08:00:55 -05:00
Brennan Watt 46a48760db
Switch concurrent replay from feature to param (#27401)
* Switch concurrent replay from feature to param
2022-08-26 12:36:02 -07:00
Tyera Eulberg b8b3d723da
Use new client crates (#27360)
* Update ancillary cli crates

* Update cli

* Update command-line tools

* Update rpc, etc

* Update client-test

* Update core, validator

* Update local-cluster
2022-08-24 10:47:02 -06:00
Jeff Biseda e50013acdf
Handle JsonRpcService startup failure (#27075) 2022-08-11 23:25:20 -07:00
Jeff Biseda 857be1e237
sign repair requests (#26833) 2022-07-31 15:48:51 -07:00
Tao Zhu f13b5c832d
Remove obsoleted metrics reporting to reduce lock contention on cost_model (#26608)
remove obsoleted metrics reporting to reduce lock contention on cost_model
2022-07-14 23:02:49 -05:00
Nicholas Clarke ee0a40937e
Add validator argument log_messages_bytes_limit to change log truncation limit.
Add new cli argument log_messages_bytes_limit to solana-validator to control how long program logs can be before truncation
2022-07-11 10:53:18 -05:00
behzad nouri 6f4838719b
decouples shreds sig-verify from tpu vote and transaction packets (#26300)
Shreds have different workload and traffic pattern from TPU vote and
transaction packets. Some of recent changes to SigVerifyStage are not
suitable or at least optimal for shreds sig-verify; e.g. random discard,
dedup with false positives, discard excess by IP-address, ...

SigVerifier trait is meant to abstract out the distinctions between the
two pipelines, but in practice it has led to more verbose and convoluted
code.

This commit discards SigVerifier implementation for shreds sig-verify
and instead provides a standalone stage for verifying shreds signatures.
2022-07-07 11:13:13 +00:00
behzad nouri d33c548660
bypasses window-service stage before retransmitting shreds (#26291)
With recent patches, window-service recv-window does not do much other
than redirecting packets/shreds to downstream channels.
The commit removes window-service recv-window and instead sends
packets/shreds directly from sigverify to retransmit-stage and
window-service insert thread.
2022-07-06 11:49:58 +00:00
Nick Rempel 7e4a5de99c
Refactor ConnectionCache::use_quic (#26235)
* Remove UseQuic type

Move to storing the UdpSocket on ConnectionCache and accepting a bool

* Remove use_quic from ConnectionCache constructor

Replace with separate with_udp constructor to force callers to choose
2022-07-05 10:49:42 -07:00
behzad nouri 61f0a7d9c3
replaces Mutex<PohRecorder> with RwLock<PohRecorder> (#26370)
Mutex causes superfluous lock contention when a read-only reference suffices.
2022-07-05 14:29:44 +00:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
behzad nouri 39ca788b95
discards shreds in sigverify if the slot leader is the node itself (#26229)
Shreds are dropped in window-service if the slot leader is the node
itself:
https://github.com/solana-labs/solana/blob/cd2878acf/core/src/window_service.rs#L181-L185

However this is done after wasting resources verifying signature on
these shreds, and requires a redundant 2nd lookup of the slot leader.

This commit instead discards such shreds in sigverify stage where we
already know the leader for the slot.
2022-06-27 20:12:23 +00:00
Ryo Onodera cd2878acf9
Avoid to miss to root for local slots before the hard fork (#19912)
* Make sure to root local slots even with hard fork

* Address review comments

* Cleanup a bit

* Further clean up

* Further clean up a bit

* Add comment

* Tweak hard fork reconciliation code placement
2022-06-26 15:14:17 +09:00
HaoranYi b5d0c7b468
Revert "tvu and tpu timeout on joining its microservices (#24111)" (#26132)
This reverts commit e105547c14.
2022-06-22 10:57:46 -05:00
behzad nouri 1f0f5dc03e verifies shred-version in fetch stage
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
2022-06-22 12:17:37 +00:00
Jon Cinque 79a8ecd0ac
client: Remove static connection cache, plumb it instead (#25667)
* client: Remove static connection cache, plumb it instead

* Add TpuClient::new_with_connection_cache to not break downstream

* Refactor get_connection and RwLock into ConnectionCache

* Fix merge conflicts from new async TpuClient

* Remove `ConnectionCache::set_use_quic`

* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
2022-06-08 13:57:12 +02:00
Yueh-Hsuan Chiang 5b67960c76
(Refactor) Move blocktore options related stuff to blockstore_options.rs (#25509)
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.

#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.

By doing this, we address the mutual dependency and also make the code cleaner.
2022-05-26 16:59:26 -07:00
carllin 9651cdad99
Refactor Sigverify trait (#25359) 2022-05-24 16:01:41 -05:00
Pankaj Garg 88c16c0176
Check if quic is enabled before warming up quic connections (#24821)
* Check if quic is enabled before warming up quic connections

* fix after rebase

* don't start warmup service if quic not enabled

* fix test
2022-05-01 03:52:38 +00:00
sakridge 5a430c15e2
Separate sigverify metrics for each verifier (#24744) 2022-04-28 01:16:17 -07:00
Michael Vines 84e3342612 Process blockstore after starting the TVU 2022-04-22 21:17:49 -07:00
Justin Starry c544742091
Local cluster test cleanup and refactoring (#24559)
* remove FixedSchedule.start_epoch

* use duration for timing

* Rename to partition bool to turbine_disabled

* simplify partition config
2022-04-22 12:14:07 +08:00
Michael Vines 0e2e0c8b7d Extract most storage-related services from the Tvu abstraction 2022-04-19 19:23:58 -07:00
Michael Vines dd766042df Remove LedgerMetricReportService from TVU 2022-04-19 19:23:58 -07:00
sakridge 1b7d1f78de
Implement QUIC connection warmup service for future leaders (#24054)
* Increase connection timeouts

* Bump quic connection cache to 1024

* Use constant for quic connection timeout and add warm cache service

* Fixes to QUIC warmup service

* fix check failure

* fixes after rebase

* fix timeout test

Co-authored-by: Pankaj Garg <pankaj@solana.com>
2022-04-15 12:09:24 -07:00
HaoranYi e105547c14
tvu and tpu timeout on joining its microservices (#24111)
* panic when test timeout

* nonblocking send when when droping banks

* debug log

* timeout for tvu

* unused varaible

* timeout for tpu

* Revert "debug log"

This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.

* add timeout const

* fix typo

* Revert "nonblocking send when when droping banks".
I will create another pull request for this.

This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-07 20:20:13 -05:00
Brooks Prumo c322842257
Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
carllin 4ea59d8cb4
Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
Jeff Washington (jwash) c24de17278
remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
Jeff Washington (jwash) db5d68f01f
HashCalc uses self.accounts_hash_cache_path (#23882) 2022-03-24 09:31:55 -05:00
Michael Vines d44f3d7216 Remove unhelpful log message 2022-03-15 17:07:13 -07:00
Tao Zhu 61cead9b9b Remove injection of exit signal into cost_update_service 2022-03-15 09:58:56 -05:00
Tyera Eulberg 102dd68a03
Rename AccountsDb plugins to Geyser plugins (#23604) 2022-03-14 19:18:46 -06:00
Carl Lin c878c9e2cb Revert "1. Persist to blockstore less frequently;"
This reverts commit 7aa1fb4e24.
2022-03-08 17:18:44 -08:00
Yueh-Hsuan Chiang b8b7163b66
(Ledger Store) Report RocksDB Column Family Metrics (#22503)
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
2022-03-05 16:13:03 -08:00
buffalu 70ebab2c82
Add rustfmt.toml and `cargo fmt` (#23238)
* fmt

* formatted

Co-authored-by: Lucas B <buffalu@jito.network>
2022-02-19 13:32:29 +08:00
Ashwin Sekar ab92578b02
Fix the flaky test test_restart_tower_rollback (#23129)
* Add flag to disable voting until a slot to avoid duplicate voting

* Fix the tower rollback test and remove it from flaky.
2022-02-15 13:19:34 -07:00
Tao Zhu 7aa1fb4e24 1. Persist to blockstore less frequently;
2. reduce alpha for EMA to 1 percent to have roughly 200 data points for estimatio
2022-02-08 16:18:23 -06:00
Tao Zhu e52e48076e
bench should update leader schedule cache (#22991) 2022-02-08 02:28:28 +00:00