Commit Graph

3329 Commits

Author SHA1 Message Date
Lucas Steuernagel b97b3dd4ab
Use BankForks on tests - Part 3 (#34248)
* Add BankForks to core tests

* Refactor functions under DCOU
2023-12-01 13:47:22 -03:00
Andrew Fitzgerald 2294801954
Do not derive Copy for EpochSchedule and Rent (#32767) 2023-12-01 07:57:25 -08:00
Andrew Fitzgerald 18309ba8da
TransactionScheduler: Schedule Filter (#34252) 2023-11-30 14:41:11 -08:00
steviez 479b7ee9f2
Bubble up errors in bank_fork_utils instead of exiting process (#34277)
There are operations in bank_fork_utils that may fail; we explicitly
call std::process::exit() on several of these. Granted we may end up
exiting the process higher up the callstack, bubbling the errors up
allow a caller that could handle the error to do so.
2023-11-30 16:35:59 -06:00
steviez 935e06f8f1
Output BankHashDetails file when leader drops its' own block (#34256)
Currently, the file is generated when a node drops a block that was
produced by another node. However, it would also be beneficial to see
the account state when a node drops its' own block.

Output the file in this additional failure codepath
2023-11-29 17:20:27 -06:00
Andrew Fitzgerald e949ef9daa
Trailing _us for scheduler time metrics (#34263) 2023-11-29 13:18:56 -08:00
Tyera e425c1acaa
Fix mac build (#34264)
Allow dead code instead
2023-11-29 04:07:13 +00:00
Tyera 2bde5c3cb2
Tag InterestingLimit enum with target_os (#34259)
Only linux has interesting limits
2023-11-29 00:52:32 +00:00
Andrew Fitzgerald df8893772e
TransactionScheduler: Clean already processed or old transactions from container (#34233) 2023-11-28 16:25:12 -08:00
Andrew Fitzgerald 656ec4bdf0
Bump prio-graph to 0.2.0 (#34235) 2023-11-28 08:23:06 -08:00
Andrew Fitzgerald 449d375565
Add metric for number of unschedulable transactions (#34230) 2023-11-28 08:20:53 -08:00
Andrew Fitzgerald 005c825b5c
Validate account locks on buffering (#34229) 2023-11-27 11:21:10 -08:00
Ikko Eltociear Ashimine c6451e9441
Fix typo in multi_iterator_scanner.rs (#34215)
targetting -> targeting
2023-11-24 23:12:57 -06:00
Ashwin Sekar 504f2ee892
Add deepest slot metric (#34186)
* reset to deepest slot when last vote is for an invalid fork.

* pr feedback: comments, height starts at 1
2023-11-21 04:07:00 -05:00
Andrew Fitzgerald 8a298f1628
TransactionScheduler: detailed consume worker metrics (#33895) 2023-11-20 10:46:04 -08:00
Brooks e02f25d5a2
Removes filler accounts (#34115) 2023-11-19 20:36:57 -05:00
Jeff Washington (jwash) 7dd8d4bb64
reverse logic on linux_report_network_limits (#34159) 2023-11-17 18:12:24 -06:00
Trent Nelson 2d3333cb46
system monitory: don't suggest query-only limits (#34149) 2023-11-17 13:18:37 -07:00
Andrew Fitzgerald 9bb82a3901
TransactionScheduler: Scheduler Count and Timing metrics (#33893) 2023-11-17 10:18:58 -08:00
Ashwin Sekar fb76b4cb6c
reduce locking in propagated check for VoteStateUpdate (#33997) 2023-11-15 01:24:30 -05:00
Tyera 0e91e96967
Geyser: add starting entry to ReplicaEntryInfo(V2) (#33963)
* Add ReplicaEntryInfoV2

* Add starting_transaction_index field to EntryNotification

* Populate starting_transaction_index in replay stage

* Cache and populate starting_transaction_index in banking stage

* Build ReplicaEntryInfoV2
2023-11-14 09:49:26 -07:00
Brooks 725ab37bf4
clippy: Replaces .get(0) with .first() (#34048) 2023-11-13 17:22:17 -05:00
Andrew Fitzgerald 81a007b3c8
TransactionScheduler: CLI and hookup for central-scheduler (#33890) 2023-11-13 22:18:54 +08:00
steviez b91da2242d
Change Blockstore max_root from RwLock<Slot> to AtomicU64 (#33998)
The Blockstore currently maintains a RwLock<Slot> of the maximum root
it has seen inserted. The value is initialized during
Blockstore::open() and updated during calls to Blockstore::set_roots().
The max root is queried fairly often for several use cases, and caching
the value is cheaper than constructing an iterator to look it up every
time.

However, the access patterns of these RwLock match that of an atomic.
That is, there is no critical section of code that is run while the
lock is head. Rather, read/write locks are acquired in order to read/
update, respectively. So, change the RwLock<u64> to an AtomicU64.
2023-11-10 17:27:43 -06:00
Ashwin Sekar b5256997f8
refactor: GossipDuplicateConfirmed/cluster_confirmed -> DuplicateConf… (#34012)
refactor: GossipDuplicateConfirmed/cluster_confirmed -> DuplicateConfirmed
2023-11-10 14:47:42 -05:00
behzad nouri 3ac2507d36
adds keep-alive-interval to repair QUIC transport config (#33866)
QUIC connections may timeout due to infrequent repair requests. The commit
configures keep_alive_interval and max_idle_timeout to avoid timeouts.
2023-11-08 20:09:23 +00:00
Lijun Wang 69cec7e7b7
Remove RwLock on BlockNotifier (#33981) 2023-11-08 10:27:50 -08:00
steviez 73815aee51
Move and rename ledger services from core to ledger (#33947)
These services currently live in core/; however, they operate on the
ledger. Mores so, these two services operate on the blockstore only,
and not necessarily the entire ledger. So, it makes sense to move these
services out of core and into ledger. We've recently been doing similar
changes with breaking things out into individual crates in order to
reduce the scope of core.

So, this change moves the services from core/ to ledger/, and replaces
ledger with blockstore.
2023-11-08 11:58:31 -06:00
Lijun Wang eba1b2d3e3
Remove RwLock on TransactionNotifier (#33962)
* Remove RwLock on TransactionNotifier
2023-11-07 10:28:56 -08:00
Tyera d6ac9bea84
Geyser: return real parent blockhash, or default (#33873)
Return real parent blockhash, or default
2023-11-06 11:14:18 -07:00
Liam Vovk e840b9759a
Remove RWLock from EntryNotifier because it causes perf degradation (#33797)
* Remove RWLock from EntryNotifier because it causes perf degradation when entry notifications are enabled on geyser

* remove unused RWLock

* Remove RWLock
2023-11-06 00:55:36 -08:00
Jeff Biseda 63abc72e86
remove unused replay-loop-voting-stats values (#33935) 2023-10-31 23:40:45 -07:00
Jeff Biseda 3f805ad06d
improve batch_send error handling (#33936) 2023-10-31 23:39:26 -07:00
Ryo Onodera 136ab21f34
Define InstalledScheduler::wait_for_termination() (#33922)
* Define InstalledScheduler::wait_for_termination()

* Rename to wait_for_scheduler_termination

* Comment wait_for_termination and WaitReason better
2023-10-31 14:33:36 +09:00
Ryo Onodera 080285cb95
Adjust solana-core for cleaner scheduler-pr diff (#33881) 2023-10-27 12:29:41 +09:00
Andrew Fitzgerald ba112a021a
TransactionScheduler: SchedulerController (#33825) 2023-10-27 09:30:51 +08:00
behzad nouri e555a61c78
adds metrics to repair QUIC endpoint (#33818) 2023-10-25 18:59:14 +00:00
Pankaj Garg 78c31aa6b8
Use program cache fork graph in extract() (#33806)
* Use program cache fork graph instead of WorkingSlot trait

* Fix deadlocked tests

* keep WorkingSlot trait for now
2023-10-25 06:04:38 -07:00
steviez 9ffbe2afd8
Replace several .expect() statements with error handling (#33783) 2023-10-24 23:48:21 +02:00
Andrew Fitzgerald b0dcaf29e3
TransactionScheduler: Consume Scheduler w/ PrioGraph (#33612) 2023-10-24 11:33:04 +08:00
Jeff Washington (jwash) b0b4e1f0c0
remove IncludeSlotInHash after feature activation on mnb (#33816)
* remove IncludeSlotInHash after feature activation on mnb

* fix compile errors

* compile errors

* fix tests

* fix test results
2023-10-23 15:12:02 -07:00
Pankaj Garg 9d42cd7efe
Initialize fork graph in program cache during bank_forks creation (#33810)
* Initialize fork graph in program cache during bank_forks creation

* rename BankForks::new to BankForks::new_rw_arc

* fix compilation

* no need to set fork_graph on insert()

* fix partition tests
2023-10-23 09:32:41 -07:00
steviez 56ccffdaa5
Replace get_tmp_ledger_path!() with self cleaning version (#33702)
This macro is used a lot for tests to create a ledger path in order to
open a Blockstore. Files will be left on disk unless the test remembers
to call Blockstore::destroy() on the directory. So, instead of requiring
this, use the get_tmp_ledger_path_auto_delete!() macro that creates a
TempDir (which automatically deletes itself when it goes out of scope).
2023-10-21 11:38:31 +02:00
Ryo Onodera 5a963529a8
Add BankWithScheduler for upcoming scheduler code (#33704)
* Add BankWithScheduler for upcoming scheduler code

* Remove too confusing insert_without_scheduler()

* Add doc comment as a bonus

* Simplify BankForks::banks()

* Add derive(Debug) on BankWithScheduler
2023-10-21 15:56:43 +09:00
behzad nouri e0b59a6f53
prunes turbine QUIC connections (#33663)
The commit implements lazy eviction for turbine QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
2023-10-20 21:52:37 +00:00
behzad nouri dc3c827299
prunes repair QUIC connections (#33775)
The commit implements lazy eviction for repair QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
2023-10-20 17:50:54 +00:00
Pankaj Garg 59cb3b57ee
Set a global fork graph in program cache (#33776)
* Set a global fork graph in program cache

* fix deadlock

* review feedback
2023-10-20 08:47:03 -07:00
behzad nouri 7aa0faea96
separates out routing repair requests from establishing connections (#33742)
Currently each outgoing repair request will attempt to establish a
connection if one does not already exist. This is very wasteful and
consumes many tokio tasks if the remote node is down or unresponsive.

The commit decouples routing packets from establishing connections by
adding a buffering channel for each remote address. Outgoing packets are
always sent down this channel to be processed once the connection is
established. If connecting attempt fails, all packets already pushed to
the channel are dropped at once, reducing the number of attempts to make
a connection if the remote node is down or unresponsive.
2023-10-19 13:25:53 +00:00
steviez 8bd0e4cd95
Change getHealth to compare optimistically confirmed slots (#33651)
The current getHealth mechanism checks a local accounts hash slot vs.
those of other nodes as specified by --known-validator. This is a
very coarse comparison given that the default for this value is 100
slots. More so, any nodes using a value larger than the default
(ie --incremental-snapshot-interval 500) will likely see getHealth
return status behind at some point.

Change the underlying mechanism of how health is computed. Instead of
using the accounts hash slots published in gossip, use the latest
optimistically confirmed slot from the cluster. Even when a node is
behind, it is able to observe cluster optimistically confirmed by slots
by viewing votes published in gossip.

Thus, the latest cluster optimistically confirmed slot can be compared
against the latest optimistically confirmed bank from replay to
determine health. This new comparison is much more granular, and not
needing to depend on individual known validators is also a plus.
2023-10-16 11:21:33 -05:00
Brooks 452fd5d384
Adds `--no-skip-initial-accounts-db-clean` *hidden* CLI flag (#33664) 2023-10-12 13:32:40 -04:00