Commit Graph

593 Commits

Author SHA1 Message Date
steviez 479b7ee9f2
Bubble up errors in bank_fork_utils instead of exiting process (#34277)
There are operations in bank_fork_utils that may fail; we explicitly
call std::process::exit() on several of these. Granted we may end up
exiting the process higher up the callstack, bubbling the errors up
allow a caller that could handle the error to do so.
2023-11-30 16:35:59 -06:00
Brooks e02f25d5a2
Removes filler accounts (#34115) 2023-11-19 20:36:57 -05:00
Andrew Fitzgerald 81a007b3c8
TransactionScheduler: CLI and hookup for central-scheduler (#33890) 2023-11-13 22:18:54 +08:00
steviez b91da2242d
Change Blockstore max_root from RwLock<Slot> to AtomicU64 (#33998)
The Blockstore currently maintains a RwLock<Slot> of the maximum root
it has seen inserted. The value is initialized during
Blockstore::open() and updated during calls to Blockstore::set_roots().
The max root is queried fairly often for several use cases, and caching
the value is cheaper than constructing an iterator to look it up every
time.

However, the access patterns of these RwLock match that of an atomic.
That is, there is no critical section of code that is run while the
lock is head. Rather, read/write locks are acquired in order to read/
update, respectively. So, change the RwLock<u64> to an AtomicU64.
2023-11-10 17:27:43 -06:00
Ashwin Sekar b5256997f8
refactor: GossipDuplicateConfirmed/cluster_confirmed -> DuplicateConf… (#34012)
refactor: GossipDuplicateConfirmed/cluster_confirmed -> DuplicateConfirmed
2023-11-10 14:47:42 -05:00
steviez 73815aee51
Move and rename ledger services from core to ledger (#33947)
These services currently live in core/; however, they operate on the
ledger. Mores so, these two services operate on the blockstore only,
and not necessarily the entire ledger. So, it makes sense to move these
services out of core and into ledger. We've recently been doing similar
changes with breaking things out into individual crates in order to
reduce the scope of core.

So, this change moves the services from core/ to ledger/, and replaces
ledger with blockstore.
2023-11-08 11:58:31 -06:00
Lijun Wang eba1b2d3e3
Remove RwLock on TransactionNotifier (#33962)
* Remove RwLock on TransactionNotifier
2023-11-07 10:28:56 -08:00
Liam Vovk e840b9759a
Remove RWLock from EntryNotifier because it causes perf degradation (#33797)
* Remove RWLock from EntryNotifier because it causes perf degradation when entry notifications are enabled on geyser

* remove unused RWLock

* Remove RWLock
2023-11-06 00:55:36 -08:00
Ryo Onodera 080285cb95
Adjust solana-core for cleaner scheduler-pr diff (#33881) 2023-10-27 12:29:41 +09:00
steviez 9ffbe2afd8
Replace several .expect() statements with error handling (#33783) 2023-10-24 23:48:21 +02:00
Pankaj Garg 9d42cd7efe
Initialize fork graph in program cache during bank_forks creation (#33810)
* Initialize fork graph in program cache during bank_forks creation

* rename BankForks::new to BankForks::new_rw_arc

* fix compilation

* no need to set fork_graph on insert()

* fix partition tests
2023-10-23 09:32:41 -07:00
behzad nouri e0b59a6f53
prunes turbine QUIC connections (#33663)
The commit implements lazy eviction for turbine QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
2023-10-20 21:52:37 +00:00
behzad nouri dc3c827299
prunes repair QUIC connections (#33775)
The commit implements lazy eviction for repair QUIC connections.
The cache is allowed to grow to 2 x capacity at which point at least
half of the entries with lowest stake are evicted, resulting in an
amortized O(1) performance.
2023-10-20 17:50:54 +00:00
steviez 8bd0e4cd95
Change getHealth to compare optimistically confirmed slots (#33651)
The current getHealth mechanism checks a local accounts hash slot vs.
those of other nodes as specified by --known-validator. This is a
very coarse comparison given that the default for this value is 100
slots. More so, any nodes using a value larger than the default
(ie --incremental-snapshot-interval 500) will likely see getHealth
return status behind at some point.

Change the underlying mechanism of how health is computed. Instead of
using the accounts hash slots published in gossip, use the latest
optimistically confirmed slot from the cluster. Even when a node is
behind, it is able to observe cluster optimistically confirmed by slots
by viewing votes published in gossip.

Thus, the latest cluster optimistically confirmed slot can be compared
against the latest optimistically confirmed bank from replay to
determine health. This new comparison is much more granular, and not
needing to depend on individual known validators is also a plus.
2023-10-16 11:21:33 -05:00
Brooks 452fd5d384
Adds `--no-skip-initial-accounts-db-clean` *hidden* CLI flag (#33664) 2023-10-12 13:32:40 -04:00
Jeff Biseda 0f82662a7f
allow empty string for SOLANA_METRICS_CONFIG sanity checking (#33515) 2023-10-11 09:58:39 -07:00
Wen 630feeddf2
Add wen_restart module (#33344)
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.

* Update lock file.

* Fix linter errors.

* Fix depencies order.

* Update wen_restart explanation and small fixes.

* Generate tower outside tvu.

* Update validator/src/cli.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/protos/wen_restart.proto

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/build.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Rename proto directory.

* Rename InitRecord to MyLastVotedForkSlots, add imports.

* Update wen-restart/Cargo.toml

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Move prost-build dependency to project toml.

* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.

* Use 16k slots instead of 81k slots, a few more wording changes.

* Use AncestorIterator which does the same thing.

* Update Cargo.lock

* Update Cargo.lock

---------

Co-authored-by: Tyera <teulberg@gmail.com>
2023-10-06 15:04:37 -07:00
Ryo Onodera eb262aabe3
Enable the banking trace by default (#33497) 2023-10-04 09:01:28 +09:00
Andrew Fitzgerald e860019687
TransactionScheduler: Pipe BlockProductionMethod (#33217) 2023-09-18 10:05:27 -07:00
Brooks acd7ad96c3
Purges old accounts hash cache dirs (#33183) 2023-09-12 13:10:22 -04:00
behzad nouri e01269a9de
sends repair requests over QUIC protocol (#33016)
The commit implements client-side of serve-repair and
ancestor-hash-service over QUIC protocol.
2023-09-11 22:22:04 +00:00
behzad nouri 7fc6fea8d8
serves remote repair requests from QUIC endpoint (#33069)
The commit implements server-side of repair using QUIC protocol.

UDP repair requests are adapted as RemoteRequest and sent down the same
channel as remote requests arriving over QUIC, and the rest of the
server code is update to process over RemoteRequest type.
2023-09-11 16:57:10 +00:00
HaoranYi 2098230d8f
Improve Blockstore error logging (#32929)
* improve Blockstore error logging

* reviews

---------

Co-authored-by: HaoranYi <haoran.yi@solana.com>
2023-08-22 13:56:03 -05:00
steviez a4c8cc3ce0
Remove improper uses of &Arc<Bank> (#32802)
In most cases, either a &Bank or an Arc<Bank> is more proper.
- &Bank is used if the function only needs a momentary reference
- Arc<Bank> is used if the function needs its' own copy

This PR leaves several instances of &Arc<Bank> around; these instances
are situations where a clone may only happen conditionally.
2023-08-18 16:46:34 -05:00
Jeff Biseda 58cca78067
sanity check metrics configuration (#32799) 2023-08-11 14:38:33 -07:00
Pankaj Garg f4287d70bb
Move accounts-db code to its own crate (#32766) 2023-08-09 13:03:36 -07:00
Tao Zhu ef6af307a4
improve prioritization fee cache accuracy (#32692)
* improve prioritization cache accuracy
2023-08-07 19:27:28 -05:00
steviez 226d7d986b
Simplify root slot lookup from BankForks (#32717)
No need to get an Arc<Bank> when we want the root slot from BankForks;
can just use BankForks::root().
2023-08-04 12:03:22 -06:00
steviez 20fc3a5ded
Remove improper &Arc<Blockstore> instances (#32698)
Update to either &Blockstore if the function just needs a ref, or
Arc<Blockstore> if the function needs to hang onto a copy.
2023-08-03 15:10:25 -06:00
behzad nouri 69336ab5da
resets packet flags obtained from QUIC datagrams (#32673)
Packets obtained from recycler have dirty meta information and need to
re-initialized.
2023-08-01 21:50:25 +00:00
steviez e337631f32
Cleanup backup_and_clear_blockstore() (#32461)
Some of the cleanup tasks include ...
- Make subfunctions return a Result and allow error handling above
- Add some clarifying comments
- Give backup directory name a more meaningful name
- Add some additional logs (with timing info) for long running parts
2023-07-28 06:43:04 -05:00
Pankaj Garg aba637d5d9
Split snapshot_utils.rs into snapshot_bank_utils.rs (#32612) 2023-07-24 16:31:03 -07:00
Brooks 36b37221f2
Removes old accounts hash cache dir (#32604) 2023-07-24 17:34:56 -04:00
cavemanloverboy ba7d892ebb
sdk: impl `Signer` for all containers (#32181)
* impl signer for all containers

* trivial fixes

---------

Co-authored-by: hanako mumei <81144685+2501babe@users.noreply.github.com>
2023-07-24 21:54:33 +02:00
steviez 4bdd73a234
Minor cleanup in Validator::new() (#32480)
- Use .map_err() instead of match and return
- Adjust log severity and add context to generic "done" logs
2023-07-13 16:44:36 -05:00
behzad nouri a3ada9c5ea
separates out turbine QUIC from TPU implementation (#32368)
* separates out turbine QUIC from TPU implementation

Turbine being tied to QUIC implementation for TPU hinders development
and makes it hard to optimize QUIC specifically for turbine.
The commit separates out turbine QUIC from TPU implementation.

* Update core/src/validator.rs

Co-authored-by: Jon Cinque <me@jonc.dev>

* Update turbine/src/retransmit_stage.rs

Co-authored-by: Jon Cinque <me@jonc.dev>

---------

Co-authored-by: Jon Cinque <me@jonc.dev>
2023-07-12 14:15:28 +00:00
Jeff Biseda bad5197cb0
refactor core to create repair module (#32303) 2023-07-05 12:20:46 -07:00
steviez d5ad29d837
Make Blockstore::scan_and_fix_roots() take optional start/stop slots (#32289)
The optional args allow reuse by ledger-tool repair roots command Also,
hold cleanup lock for duration of Blockstore::scan_and_fix_roots().

This prevents a scenario where scan_and_fix_roots() could identify a
slot as needing to be marked root, that slot getting cleaned by
LedgerCleanupService, and then scan_and_fix_roots() marking the slot as
root on the now purged slot.
2023-06-28 22:32:03 -05:00
Jeff Biseda 87c1b67d53
refactor core to create consensus module (#32282) 2023-06-27 17:25:08 -07:00
steviez 77b587aa4d
Add constant for disabled snapshot interval (#32236)
Slot::MAX was used to specify that a type of snapshots should not be
created; define a constant to be that value and reference the constant
to have a single point of edit.
2023-06-26 12:26:56 -05:00
Brooks 5f1b5b877a
Replace boot_from_local_state with use_snapshot_archives_at_startup (#32260) 2023-06-26 12:44:25 -04:00
behzad nouri f6e039b0b3
moves turbine to a separate crate out of solana/core (#32226) 2023-06-22 16:22:11 +00:00
steviez 20a7cdd43d
Restrict access to Bank's HardForks (#32180)
* Restrict access to Bank's HardForks

Callers could previously obtain a a lock to read/write HardForks from
any Bank. This would allow any caller to modify, and creates the
opportunity for inconsistent handling of what is considered a valid hard
fork (ie too old).

This PR adds a function to Bank so consistent sanity checks can be
applied; the caller will already have a Bank as that is where they would
have obtained the HardForks from in the first place. Additionally,
change the getter to return a copy of HardForks (simple Vec).

* Allow hard fork at bank slot if bank is not yet frozen
2023-06-20 23:44:43 -05:00
behzad nouri 469661d217
removes outdated tvu_forward socket (#32101)
Shreds are no longer sent to tvu_forward socket.
2023-06-20 20:50:16 +00:00
Brooks 47ff3cecc9
Enables creating snapshots after booting from local state (#32137) 2023-06-15 22:54:32 -04:00
behzad nouri ec0001ef85
adds code-path broadcasting shreds using QUIC (#31610)
adds quic connection cache to turbine

Working towards migrating turbine to QUIC.
2023-06-12 22:58:27 +00:00
behzad nouri aed4ecb633
adds quic receiver to shred-fetch-stage (#31576)
Working towards migrating turbine to QUIC.
2023-06-12 13:16:27 +00:00
Illia Bobyr 4353ac6797
Pass Arc<AtomicBool> by value, not by reference. (#31916)
`Arc` is already a reference internally, so it does not seem to be
beneficial to pass a reference to it.  Just adds an extra layer of
indirection.

Functions that need to be able to increment `Arc` reference count need
to take `Arc<AtomicBool>`, but those that just want to read the
`AtomicBool` value can accept `&AtomicBool`, making them a bit more
generic.

This change focuses specifically on `Arc<AtomicBool>`.  There are other
uses of `&Arc<T>` in the code base that could be converted in a similar
manner.  But it would make the change even larger.
2023-06-01 17:25:48 -07:00
Illia Bobyr e0389ba90f
GeyserPluginService: Use common `exit` flag. (#31915)
Geyser plugin thread would never shutdown correctly, as it is using an
exit flag that is never set.
2023-06-01 11:20:59 -07:00
steviez debe794987
Replace improper &Arc<...> with Arc<...> in Bank and Accounts (#31892)
The callstack updated in this PR passed an &Arc<...> down only to have
the bottom level clone the reference. Thus, we are giving shared
ownership so the reference is a bit redundant and arguably obscures the
intention to clone further down the callstack.
2023-05-31 12:36:44 -05:00