Commit Graph

494 Commits

Author SHA1 Message Date
mergify[bot] c01560e136
v2.0: scheduler opt-in forwarding (backport of #1801) (#2285)
* scheduler opt-in forwarding (#1801)

(cherry picked from commit 61d8be0d6f)

* Scheduler: buffer packets for forwarding if forwarding is enabled (#2305)

---------

Co-authored-by: Andrew Fitzgerald <apfitzge@gmail.com>
2024-09-12 10:43:21 -05:00
mergify[bot] 50f12b0040
v2.0: Use node's latest vote for commitment calc. too (backport of #1964) (#1994)
Use node's latest vote for commitment calc. too (#1964)

* Use node's latest vote for commitment calc. too

* Make local_cluster test use finalized

* Update core/src/commitment_service.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Don't wrap with Option and update tests

---------

Co-authored-by: Tyera Eulberg <tyera@anza.xyz>
Co-authored-by: Tyera <teulberg@gmail.com>
(cherry picked from commit 556298982a)

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2024-07-04 00:33:22 -06:00
Tao Zhu f8ae688668
Revert "v2.0: Refactor and additional metrics for cost tracking (backport of #1888) (#1900) (#1937)
Revert "v2.0: Refactor and additional metrics for cost tracking (backport of #1888) (#1900)"

This reverts commit 0aef62eac7.
2024-07-01 10:33:44 -06:00
mergify[bot] 0aef62eac7
v2.0: Refactor and additional metrics for cost tracking (backport of #1888) (#1900)
* Refactor and additional metrics for cost tracking (#1888)

* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;

* Move committed transactions adjustment into its own function

(cherry picked from commit c3fadacf69)

* rename cost_tracker.account_data_size to better describe its purpose is to tracker per-block new account allocation

---------

Co-authored-by: Tao Zhu <82401714+tao-stones@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
2024-06-28 16:09:36 -05:00
mergify[bot] edca6057eb
v2.0: chore: add dcou to apply_votes_to_tower (backport of #1831) (#1843)
chore: add dcou to apply_votes_to_tower (#1831)

* add dcou to apply_votes_to_tower

* cargo sort

* fix fmt

(cherry picked from commit 66bdefd178)

Co-authored-by: Yihau Chen <yihau.chen@icloud.com>
2024-06-24 15:34:53 -05:00
Andrew Fitzgerald b0737e0e59
change match to an if (#726) 2024-06-20 11:38:14 -05:00
Greg Cusack 7f2beb21a1
add retries to transaction sending in LocalCluster (#1747) 2024-06-18 00:20:17 +08:00
Greg Cusack 63fb9fe9d9
Remove `ThinClient` from `LocalCluster` (#1300)
* setup tpu client methods required for localcluster to use TpuClient

* add new_tpu_quic_client() for local cluster tests

* update local-cluster src files to use TpuClient. tests next

* finish removing thinclient from localcluster

* address comments

* add note for send_and_confirm_transaction_with_retries

* remove retry logic from tpu-client. Send directly to upcoming leaders without retry.
2024-06-13 10:31:10 -07:00
Yihau Chen ec9bd79849
clippy: fix legacy_numeric_constants (#1314)
clippy: legacy_numeric_constants
2024-05-15 11:29:19 +08:00
Lijun Wang f54c120450
Connection rate limiting (#948)
* use rate limit on connectings

use rate limit on connectings; missing file

* Change connection rate limit to 8/min instead of 4/s

* Addressed some feedback from Trent

* removed some comments

* fix test failures which are opening connections more frequently

* moved the flag up

* turn off rate limiting to debug CI

* Fix CI test failures

* differentiate of the two throttling cases in stats: across connections or per ip addr

* fmt issues

* Addressed some feedback from Trent

* Added unit tests

Cleanup connection cache rate limiter if exceeding certain threshold

missing files

CONNECITON_RATE_LIMITER_CLEANUP_THRESHOLD to 100_000

clippy issue

clippy issue

sort crates

* revert Cargo.lock changes

* Addressed some feedback from Pankaj
2024-05-14 17:33:43 -07:00
carllin d36954cb72
Unignore `test_local_cluster_signature_subscribe` and remove `test_spend_and_verify_all_nodes_env_num_nodes` (#1256) 2024-05-09 13:02:16 -04:00
Brooks 11383ae0a1
Do not purge old snapshot archives in bank_to_snapshot_archive() (#1226) 2024-05-09 15:45:33 +00:00
Brooks fbbae8a59a
clippy: clone_from() (#1177)
```
error: assigning the result of `Clone::clone()` may be inefficient
   --> bucket_map/src/bucket.rs:979:17
    |
979 |                 hashed = hashed_raw.clone();
    |                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `clone_from()`: `hashed.clone_from(&hashed_raw)`
    |
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#assigning_clones
    = note: `-D clippy::assigning-clones` implied by `-D warnings`
    = help: to override `-D warnings` add `#[allow(clippy::assigning_clones)]`
```
2024-05-03 15:21:10 +00:00
Greg Cusack 5cee9dd0d7
ignore flaky test: test_hard_fork_with_gap_in_roots (#1130)
ignore test_hard_fork_with_gap_in_roots test
2024-04-30 19:39:05 +00:00
Andrew Fitzgerald fb35f1912e
scheduler forward packets (#898) 2024-04-26 12:18:17 -05:00
behzad nouri 443bb6c1dc
migrates to the new contact-info (#823)
The commit replaces (most) uses of LegacyContactInfo with the new ContactInfo.
2024-04-24 18:47:04 +00:00
Justin Starry 1c1b4c3e28
Use poh grace ticks when new reset bank is pending (#794)
* Use poh grace ticks when new reset bank is pending

* feedback

* make it hidden
2024-04-18 17:32:29 +00:00
carllin 297a7aa40a
Fix vote refresh local cluster test (#830) 2024-04-17 17:14:38 -04:00
Ashwin Sekar 499d36e354
vote: update benches and tests to TowerSync (#725) 2024-04-11 22:15:02 -07:00
behzad nouri 293414f482
pads last erasure batch with empty data shreds (#639)
For duplicate blocks prevention we want to verify that the last erasure
batch was sufficiently propagated through turbine. This requires
additional bookkeeping because, depending on the erasure coding schema,
the entire batch might be recovered from only a few coding shreds.

In order to simplify above, this commit instead ensures that the last
erasure batch has >= 32 data shreds so that the batch cannot be
recovered unless 32+ shreds are received from turbine or repair.
2024-04-11 14:50:43 +00:00
Andrew Fitzgerald e91a5e2744
default staked client in LocalCluster (#716)
* default staked client in LocalCluster

* fix underflow
2024-04-10 15:33:07 -05:00
Ashwin Sekar 70c4cb0ba1
consensus: add dev-context-only-utils to tower (#687) 2024-04-09 16:39:57 -07:00
Andrew Fitzgerald 1744e9efd7
BankingStage Forwarding Filter (#685)
* add PacketFlags::FROM_STAKED_NODE

* Only forward packets from staked node

* fix local-cluster test forwarding

* review comment

* tpu_votes get marked as from_staked_node
2024-04-09 23:12:26 +00:00
carllin de8e9e6850
Add all validators as entrypoint to local cluster (#567) 2024-04-05 20:38:13 -04:00
steviez 79e316eb56
Reduce the default number of IP echo server threads (#354)
The IP echo server currently spins up a worker thread for every thread
on the machine. Observing some data for nodes,
- MNB validators and RPC nodes look to get several hundred of these
  requests per day
- MNB entrypoint nodes look to get 2-3 requests per second on average

In both instances, the current threadpool is severely overprovisioned
which is a waste of resources. This PR plumnbs a flag to control the
number of worker threads for this pool as well as setting a default of
two threads for this server. Two threads allow for one thread to always
listen on the TCP port while the other thread processes requests
2024-04-01 10:24:59 -05:00
Brooks 182d27f718
Checks if bank snapshot is loadable before fastbooting (#343) 2024-03-28 11:14:23 -04:00
steviez 10d06773cd
Share the threadpool for tx execution and entry verifification (#216)
Previously, entry verification had a dedicated threadpool used to verify
PoH hashes as well as some basic transaction verification via
Bank::verify_transaction(). It should also be noted that the entry
verification code provides logic to offload to a GPU if one is present.

Regardless of whether a GPU is present or not, some of the verification
must be done on a CPU. Moreso, the CPU verification of entries and
transaction execution are serial operations; entry verification finishes
first before moving onto transaction execution.

So, tx execution and entry verification are not competing for CPU cycles
at the same time and can use the same pool.

One exception to the above statement is that if someone is using the
feature to replay forks in parallel, then hypothetically, different
forks may end up competing for the same resources at the same time.
However, that is already true given that we had pools that were shared
between replay of multiple forks. So, this change doesn't really change
much for that case, but will reduce overhead in the single fork case
which is the vast majority of the time.
2024-03-27 16:33:21 -05:00
carllin b01d7923fc
Add local cluster utitlity functions (#355) 2024-03-26 00:34:15 -04:00
Greg Cusack 792d7454d9
switch to `solana-tpu-client` from `solana_client::tpu_client` for `bench-tps`, `dos/`, `LocalCluster`, `gossip/` (#310)
* switch over to solana-tpu-client for bench-tps, dos, gossip, local-cluster

* put TpuClientWrapper back in solana_client
2024-03-21 09:25:54 -07:00
steviez 4a67cd495b
Allow configuration of replay thread pools from CLI (#236)
Bubble up the constants to the CLI that control the sizes of the
following two thread pools:
- The thread pool used to replay multiple forks in parallel
- The thread pool used to execute transactions in parallel
2024-03-20 15:07:04 -05:00
Greg Cusack ed573ff60c
add in method for building a `TpuClient` for `LocalCluster` tests (#258)
* add in method for building a TpuClient for LocalCluster tests

* add cluster trait. leave dependency on solana_client::tpu_client
2024-03-18 17:58:11 -07:00
听寒 87a0071e9a
fix typo (#264)
fix typo in comment of test_optimistic_confirmation_violation_without_tower
2024-03-15 16:20:30 -05:00
Yihau Chen 3f9a7a52ea [anza migration] rename crates (#10)
* rename geyser-plugin-interface

* rename cargo registry

* rename watchtower

* rename ledger tool

* rename validator

* rename install

* rename geyser plugin interface when patch
2024-03-03 12:31:24 +08:00
Ashwin Sekar e8c87e86ef
local-cluster: fix flaky optimistic_confirmation tests (#35356)
* local-cluster: fix flaky optimistic_confirmation tests

* pr feedback: latest_vote -> newest_vote, reword some comments
2024-02-29 12:05:20 -08:00
Ryo Onodera 024d6ecc4f
Add --unified-scheduler-handler-threads (#35195)
* Add --unified-scheduler-handler-threads

* Adjust value name

* Warn if the flag was ignored

* Tweak message a bit
2024-02-22 09:05:17 +09:00
behzad nouri 7a95e4fa90
uses Merkle shreds in broadcast duplicates (#35115)
The  commit migrates away from legacy shreds in duplicate shreds tests.
2024-02-07 16:02:16 +00:00
Brooks daa2449ad4
Removes RwLock on AccountsDb::shrink_paths (#35027) 2024-02-01 09:35:34 -05:00
behzad nouri 79bbe4381a
adds chained_merkle_root to shredder arguments (#34952)
Working towards chaining Merkle root of erasure batches, the commit adds
chained_merkle_root to shredder arguments.
2024-01-27 15:04:31 +00:00
Brooks 02062a6b6a
Removes unused AccountsHashFaultInjector (#34977) 2024-01-26 19:21:23 -05:00
Ashwin Sekar 93271d91b0
gossip: notify state machine of duplicate proofs (#32963)
* gossip: notify state machine of duplicate proofs

* Add feature flag for ingesting duplicate proofs from Gossip.

* Use the Epoch the shred is in instead of the root bank epoch.

* Fix unittest by activating the feature.

* Add a test for feature disabled case.

* EpochSchedule is now not copyable, clone it explicitly.

* pr feedback: read epoch schedule on startup, add guard for ff recache

* pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff

* pr feedback: bank.forks_try_read() -> read()

* pr feedback: fix local-cluster setup

* local-cluster: do not expose gossip internals, use retry mechanism instead

* local-cluster: split out case 4b into separate test and ignore

* pr feedback: avoid taking lock if ff is already found

* pr feedback: do not cache ff epoch

* pr feedback: bank_forks lock, revert to cached_slots_in_epoch

* pr feedback: move local variable into helper function

* pr feedback: use let else, remove epoch 0 hack

---------

Co-authored-by: Wen <crocoxu@gmail.com>
2024-01-26 07:58:37 -08:00
Andrew Fitzgerald 29737ab5e4
Use ThreadLocalMultiIterator for tests (#34947)
* Use ThreadLocalMultiIterator for tests

* some validator config was not using default_for_test
2024-01-25 11:22:27 -07:00
Andrew Fitzgerald 62e7ebd0cc
BlockProductionMethod::CentralScheduler as default (#34891) 2024-01-24 15:30:32 -08:00
Brooks 2f744f1639
Moves create_all_accounts_run_and_snapshot_dirs() into accounts-db utils (#34877) 2024-01-22 18:18:43 -05:00
steviez 3dd348802f
Bubble up genesis load errors instead of exiting (#34851)
The function open_genesis_config() performs several operations that
could fail. If any of these fail, the process exits immediately.

Instead of exiting immediately, bubble up the error and let the caller
decide the appropriate action. solana-validator and solana-ledger-tool
will functionally be unchanged, but this consolidates startup failures
for both of these processes.
2024-01-19 10:25:46 -05:00
Pankaj Garg 6bbd3661e1
Throttle unstaked quic streams for a given connection (#34562)
* Throttle unstaked quic streams for a given connection

* Fix interval duration check

* move wait to handle_chunk

* set max unistreams to 0

* drop new streams

* cleanup

* some more cleanup

* fix tests

* update test and stop code

* fix bench-tps
2023-12-21 18:47:52 -08:00
GoodDaisy 03386cc7b9
Fix typos (#34459)
* Fix typos

* Fix typos

* fix typo
2023-12-21 13:06:00 -07:00
sakridge 210d320f16
Remove to_string which is not necessary (#34540) 2023-12-20 14:34:16 +01:00
Ryo Onodera d2b5afc410
Finish unified scheduler plumbing with min impl (#34300)
* Finalize unified scheduler plumbing with min impl

* Fix comment

* Rename leftover type name...

* Make logging text less ambiguous

* Make PhantomData simplyer without already used S

* Make TaskHandler stateless again

* Introduce HandlerContext to simplify TaskHandler

* Add comment for coexistence of Pool::{new,new_dyn}

* Fix grammar

* Remove confusing const for upcoming changes

* Demote InstalledScheduler::context() into dcou

* Delay drop of context up to return_to_pool()-ing

* Revert "Demote InstalledScheduler::context() into dcou"

This reverts commit 049a126c905df0ba8ad975c5cb1007ae90a21050.

* Revert "Delay drop of context up to return_to_pool()-ing"

This reverts commit 60b1bd2511a714690b0b2331e49bc3d0c72e3475.

* Make context handling really type-safe

* Update comment

* Fix grammar...

* Refine type aliases for boxed traits

* Swap the tuple order for readability & semantics

* Simplify PooledScheduler::result_with_timings type

* Restore .in_sequence()

* Use where for aesthetics

* Simplify if...

* Fix typo...

* Polish ::schedule_execution() a bit

* Fix rebase conflicts..

* Make test more readable

* Fix test failures after rebase...
2023-12-19 09:50:41 +09:00
Ashwin Sekar 4f1116164c
local-cluster: fix flaky test_rpc_block_subscribe (#34421)
* local-cluster: fix flaky test_rpc_block_subscribe

* reenable test

* pr feedback: add comment linking pr
2023-12-14 14:50:48 -05:00
behzad nouri 750023530c
makes last erasure batch size >= 64 shreds (#34330) 2023-12-13 06:48:00 +00:00