Use node's latest vote for commitment calc. too (#1964)
* Use node's latest vote for commitment calc. too
* Make local_cluster test use finalized
* Update core/src/commitment_service.rs
Co-authored-by: Tyera <teulberg@gmail.com>
* Don't wrap with Option and update tests
---------
Co-authored-by: Tyera Eulberg <tyera@anza.xyz>
Co-authored-by: Tyera <teulberg@gmail.com>
(cherry picked from commit 556298982a)
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
* Refactor and additional metrics for cost tracking (#1888)
* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;
* Move committed transactions adjustment into its own function
(cherry picked from commit c3fadacf69)
* rename cost_tracker.account_data_size to better describe its purpose is to tracker per-block new account allocation
---------
Co-authored-by: Tao Zhu <82401714+tao-stones@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
* setup tpu client methods required for localcluster to use TpuClient
* add new_tpu_quic_client() for local cluster tests
* update local-cluster src files to use TpuClient. tests next
* finish removing thinclient from localcluster
* address comments
* add note for send_and_confirm_transaction_with_retries
* remove retry logic from tpu-client. Send directly to upcoming leaders without retry.
* use rate limit on connectings
use rate limit on connectings; missing file
* Change connection rate limit to 8/min instead of 4/s
* Addressed some feedback from Trent
* removed some comments
* fix test failures which are opening connections more frequently
* moved the flag up
* turn off rate limiting to debug CI
* Fix CI test failures
* differentiate of the two throttling cases in stats: across connections or per ip addr
* fmt issues
* Addressed some feedback from Trent
* Added unit tests
Cleanup connection cache rate limiter if exceeding certain threshold
missing files
CONNECITON_RATE_LIMITER_CLEANUP_THRESHOLD to 100_000
clippy issue
clippy issue
sort crates
* revert Cargo.lock changes
* Addressed some feedback from Pankaj
```
error: assigning the result of `Clone::clone()` may be inefficient
--> bucket_map/src/bucket.rs:979:17
|
979 | hashed = hashed_raw.clone();
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use `clone_from()`: `hashed.clone_from(&hashed_raw)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#assigning_clones
= note: `-D clippy::assigning-clones` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::assigning_clones)]`
```
For duplicate blocks prevention we want to verify that the last erasure
batch was sufficiently propagated through turbine. This requires
additional bookkeeping because, depending on the erasure coding schema,
the entire batch might be recovered from only a few coding shreds.
In order to simplify above, this commit instead ensures that the last
erasure batch has >= 32 data shreds so that the batch cannot be
recovered unless 32+ shreds are received from turbine or repair.
* add PacketFlags::FROM_STAKED_NODE
* Only forward packets from staked node
* fix local-cluster test forwarding
* review comment
* tpu_votes get marked as from_staked_node
The IP echo server currently spins up a worker thread for every thread
on the machine. Observing some data for nodes,
- MNB validators and RPC nodes look to get several hundred of these
requests per day
- MNB entrypoint nodes look to get 2-3 requests per second on average
In both instances, the current threadpool is severely overprovisioned
which is a waste of resources. This PR plumnbs a flag to control the
number of worker threads for this pool as well as setting a default of
two threads for this server. Two threads allow for one thread to always
listen on the TCP port while the other thread processes requests
Previously, entry verification had a dedicated threadpool used to verify
PoH hashes as well as some basic transaction verification via
Bank::verify_transaction(). It should also be noted that the entry
verification code provides logic to offload to a GPU if one is present.
Regardless of whether a GPU is present or not, some of the verification
must be done on a CPU. Moreso, the CPU verification of entries and
transaction execution are serial operations; entry verification finishes
first before moving onto transaction execution.
So, tx execution and entry verification are not competing for CPU cycles
at the same time and can use the same pool.
One exception to the above statement is that if someone is using the
feature to replay forks in parallel, then hypothetically, different
forks may end up competing for the same resources at the same time.
However, that is already true given that we had pools that were shared
between replay of multiple forks. So, this change doesn't really change
much for that case, but will reduce overhead in the single fork case
which is the vast majority of the time.
Bubble up the constants to the CLI that control the sizes of the
following two thread pools:
- The thread pool used to replay multiple forks in parallel
- The thread pool used to execute transactions in parallel
* gossip: notify state machine of duplicate proofs
* Add feature flag for ingesting duplicate proofs from Gossip.
* Use the Epoch the shred is in instead of the root bank epoch.
* Fix unittest by activating the feature.
* Add a test for feature disabled case.
* EpochSchedule is now not copyable, clone it explicitly.
* pr feedback: read epoch schedule on startup, add guard for ff recache
* pr feedback: bank_forks lock, -cached_slots_in_epoch, init ff
* pr feedback: bank.forks_try_read() -> read()
* pr feedback: fix local-cluster setup
* local-cluster: do not expose gossip internals, use retry mechanism instead
* local-cluster: split out case 4b into separate test and ignore
* pr feedback: avoid taking lock if ff is already found
* pr feedback: do not cache ff epoch
* pr feedback: bank_forks lock, revert to cached_slots_in_epoch
* pr feedback: move local variable into helper function
* pr feedback: use let else, remove epoch 0 hack
---------
Co-authored-by: Wen <crocoxu@gmail.com>
The function open_genesis_config() performs several operations that
could fail. If any of these fail, the process exits immediately.
Instead of exiting immediately, bubble up the error and let the caller
decide the appropriate action. solana-validator and solana-ledger-tool
will functionally be unchanged, but this consolidates startup failures
for both of these processes.
* Throttle unstaked quic streams for a given connection
* Fix interval duration check
* move wait to handle_chunk
* set max unistreams to 0
* drop new streams
* cleanup
* some more cleanup
* fix tests
* update test and stop code
* fix bench-tps
* Finalize unified scheduler plumbing with min impl
* Fix comment
* Rename leftover type name...
* Make logging text less ambiguous
* Make PhantomData simplyer without already used S
* Make TaskHandler stateless again
* Introduce HandlerContext to simplify TaskHandler
* Add comment for coexistence of Pool::{new,new_dyn}
* Fix grammar
* Remove confusing const for upcoming changes
* Demote InstalledScheduler::context() into dcou
* Delay drop of context up to return_to_pool()-ing
* Revert "Demote InstalledScheduler::context() into dcou"
This reverts commit 049a126c905df0ba8ad975c5cb1007ae90a21050.
* Revert "Delay drop of context up to return_to_pool()-ing"
This reverts commit 60b1bd2511a714690b0b2331e49bc3d0c72e3475.
* Make context handling really type-safe
* Update comment
* Fix grammar...
* Refine type aliases for boxed traits
* Swap the tuple order for readability & semantics
* Simplify PooledScheduler::result_with_timings type
* Restore .in_sequence()
* Use where for aesthetics
* Simplify if...
* Fix typo...
* Polish ::schedule_execution() a bit
* Fix rebase conflicts..
* Make test more readable
* Fix test failures after rebase...