Commit Graph

2866 Commits

Author SHA1 Message Date
Brennan Watt 502f249904
Add proc net dev metrics to net stats (#26603)
* Add proc net dev metrics to net stats
2022-07-20 11:44:36 -07:00
sakridge 4a7fb2a808
Revert "core: disable quic servers on mainnet-beta" (#26216)
Enable QUIC server
2022-07-20 20:37:24 +02:00
Jeff Washington (jwash) 263911e7fd
save off what we find when calculating hash (#26663) 2022-07-19 09:55:52 -05:00
behzad nouri 2dd8573287
removes erroneous allow(dead_code) annotations from core (#26660) 2022-07-18 17:15:47 +00:00
Tao Zhu 22d465cd57
Share function to get priority details from various transaction types (#26643) 2022-07-15 18:17:22 -05:00
Jeff Washington (jwash) 47716a5e01
async hash verify on load (#26208)
* verify accounts hash in bg on startup

* fix some tests and loading from genesis

* add extra state for when background thread has completed
2022-07-15 14:29:56 -05:00
Tao Zhu f13b5c832d
Remove obsoleted metrics reporting to reduce lock contention on cost_model (#26608)
remove obsoleted metrics reporting to reduce lock contention on cost_model
2022-07-14 23:02:49 -05:00
Pankaj Garg 49a112ae74
Use pubkey of peer for active QUIC connection table (#26597)
* Use pubkey of peer for active QUIC connection table

* clippy

* update code
2022-07-13 09:59:01 -07:00
HaoranYi bf14440895
clean up and optimize account hash verify (#26560)
* remove unused code

* extract test related fault hash inject fn

* use rotate to optimize hashes removal

* use rotate to optimize snapshot hashes removal

* address code reveiw feedbacks

* revise comments

* inline
2022-07-12 19:27:28 +00:00
Pankaj Garg ea7448c568
Use client certs in QUIC to get peer's stake (#26477)
* Use client certs in QUIC to get peer's stake

* fixes to cert processing

* integrate the code

* clippy

* more cleanup

* sort cargo deps

* test fixes

* info -> debug
2022-07-11 18:06:40 +00:00
Tao Zhu a3b094300b
Remove sender stakes from banking_stage buffer prioritization (#26512)
* remove sender stakes from banking_stage buffer prioritization
2022-07-11 12:46:15 -05:00
Nicholas Clarke ee0a40937e
Add validator argument log_messages_bytes_limit to change log truncation limit.
Add new cli argument log_messages_bytes_limit to solana-validator to control how long program logs can be before truncation
2022-07-11 10:53:18 -05:00
behzad nouri ba785cf8ab
removes erroneous uses of std::mem::swap (#26536)
All instances should be replace by std::mem::{replace,take},
or just plain assignment.
2022-07-11 11:33:15 +00:00
Jeff Washington (jwash) 275e47f931
do this right: add 2nd pass at hash calc when failure seen (#26392) (#26538) 2022-07-10 23:10:22 -05:00
Ashwin Sekar 734fedea4c
Create a more compact vote state update transaction (#26092)
* Create a more compact vote state update transaction

* pr comments

* change root to not be an option and update abi
2022-07-07 22:29:02 -07:00
carllin 5bffee248c
Cleanup repair logging (#26461) 2022-07-07 15:02:43 -05:00
behzad nouri 6f4838719b
decouples shreds sig-verify from tpu vote and transaction packets (#26300)
Shreds have different workload and traffic pattern from TPU vote and
transaction packets. Some of recent changes to SigVerifyStage are not
suitable or at least optimal for shreds sig-verify; e.g. random discard,
dedup with false positives, discard excess by IP-address, ...

SigVerifier trait is meant to abstract out the distinctions between the
two pipelines, but in practice it has led to more verbose and convoluted
code.

This commit discards SigVerifier implementation for shreds sig-verify
and instead provides a standalone stage for verifying shreds signatures.
2022-07-07 11:13:13 +00:00
behzad nouri d33c548660
bypasses window-service stage before retransmitting shreds (#26291)
With recent patches, window-service recv-window does not do much other
than redirecting packets/shreds to downstream channels.
The commit removes window-service recv-window and instead sends
packets/shreds directly from sigverify to retransmit-stage and
window-service insert thread.
2022-07-06 11:49:58 +00:00
Tao Zhu c1d89ad749
forward packets by prioritization in desc order (#25406)
- Forward packets by prioritization in desc order
- Add support of cost-tracking by transaction requested compute units
- Hook up account buckets to forwarder
- Add metrics for forwardable batches count
- Remove redundant invalid packets filtering at end of slot since forwarder will do the same when batch forwardable packets
- Add bench test for forwarding
2022-07-05 23:24:58 -05:00
Jeff Washington (jwash) 8eba4d1698
add 2nd pass at hash calc when failure seen (#26392) 2022-07-05 18:01:02 -05:00
behzad nouri d3a14f5b30
simplifies packet/shred sanity checks (#26356) 2022-07-05 21:41:19 +00:00
carllin ce39c14025
Add end-to-end replay slot metrics (#25752) 2022-07-05 13:58:51 -05:00
Nick Rempel 7e4a5de99c
Refactor ConnectionCache::use_quic (#26235)
* Remove UseQuic type

Move to storing the UdpSocket on ConnectionCache and accepting a bool

* Remove use_quic from ConnectionCache constructor

Replace with separate with_udp constructor to force callers to choose
2022-07-05 10:49:42 -07:00
behzad nouri 61f0a7d9c3
replaces Mutex<PohRecorder> with RwLock<PohRecorder> (#26370)
Mutex causes superfluous lock contention when a read-only reference suffices.
2022-07-05 14:29:44 +00:00
Pankaj Garg 94685e1222
Implement randomized pruning of QUIC connection from staked peers (#26299) 2022-06-30 17:56:15 -07:00
behzad nouri 88599fd760
skips shreds deserialization before retransmit (#26230)
Fully deserializing shreds in window-service before sending them to
retransmit stage adds latency to shreds propagation.
This commit instead channels through the payload and relies on only
partial deserialization of a few required fields: slot, shred-index,
shred-type.
2022-06-30 12:13:00 +00:00
Jack May 4563bf40f6
cleanup feature: tx-wide-compute-cap (#26326) 2022-06-29 23:54:45 -07:00
Jeff Washington (jwash) 557bf6e656
allow initial hash calc to occur in bg (#26271)
* allow initial hash calc to occur in bg

* validator_initialized -> startup_verification_complete

* add infos for leader and vote

* rework snapshot for startup verification

* change to assert
2022-06-29 16:48:33 -05:00
behzad nouri f875733a9e
patches bug in retransmit stats where slot stats are erroneously dropped (#26317)
slot_stats are submitted at a different cadence from the rest of
RetransmitStats. Current code erroneously clears slot_stats before
submitting any metrics.
2022-06-29 21:35:58 +00:00
behzad nouri b3406b5b2a
removes IndexedParallelIterator::with_min_len from retransmit (#26305)
Testing on mainnet-beta, with_min_len does not seem to have much impact
in the current retransmit code.
2022-06-29 13:27:17 +00:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
behzad nouri 39ca788b95
discards shreds in sigverify if the slot leader is the node itself (#26229)
Shreds are dropped in window-service if the slot leader is the node
itself:
https://github.com/solana-labs/solana/blob/cd2878acf/core/src/window_service.rs#L181-L185

However this is done after wasting resources verifying signature on
these shreds, and requires a redundant 2nd lookup of the slot leader.

This commit instead discards such shreds in sigverify stage where we
already know the leader for the slot.
2022-06-27 20:12:23 +00:00
behzad nouri 67936aaa74
moves Shred::seed to ShredId and adds test coverage (#26251)
Following commits will skip shreds deserializaton before retransmit, and
so we will only have a ShredId and not a fully deserialized shred to
obtain the shuffling seed from.
2022-06-27 17:58:43 +00:00
Brooks Prumo 662818ef0d
Use `VoteAccount::node_pubkey()` (#26207) 2022-06-27 09:09:06 -05:00
HaoranYi d5efbdb19b
Add timing measurement for gossip vote txn processing (#26163)
* add timing for gossip vote txn processing

* fix build

* fix too many arg error in clippy

* atomic interval
2022-06-27 08:53:34 -05:00
Ryo Onodera cd2878acf9
Avoid to miss to root for local slots before the hard fork (#19912)
* Make sure to root local slots even with hard fork

* Address review comments

* Cleanup a bit

* Further clean up

* Further clean up a bit

* Add comment

* Tweak hard fork reconciliation code placement
2022-06-26 15:14:17 +09:00
behzad nouri 30d2b112e4
bypasses rayon thread-pool for small retransmit shred batches (#26222)
In order to preserve current behavior, the threshold is set to the
current value of the argument to IndexedParallelIterator::with_min_len.
Follow up commits will recalibrate this threshold to optimize
performance on mainnet-beta.
2022-06-25 21:15:42 +00:00
Justin Starry 7cd7173b71
Refactor: Add get_delegated_stake method to VoteAccounts (#26221) 2022-06-25 16:41:35 +00:00
Justin Starry 44d1e62007
Refactor: No need to return stake in Bank::get_vote_account (#26220) 2022-06-25 16:27:43 +00:00
behzad nouri f1b82ec44d
factors out common retransmit work for shreds of the same slot (#26218)
Shreds arriving at a node for retransmit tend to belong to the same slot
(or a just a couple of different slots). Slot leader and cluster nodes
are common for the shreds of the same slot, and so the common work to
look up these values can be factored out.
This commit first group-bys shreds by slot to factor out that common
lookup work.
2022-06-25 15:49:05 +00:00
Jeff Washington (jwash) a3395a786a
vote_account uses AccountSharedData to avoid copies (#23687)
* vote_account uses AccountSharedData to avoid copies

* simpler deserialize
2022-06-24 15:08:01 -05:00
Tyera Eulberg a6ba5a9a05
Add transaction index in slot to geyser plugin TransactionInfo (#25688)
* Define shuffle to prep using same shuffle for multiple slices

* Determine transaction indexes and plumb to execute_batch

* Pair transaction_index with transaction in TransactionStatusService

* Add new ReplicaTransactionInfoVersion

* Plumb transaction_indexes through BankingStage

* Prepare BankingStage to receive transaction indexes from PohRecorder

* Determine transaction indexes in PohRecorder; add field to WorkingBank

* Add PohRecorder::record unit test

* Only pass starting_transaction_index around PohRecorder

* Add helper structs to simplify test DashMap

* Pass entry and starting-index into process_entries_with_callback together

* Add tx-index checks to test_rebatch_transactions

* Revert shuffle definition and use zip/unzip

* Only zip/unzip if randomize

* Add confirm_slot_entries test

* Review nits

* Add type alias to make sender docs more clear
2022-06-23 13:37:38 -06:00
behzad nouri f534b8981b
maps number of data shreds to erasure batch size (#25917)
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
  erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
  replace and remove entries_to_data_shreds from the public interface.
2022-06-23 13:27:54 +00:00
Jeff Biseda bafdb7dd62
Revert handle start_http failure in rpc_service (#25400) (#26130)
* revert e263be2000
2022-06-22 10:52:27 -07:00
Michael Vines f3639b76ce Remove some clippy lints 2022-06-22 09:23:22 -07:00
HaoranYi b5d0c7b468
Revert "tvu and tpu timeout on joining its microservices (#24111)" (#26132)
This reverts commit e105547c14.
2022-06-22 10:57:46 -05:00
behzad nouri faa6c32162 removes packet modifier from shred_fetch_stage
... in favor of just passing packet flags.
2022-06-22 12:17:37 +00:00
behzad nouri 1f0f5dc03e verifies shred-version in fetch stage
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
2022-06-22 12:17:37 +00:00
Pankaj Garg 43ff65ece9
Use single send socket in UdpTpuConnection (#26105) 2022-06-21 14:56:21 -07:00
behzad nouri 75425521b4
moves slot updates notifications after shreds retransmit (#26094)
RetransmitSlotStats can already be utilized to track when the first
shred for a slot was received; therefore
    first_shreds_received: &Mutex<BTreeSet<Slot>>

is redundant. Sending update notifications after shreds retransmit will
also bypass the need for a mutex.
2022-06-21 17:19:40 -04:00
Lijun Wang 61946a49c3
Weight concurrent streams by stake (#25993)
Weight concurrent streams by stake for staked nodes
Ported changes from #25056 after address merge conflicts and some refactoring
2022-06-21 12:06:44 -07:00
behzad nouri d2afa6b418
moves packet-hasher out of the mutex (#26091)
Packet-hasher is not mutated across threads and does not need to be
wrapped in a mutex.
2022-06-21 16:29:27 +00:00
Pankaj Garg e344c8476f
Do not use UdpTpuConnection to forward votes (#26082)
* Do not use UdpTpuConnection to forward votes

* fix tests
2022-06-21 05:56:11 -07:00
Boqin Qin(秦 伯钦) 611d2ec73c
core: fix double-readlock in validator (#26053) 2022-06-20 15:07:00 +00:00
behzad nouri 47e62add5b
removes feature gate code adding shred-type to shred seed (#25963)
The feature is already activated on all clusters, and does not impact
processing of ledger/snapshots.
2022-06-20 14:39:24 +00:00
Trent Nelson a5f290a66f core: disable quic servers on mainnet-beta 2022-06-17 20:04:05 -06:00
behzad nouri b3d1f8d1ac
tracks number of shreds sent and received at different distances from the root (#25989) 2022-06-17 21:33:23 +00:00
behzad nouri eacb9183d4
patches bug where the 1st coding shred is not inserted into blockstore (#25916)
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

https://github.com/solana-labs/solana/pull/7481
which added this code was not inserting coding shreds into blockstore.
Starting with
https://github.com/solana-labs/solana/pull/8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
2022-06-16 13:59:15 +00:00
behzad nouri fe3c1d3d49
removes erroneous uses of &Arc<...> from broadcast-stage (#25962) 2022-06-15 13:44:24 +00:00
Brian Anderson db9004bd0f
Fix doc warnings (#25953) 2022-06-14 21:55:08 -06:00
Tao Zhu c96d9d127a
Include forwarding counters in leader slot metrics (#25874)
* To include forwarding counters in leader slot metrics

* Capture slot_end_detected time when checking leader slots, to be used in reporting later

* Simplify banking stage loop to report leader slot metrics

Co-authored-by: carllin <carl@solana.com>
2022-06-13 17:03:34 -05:00
Michael Vines ace24a7c82 A default tower is no longer considered to contain a stray last vote 2022-06-10 14:17:26 -07:00
Lijun Wang 29b597cea5
Connection pool support in connection cache and QUIC connection reliability improvement (#25793)
* Connection pool in connection cache and handle connection errors

1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
2022-06-10 09:25:24 -07:00
Yueh-Hsuan Chiang ee4469c882
Skip compaction in backup_and_clear_blockstore (#25810)
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.

#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)

Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.

Fixes #25710
2022-06-09 17:11:50 +08:00
carllin bf8faa8a30
Report banking stage tracer metrics (#25620) 2022-06-09 00:25:37 -05:00
Jon Cinque 79a8ecd0ac
client: Remove static connection cache, plumb it instead (#25667)
* client: Remove static connection cache, plumb it instead

* Add TpuClient::new_with_connection_cache to not break downstream

* Refactor get_connection and RwLock into ConnectionCache

* Fix merge conflicts from new async TpuClient

* Remove `ConnectionCache::set_use_quic`

* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
2022-06-08 13:57:12 +02:00
behzad nouri 6c9f2eac78
removes fec_set_offset from UnfinishedSlotInfo (#25815)
If the blockstore has shreds for a slot, it should not recreate the
slot:
https://github.com/solana-labs/solana/blob/ff68bf6c2/ledger/src/leader_schedule_cache.rs#L142-L146
https://github.com/solana-labs/solana/pull/15849/files#r596657314

Therefore in broadcast stage if UnfinishedSlotInfo is None, then
fec_set_offset will be zero:
https://github.com/solana-labs/solana/blob/ff68bf6c2/core/src/broadcast_stage/standard_broadcast_run.rs#L111-L120

As a result fec_set_offset will always be zero, and is so redundant and
can be removed.
2022-06-07 22:17:37 +00:00
Brennan Watt ba04063956
Add CPUmetrics (#25802)
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
2022-06-07 11:34:25 -07:00
apfitzge e6c21a3036
Convert Measure::this to measure! and remove Measure::this (#25776)
* Remove the args param from Measure::this since we don't ever use it

* banking_stage.rs: convert to measure!

* poh_recorder.rs: convert to measure!

* cost_update_service.rs: convert to measure!

* poh_service.rs: convert to measure!

* bank.rs: convert to measure!

* measure.rs: Remove Measure::this now that all have been converted to measure!
2022-06-06 20:21:05 -05:00
sakridge 447a3239e7
Add new replay metrics for replay blockstore_into_bank and complete (#25717) 2022-06-03 19:45:27 +02:00
behzad nouri 5dbf7d8f91
removes raw indexing into packet data (#25554)
Packets are at the boundary of the system where, vast majority of the
time, they are received from an untrusted source. Raw indexing into the
data buffer can open attack vectors if the offsets are invalid.
Validating offsets beforehand is verbose and error prone.

The commit updates Packet::data() api to take a SliceIndex and always to
return an Option. The call-sites are so forced to explicitly handle the
case where the offsets are invalid.
2022-06-03 01:05:06 +00:00
behzad nouri 81231a89b9 adds support for different variants of ShredCode and ShredData
The commit implements two new types:
    pub enum ShredCode {
        Legacy(legacy::ShredCode),
    }
    pub enum ShredData {
        Legacy(legacy::ShredData),
    }

Following commits will extend these types by adding merkle variants:
    pub enum ShredCode {
        Legacy(legacy::ShredCode),
        Merkle(merkle::ShredCode),
    }
    pub enum ShredData {
        Legacy(legacy::ShredData),
        Merkle(merkle::ShredData),
    }
2022-06-02 18:55:50 +00:00
Pankaj Garg 1c2ae470c5
Fix forwarding of transactions over QUIC (#25674)
* Spawn QUIC server to receive forwarded txs

* Update validator port range

* forward votes using UDP

* no forwarding from unstaked nodes

* forwarding stats in banking stage

* fix test builds

* fix lifetime of forward sender
2022-06-02 11:14:58 -07:00
HaoranYi d3ac4e941b
Bench: preshrink + sigverify (#25480)
* double shrinking

* add bench

* rename

* aggregate timing

* remove pre/post shrink time

* update api after merge
2022-06-02 09:19:01 -05:00
Tao Zhu 51ac599915
Add user requested CU (eg. compute_budget.compute_unit_limit) to immutable_deserialized_packet, to be used in cost model and prioritized forwarding (#25695) 2022-06-01 22:43:48 +00:00
Ryo Onodera aedcb05dc8
Record solana-validator ver to metrics at startup (#25635)
* Record solana-validator ver to metrics at startup

* Update Cargo.lock
2022-06-01 13:37:50 +09:00
Christian Kamm 02b26ddd82
SigVerify: Fix num_valid_packets metric (#25643)
It used to report the number of packets with successful signature
validations but was accidentally changed to count packets passed into
the verifier by e4409a87fe.

This restores the previous meaning.
2022-05-31 18:51:20 +10:00
carllin 90a3315b69
Detect tracer key in sigverify (#25579)
* Mark the tracer transaction

* simplify tracer check
2022-05-30 18:41:54 -05:00
Justin Starry e4409a87fe
Add pre shrink pass before sigverify batch (#25136) 2022-05-28 01:51:55 +10:00
Yueh-Hsuan Chiang 5b67960c76
(Refactor) Move blocktore options related stuff to blockstore_options.rs (#25509)
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.

#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.

By doing this, we address the mutual dependency and also make the code cleaner.
2022-05-26 16:59:26 -07:00
ryleung-solana 1ca5c3a7bd
Switch to using enum-dispatch to switch between UDP and Quic (#24713) 2022-05-26 11:21:16 -04:00
behzad nouri de612c25b3
removes shred wire layout specs from sigverify (#25520)
sigverify_shreds relies on wire layout specs of shreds:
https://github.com/solana-labs/solana/blob/0376ab41a/ledger/src/sigverify_shreds.rs#L39-L46
https://github.com/solana-labs/solana/blob/0376ab41a/ledger/src/sigverify_shreds.rs#L298-L305

In preparation of
https://github.com/solana-labs/solana/pull/25237
which adds a new shred variant with different layout and signed message,
this commit removes shred layout specification from sigverify and
instead encapsulate that in shred module.
2022-05-26 13:06:27 +00:00
Christian Kamm 0efb7478cd
FindPacketSenderStake: Remove parallelism to improve performance (#25562)
* FindPacketSenderStake: Remove parallelism to improve performance

The work unit sizes were so small that using the thread pool
slowed down this stage significantly.

* fix checks

Co-authored-by: Justin Starry <justin@solana.com>
2022-05-26 21:17:52 +10:00
behzad nouri cafa85bfbb
includes shred-type when computing turbine broadcast seed (#25556)
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.

This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.

This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
2022-05-25 20:31:53 +00:00
behzad nouri 880684565c
limits read access into Packet data to Packet.meta.size (#25484)
Bytes past Packet.meta.size are not valid to read from.

The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
  buffer up to Packet.meta.size. The rest of the buffer is not valid to
  read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
  of the underlying buffer to write into. The caller is responsible to
  update Packet.meta.size after writing to the buffer.
2022-05-25 16:52:54 +00:00
carllin 9651cdad99
Refactor Sigverify trait (#25359) 2022-05-24 16:01:41 -05:00
Jeff Biseda 61c5a471e8
preserve optimistic_slot in blockstore (#25311) 2022-05-24 12:03:28 -07:00
Justin Starry e66ea7cb6a Clean up Bank::commit_transactions parameters 2022-05-24 20:24:42 +08:00
Justin Starry cad1c41ce2 Add Packet::deserialize_slice convenience method 2022-05-24 17:31:14 +08:00
steviez ec7ca411dd
Make PacketBatch packets vector non-public (#25413)
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
2022-05-23 15:30:15 -05:00
Christian Kamm 6429aff13b
findpacketsenderstake: add discard after receive (#25458)
This mimics a similar change in sigverify, see #25388
2022-05-23 21:27:20 +02:00
behzad nouri c248fb3f51
renames Packet Meta::{,set_}addr methods to {,set_}socket_addr (#25478)
In order to distinguish between Meta.addr field which is an IpAddr and
the methods which refer to a SocketAddr.
2022-05-23 15:48:59 +00:00
Michael Vines b05c7d91ed Fix derive_partial_eq_without_eq clippy lint 2022-05-22 22:22:21 -07:00
sakridge e22be02d3a
sigverify: add discard before dedup (#25388) 2022-05-23 03:40:33 +02:00
Pankaj Garg 7fb0ef1fa5
Use async send for forwarding transactions (#25435) 2022-05-20 21:20:47 -07:00
Jeff Biseda e263be2000
handle start_http failure in rpc_service (#25400) 2022-05-20 17:59:23 -07:00
Brennan Watt e025376719
Fix packet accounting after dedup (#25357)
* Fix packet accounting after dedup
* Rename function to better represent intent
2022-05-20 17:00:13 -07:00
Brennan Watt 2fdc850176
Use Shared IP to Stake Map (#25377)
* Find packet sender stake stage use shared IP to stake map
2022-05-20 12:51:07 -07:00
Michael Vines c54e06355f
voteSubscribe pubsub notification now includes the vote transaction signature (#25291) 2022-05-19 18:28:46 -07:00
Michael Vines 97efbdc303
Defer tower saving until push_vote(), there's no need to do it sooner (#25374) 2022-05-19 18:27:58 -07:00
buffalu 971748b335
fix banking stage starvation (#25245) 2022-05-18 22:37:47 +02:00
Justin Starry 5548baf4dd
Don't drop transactions which use a request heap size ix (#25315) 2022-05-18 17:47:24 +08:00
steviez b27125815a
Simplify logic around MAX_ORPHAN_REPAIR_RESPONSES constant (#25032) 2022-05-17 19:45:45 -06:00
Tao Zhu b1b3702e6d
Prioritize transactions in banking stage by their compute unit price (#25178)
* - get prioritization fee from compute_budget instruction;
- update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message;
- updated runtime.md

* update transaction fee calculation for prioritization fee rate as lamports per 10K CUs

* review changes

* fix test

* fix a bpf test

* fix bpf test

* patch feedback

* fix clippy

* fix bpf test

* feedback

* rename prioritization fee rate to compute unit price

* feedback

Co-authored-by: Justin Starry <justin@solana.com>
2022-05-16 12:06:33 +08:00
sakridge 52db2e19bc
Lower default batch size to 64 and add 2 banking threads (#25226) 2022-05-15 16:52:47 +02:00
sakridge 3d96a1ab76
Block packets in vote-only mode (#24906) 2022-05-14 17:53:37 +02:00
Pankaj Garg 71dd95e842
Tune banking_stage receive loop timing (#25172) 2022-05-13 03:42:08 +00:00
Jeff Washington (jwash) 896729f25e
keep track of oldest slot used by last hash calculation (#25152) 2022-05-12 11:18:08 -05:00
Jeff Washington (jwash) 3a4f0d3397
println -> info (#25163) 2022-05-12 11:07:13 -05:00
Pankaj Garg bcf4d54235
Update test_banking_stage_entryfication to be more deterministic (#25146)
* Update test_banking_stage_entryfication to be more deterministic

* revert to original test with updated checks
2022-05-12 15:36:19 +00:00
HaoranYi 41d34d45e0
pass exit by ref (#25120) 2022-05-11 09:17:21 -05:00
DimAn 2fa9bc3e70
Add options to store full and/or incremental snapshots in separate locations (#24247) 2022-05-10 16:37:41 -04:00
Justin Starry e3bdc38f0a
Add sanitized types for use in banking stage (#25067) 2022-05-11 00:30:48 +08:00
HaoranYi de96663cc4
fix typo (#25083) 2022-05-09 12:42:58 -05:00
Pankaj Garg 362b0526cd
Greedy receive in banking stage (#25060)
* Greedy receive in banking stage

* add upperbound to batch size and batching time

* update test_banking_stage_entryfication test
2022-05-08 10:47:55 -07:00
HaoranYi 8e37e364b1
fix typo in measure name (#25058) 2022-05-07 10:04:42 -05:00
Justin Starry c920d411f7
Clean up logging and make variables consistent (#25049) 2022-05-07 03:52:45 +08:00
Justin Starry 082502d4f3
Fail tx sanitization when ix program id uses lookup table (#25035)
* Fail tx sanitization when ix program id uses lookup table

* feedback
2022-05-07 03:19:50 +08:00
Christian Kamm cb6cd5d60f
FindPacketSenderStake: Improve metrics (#24971)
- separate names for vote and non-vote thread
- time unit postfixes (one is in ns!)
2022-05-06 21:16:13 +02:00
behzad nouri 492f89a170
checks account owner when initializing a vote-account (#25018)
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
2022-05-06 16:22:49 +00:00
behzad nouri a01291069a
initializes thread-pools with lazy_static instead of thread_local (#24853)
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
2022-05-05 20:00:50 +00:00
Justin Starry 7100f1c94b
Collect stats in streamer receiver and report fetch stage metrics (#25010) 2022-05-06 02:56:18 +08:00
carllin 870ac80b79
Prioritize BankingStage packets individually in min-max heap (#24187) 2022-05-04 21:50:56 -05:00
Christian Kamm 503d0baf6d
SigVerify: Add total time metrics for dedup/discard/verify (#24768)
* SigVerify: Add total time metrics for dedup/discard/verify

Previously it was impossible to determine the total time the stage spent
on these activities within a measurement window.

* SigVerify: Add _us postfix to time metrics
2022-05-03 14:59:25 +02:00
behzad nouri eff59193db
enforces that LAST_SHRED_IN_SLOT is also DATA_COMPLETE_SHRED (#24892)
A data shred cannot be LAST_SHRED_IN_SLOT if not also DATA_COMPLETE_SHRED.
So LAST_SHRED_IN_SLOT should also imply DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shredder.rs#L116-L117
https://github.com/solana-labs/solana/blob/74b586ae7/core/src/broadcast_stage/standard_broadcast_run.rs#L80-L81

However current shred constructs allow specifying a shred which is
LAST_SHRED_IN_SLOT but not DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L117-L118
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L272-L273

The commit updates ShredFlags so that if a shred is not
DATA_COMPLETE_SHRED it cannot be LAST_SHRED_IN_SLOT either.
2022-05-02 23:33:53 +00:00
apfitzge 112a0b475a
Revert "Refactor to use EpochSchedule from within RentCollector struct" (#24893)
* Revert "Ran cargo fmt"

This reverts commit 9052e41b32.

* Revert "Fix build error introduced by my editor setup, part 2"

This reverts commit 4dfeab3b38.

* Revert "Fix build error introduced by my editor setup"

This reverts commit 87fb78dc56.

* Revert "Remove redundant epoch_schedule from AccountsPackage"

This reverts commit c2f7f2fff8.

* Revert "Fix a test"

This reverts commit 36c0bdaa78.

* Revert "Fixes to initial code"

This reverts commit ed7813e698.

* Revert "Removing redundant EpochSchedule param from fns"

This reverts commit 5472d2e605.
2022-05-02 13:46:17 -05:00
behzad nouri e812430e28
defines shred flags using bitflags crate (#24874)
Shred flags uses raw bit-masking ops which lacks type-safety:
https://github.com/solana-labs/solana/blob/a829ddc92/ledger/src/shred.rs#L112-L114

This commit instead uses bitflags crate to define shred flags.
2022-05-01 19:25:15 +00:00
Pankaj Garg 88c16c0176
Check if quic is enabled before warming up quic connections (#24821)
* Check if quic is enabled before warming up quic connections

* fix after rebase

* don't start warmup service if quic not enabled

* fix test
2022-05-01 03:52:38 +00:00
Justin Starry a61652104b
Avoid holding lock guards in match expressions (#24805)
* Avoid holding bank forks read lock for RPC requests

* Avoid using lock guards in temporaries

* revert fetch stage change
2022-04-29 16:32:46 +08:00
Justin Starry 4e58b3870c
Update all BankForks methods to return owned values (#24801) 2022-04-28 18:51:00 +00:00
sakridge 5a430c15e2
Separate sigverify metrics for each verifier (#24744) 2022-04-28 01:16:17 -07:00
behzad nouri 0f60665100
replaces Shred::new_empty_coding with Shred::new_from_parity_shard (#24749)
Removing implementation details of shreds and payload offsets from
shredder, so that shredder does not need to mutate payload:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L968-L977

Also, Shred::new_from_data can simply obtain a slice as opposed to
Option<&[u8]>:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L268-L278
2022-04-27 18:04:10 +00:00
behzad nouri 081c844d6e
removes Shred::new_empty_data_shred (#24714)
Shred::new_empty_data_shred returns an invalid shred (i.e.
shred.sanitize() returns error). The method is only used in tests and
can be easily replaced with Shred::new_from_data. To keep the shred api
surface small, this commit removes this method.
2022-04-26 23:13:12 +00:00
behzad nouri 12ae8d3be5
returns Error when Shred::sanitize fails (#24653)
Including the error in the output allows to debug when Shred::sanitize
fails.
2022-04-25 23:19:37 +00:00
behzad nouri 895f76a93c
hides implementation details of shred from its public interface (#24563)
Working towards embedding versioning into shreds binary, so that a new
variant of shred struct can include merkle tree hashes of the erasure
set.
2022-04-25 12:43:22 +00:00
carllin 8a062273de
Move error counters to be reported by leader only at end of slot (#24581)
* Add error counters to leader metrics only

* Add dependencies
2022-04-23 18:10:47 -05:00
Michael Vines d0a8a16a57 ReplayStage no longer relies on Validator to reset the poh recorder at start 2022-04-22 21:17:49 -07:00
Michael Vines 84e3342612 Process blockstore after starting the TVU 2022-04-22 21:17:49 -07:00
Michael Vines 83e041299a Run real snapshot packager while processing blockstore at validator startup 2022-04-22 21:17:49 -07:00
HaoranYi 2d4defa477
fix typo (#24576) 2022-04-22 08:43:57 -05:00
Jon Cinque 0d51596224
sim: Override slot hashes account on simulation bank (#24543)
* sim: Override slot hashes during simulation

* Add simulation test program

* Address feedback

* Add AccountOverrides explicit type

* Cargo fmt
2022-04-22 12:32:31 +02:00
Justin Starry c544742091
Local cluster test cleanup and refactoring (#24559)
* remove FixedSchedule.start_epoch

* use duration for timing

* Rename to partition bool to turbine_disabled

* simplify partition config
2022-04-22 12:14:07 +08:00
Justin Starry 02bfb85c16
Refactor transaction processing in banking stage (#24336)
* Refactor transaction processing in banking stage

* feedback

* more feedback
2022-04-21 21:06:26 +08:00
Justin Starry d5127abf46
Only add hashes for completed blocks to recent blockhashes (#24389)
* Only add hashes for completed blocks to recent blockhashes

* feedback
2022-04-21 21:05:29 +08:00
Tao Zhu a21fc3f303
Apply transaction actual execution units to cost_tracker (#24311)
* Pass the sum of consumed compute units to cost_tracker

* cost model tracks builtins and bpf programs separately, enabling adjust block cost by actual bpf programs execution costs

* Copied nightly-only experimental `checked_add_(un)signed` implementation to sdk

* Add function to update cost tracker with execution cost adjustment

* Review suggestion - using enum instead of struct for CommitTransactionDetails
Co-authored-by: Justin Starry <justin.m.starry@gmail.com>

* review - rename variable to distinguish accumulated_consumed_units from individual compute_units_consumed

* not to use signed integer operations

* Review - using saturating_add_assign!(), and checked_*().unwrap_or()

* Review - using Ordering enum to cmp

* replace checked_ with saturating_

* review - remove unnecessary Option<>

* Review - add function to report number of non-zero units account to metrics
2022-04-21 07:38:07 +00:00
Justin Starry 79923c3b58
Refactor: Rename BlockhashQueue fields and methods for clarity (#24426) 2022-04-21 11:57:17 +08:00
HaoranYi d0761d0ca4
demote receive_window_num_slot_shreds to debug logging (#24505) 2022-04-20 08:51:46 -05:00
Michael Vines 05f32f287c solana-validator monitor now reports slot-level progress while loading blockstore 2022-04-19 22:09:48 -07:00
Michael Vines 9e4999ef6a Remove halt_at_slot from RuntimeConfig, it's not a runtime concern 2022-04-19 19:23:58 -07:00
Michael Vines 988210908c Move verify_udp_stats_access out of the way 2022-04-19 19:23:58 -07:00
Michael Vines c6f3da4879 blockstore_processor now accepts an Arc<Rwlock<BankForks>> 2022-04-19 19:23:58 -07:00
Michael Vines 0e2e0c8b7d Extract most storage-related services from the Tvu abstraction 2022-04-19 19:23:58 -07:00
Michael Vines 268a2109de Relocate hard forks info log 2022-04-19 19:23:58 -07:00
Michael Vines dd766042df Remove LedgerMetricReportService from TVU 2022-04-19 19:23:58 -07:00
behzad nouri 705ea53353
moves sign_shred and new_coding_shred_header out of Shredder (#24487) 2022-04-19 20:00:05 +00:00
Tao Zhu 94b0186a96
Cost model tracks builtins and bpf programs separately (#24468)
* Cost model tracks builtins and bpf programs separatele (enables adjusting block cost by actual bpf programs execution costs)

* Address reviews: expand test; add metrics stat
2022-04-19 13:25:47 -05:00
behzad nouri 3bbfaae7b6
moves shred stats to a separate file (#24484) 2022-04-19 18:25:09 +00:00
Jeff Washington (jwash) d9d0dad258
report swap mem as bytes like other metrics (#24455) 2022-04-19 10:03:25 -05:00
behzad nouri 039488b562
drops redundant turbine propagation path (#24351)
Most nodes in the cluster receive the same shred from two different
nodes: parent, and the first node of their neighborhood:
https://github.com/solana-labs/solana/blob/a8c695ba5/core/src/cluster_nodes.rs#L178-L197

Because of the erasure codings, half of the shreds are already
redundant. So this redundant propagation path will only add extra
overhead.

Additionally the very first node of the broadcast tree has 2x fanout
(i.e. 400 nodes) which adds too much load at one node.

This commit simplifies the broadcast tree by dropping the redundant
propagation path and removing the 2x fanout at root node.
2022-04-19 00:11:29 +00:00
behzad nouri 1d50832389
replaces counters with datapoints in gossip metrics (#24451) 2022-04-18 23:14:59 +00:00
Jason Davis c2f7f2fff8 Remove redundant epoch_schedule from AccountsPackage 2022-04-18 11:57:40 -05:00
Jason Davis 5472d2e605 Removing redundant EpochSchedule param from fns 2022-04-18 11:57:40 -05:00
Christian Kamm d2c6c04d3e banking-bench: Add and rearrange options
- Add write-lock-contention option, replacing same_payer
- write-lock-contention also has a same-batch-only value, where
  contention happens only inside batches, not between them
- Rename num-threads to batches-per-iteration, which is closer to what
  it is actually doing.
- Add num-banking-threads as a new option
- Rename packets-per-chunk to packets-per-batch, because this is closer
  to what's happening; and it was previously confusing that num-chunks
  had little to do with packets-per-chunk.

Example output for a iterations=100 and a permutation of inputs:

contention,threads,batchsize,batchcount,tps
none,           3,192, 4,65290.30
none,           4,192, 4,77358.06
none,           5,192, 4,86436.65
none,           3, 12,64,43944.57
none,           4, 12,64,65852.15
none,           5, 12,64,70674.37
same-batch-only,3,192, 4,3928.21
same-batch-only,4,192, 4,6460.15
same-batch-only,5,192, 4,7242.85
same-batch-only,3, 12,64,11377.58
same-batch-only,4, 12,64,19582.79
same-batch-only,5, 12,64,24648.45
full,           3,192, 4,3914.26
full,           4,192, 4,2102.99
full,           5,192, 4,3041.87
full,           3, 12,64,11316.17
full,           4, 12,64,2224.99
full,           5, 12,64,5240.32
2022-04-18 09:43:46 -05:00
steviez 38f0d60b00
Move repeated logic into common function (#24373) 2022-04-18 00:16:06 -05:00
Tao Zhu 578d59c802 Remove the code that handles cost update for separate pr 2022-04-17 19:26:24 -05:00
Tao Zhu e97ffb55cb nit - renaming variables to concise names 2022-04-17 19:26:24 -05:00
Tao Zhu 6bc6384f8e refactor to consolidate info into single return field 2022-04-17 19:26:24 -05:00
Tao Zhu 9dadfb2e2c Add checked_add_signed() to apply cost adjustment to cost_tracker 2022-04-17 19:26:24 -05:00
Tao Zhu 810b1dff40 undo cost of executed-but-not-recorded transactions from cost_tracker 2022-04-17 19:26:24 -05:00
Tao Zhu 23d365d02f Address review comment: extract transaction was_executed status to avoid cloning execution_results 2022-04-17 19:26:24 -05:00
Tao Zhu 094da35b91 Address review comments:
1. use was_executed to correctly identify transactions requires cost adjustment;
2. add function to specifically handle executino cost adjustment without have to copy accounts
2022-04-17 19:26:24 -05:00
Tao Zhu 29ca21ed78 undo transaction cost from cost_tracker if it was not executed successfully 2022-04-17 19:26:24 -05:00
sakridge d71986cecf
Separate staked and un-staked on quic tpu port (#24339) 2022-04-16 10:54:22 +02:00
sakridge 1b7d1f78de
Implement QUIC connection warmup service for future leaders (#24054)
* Increase connection timeouts

* Bump quic connection cache to 1024

* Use constant for quic connection timeout and add warm cache service

* Fixes to QUIC warmup service

* fix check failure

* fixes after rebase

* fix timeout test

Co-authored-by: Pankaj Garg <pankaj@solana.com>
2022-04-15 12:09:24 -07:00
Christian Kamm 97f2eb8e65 Banking stage: Deserialize packets only once
Benchmarks show roughly a 6% improvement. The impact could be more
significant when transactions need to be retried a lot.

after patch:
{'name': 'banking_bench_total', 'median': '72767.43'}
{'name': 'banking_bench_tx_total', 'median': '80240.38'}
{'name': 'banking_bench_success_tx_total', 'median': '72767.43'}
test bench_banking_stage_multi_accounts
... bench:   6,137,264 ns/iter (+/- 1,364,111)
test bench_banking_stage_multi_programs
... bench:  10,086,435 ns/iter (+/- 2,921,440)

before patch:
{'name': 'banking_bench_total', 'median': '68572.26'}
{'name': 'banking_bench_tx_total', 'median': '75704.75'}
{'name': 'banking_bench_success_tx_total', 'median': '68572.26'}
test bench_banking_stage_multi_accounts
... bench:   6,521,007 ns/iter (+/- 1,926,741)
test bench_banking_stage_multi_programs
... bench:  10,526,433 ns/iter (+/- 2,736,530)
2022-04-15 00:57:11 -06:00
sakridge 7a4a6597c0
Don't enforce ulimit for validator test config (#24272) 2022-04-12 22:06:37 +02:00
Jon Cinque 9b8850f99e
test-validator: Add `--max-compute-units` flag (#24130)
* test-validator: Add `--max-compute-units` flag

* Add `RuntimeConfig` for tweaking runtime behavior

* Actually add the file

* Move RuntimeConfig to runtime
2022-04-12 02:28:10 +02:00
Michael Vines c1687b0604 Switch to await-aware tokio::sync::Mutex 2022-04-11 18:15:03 -04:00
Giorgio Gambino 60b2155bd3
Add accounts-filler-size command line option (#23896) 2022-04-11 13:10:09 -05:00
carllin ff3b6d2b8b
Remove duplicate increment (#24219) 2022-04-09 15:21:39 -05:00
Christian Kamm a058f348a2 Address review comments 2022-04-08 14:37:55 -05:00
Christian Kamm 2ed29771f2 Unittest for cost tracker after process_and_record_transactions 2022-04-08 14:37:55 -05:00
Christian Kamm 924b8ea1eb Adjustments to cost_tracker updates
- don't store pending tx signatures and costs in CostTracker
- apply tx costs to global state immediately again
- go from commit_or_cancel to update_or_remove, where the cost tracker
  is either updated with the true costs for successful tx, or the costs
  of a retryable tx is removed
- move the function into qos_service and hold the cost tracker lock for
  the whole loop
2022-04-08 14:37:55 -05:00
Tao Zhu 9e07272af8 - Only commit successfully executed transactions' cost to cost_tracker;
- In-fly transactions are pended in cost_tracker until being committed
  or cancelled;
2022-04-08 14:37:55 -05:00
Jeff Washington (jwash) 210f6a6fab
move hash calculation out of acct bg svc (#23689)
* move hash calculation out of acct bg svc

* pr feedback
2022-04-08 10:42:03 -05:00
steviez 1dd63631c0
Add high level overview comments on ledger_cleanup_service (#24184) 2022-04-08 00:49:21 -05:00
HaoranYi e105547c14
tvu and tpu timeout on joining its microservices (#24111)
* panic when test timeout

* nonblocking send when when droping banks

* debug log

* timeout for tvu

* unused varaible

* timeout for tpu

* Revert "debug log"

This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.

* add timeout const

* fix typo

* Revert "nonblocking send when when droping banks".
I will create another pull request for this.

This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tpu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/tvu.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-07 20:20:13 -05:00
Jeff Washington (jwash) c27150b1a3
reserialize_bank_fields_with_hash (#23916)
* reserialize_bank_with_new_accounts_hash

* Update runtime/src/serde_snapshot.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/serde_snapshot/tests.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/serde_snapshot/tests.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* pr feedback

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-04-07 14:05:57 -05:00
Jeff Washington (jwash) 550ca7bf92
compare contents of serialized banks instead of exact file format (#24141)
* compare contents of serialized banks instead of exact file format

* Update runtime/src/snapshot_utils.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* Update runtime/src/snapshot_utils.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

* pr feedback

* get rid of clone

* pr feedback

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-04-06 21:55:44 -05:00
Jeff Washington (jwash) fddd162645
reserialize bank in ahv by first writing to temp file in abs (#23947) 2022-04-06 21:39:26 -05:00
Tyera Eulberg fb67ff14de
Remove replica-node crates (#24152) 2022-04-06 16:52:19 -06:00
Brooks Prumo c322842257
Replace channel with Mutex<Option> for AccountsPackage (#24013) 2022-04-06 05:47:19 -05:00
HaoranYi 302142bb25
fix typo (#24123) 2022-04-05 15:55:47 -05:00
behzad nouri db23295e1c
removes legacy weighted_shuffle and weighted_best methods (#24125)
Older weighted_shuffle is based on a heuristic which results in biased
samples as shown in:
https://github.com/solana-labs/solana/pull/18343
and can be replaced with WeightedShuffle.

Also, as described in:
https://github.com/solana-labs/solana/pull/13919
weighted_best can be replaced with rand::distributions::WeightedIndex,
or WeightdShuffle::first.
2022-04-05 19:19:22 +00:00
carllin 4ea59d8cb4
Set drop callback on first root bank (#23999) 2022-04-05 13:02:33 -05:00
behzad nouri 2282571493
removes outdated and flaky test_skip_repair from retransmit-stage (#24121)
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
2022-04-05 16:02:53 +00:00
behzad nouri 2b718d00b0 removes legacy compatibility turbine peers shuffle code 2022-04-05 12:04:12 +00:00
behzad nouri d0b850cdd9 removes turbine peers shuffle patch feature 2022-04-05 12:04:12 +00:00
behzad nouri 855801cc95 removes deterministic-shred-seed feature 2022-04-05 12:04:12 +00:00
Jeff Biseda ee6bb0d5d3
track fec set turbine stats (#23989) 2022-04-04 14:44:21 -07:00
HaoranYi 6ba4e870c4
Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
behzad nouri 7cb3b6cbe2
demotes WeightedShuffle failures to error metrics (#24079)
Since call-sites are calling unwrap anyways, panicking seems too punitive
for our use cases.
2022-04-03 16:20:06 +00:00
HaoranYi ffa4cafe1c
Revert sequential execution of validator_exit and validator_parallel_exit tests (#24048)
* handle channel disconnect

* revert sequential execution of validator_exit and parallel_validator_exit tests
2022-04-02 10:22:47 -05:00
Yueh-Hsuan Chiang 0b5ed87220
(LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
Pankaj Garg df4d92f9cf
Revert voting service to use UDP instead of QUIC (#24032) 2022-04-01 09:34:18 -07:00
HaoranYi 51b37f0184
Modify rpc_completed_slot_service to be non-blocking (#24007)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-31 16:44:23 -05:00
Jeff Washington (jwash) 9c8dad33c7
add epoch_schedule and rent_collector to hash calc (#24012) 2022-03-31 10:51:18 -05:00
Jeff Washington (jwash) da001d54e5
calculate_accounts_hash_helper uses config (#24003) 2022-03-31 09:29:45 -05:00
Jeff Washington (jwash) 125f9634fd
add hash calc config.use_write_cache (#24005) 2022-03-30 17:19:34 -05:00
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
Jeff Washington (jwash) c24de17278
remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
HaoranYi 01af40d6b6
Fix intermittent validator_exit test failure (#23594)
* run validator_exit_test sequentially

* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests

* fix intermittent validator exit failure

* no sleep

* undo the code move
2022-03-25 14:38:19 -05:00
ryleung-solana 6b85c2104c
Implement forwarding via TpuConnection (#23817) 2022-03-25 11:31:40 -04:00
Steven Luscher f44c8f296f
fix: thread `enforce_ulimit_nofile` config down when opening blockstore (#23925) 2022-03-25 03:13:33 -05:00
Jeff Washington (jwash) 51f5524e2f
make verify_accounts_package_hash like other hash calc (#23906) 2022-03-24 17:49:48 -05:00
Jeff Washington (jwash) 55d61023f7
document 'accounts' hash (#23907) 2022-03-24 15:58:52 -05:00
HaoranYi fedf4e984f
typo (#23910) 2022-03-24 15:21:59 -05:00
Jeff Washington (jwash) 37c36ce3fa
pass stats separately from CalcAccountsHashConfig (#23892) 2022-03-24 12:48:47 -05:00
steviez c31db81ac4
Use VoteAccountsHashMap type alias in all applicable spots (#23904) 2022-03-24 12:09:48 -05:00
ryleung-solana 82945ba973
Optimize TpuConnection and its implementations and refactor connection-cache to not use dyn in order to enable those changes (#23877) 2022-03-24 11:40:26 -04:00
Jeff Washington (jwash) 5b916961b5
HashCalc uses self.accounts_cache (#23890) 2022-03-24 10:34:28 -05:00
Jeff Washington (jwash) b22165ad69
hash calc uses self.filler_account_suffix (#23887) 2022-03-24 09:58:06 -05:00
Jeff Washington (jwash) 9022931689
calc hash uses self.num_hash_scan_passes (#23883) 2022-03-24 09:44:42 -05:00
Jeff Washington (jwash) db5d68f01f
HashCalc uses self.accounts_hash_cache_path (#23882) 2022-03-24 09:31:55 -05:00
Jeff Washington (jwash) 3e22d4b286
calc hash uses self.thread_pool_clean (#23881) 2022-03-23 20:52:38 -05:00
Jeff Washington (jwash) 9e61fe7583
add AccountsHashConfig to manage parameters (#23850) 2022-03-23 13:44:23 -05:00
HaoranYi db49b826f0
seperate blockstore metrics from window service metrics (#23871) 2022-03-23 13:38:17 -05:00
HaoranYi 7ff8ed869c
typos (#23870) 2022-03-23 13:36:55 -05:00
Jeff Washington (jwash) b1280b670a
calculate_accounts_hash_without_index takes &self (#23846)
* calculate_accounts_hash_without_index takes &self

* Update runtime/src/snapshot_package.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-03-23 11:57:32 -05:00
Justin Starry 92462ae031
Manually serialize and use `send_wire_transaction` for votes (#23826)
* Revert "core: partial versioned transaction support for voting service"

This reverts commit eb3df4c20e.

* Manually serialize vote tx before sending to TPU
2022-03-23 09:47:55 +08:00
Jon Cinque 7af48465fa
transaction-status: Add return data to meta (#23688)
* transaction-status: Add return data to meta

* Add return data to simulation results

* Use pretty-hex for printing return data

* Update arg name, make TransactionRecord struct

* Rename TransactionRecord -> ExecutionRecord
2022-03-22 23:17:05 +01:00
Trent Nelson eb3df4c20e core: partial versioned transaction support for voting service 2022-03-21 22:59:05 -06:00
HaoranYi 45a7c6edfb
Fix typos and a small refactor (#23805)
* fix typo

* remove packet_has_more_unprocessed_transactions function
2022-03-21 18:35:31 -05:00
Pankaj Garg 5d03b188c8
Use QUIC client in voting service (#23713)
* Use QUIC client in voting service

* guard quic-client usage with a flag

* add measure to time the quic client

* move time measure outside if block

* remove quic vs UDP flag from voting service
2022-03-21 09:10:16 -07:00
Tao Zhu 71ea05c176 replace nested for_each with flat_map 2022-03-18 16:37:41 -05:00
Tao Zhu 1c369fb55f Scan entire UnprocessedPacketBatches buffer to produce stake and locator of each packet 2022-03-18 16:37:41 -05:00
Yueh-Hsuan Chiang f999eef452
(LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00
Tao Zhu 56428be629 Not exposing inner cost_table to encapsulating implementation details,
making future change easier.
2022-03-18 12:58:43 -05:00
Tao Zhu 0ed23899e7 directly use compute_budget MAX_UNITS and DEFAULT_UNITS 2022-03-18 08:53:11 -05:00
Tao Zhu a4cacf3389 add deterministic default cost 2022-03-18 08:53:11 -05:00
Tao Zhu c478fe2047 add timing metrics, some renaming 2022-03-17 19:31:28 -05:00
Tao Zhu fd515097d8 leader qos part 2: add stage to find sender stake, set to packet meta 2022-03-17 19:31:28 -05:00
Stephen Akridge 976b138e76 Add tx weighting stage 2022-03-17 19:31:28 -05:00
Michael Vines 3773b753d1 Configure shrink paths during blockstore load 2022-03-15 23:08:07 -07:00
Michael Vines ab373bb1a9 Refactor new_banks_from_ledger() into load and process steps 2022-03-15 23:08:07 -07:00
Michael Vines 2da4e3eb6c Add --no-os-memory-stats-reporting 2022-03-15 17:07:40 -07:00
Michael Vines dbc62f2e28 Use consistent variable naming for DropBankService 2022-03-15 17:07:13 -07:00
Michael Vines d44f3d7216 Remove unhelpful log message 2022-03-15 17:07:13 -07:00
Tao Zhu 2d3501dff9 make upsert infallible op 2022-03-15 17:05:41 -05:00
Tao Zhu 61cead9b9b Remove injection of exit signal into cost_update_service 2022-03-15 09:58:56 -05:00