Commit Graph

3081 Commits

Author SHA1 Message Date
Maximilian Schneider c8b0c3ede9
Update cost model to use requested_cu instead of estimated cu #27608 (#28281)
* Update cost model to use requested_cu instead of estimated cu #27608

* remove CostUpdate and CostModel from replay/tvu

* revive cost update service to send cost tracker stats

* CostModel is now static

* remove unused package

Co-authored-by: Tao Zhu <tao@solana.com>
2022-11-22 11:55:56 -06:00
apfitzge 637e8a937b
clean up: remove my_pubkey arg from consume_buffered_packets (#28888) 2022-11-22 11:40:04 -06:00
Jeff Washington (jwash) 20d8b5e98b
default some tests to write cache = true (#28917) 2022-11-21 15:53:39 -08:00
apfitzge dd723210ca
remove unnecessary clippy attributes (#28891) 2022-11-21 12:54:54 -06:00
behzad nouri d43b001189
rolls out merkle shreds to ~20% of testnet (#28905) 2022-11-21 16:20:02 +00:00
Michael Vines c6927151ef
Sort offline/wrong-shred nodes by stake weight while waiting for supermajority (#28872) 2022-11-18 15:26:21 -08:00
apfitzge a636038fff
Clean up: banking_stage_prepare_sanitized_batch (#28841)
Use measure! for bank.prepare_sanitized_batch_with_results
2022-11-18 14:04:44 -06:00
Tyera c32377b5af
Split out quic- and udp-client definitions (#28762)
* Move ConnectionCache back to solana-client, and duplicate ThinClient, TpuClient there

* Dedupe thin_client modules

* Dedupe tpu_client modules

* Move TpuClient to TpuConnectionCache

* Move ThinClient to TpuConnectionCache

* Move TpuConnection and quic/udp trait implementations back to solana-client

* Remove enum_dispatch from solana-tpu-client

* Move udp-client to its own crate

* Move quic-client to its own crate
2022-11-18 12:21:45 -07:00
apfitzge 88e6ea37d9
refactor: move more BankingStage cost_model stuff into qos_service (#28840) 2022-11-17 14:03:17 -06:00
Andrew Fitzgerald ee2f760d3d
MultiIteratorScanner - improve banking stage performance with high contention 2022-11-17 10:54:12 -06:00
Jeff Biseda 17ee3349f8
limit repairs to top staked requests in batch (#28673) 2022-11-16 16:30:41 -08:00
Ashwin Sekar ddf4ff2d26
Repair service documentation (#28592)
* repair doc update

* tree_root rename

* remove extra todo
2022-11-16 02:38:07 +00:00
Jeff Biseda e10d958352
signed repair request test fixes/cleanup (#28691) 2022-11-15 16:46:17 -08:00
Tao Zhu e5ae0b3371
check is_forwarded packet earlier (#28159)
* check and filter is_forwarded packet earlier

* review fix: renaming; and rebase
2022-11-11 23:32:03 +00:00
Brooks Prumo 4d6653598b
Upgrades to Rust 1.65.0 (#28741) 2022-11-09 17:15:03 -05:00
Brooks Prumo d1ba42180d
clippy for rust 1.65.0 (#28765) 2022-11-09 19:39:38 +00:00
Brooks Prumo 9e1cdc7e60
Enables not taking a bank snapshot (#28756) 2022-11-09 12:43:33 -05:00
Brooks Prumo 0b9426e734
Simplifies AHV's `test_max_hashes()` (#28754) 2022-11-07 02:32:33 +00:00
Brooks Prumo 064cfc70d2
Removes cluster_type from AccountsPackage (#28725) 2022-11-02 18:21:13 -04:00
Brooks Prumo d0f639745a
Uses AccountsPackage::default_for_tests() in AHV tests (#28723) 2022-11-02 14:13:35 -04:00
Lijun Wang f156bc12ca
Enforce stream receive timeout (#28513)
In the quic server handle_connection, when we timed out in receiving the chunks, we loop forever to wait for the chunk. If the client never provide another chunk, the server can hopelessly wait for that chunk and wasting server resources. Instead WAIT_FOR_CHUNK_TIMEOUT_MS is introduced to bound this to 10 seconds at maximum. The stream will be dropped if it times out.
2022-11-02 10:09:32 -07:00
Brooks Prumo 59bf1809fe
Uses SnapshotHash type in snapshot archive fields (#28681) 2022-10-31 14:28:35 -04:00
Dmitri Makarov 34865d032c chore: update Solana docs and code comments that specify "BPF" to "SBF" 2022-10-31 14:14:25 -04:00
Brooks Prumo 37507a2de6
Removes EAH parameter from serde_snapshot::reserialize_bank() (#28669) 2022-10-31 09:43:17 -04:00
sakridge 340ad68223
Banking stage refactor commit transactions (#28660)
* Refactor commit transactions step

* Cleanup token pre-balances

* Collect prebalances together

* Collect pre/post balances in separate function

* Fix clippy
2022-10-29 21:36:57 +02:00
steviez 6b93d05c37
Add LedgerCleanupService::find_slots_to_clean() test (#28656)
Add a test to better exercise find_slots_to_clean(), as well as a minor
bug fix to this method that was found as a result of writing test.
2022-10-29 00:55:21 +02:00
apfitzge 22ce49ae7f
Maintain original queue capacity for unprocessed packet buffer (#28661) 2022-10-28 16:37:21 -05:00
apfitzge 0a148b2bf7
remove unused handle_retryable_packets_elapsed (#28355) 2022-10-28 16:36:41 -05:00
steviez 2272fd807e
Remove Blockstore manual compaction code (#28409)
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
2022-10-28 10:39:00 +02:00
Ashwin Sekar ae557a9eb5
Exit when stuck in an unrecoverable repair/purge loop (#28596)
* Exit when stuck in an unrecoverable repair/purge loop

* add tests
2022-10-27 20:06:06 -07:00
apfitzge 340d3b5468
rename and change capacity on unprocessed transaction storage - max_receive_size (#28586) 2022-10-26 10:03:47 -05:00
Brooks Prumo f158bab0ef
Tracks how long background requests wait before processing (#28581) 2022-10-25 12:10:53 -04:00
Brooks Prumo bc02789c43
Renames fn to calculate_accounts_hash_from_storages() (#28566) 2022-10-24 21:07:00 -04:00
Brooks Prumo 2354a0a343
Renames fn to calculate_accounts_hash_from_index() (#28568) 2022-10-24 19:20:08 -04:00
Ashwin Sekar 9eafad467c
Add convenience methods to VoteInstruction to distinguish vote types (#28526)
* Add convenience methods to VoteInstruction to distinguish vote types

* use matches! macro instead
2022-10-21 14:17:40 -06:00
Ashwin Sekar f207af765e
Split out voting and banking threads in banking stage (#27931)
* Split out voting and banking threads in banking stage

Additionally this allows us to aggressively prune the buffer for voting threads
as with the new vote state only the latest vote from each validator is
necessary.

* Update local cluster test to use new Vote ix

* Encapsulate transaction storage filtering better

* Address pr comments

* Commit cargo lock change

* clippy

* Remove unsafe impls

* pr comments

* compute_sanitized_transaction -> build_sanitized_transaction

* &Arc -> Arc

* Move test

* Refactor metrics enums

* clippy
2022-10-20 21:10:48 +00:00
Jeff Biseda 0df4be06a0
enable repair ping/pong cache (#28408) 2022-10-19 14:55:55 -07:00
carllin 274d9ea607
Check for valid address in broadcast (#28432)
Check for valid address
2022-10-19 14:49:22 -05:00
Brooks Prumo 1cc9cf927c
Supports warping with Epoch Accounts Hash (#28459) 2022-10-19 10:37:14 -04:00
behzad nouri e283461d99
enforces hash domain for ping-pong protocol (#28433)
https://github.com/solana-labs/solana/pull/27193
added hash domain to ping-pong protocol.
For backward compatibility responses both with and without domain were
generated and accepted.
Now that all clusters are upgraded, this commit enforces the hash domain
by removing the response without the domain.
2022-10-18 18:17:12 +00:00
Jeff Washington (jwash) 28a89a1d99
remove expected rent collection and rehashing completely (#28422) 2022-10-17 07:24:42 -07:00
steviez 39fa297bf6
Report total_transactions in replay-slot-stats (#28382)
We have transactions counted in replay-slot-end-to-end-stats, but that
metric is broken down to report things per thread.

So, report total_transactions for the entire slot (all threads) in
replay-slot-stats.
2022-10-15 14:07:03 +01:00
Brooks Prumo dd7fee8f32
Re-enqueues unhandled ABS requests (#28362) 2022-10-13 16:25:39 -04:00
Brooks Prumo 9cbd00fdbc
Converts PendingAccountsPackage to a channel (#28352) 2022-10-13 12:47:36 -04:00
Jason Davis e2fc9d51de Increase cpu metric reporting interval from 1s to 10s 2022-10-11 10:44:59 -05:00
Jeff Biseda 15050b14b9
use signed repair request variants (#28283) 2022-10-10 14:09:45 -07:00
Tao Zhu 50985f79a1
Correctly mark packets as forwarded (#28161)
Only mark packets accepted for forwarding as `forwarded`
2022-10-07 11:50:57 -05:00
Tao Zhu 0324573667
report additional transaction errors to metrics (#28285) 2022-10-07 10:36:22 -05:00
Brooks Prumo a8c6a9e5fc
Bank::freeze() waits for EAH calculation to complete (#28170) 2022-10-05 17:44:35 -04:00
Jason Davis c899ededfc Minor refactoring and cleaning of cpuid code 2022-10-05 11:43:27 -05:00
Jason Davis 3b2ab313de Use num-enum crate to make everything typesafe 2022-10-05 11:43:27 -05:00
Jason Davis 1e1455688d Convert magic numbers to named constants 2022-10-05 11:43:27 -05:00
Jason Davis fac772ff90 Update naming, style after PR review comments 2022-10-05 11:43:27 -05:00
Jason Davis 13b095b4ab Fix a fmt problem, I think. Shows up in the git check, but not when I run here 2022-10-05 11:43:27 -05:00
Jason Davis c8584b0cdd Cargo fmt applied 2022-10-05 11:43:27 -05:00
Jason Davis d841286c21 Add cpuid calls and metric reporting; change cpu info sampling interval from 1s to 10s 2022-10-05 11:43:27 -05:00
Jeff Biseda e3e888c0e0
stats for staked/unstaked repair requests (#28215) 2022-10-04 17:37:24 -07:00
behzad nouri 9e7a0e7420
rolls out merkle shreds to ~5% of testnet (#28199) 2022-10-04 19:36:16 +00:00
carllin 14a415ccf3
Consensus Logging (#28176) 2022-10-03 20:45:55 -05:00
haoran c4aab3f178 typo 2022-10-03 09:41:15 -05:00
Justin Starry c2bb2b8e60
Allow validators to reset to the slot which matches their last voted slot (#28172)
* Add failing test

* Allow resetting to duplicate was confirmed

* feedback

* feedback

* bump

* simplify change

* Revert "simplify change"

This reverts commit 72e5de3e5bdac595f71dc7fc01650ca3bc7da98e.

* update comment

* Update core/src/replay_stage.rs
2022-10-03 16:49:47 +08:00
Jeff Washington (jwash) cfc124c825
acct idx can no longer use write cache (#28150) 2022-09-30 10:55:27 -07:00
apfitzge 82558226f7
ImmutableDeserializedPacket rc to arc (#28145) 2022-09-30 12:07:48 -05:00
Tao Zhu 82e65593ee
Batch filtering invalid transactions before forwarding (#26798)
- Batch filtering invalid transactions (fail to sanitize, too old or already processed) before forwarding
- Combine packet filtering and forwarding to share sanitized transactions
- `iter_desc` is no longer needed, remove it;
- Add a method to share the logic of removing packets from buffer after they were removed from MinMaxHeap
- Add test coverage for forward_packet_batches_by_accounts
- rebase, resolve conflicts
2022-09-29 16:33:40 -05:00
Jeff Biseda 8b0f9b4917
make ping cache rate limit delay configurable (#27955) 2022-09-26 14:16:56 -07:00
behzad nouri f49beb0cbc
caches reed-solomon encoder/decoder instance (#27510)
ReedSolomon::new(...) initializes a matrix and a data-decode-matrix cache:
https://github.com/rust-rse/reed-solomon-erasure/blob/273ebbced/src/core.rs#L460-L466

In order to cache this computation, this commit caches the reed-solomon
encoder/decoder instance for each (data_shards, parity_shards) pair.
2022-09-25 18:09:47 +00:00
Jeff Biseda 9816c94d7e
metrics to distinguish why repair packets are dropped (#27960) 2022-09-24 23:20:05 -07:00
Jeff Biseda 8b43215ddd
count unsigned repair requests (#27953) 2022-09-24 12:56:02 -07:00
Tao Zhu e51cf46d6b
Remove priority from vote transactions (#28030)
vote transactions have same priority fee
2022-09-24 00:31:50 +00:00
behzad nouri 9ee53e594d
patches clippy errors from new rust nightly release (#28028) 2022-09-23 20:57:27 +00:00
Brooks Prumo d9b31fd6b0
ahv: Add debug logging for EAH (#27998) 2022-09-23 14:04:48 -04:00
Jeff Biseda 206cc9407b
allow unsigned repair requests (#27910) 2022-09-23 10:11:08 -07:00
behzad nouri 97c9af4c6b plumbs through flag to generate merkle variant of shreds 2022-09-23 16:45:18 +00:00
steviez e4affb9fea
Add Blockstore::highest_slot() method (#27981) 2022-09-23 04:53:43 -05:00
behzad nouri 9a57c64f21
patches clippy errors from new rust nightly release (#27996) 2022-09-22 22:23:03 +00:00
behzad nouri abfb996135
tracks number of staked/stale/dead nodes in turbine cluster-nodes (#27915) 2022-09-19 18:16:04 +00:00
Ashwin Sekar 9119dc13ec
Add structure to house unprocessed transactions in banking_stage (#27777)
Separate storage for voting and transaction threads:
- Voting threads utilize a shared reference in order to dedup extraneous
  votes
- Transactions have thread local storage like before
2022-09-14 10:40:44 -07:00
Ashwin Sekar c74df830b1
Add structure to collect and coalesce vote packets (#27558)
* Add structure to collect and coalesce vote packets

Will be used in banking stage to throw out extraneous vote packets
before processing

* pr comments

* Update inner lock to arc to improve performance
2022-09-14 00:44:26 -07:00
apfitzge 079bf561b0
Clean_up/upb_push_comment (#27707) 2022-09-12 18:59:41 -05:00
Jeff Washington (jwash) 765c628546
use exit signal for acct idx bg threads (#27483) 2022-09-12 11:51:12 -07:00
behzad nouri 4f22ee8f9b uses varint encoding for vote-state lockout offsets
The commit removes CompactVoteStateUpdate and instead reduces serialized
size of VoteStateUpdate using varint encoding for vote-state lockout
offsets.
2022-09-12 16:31:20 +00:00
Christian Kamm 90b8a3a44d
Remove KeypairInsecureClone trait and add insecure_clone() instead (#27396)
See discussion in #26248
2022-09-12 14:59:41 +00:00
Michael Vines 83d4d128c2 Add --process-ledger-before-service flag to solana-validator 2022-09-11 07:58:42 -07:00
Jeff Washington (jwash) 1f00b468e5
add enable_rehashing to AccountsPackage (#27644) 2022-09-08 09:25:25 -07:00
apfitzge a9c5adbf88
UnprocessedPacketBatches pop_max fn are only used in tests (#27645) 2022-09-08 11:01:14 -05:00
Maximilian Schneider cc58968b76
add new leader slot metric to track account contention throttling (#27654) 2022-09-08 09:22:58 -05:00
Xiang Zhu 4308c300b4
In ledger-tool delete the account files in the async way (#27622)
* In ledger-tool delete the account files in the async way

* format changes by ./cargo nightly fmt --all
2022-09-07 14:35:06 -07:00
Brooks Prumo 6a322de845
Make Accounts Background Services aware of Epoch Accounts Hash (#27626) 2022-09-07 20:41:40 +00:00
Lijun Wang 7f223dc582
Added option to turn on UDP for TPU transaction and make UDP based TPU off by default (#27462)
--tpu-enable-udp is introduced. And when this is on, the transaction receive and transaction forward is enabled using udp.

Except for a few tests which was hard-coded sending transactions using udp, most tests are being done with udp based tpu disabled.
2022-09-07 13:19:14 -07:00
apfitzge c04747dd66
cluster_slot_state_verifier: clippy nightly fixes (#27521)
clippy nightly fixes
2022-09-07 15:04:56 -05:00
apfitzge 1465ec947d
replay_stage: clippy nightly fixes (#27520)
clippy nightly fixes
2022-09-07 15:04:46 -05:00
apfitzge d6a1e7498f
Add tests for deserialize_and_collect_packets (#27623) 2022-09-07 12:52:18 -05:00
Jeff Washington (jwash) 22007a3c96
allow accounts hash calc to specify enable_rehashing (#27615) 2022-09-07 10:16:52 -07:00
Jeff Washington (jwash) a31d4a597d
serialize epoch_accounts_hash (#27516) 2022-09-07 10:07:00 -07:00
Brooks Prumo 93a4f80a2c
Handling snapshot requests is now required (#27537) 2022-09-07 10:08:42 -04:00
Jeff Biseda 269eb519dd
track time to coalesce entries in recv_slot_entries (#27525) 2022-09-06 16:07:17 -07:00
apfitzge a67d56f462
refactor: add function for deserializing and collecting packets - separate from channel receive (#27548) 2022-09-06 15:54:31 -05:00
Brooks Prumo 6684c62280
Add SnapshotUsage to SnapshotConfig (#27508) 2022-09-02 08:56:23 -04:00
Brennan Watt 242c9cb442
RPC Notifier Signal when Setup Complete (#27481)
* RPC notifier signal when ready
2022-09-01 16:39:55 -07:00
Tyera Eulberg 9b8bed86f9
Add getRecentPrioritizationFees RPC endpoint (#27278)
* Plumb priority_fee_cache into rpc

* Add PrioritizationFeeCache api

* Add getRecentPrioritizationFees rpc endpoint

* Use MAX_TX_ACCOUNT_LOCKS to limit input keys

* Remove unused cache apis

* Map fee data by slot, and make rpc account inputs optional

* Add priority_fee_cache to rpc test framework, and add test

* Add endpoint to jsonrpc docs

* Update docs/src/developing/clients/jsonrpc-api.md

* Update docs/src/developing/clients/jsonrpc-api.md
2022-09-01 23:12:12 +00:00
apfitzge 3bdc5b3f2b
separate packet_deserializer inside banking_stage (#27120)
* separate packet_deserializer inside banking_stage

* Make ReceivePacketResults into a struct with named fields
2022-09-01 10:00:48 -05:00
Tao Zhu 8bb039d08d
collect min prioritization fees when replaying sanitized transactions (#26709)
* Collect blocks' minimum prioritization fees when replaying sanitized transactions

* Limits block min-fee metrics reporting to top 10 writable accounts

* Add service thread to asynchronously update and finalize prioritization fee cache

* Add bench test for prioritization_fee_cache

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2022-08-31 08:00:55 -05:00
Haoran Yi 5b64107626 make pruned_bank channel unbonded.
In kin-sim, we found that bounded channel causes halt for account
background services. As the number of accounts grows, the time for
pruning and cleaning increases, which would leads to longer intervals
between the pruning of deaded bank slots. With 1.7B accounts, we will
exceed the 10K bounded channel threshold that causes halt of account
back ground services. Without pruning, the node will eventually run out
of memory.
2022-08-29 19:06:30 -05:00
Brooks Prumo 3c7cd62030
Move pruned_banks_receiver into PrunedBanksRequestHandler (#27445) 2022-08-29 13:30:06 -04:00
Brennan Watt 46a48760db
Switch concurrent replay from feature to param (#27401)
* Switch concurrent replay from feature to param
2022-08-26 12:36:02 -07:00
Will Hickey 5eefc256d6
Fix startup panic if removing accounts directory fails (#27386)
* Remove contents of accounts directory if deleting the directory fails.
2022-08-25 20:35:12 -05:00
Trent Nelson b1cff5d740 make fatal log message sound fatal 2022-08-25 21:49:12 +00:00
Jeff Biseda d1522fc790
coalesce entries in recv_slot_entries to target byte count (#27321) 2022-08-25 13:51:55 -07:00
Jeff Washington (jwash) 2da93bd45a
add text to assert (#27377) 2022-08-24 14:11:53 -05:00
Tyera Eulberg b8b3d723da
Use new client crates (#27360)
* Update ancillary cli crates

* Update cli

* Update command-line tools

* Update rpc, etc

* Update client-test

* Update core, validator

* Update local-cluster
2022-08-24 10:47:02 -06:00
Ashwin Sekar efa6201eda
Check overflow on vote tx compaction boundary (#27185)
* Check overflow on vote tx compaction boundary

Check for overflow during the conversion between VoteStateUpdate and
CompactVoteStateUpdate.

* Try removing clippy supress
2022-08-23 22:29:03 -07:00
Xiang Zhu 827d8e4bc0
Fallback to synchronous rm_dir call if path moving fails (#27306)
Remove some log lines, as suggested in PR #26910
2022-08-22 22:47:39 -07:00
Brennan Watt e4a7d01e10
Rust v1.63 (#27303)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

* Update quinn-udp crate

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-22 18:01:03 -07:00
Michael Vines 7bdeea10ad Assign custom names to the Rayon global thread pool 2022-08-22 17:56:55 +00:00
Michael Vines 7c01c1ecc6
Update delete_path thread name 2022-08-22 09:01:26 -07:00
Jeff Washington (jwash) fc1a4dd11a
run hash calc with index on failure (#27279) 2022-08-22 10:58:04 -05:00
Michael Vines 3f4731b37f Standardize thread names
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
2022-08-20 07:49:39 -07:00
Xiang Zhu c54824e4f5
Account files remove (#26910)
* Create a new function cleanup_accounts_paths, a trivial change

* Remove account files asynchronously

* Update and simplify the implementation after the validator test runs.

* Fixes after testing on the dev device

* Discard tokio.  Use thread instead

* Fix comments format

* Fix config type to pass the github test

* Fix failed tests.  Handle the case of non-existing path

* Final cleanup, addressing the review comments
Avoided OsString.
Made the function more generic with "impl AsRef<Path>"

Co-authored-by: Jeff Washington <jeff.washington@solana.com>
2022-08-19 23:56:52 -07:00
apfitzge eb06bb61e8
banking stage: actually aggregate tracer packet stats (#27118)
* aggregated_tracer_packet_stats_option was alwasys None

* Actually accumulate tracer packet stats
2022-08-19 15:16:56 -05:00
Will Hickey dba2fd5a16
Enable QUIC client by default. Add arg to disable QUIC client. (Forward port #26927) (#27194)
Enable QUIC client by default. Add arg to disable QUIC client.

* Enable QUIC client by default. Add arg to disable QUIC client.
* Deprecate --disable-quic-servers arg
* Add #[ignore] annotation to failing tests
2022-08-19 09:15:15 -05:00
Brennan Watt 7573000d87
Revert "Rust v1.63.0 (#27148)" (#27245)
This reverts commit a2e7bdf50a.
2022-08-19 09:19:44 +01:00
behzad nouri 6928b2a5af
adds hash domain to ping-pong protocol (#27193)
In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.
2022-08-18 22:39:31 +00:00
Brennan Watt a2e7bdf50a
Rust v1.63.0 (#27148)
* Upgrade to Rust v1.63.0

* Add nightly_clippy_allows

* Resolve some new clippy nightly lints

* Increase QUIC packets completion timeout

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-17 15:48:33 -07:00
behzad nouri fea66c8b63
derives Error trait for ClusterInfoError and core::result::Error (#27208) 2022-08-17 22:01:51 +00:00
Jeff Washington (jwash) 225cddcffb
serialize incremental_snapshot_hash (#26839)
* serialize incremental_snapshot_hash

* pr feedback
2022-08-17 15:14:31 -05:00
behzad nouri 3b87aa9227
reverts wide fanout in broadcast when the root node is down (#26359)
A change included in
https://github.com/solana-labs/solana/pull/20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.
2022-08-16 19:40:06 +00:00
Justin Starry bdce208fe5
clean feature: `request_units_deprecated` (#27102)
clean feature: request_units_deprecated
2022-08-13 13:12:35 +01:00
Jeff Biseda e50013acdf
Handle JsonRpcService startup failure (#27075) 2022-08-11 23:25:20 -07:00
janlegner fc6cee9c06
allow staked nodes weight override (#26870)
* Allowed staked nodes weight override (#26407)

* Allowed staked nodes weight override, passing only HashMap over to core module

Co-authored-by: Ondra Chaloupka <chalda@chainkeepers.io>
2022-08-11 14:34:04 -07:00
behzad nouri ac91cdab74
removes buffering when generating coding shreds in broadcast (#25807)
Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.

In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.

This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
  coding shreds will be generated so that the resulting erasure batch
  has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
  split uniformly into erasure batches with _at least_ 32 data shreds in
  each batch. Each erasure batch will have the same number of code and
  data shreds.

For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
  resulting 19(data):27(code) erasure batch has the same recovery
  probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
  36:36 and 35:35 data:code shreds each.

A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);
2022-08-11 12:44:27 +00:00
Michael Vines 4e79d78629 `solana-validator monitor` how displays slot and gossip stake % while waiting for supermajority 2022-08-10 11:13:25 -07:00
apfitzge c03f3b1436
Separate file for ImmutableDeserializedPacket type (#26951) 2022-08-09 22:39:01 -07:00
Jeff Biseda 370de8129e
ancestor hashes socket ping/pong support (#26866) 2022-08-09 21:39:55 -07:00
Michael Vines ccfbc54195 Move vote program state and instructions to solana-program 2022-08-09 20:52:47 -07:00
apfitzge c2455e7aa4
Fix typo in test function (#27031) 2022-08-09 12:39:22 -07:00
Lijun Wang a69470fd45
Set receive_window per quic connection (#26936)
This change sets the receive_window for non-staked node to 1 * PACKET_DATA_SIZE, and maps the staked nodes's connection's receive_window between 1.2 * PACKET_DATA_SIZE to 10 * PACKET_DATA_SIZE based on the stakes.

The changes is based on Quinn library change to support per connection receive_window tweak at the server side. quinn-rs/quinn#1393
2022-08-09 10:02:47 -07:00
behzad nouri e2a2d271f2
adds number of coding shreds to broadcast metrics (#27006) 2022-08-09 13:59:40 +00:00
apfitzge b6d38aad69
tracer-packet-stats reporting should not reset id (#27012) 2022-08-09 06:38:08 -07:00
Yueh-Hsuan Chiang 99ef2184cc
Delete files older than the lowest_cleanup_slot in LedgerCleanupService::cleanup_ledger (#26651)
#### Problem
LedgerCleanupService requires compactions to propagate & digest range-delete tombstones
to eventually reclaim disk space.

#### Summary of Changes
This PR makes LedgerCleanupService::cleanup_ledger delete any file whose slot-range is
older than the lowest_cleanup_slot.  This allows us to reclaim disk space more often with
fewer IOps.  Experimental results on mainnet validators show that the PR can effectively
reduce 33% to 40% ledger disk size.
2022-08-09 00:48:06 +08:00
Christian Kamm cf58640937
Keypair: implement clone() (#26248)
* Keypair: implement clone()

This was not implemented upstream in ed25519-dalek to force everyone to
think twice before creating another copy of a potentially sensitive
private key in memory.

See https://github.com/dalek-cryptography/ed25519-dalek/issues/76

However, there are now 9 instances of
  Keypair::from_bytes(&keypair.to_bytes())
in the solana codebase and it would be preferable to have a function.

In particular since this also comes up when writing programs and can
cause users to either start messing with lifetimes or discover the
from_bytes() workaround themselves.

This patch opts to not implement the Clone trait. This avoids automatic
use in order to preserve some of the original "let developers think
twice about this" intention.

* Use Keypair::clone
2022-08-06 11:54:38 -06:00
Richard Patel 270315a7f6
transaction-status, storage-proto: add compute_units_consumed (#26528)
* transaction-status, storage-proto: add compute_units_consumed

* fix bpf test

Co-authored-by: Justin Starry <justin@solana.com>
2022-08-06 17:14:31 +00:00
Brennan Watt 5bc81a6c35
Io stats v2 (#26898)
* Use sysfs instead of procfs for disk stats

* Filter map to filter dmcrypt and mdraid volumes

* Unit test cover different kernel formats
2022-08-05 10:38:49 -07:00
Tyera Eulberg 2dca239480
Remove runtime dependency from solana-transaction-status (#26930)
* Move RewardType out of runtime

* Move collect_token_balances to solana-ledger

* Remove solana-runtime dependency
2022-08-05 00:20:27 -06:00
steviez 300666dce7
Make `solana-ledger-tool` run AccountsBackgroundService (#26914)
Prior to this change, long running commands like `solana-ledger-tool
verify` would OOM due to AccountsDb cleanup not happening.

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-08-04 15:44:31 -05:00
Boqin Qin(秦 伯钦) 83a0f5da0f
core: fix double-readlock in replay_stage (#26052) 2022-08-04 10:45:31 -04:00
Brennan Watt 457f9ef739
Reduce severity level of log in replay (#26893)
* Reduce active banks log severity from warn to trace
2022-08-03 13:51:16 -07:00
Brennan Watt f3b760dd91
Add IO metrics (#26804)
* Add Disk IO metrics
2022-08-02 14:29:53 -07:00
behzad nouri ec36f0c5df removes redundant Option<&Arc<...>> wrapper for Blockstore in serve-repair 2022-08-02 15:30:53 +00:00
behzad nouri 6423da0218 removes redundant Arc<RwLock<...>> wrapper off ServeRepair 2022-08-02 15:30:53 +00:00
Jeff Biseda ded9a35cd6
mark repair ping packets for discard only after successful signature verification (#26878) 2022-08-01 16:17:19 -07:00
Jeff Biseda 857be1e237
sign repair requests (#26833) 2022-07-31 15:48:51 -07:00
apfitzge fbfcc3febf
Bugfix: VoteProcessingTiming reset both counters (#26843) 2022-07-29 12:56:04 -05:00
Pankaj Garg fb922f613c
Compute maximum parallel QUIC streams using client stake (#26802)
* Compute maximum parallel QUIC streams using client stake

* clippy fixes

* Add unit test
2022-07-29 08:44:24 -07:00
Brennan Watt 467cb5def5
Concurrent slot replay (#26465)
* Concurrent replay slots

* Split out concurrent and single bank replay paths

* Sub function processing of replay results for readability

* Add feature switch for concurrent replay
2022-07-28 11:33:19 -07:00
Jeff Washington (jwash) 817f65bb50
add full_snapshot to hash config (#26811) 2022-07-28 09:46:34 -05:00
Ashwin Sekar 8d69e8d447
Compact vote state updates to reduce block size (#26616)
* Compact vote state updates to reduce block size

* Add rpc transaction tests
2022-07-27 13:23:44 -06:00
Ashwin Sekar ed539d65b4
Only take the latest vote for each validator in gossip (#25934)
* Only take the latest vote for each validator in gossip

Since the new vote updates are no longer incremental, there
is no value in storing intermediate votes.

* Address pr feedback

* Handle potential downgrade path, FullTowerVote -> Incremental

* Rename sent to bank -> gossip slot

* Handle downgrade case properly

* Only downgrade for newer votes and feature flag, ignore incremental votes otherwise

* Update test
2022-07-26 16:38:30 -06:00
carllin f6d5b253fb
Enforce a 12MB limit on outbound repair (#26493) 2022-07-24 20:44:22 -05:00
Jeff Washington (jwash) d9c7bc7e78
Revert "cleanup feature: default units per instruction (#26684)" (#26750)
This reverts commit 39a34db52a.
2022-07-23 11:03:46 -05:00
Trent Nelson a603c8b0bc Enable QUIC servers by default 2022-07-22 15:45:10 -06:00
Trent Nelson 2ee19f536a Revert "Revert "core: disable quic servers on mainnet-beta" (#26216)"
This reverts commit 4a7fb2a808.
2022-07-22 15:45:10 -06:00
Tao Zhu a6215c1b92
Remove unnecessary poh_recorder read lock acquire (#26743)
Remove unnecessary acquiring of poh_recorder read lock
2022-07-22 15:23:05 -05:00
Tao Zhu 333a48e4e2
Add comment for forwarder buffer iteration behavior (#26721)
Add comment of forwarder buffer iteration behavior
2022-07-22 01:15:17 +00:00
Brennan Watt 932abe98a7
Switch UDP stats from usize to u64 (#26700) 2022-07-20 15:20:30 -07:00
Jack May 39a34db52a
cleanup feature: default units per instruction (#26684) 2022-07-20 19:13:34 +00:00
Brennan Watt 502f249904
Add proc net dev metrics to net stats (#26603)
* Add proc net dev metrics to net stats
2022-07-20 11:44:36 -07:00
sakridge 4a7fb2a808
Revert "core: disable quic servers on mainnet-beta" (#26216)
Enable QUIC server
2022-07-20 20:37:24 +02:00
Jeff Washington (jwash) 263911e7fd
save off what we find when calculating hash (#26663) 2022-07-19 09:55:52 -05:00
behzad nouri 2dd8573287
removes erroneous allow(dead_code) annotations from core (#26660) 2022-07-18 17:15:47 +00:00
Tao Zhu 22d465cd57
Share function to get priority details from various transaction types (#26643) 2022-07-15 18:17:22 -05:00
Jeff Washington (jwash) 47716a5e01
async hash verify on load (#26208)
* verify accounts hash in bg on startup

* fix some tests and loading from genesis

* add extra state for when background thread has completed
2022-07-15 14:29:56 -05:00
Tao Zhu f13b5c832d
Remove obsoleted metrics reporting to reduce lock contention on cost_model (#26608)
remove obsoleted metrics reporting to reduce lock contention on cost_model
2022-07-14 23:02:49 -05:00
Pankaj Garg 49a112ae74
Use pubkey of peer for active QUIC connection table (#26597)
* Use pubkey of peer for active QUIC connection table

* clippy

* update code
2022-07-13 09:59:01 -07:00
HaoranYi bf14440895
clean up and optimize account hash verify (#26560)
* remove unused code

* extract test related fault hash inject fn

* use rotate to optimize hashes removal

* use rotate to optimize snapshot hashes removal

* address code reveiw feedbacks

* revise comments

* inline
2022-07-12 19:27:28 +00:00
Pankaj Garg ea7448c568
Use client certs in QUIC to get peer's stake (#26477)
* Use client certs in QUIC to get peer's stake

* fixes to cert processing

* integrate the code

* clippy

* more cleanup

* sort cargo deps

* test fixes

* info -> debug
2022-07-11 18:06:40 +00:00
Tao Zhu a3b094300b
Remove sender stakes from banking_stage buffer prioritization (#26512)
* remove sender stakes from banking_stage buffer prioritization
2022-07-11 12:46:15 -05:00
Nicholas Clarke ee0a40937e
Add validator argument log_messages_bytes_limit to change log truncation limit.
Add new cli argument log_messages_bytes_limit to solana-validator to control how long program logs can be before truncation
2022-07-11 10:53:18 -05:00
behzad nouri ba785cf8ab
removes erroneous uses of std::mem::swap (#26536)
All instances should be replace by std::mem::{replace,take},
or just plain assignment.
2022-07-11 11:33:15 +00:00
Jeff Washington (jwash) 275e47f931
do this right: add 2nd pass at hash calc when failure seen (#26392) (#26538) 2022-07-10 23:10:22 -05:00
Ashwin Sekar 734fedea4c
Create a more compact vote state update transaction (#26092)
* Create a more compact vote state update transaction

* pr comments

* change root to not be an option and update abi
2022-07-07 22:29:02 -07:00
carllin 5bffee248c
Cleanup repair logging (#26461) 2022-07-07 15:02:43 -05:00
behzad nouri 6f4838719b
decouples shreds sig-verify from tpu vote and transaction packets (#26300)
Shreds have different workload and traffic pattern from TPU vote and
transaction packets. Some of recent changes to SigVerifyStage are not
suitable or at least optimal for shreds sig-verify; e.g. random discard,
dedup with false positives, discard excess by IP-address, ...

SigVerifier trait is meant to abstract out the distinctions between the
two pipelines, but in practice it has led to more verbose and convoluted
code.

This commit discards SigVerifier implementation for shreds sig-verify
and instead provides a standalone stage for verifying shreds signatures.
2022-07-07 11:13:13 +00:00
behzad nouri d33c548660
bypasses window-service stage before retransmitting shreds (#26291)
With recent patches, window-service recv-window does not do much other
than redirecting packets/shreds to downstream channels.
The commit removes window-service recv-window and instead sends
packets/shreds directly from sigverify to retransmit-stage and
window-service insert thread.
2022-07-06 11:49:58 +00:00
Tao Zhu c1d89ad749
forward packets by prioritization in desc order (#25406)
- Forward packets by prioritization in desc order
- Add support of cost-tracking by transaction requested compute units
- Hook up account buckets to forwarder
- Add metrics for forwardable batches count
- Remove redundant invalid packets filtering at end of slot since forwarder will do the same when batch forwardable packets
- Add bench test for forwarding
2022-07-05 23:24:58 -05:00
Jeff Washington (jwash) 8eba4d1698
add 2nd pass at hash calc when failure seen (#26392) 2022-07-05 18:01:02 -05:00
behzad nouri d3a14f5b30
simplifies packet/shred sanity checks (#26356) 2022-07-05 21:41:19 +00:00
carllin ce39c14025
Add end-to-end replay slot metrics (#25752) 2022-07-05 13:58:51 -05:00
Nick Rempel 7e4a5de99c
Refactor ConnectionCache::use_quic (#26235)
* Remove UseQuic type

Move to storing the UdpSocket on ConnectionCache and accepting a bool

* Remove use_quic from ConnectionCache constructor

Replace with separate with_udp constructor to force callers to choose
2022-07-05 10:49:42 -07:00
behzad nouri 61f0a7d9c3
replaces Mutex<PohRecorder> with RwLock<PohRecorder> (#26370)
Mutex causes superfluous lock contention when a read-only reference suffices.
2022-07-05 14:29:44 +00:00
Pankaj Garg 94685e1222
Implement randomized pruning of QUIC connection from staked peers (#26299) 2022-06-30 17:56:15 -07:00
behzad nouri 88599fd760
skips shreds deserialization before retransmit (#26230)
Fully deserializing shreds in window-service before sending them to
retransmit stage adds latency to shreds propagation.
This commit instead channels through the payload and relies on only
partial deserialization of a few required fields: slot, shred-index,
shred-type.
2022-06-30 12:13:00 +00:00
Jack May 4563bf40f6
cleanup feature: tx-wide-compute-cap (#26326) 2022-06-29 23:54:45 -07:00
Jeff Washington (jwash) 557bf6e656
allow initial hash calc to occur in bg (#26271)
* allow initial hash calc to occur in bg

* validator_initialized -> startup_verification_complete

* add infos for leader and vote

* rework snapshot for startup verification

* change to assert
2022-06-29 16:48:33 -05:00
behzad nouri f875733a9e
patches bug in retransmit stats where slot stats are erroneously dropped (#26317)
slot_stats are submitted at a different cadence from the rest of
RetransmitStats. Current code erroneously clears slot_stats before
submitting any metrics.
2022-06-29 21:35:58 +00:00
behzad nouri b3406b5b2a
removes IndexedParallelIterator::with_min_len from retransmit (#26305)
Testing on mainnet-beta, with_min_len does not seem to have much impact
in the current retransmit code.
2022-06-29 13:27:17 +00:00
behzad nouri 348fe9ebe2
verifies shred slot and parent in fetch stage (#26225)
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
2022-06-28 12:45:50 +00:00
behzad nouri 39ca788b95
discards shreds in sigverify if the slot leader is the node itself (#26229)
Shreds are dropped in window-service if the slot leader is the node
itself:
https://github.com/solana-labs/solana/blob/cd2878acf/core/src/window_service.rs#L181-L185

However this is done after wasting resources verifying signature on
these shreds, and requires a redundant 2nd lookup of the slot leader.

This commit instead discards such shreds in sigverify stage where we
already know the leader for the slot.
2022-06-27 20:12:23 +00:00
behzad nouri 67936aaa74
moves Shred::seed to ShredId and adds test coverage (#26251)
Following commits will skip shreds deserializaton before retransmit, and
so we will only have a ShredId and not a fully deserialized shred to
obtain the shuffling seed from.
2022-06-27 17:58:43 +00:00
Brooks Prumo 662818ef0d
Use `VoteAccount::node_pubkey()` (#26207) 2022-06-27 09:09:06 -05:00
HaoranYi d5efbdb19b
Add timing measurement for gossip vote txn processing (#26163)
* add timing for gossip vote txn processing

* fix build

* fix too many arg error in clippy

* atomic interval
2022-06-27 08:53:34 -05:00
Ryo Onodera cd2878acf9
Avoid to miss to root for local slots before the hard fork (#19912)
* Make sure to root local slots even with hard fork

* Address review comments

* Cleanup a bit

* Further clean up

* Further clean up a bit

* Add comment

* Tweak hard fork reconciliation code placement
2022-06-26 15:14:17 +09:00
behzad nouri 30d2b112e4
bypasses rayon thread-pool for small retransmit shred batches (#26222)
In order to preserve current behavior, the threshold is set to the
current value of the argument to IndexedParallelIterator::with_min_len.
Follow up commits will recalibrate this threshold to optimize
performance on mainnet-beta.
2022-06-25 21:15:42 +00:00
Justin Starry 7cd7173b71
Refactor: Add get_delegated_stake method to VoteAccounts (#26221) 2022-06-25 16:41:35 +00:00
Justin Starry 44d1e62007
Refactor: No need to return stake in Bank::get_vote_account (#26220) 2022-06-25 16:27:43 +00:00
behzad nouri f1b82ec44d
factors out common retransmit work for shreds of the same slot (#26218)
Shreds arriving at a node for retransmit tend to belong to the same slot
(or a just a couple of different slots). Slot leader and cluster nodes
are common for the shreds of the same slot, and so the common work to
look up these values can be factored out.
This commit first group-bys shreds by slot to factor out that common
lookup work.
2022-06-25 15:49:05 +00:00
Jeff Washington (jwash) a3395a786a
vote_account uses AccountSharedData to avoid copies (#23687)
* vote_account uses AccountSharedData to avoid copies

* simpler deserialize
2022-06-24 15:08:01 -05:00
Tyera Eulberg a6ba5a9a05
Add transaction index in slot to geyser plugin TransactionInfo (#25688)
* Define shuffle to prep using same shuffle for multiple slices

* Determine transaction indexes and plumb to execute_batch

* Pair transaction_index with transaction in TransactionStatusService

* Add new ReplicaTransactionInfoVersion

* Plumb transaction_indexes through BankingStage

* Prepare BankingStage to receive transaction indexes from PohRecorder

* Determine transaction indexes in PohRecorder; add field to WorkingBank

* Add PohRecorder::record unit test

* Only pass starting_transaction_index around PohRecorder

* Add helper structs to simplify test DashMap

* Pass entry and starting-index into process_entries_with_callback together

* Add tx-index checks to test_rebatch_transactions

* Revert shuffle definition and use zip/unzip

* Only zip/unzip if randomize

* Add confirm_slot_entries test

* Review nits

* Add type alias to make sender docs more clear
2022-06-23 13:37:38 -06:00
behzad nouri f534b8981b
maps number of data shreds to erasure batch size (#25917)
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
  erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
  replace and remove entries_to_data_shreds from the public interface.
2022-06-23 13:27:54 +00:00
Jeff Biseda bafdb7dd62
Revert handle start_http failure in rpc_service (#25400) (#26130)
* revert e263be2000
2022-06-22 10:52:27 -07:00
Michael Vines f3639b76ce Remove some clippy lints 2022-06-22 09:23:22 -07:00
HaoranYi b5d0c7b468
Revert "tvu and tpu timeout on joining its microservices (#24111)" (#26132)
This reverts commit e105547c14.
2022-06-22 10:57:46 -05:00
behzad nouri faa6c32162 removes packet modifier from shred_fetch_stage
... in favor of just passing packet flags.
2022-06-22 12:17:37 +00:00
behzad nouri 1f0f5dc03e verifies shred-version in fetch stage
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
2022-06-22 12:17:37 +00:00
Pankaj Garg 43ff65ece9
Use single send socket in UdpTpuConnection (#26105) 2022-06-21 14:56:21 -07:00
behzad nouri 75425521b4
moves slot updates notifications after shreds retransmit (#26094)
RetransmitSlotStats can already be utilized to track when the first
shred for a slot was received; therefore
    first_shreds_received: &Mutex<BTreeSet<Slot>>

is redundant. Sending update notifications after shreds retransmit will
also bypass the need for a mutex.
2022-06-21 17:19:40 -04:00
Lijun Wang 61946a49c3
Weight concurrent streams by stake (#25993)
Weight concurrent streams by stake for staked nodes
Ported changes from #25056 after address merge conflicts and some refactoring
2022-06-21 12:06:44 -07:00
behzad nouri d2afa6b418
moves packet-hasher out of the mutex (#26091)
Packet-hasher is not mutated across threads and does not need to be
wrapped in a mutex.
2022-06-21 16:29:27 +00:00
Pankaj Garg e344c8476f
Do not use UdpTpuConnection to forward votes (#26082)
* Do not use UdpTpuConnection to forward votes

* fix tests
2022-06-21 05:56:11 -07:00
Boqin Qin(秦 伯钦) 611d2ec73c
core: fix double-readlock in validator (#26053) 2022-06-20 15:07:00 +00:00
behzad nouri 47e62add5b
removes feature gate code adding shred-type to shred seed (#25963)
The feature is already activated on all clusters, and does not impact
processing of ledger/snapshots.
2022-06-20 14:39:24 +00:00
Trent Nelson a5f290a66f core: disable quic servers on mainnet-beta 2022-06-17 20:04:05 -06:00
behzad nouri b3d1f8d1ac
tracks number of shreds sent and received at different distances from the root (#25989) 2022-06-17 21:33:23 +00:00
behzad nouri eacb9183d4
patches bug where the 1st coding shred is not inserted into blockstore (#25916)
StandardBroadcastRun::insert skips 1st shred with index zero because
the 1st *data* shred is inserted synchronously:
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L239-L246
https://github.com/solana-labs/solana/blob/53695ecd2/core/src/broadcast_stage/standard_broadcast_run.rs#L334-L339

https://github.com/solana-labs/solana/pull/7481
which added this code was not inserting coding shreds into blockstore.
Starting with
https://github.com/solana-labs/solana/pull/8899
coding shreds are inserted into blockstore as well as data shreds, but
the insert logic erroneously skips first coding shred because it does
not check if shred is code or data.
2022-06-16 13:59:15 +00:00
behzad nouri fe3c1d3d49
removes erroneous uses of &Arc<...> from broadcast-stage (#25962) 2022-06-15 13:44:24 +00:00
Brian Anderson db9004bd0f
Fix doc warnings (#25953) 2022-06-14 21:55:08 -06:00
Tao Zhu c96d9d127a
Include forwarding counters in leader slot metrics (#25874)
* To include forwarding counters in leader slot metrics

* Capture slot_end_detected time when checking leader slots, to be used in reporting later

* Simplify banking stage loop to report leader slot metrics

Co-authored-by: carllin <carl@solana.com>
2022-06-13 17:03:34 -05:00
Michael Vines ace24a7c82 A default tower is no longer considered to contain a stray last vote 2022-06-10 14:17:26 -07:00
Lijun Wang 29b597cea5
Connection pool support in connection cache and QUIC connection reliability improvement (#25793)
* Connection pool in connection cache and handle connection errors

1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
2022-06-10 09:25:24 -07:00
Yueh-Hsuan Chiang ee4469c882
Skip compaction in backup_and_clear_blockstore (#25810)
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.

#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)

Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.

Fixes #25710
2022-06-09 17:11:50 +08:00
carllin bf8faa8a30
Report banking stage tracer metrics (#25620) 2022-06-09 00:25:37 -05:00
Jon Cinque 79a8ecd0ac
client: Remove static connection cache, plumb it instead (#25667)
* client: Remove static connection cache, plumb it instead

* Add TpuClient::new_with_connection_cache to not break downstream

* Refactor get_connection and RwLock into ConnectionCache

* Fix merge conflicts from new async TpuClient

* Remove `ConnectionCache::set_use_quic`

* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
2022-06-08 13:57:12 +02:00
behzad nouri 6c9f2eac78
removes fec_set_offset from UnfinishedSlotInfo (#25815)
If the blockstore has shreds for a slot, it should not recreate the
slot:
https://github.com/solana-labs/solana/blob/ff68bf6c2/ledger/src/leader_schedule_cache.rs#L142-L146
https://github.com/solana-labs/solana/pull/15849/files#r596657314

Therefore in broadcast stage if UnfinishedSlotInfo is None, then
fec_set_offset will be zero:
https://github.com/solana-labs/solana/blob/ff68bf6c2/core/src/broadcast_stage/standard_broadcast_run.rs#L111-L120

As a result fec_set_offset will always be zero, and is so redundant and
can be removed.
2022-06-07 22:17:37 +00:00
Brennan Watt ba04063956
Add CPUmetrics (#25802)
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
2022-06-07 11:34:25 -07:00
apfitzge e6c21a3036
Convert Measure::this to measure! and remove Measure::this (#25776)
* Remove the args param from Measure::this since we don't ever use it

* banking_stage.rs: convert to measure!

* poh_recorder.rs: convert to measure!

* cost_update_service.rs: convert to measure!

* poh_service.rs: convert to measure!

* bank.rs: convert to measure!

* measure.rs: Remove Measure::this now that all have been converted to measure!
2022-06-06 20:21:05 -05:00
sakridge 447a3239e7
Add new replay metrics for replay blockstore_into_bank and complete (#25717) 2022-06-03 19:45:27 +02:00
behzad nouri 5dbf7d8f91
removes raw indexing into packet data (#25554)
Packets are at the boundary of the system where, vast majority of the
time, they are received from an untrusted source. Raw indexing into the
data buffer can open attack vectors if the offsets are invalid.
Validating offsets beforehand is verbose and error prone.

The commit updates Packet::data() api to take a SliceIndex and always to
return an Option. The call-sites are so forced to explicitly handle the
case where the offsets are invalid.
2022-06-03 01:05:06 +00:00
behzad nouri 81231a89b9 adds support for different variants of ShredCode and ShredData
The commit implements two new types:
    pub enum ShredCode {
        Legacy(legacy::ShredCode),
    }
    pub enum ShredData {
        Legacy(legacy::ShredData),
    }

Following commits will extend these types by adding merkle variants:
    pub enum ShredCode {
        Legacy(legacy::ShredCode),
        Merkle(merkle::ShredCode),
    }
    pub enum ShredData {
        Legacy(legacy::ShredData),
        Merkle(merkle::ShredData),
    }
2022-06-02 18:55:50 +00:00
Pankaj Garg 1c2ae470c5
Fix forwarding of transactions over QUIC (#25674)
* Spawn QUIC server to receive forwarded txs

* Update validator port range

* forward votes using UDP

* no forwarding from unstaked nodes

* forwarding stats in banking stage

* fix test builds

* fix lifetime of forward sender
2022-06-02 11:14:58 -07:00
HaoranYi d3ac4e941b
Bench: preshrink + sigverify (#25480)
* double shrinking

* add bench

* rename

* aggregate timing

* remove pre/post shrink time

* update api after merge
2022-06-02 09:19:01 -05:00
Tao Zhu 51ac599915
Add user requested CU (eg. compute_budget.compute_unit_limit) to immutable_deserialized_packet, to be used in cost model and prioritized forwarding (#25695) 2022-06-01 22:43:48 +00:00
Ryo Onodera aedcb05dc8
Record solana-validator ver to metrics at startup (#25635)
* Record solana-validator ver to metrics at startup

* Update Cargo.lock
2022-06-01 13:37:50 +09:00
Christian Kamm 02b26ddd82
SigVerify: Fix num_valid_packets metric (#25643)
It used to report the number of packets with successful signature
validations but was accidentally changed to count packets passed into
the verifier by e4409a87fe.

This restores the previous meaning.
2022-05-31 18:51:20 +10:00
carllin 90a3315b69
Detect tracer key in sigverify (#25579)
* Mark the tracer transaction

* simplify tracer check
2022-05-30 18:41:54 -05:00
Justin Starry e4409a87fe
Add pre shrink pass before sigverify batch (#25136) 2022-05-28 01:51:55 +10:00
Yueh-Hsuan Chiang 5b67960c76
(Refactor) Move blocktore options related stuff to blockstore_options.rs (#25509)
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.

#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.

By doing this, we address the mutual dependency and also make the code cleaner.
2022-05-26 16:59:26 -07:00
ryleung-solana 1ca5c3a7bd
Switch to using enum-dispatch to switch between UDP and Quic (#24713) 2022-05-26 11:21:16 -04:00
behzad nouri de612c25b3
removes shred wire layout specs from sigverify (#25520)
sigverify_shreds relies on wire layout specs of shreds:
https://github.com/solana-labs/solana/blob/0376ab41a/ledger/src/sigverify_shreds.rs#L39-L46
https://github.com/solana-labs/solana/blob/0376ab41a/ledger/src/sigverify_shreds.rs#L298-L305

In preparation of
https://github.com/solana-labs/solana/pull/25237
which adds a new shred variant with different layout and signed message,
this commit removes shred layout specification from sigverify and
instead encapsulate that in shred module.
2022-05-26 13:06:27 +00:00
Christian Kamm 0efb7478cd
FindPacketSenderStake: Remove parallelism to improve performance (#25562)
* FindPacketSenderStake: Remove parallelism to improve performance

The work unit sizes were so small that using the thread pool
slowed down this stage significantly.

* fix checks

Co-authored-by: Justin Starry <justin@solana.com>
2022-05-26 21:17:52 +10:00
behzad nouri cafa85bfbb
includes shred-type when computing turbine broadcast seed (#25556)
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.

This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.

This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
2022-05-25 20:31:53 +00:00
behzad nouri 880684565c
limits read access into Packet data to Packet.meta.size (#25484)
Bytes past Packet.meta.size are not valid to read from.

The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
  buffer up to Packet.meta.size. The rest of the buffer is not valid to
  read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
  of the underlying buffer to write into. The caller is responsible to
  update Packet.meta.size after writing to the buffer.
2022-05-25 16:52:54 +00:00
carllin 9651cdad99
Refactor Sigverify trait (#25359) 2022-05-24 16:01:41 -05:00
Jeff Biseda 61c5a471e8
preserve optimistic_slot in blockstore (#25311) 2022-05-24 12:03:28 -07:00
Justin Starry e66ea7cb6a Clean up Bank::commit_transactions parameters 2022-05-24 20:24:42 +08:00
Justin Starry cad1c41ce2 Add Packet::deserialize_slice convenience method 2022-05-24 17:31:14 +08:00
steviez ec7ca411dd
Make PacketBatch packets vector non-public (#25413)
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
2022-05-23 15:30:15 -05:00
Christian Kamm 6429aff13b
findpacketsenderstake: add discard after receive (#25458)
This mimics a similar change in sigverify, see #25388
2022-05-23 21:27:20 +02:00
behzad nouri c248fb3f51
renames Packet Meta::{,set_}addr methods to {,set_}socket_addr (#25478)
In order to distinguish between Meta.addr field which is an IpAddr and
the methods which refer to a SocketAddr.
2022-05-23 15:48:59 +00:00
Michael Vines b05c7d91ed Fix derive_partial_eq_without_eq clippy lint 2022-05-22 22:22:21 -07:00
sakridge e22be02d3a
sigverify: add discard before dedup (#25388) 2022-05-23 03:40:33 +02:00
Pankaj Garg 7fb0ef1fa5
Use async send for forwarding transactions (#25435) 2022-05-20 21:20:47 -07:00
Jeff Biseda e263be2000
handle start_http failure in rpc_service (#25400) 2022-05-20 17:59:23 -07:00
Brennan Watt e025376719
Fix packet accounting after dedup (#25357)
* Fix packet accounting after dedup
* Rename function to better represent intent
2022-05-20 17:00:13 -07:00
Brennan Watt 2fdc850176
Use Shared IP to Stake Map (#25377)
* Find packet sender stake stage use shared IP to stake map
2022-05-20 12:51:07 -07:00
Michael Vines c54e06355f
voteSubscribe pubsub notification now includes the vote transaction signature (#25291) 2022-05-19 18:28:46 -07:00
Michael Vines 97efbdc303
Defer tower saving until push_vote(), there's no need to do it sooner (#25374) 2022-05-19 18:27:58 -07:00
buffalu 971748b335
fix banking stage starvation (#25245) 2022-05-18 22:37:47 +02:00
Justin Starry 5548baf4dd
Don't drop transactions which use a request heap size ix (#25315) 2022-05-18 17:47:24 +08:00
steviez b27125815a
Simplify logic around MAX_ORPHAN_REPAIR_RESPONSES constant (#25032) 2022-05-17 19:45:45 -06:00
Tao Zhu b1b3702e6d
Prioritize transactions in banking stage by their compute unit price (#25178)
* - get prioritization fee from compute_budget instruction;
- update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message;
- updated runtime.md

* update transaction fee calculation for prioritization fee rate as lamports per 10K CUs

* review changes

* fix test

* fix a bpf test

* fix bpf test

* patch feedback

* fix clippy

* fix bpf test

* feedback

* rename prioritization fee rate to compute unit price

* feedback

Co-authored-by: Justin Starry <justin@solana.com>
2022-05-16 12:06:33 +08:00
sakridge 52db2e19bc
Lower default batch size to 64 and add 2 banking threads (#25226) 2022-05-15 16:52:47 +02:00
sakridge 3d96a1ab76
Block packets in vote-only mode (#24906) 2022-05-14 17:53:37 +02:00
Pankaj Garg 71dd95e842
Tune banking_stage receive loop timing (#25172) 2022-05-13 03:42:08 +00:00
Jeff Washington (jwash) 896729f25e
keep track of oldest slot used by last hash calculation (#25152) 2022-05-12 11:18:08 -05:00
Jeff Washington (jwash) 3a4f0d3397
println -> info (#25163) 2022-05-12 11:07:13 -05:00
Pankaj Garg bcf4d54235
Update test_banking_stage_entryfication to be more deterministic (#25146)
* Update test_banking_stage_entryfication to be more deterministic

* revert to original test with updated checks
2022-05-12 15:36:19 +00:00
HaoranYi 41d34d45e0
pass exit by ref (#25120) 2022-05-11 09:17:21 -05:00
DimAn 2fa9bc3e70
Add options to store full and/or incremental snapshots in separate locations (#24247) 2022-05-10 16:37:41 -04:00
Justin Starry e3bdc38f0a
Add sanitized types for use in banking stage (#25067) 2022-05-11 00:30:48 +08:00
HaoranYi de96663cc4
fix typo (#25083) 2022-05-09 12:42:58 -05:00
Pankaj Garg 362b0526cd
Greedy receive in banking stage (#25060)
* Greedy receive in banking stage

* add upperbound to batch size and batching time

* update test_banking_stage_entryfication test
2022-05-08 10:47:55 -07:00
HaoranYi 8e37e364b1
fix typo in measure name (#25058) 2022-05-07 10:04:42 -05:00
Justin Starry c920d411f7
Clean up logging and make variables consistent (#25049) 2022-05-07 03:52:45 +08:00
Justin Starry 082502d4f3
Fail tx sanitization when ix program id uses lookup table (#25035)
* Fail tx sanitization when ix program id uses lookup table

* feedback
2022-05-07 03:19:50 +08:00
Christian Kamm cb6cd5d60f
FindPacketSenderStake: Improve metrics (#24971)
- separate names for vote and non-vote thread
- time unit postfixes (one is in ns!)
2022-05-06 21:16:13 +02:00
behzad nouri 492f89a170
checks account owner when initializing a vote-account (#25018)
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
2022-05-06 16:22:49 +00:00
behzad nouri a01291069a
initializes thread-pools with lazy_static instead of thread_local (#24853)
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
2022-05-05 20:00:50 +00:00
Justin Starry 7100f1c94b
Collect stats in streamer receiver and report fetch stage metrics (#25010) 2022-05-06 02:56:18 +08:00
carllin 870ac80b79
Prioritize BankingStage packets individually in min-max heap (#24187) 2022-05-04 21:50:56 -05:00
Christian Kamm 503d0baf6d
SigVerify: Add total time metrics for dedup/discard/verify (#24768)
* SigVerify: Add total time metrics for dedup/discard/verify

Previously it was impossible to determine the total time the stage spent
on these activities within a measurement window.

* SigVerify: Add _us postfix to time metrics
2022-05-03 14:59:25 +02:00
behzad nouri eff59193db
enforces that LAST_SHRED_IN_SLOT is also DATA_COMPLETE_SHRED (#24892)
A data shred cannot be LAST_SHRED_IN_SLOT if not also DATA_COMPLETE_SHRED.
So LAST_SHRED_IN_SLOT should also imply DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shredder.rs#L116-L117
https://github.com/solana-labs/solana/blob/74b586ae7/core/src/broadcast_stage/standard_broadcast_run.rs#L80-L81

However current shred constructs allow specifying a shred which is
LAST_SHRED_IN_SLOT but not DATA_COMPLETE_SHRED:
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L117-L118
https://github.com/solana-labs/solana/blob/74b586ae7/ledger/src/shred.rs#L272-L273

The commit updates ShredFlags so that if a shred is not
DATA_COMPLETE_SHRED it cannot be LAST_SHRED_IN_SLOT either.
2022-05-02 23:33:53 +00:00
apfitzge 112a0b475a
Revert "Refactor to use EpochSchedule from within RentCollector struct" (#24893)
* Revert "Ran cargo fmt"

This reverts commit 9052e41b32.

* Revert "Fix build error introduced by my editor setup, part 2"

This reverts commit 4dfeab3b38.

* Revert "Fix build error introduced by my editor setup"

This reverts commit 87fb78dc56.

* Revert "Remove redundant epoch_schedule from AccountsPackage"

This reverts commit c2f7f2fff8.

* Revert "Fix a test"

This reverts commit 36c0bdaa78.

* Revert "Fixes to initial code"

This reverts commit ed7813e698.

* Revert "Removing redundant EpochSchedule param from fns"

This reverts commit 5472d2e605.
2022-05-02 13:46:17 -05:00
behzad nouri e812430e28
defines shred flags using bitflags crate (#24874)
Shred flags uses raw bit-masking ops which lacks type-safety:
https://github.com/solana-labs/solana/blob/a829ddc92/ledger/src/shred.rs#L112-L114

This commit instead uses bitflags crate to define shred flags.
2022-05-01 19:25:15 +00:00
Pankaj Garg 88c16c0176
Check if quic is enabled before warming up quic connections (#24821)
* Check if quic is enabled before warming up quic connections

* fix after rebase

* don't start warmup service if quic not enabled

* fix test
2022-05-01 03:52:38 +00:00
Justin Starry a61652104b
Avoid holding lock guards in match expressions (#24805)
* Avoid holding bank forks read lock for RPC requests

* Avoid using lock guards in temporaries

* revert fetch stage change
2022-04-29 16:32:46 +08:00
Justin Starry 4e58b3870c
Update all BankForks methods to return owned values (#24801) 2022-04-28 18:51:00 +00:00
sakridge 5a430c15e2
Separate sigverify metrics for each verifier (#24744) 2022-04-28 01:16:17 -07:00
behzad nouri 0f60665100
replaces Shred::new_empty_coding with Shred::new_from_parity_shard (#24749)
Removing implementation details of shreds and payload offsets from
shredder, so that shredder does not need to mutate payload:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L968-L977

Also, Shred::new_from_data can simply obtain a slice as opposed to
Option<&[u8]>:
https://github.com/solana-labs/solana/blob/71ad12128/ledger/src/shred.rs#L268-L278
2022-04-27 18:04:10 +00:00
behzad nouri 081c844d6e
removes Shred::new_empty_data_shred (#24714)
Shred::new_empty_data_shred returns an invalid shred (i.e.
shred.sanitize() returns error). The method is only used in tests and
can be easily replaced with Shred::new_from_data. To keep the shred api
surface small, this commit removes this method.
2022-04-26 23:13:12 +00:00
behzad nouri 12ae8d3be5
returns Error when Shred::sanitize fails (#24653)
Including the error in the output allows to debug when Shred::sanitize
fails.
2022-04-25 23:19:37 +00:00
behzad nouri 895f76a93c
hides implementation details of shred from its public interface (#24563)
Working towards embedding versioning into shreds binary, so that a new
variant of shred struct can include merkle tree hashes of the erasure
set.
2022-04-25 12:43:22 +00:00