#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
* FindPacketSenderStake: Remove parallelism to improve performance
The work unit sizes were so small that using the thread pool
slowed down this stage significantly.
* fix checks
Co-authored-by: Justin Starry <justin@solana.com>
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.
This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.
This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
Bytes past Packet.meta.size are not valid to read from.
The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
buffer up to Packet.meta.size. The rest of the buffer is not valid to
read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
of the underlying buffer to write into. The caller is responsible to
update Packet.meta.size after writing to the buffer.
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
* - get prioritization fee from compute_budget instruction;
- update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message;
- updated runtime.md
* update transaction fee calculation for prioritization fee rate as lamports per 10K CUs
* review changes
* fix test
* fix a bpf test
* fix bpf test
* patch feedback
* fix clippy
* fix bpf test
* feedback
* rename prioritization fee rate to compute unit price
* feedback
Co-authored-by: Justin Starry <justin@solana.com>
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
* SigVerify: Add total time metrics for dedup/discard/verify
Previously it was impossible to determine the total time the stage spent
on these activities within a measurement window.
* SigVerify: Add _us postfix to time metrics
Shred::new_empty_data_shred returns an invalid shred (i.e.
shred.sanitize() returns error). The method is only used in tests and
can be easily replaced with Shred::new_from_data. To keep the shred api
surface small, this commit removes this method.
The commit makes values in stake_delegations map in Stakes struct
generic. Stakes<Delegation> is equivalent to the old code and is used
for backward compatibility in BankFieldsTo{Serialize,Deserialize}.
But banks cache Stakes<StakeAccount> which includes the entire stake
account and StakeState deserialized from account. Doing so, will remove
the need to load stake account from accounts-db when working with
stake-delegations.
* Pass the sum of consumed compute units to cost_tracker
* cost model tracks builtins and bpf programs separately, enabling adjust block cost by actual bpf programs execution costs
* Copied nightly-only experimental `checked_add_(un)signed` implementation to sdk
* Add function to update cost tracker with execution cost adjustment
* Review suggestion - using enum instead of struct for CommitTransactionDetails
Co-authored-by: Justin Starry <justin.m.starry@gmail.com>
* review - rename variable to distinguish accumulated_consumed_units from individual compute_units_consumed
* not to use signed integer operations
* Review - using saturating_add_assign!(), and checked_*().unwrap_or()
* Review - using Ordering enum to cmp
* replace checked_ with saturating_
* review - remove unnecessary Option<>
* Review - add function to report number of non-zero units account to metrics
Most nodes in the cluster receive the same shred from two different
nodes: parent, and the first node of their neighborhood:
https://github.com/solana-labs/solana/blob/a8c695ba5/core/src/cluster_nodes.rs#L178-L197
Because of the erasure codings, half of the shreds are already
redundant. So this redundant propagation path will only add extra
overhead.
Additionally the very first node of the broadcast tree has 2x fanout
(i.e. 400 nodes) which adds too much load at one node.
This commit simplifies the broadcast tree by dropping the redundant
propagation path and removing the 2x fanout at root node.
- Add write-lock-contention option, replacing same_payer
- write-lock-contention also has a same-batch-only value, where
contention happens only inside batches, not between them
- Rename num-threads to batches-per-iteration, which is closer to what
it is actually doing.
- Add num-banking-threads as a new option
- Rename packets-per-chunk to packets-per-batch, because this is closer
to what's happening; and it was previously confusing that num-chunks
had little to do with packets-per-chunk.
Example output for a iterations=100 and a permutation of inputs:
contention,threads,batchsize,batchcount,tps
none, 3,192, 4,65290.30
none, 4,192, 4,77358.06
none, 5,192, 4,86436.65
none, 3, 12,64,43944.57
none, 4, 12,64,65852.15
none, 5, 12,64,70674.37
same-batch-only,3,192, 4,3928.21
same-batch-only,4,192, 4,6460.15
same-batch-only,5,192, 4,7242.85
same-batch-only,3, 12,64,11377.58
same-batch-only,4, 12,64,19582.79
same-batch-only,5, 12,64,24648.45
full, 3,192, 4,3914.26
full, 4,192, 4,2102.99
full, 5,192, 4,3041.87
full, 3, 12,64,11316.17
full, 4, 12,64,2224.99
full, 5, 12,64,5240.32
1. use was_executed to correctly identify transactions requires cost adjustment;
2. add function to specifically handle executino cost adjustment without have to copy accounts
* Increase connection timeouts
* Bump quic connection cache to 1024
* Use constant for quic connection timeout and add warm cache service
* Fixes to QUIC warmup service
* fix check failure
* fixes after rebase
* fix timeout test
Co-authored-by: Pankaj Garg <pankaj@solana.com>
Benchmarks show roughly a 6% improvement. The impact could be more
significant when transactions need to be retried a lot.
after patch:
{'name': 'banking_bench_total', 'median': '72767.43'}
{'name': 'banking_bench_tx_total', 'median': '80240.38'}
{'name': 'banking_bench_success_tx_total', 'median': '72767.43'}
test bench_banking_stage_multi_accounts
... bench: 6,137,264 ns/iter (+/- 1,364,111)
test bench_banking_stage_multi_programs
... bench: 10,086,435 ns/iter (+/- 2,921,440)
before patch:
{'name': 'banking_bench_total', 'median': '68572.26'}
{'name': 'banking_bench_tx_total', 'median': '75704.75'}
{'name': 'banking_bench_success_tx_total', 'median': '68572.26'}
test bench_banking_stage_multi_accounts
... bench: 6,521,007 ns/iter (+/- 1,926,741)
test bench_banking_stage_multi_programs
... bench: 10,526,433 ns/iter (+/- 2,736,530)
- don't store pending tx signatures and costs in CostTracker
- apply tx costs to global state immediately again
- go from commit_or_cancel to update_or_remove, where the cost tracker
is either updated with the true costs for successful tx, or the costs
of a retryable tx is removed
- move the function into qos_service and hold the cost tracker lock for
the whole loop
* panic when test timeout
* nonblocking send when when droping banks
* debug log
* timeout for tvu
* unused varaible
* timeout for tpu
* Revert "debug log"
This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.
* add timeout const
* fix typo
* Revert "nonblocking send when when droping banks".
I will create another pull request for this.
This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/validator.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations. The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.
The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>