The num_repair field is only blockstore insertion metric being updated
outside of Blockstore::insert() call chain; move the update to insert()
with the rest of the fields in BlockstoreInsertionMetrics struct.
* Add dump_node to update stake for heaviest subtrees
Additionally refactor subtrees to store children as a hashset
* Add a more complicated forks test
* chose -> choose
* remove is_dumped flag and reuse latest_invalid_ancestor instead
* Update cost model to use requested_cu instead of estimated cu #27608
* remove CostUpdate and CostModel from replay/tvu
* revive cost update service to send cost tracker stats
* CostModel is now static
* remove unused package
Co-authored-by: Tao Zhu <tao@solana.com>
* Move ConnectionCache back to solana-client, and duplicate ThinClient, TpuClient there
* Dedupe thin_client modules
* Dedupe tpu_client modules
* Move TpuClient to TpuConnectionCache
* Move ThinClient to TpuConnectionCache
* Move TpuConnection and quic/udp trait implementations back to solana-client
* Remove enum_dispatch from solana-tpu-client
* Move udp-client to its own crate
* Move quic-client to its own crate
In the quic server handle_connection, when we timed out in receiving the chunks, we loop forever to wait for the chunk. If the client never provide another chunk, the server can hopelessly wait for that chunk and wasting server resources. Instead WAIT_FOR_CHUNK_TIMEOUT_MS is introduced to bound this to 10 seconds at maximum. The stream will be dropped if it times out.
The manual Blockstore compaction that was being initiated from
LedgerCleanupService has been disabled for quite some time in favor of
several optimizations.
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
* Split out voting and banking threads in banking stage
Additionally this allows us to aggressively prune the buffer for voting threads
as with the new vote state only the latest vote from each validator is
necessary.
* Update local cluster test to use new Vote ix
* Encapsulate transaction storage filtering better
* Address pr comments
* Commit cargo lock change
* clippy
* Remove unsafe impls
* pr comments
* compute_sanitized_transaction -> build_sanitized_transaction
* &Arc -> Arc
* Move test
* Refactor metrics enums
* clippy
https://github.com/solana-labs/solana/pull/27193
added hash domain to ping-pong protocol.
For backward compatibility responses both with and without domain were
generated and accepted.
Now that all clusters are upgraded, this commit enforces the hash domain
by removing the response without the domain.
We have transactions counted in replay-slot-end-to-end-stats, but that
metric is broken down to report things per thread.
So, report total_transactions for the entire slot (all threads) in
replay-slot-stats.
#### Summary of Changes
Removes the constant default for ShredStorageType::RocksFifo
as the shred storage size is either user-specified or derived
from --limit-ledger-size in #27459.
- Batch filtering invalid transactions (fail to sanitize, too old or already processed) before forwarding
- Combine packet filtering and forwarding to share sanitized transactions
- `iter_desc` is no longer needed, remove it;
- Add a method to share the logic of removing packets from buffer after they were removed from MinMaxHeap
- Add test coverage for forward_packet_batches_by_accounts
- rebase, resolve conflicts
Separate storage for voting and transaction threads:
- Voting threads utilize a shared reference in order to dedup extraneous
votes
- Transactions have thread local storage like before
* Add structure to collect and coalesce vote packets
Will be used in banking stage to throw out extraneous vote packets
before processing
* pr comments
* Update inner lock to arc to improve performance
--tpu-enable-udp is introduced. And when this is on, the transaction receive and transaction forward is enabled using udp.
Except for a few tests which was hard-coded sending transactions using udp, most tests are being done with udp based tpu disabled.
* Plumb priority_fee_cache into rpc
* Add PrioritizationFeeCache api
* Add getRecentPrioritizationFees rpc endpoint
* Use MAX_TX_ACCOUNT_LOCKS to limit input keys
* Remove unused cache apis
* Map fee data by slot, and make rpc account inputs optional
* Add priority_fee_cache to rpc test framework, and add test
* Add endpoint to jsonrpc docs
* Update docs/src/developing/clients/jsonrpc-api.md
* Update docs/src/developing/clients/jsonrpc-api.md
In kin-sim, we found that bounded channel causes halt for account
background services. As the number of accounts grows, the time for
pruning and cleaning increases, which would leads to longer intervals
between the pruning of deaded bank slots. With 1.7B accounts, we will
exceed the 10K bounded channel threshold that causes halt of account
back ground services. Without pruning, the node will eventually run out
of memory.
* Check overflow on vote tx compaction boundary
Check for overflow during the conversion between VoteStateUpdate and
CompactVoteStateUpdate.
* Try removing clippy supress
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
* Create a new function cleanup_accounts_paths, a trivial change
* Remove account files asynchronously
* Update and simplify the implementation after the validator test runs.
* Fixes after testing on the dev device
* Discard tokio. Use thread instead
* Fix comments format
* Fix config type to pass the github test
* Fix failed tests. Handle the case of non-existing path
* Final cleanup, addressing the review comments
Avoided OsString.
Made the function more generic with "impl AsRef<Path>"
Co-authored-by: Jeff Washington <jeff.washington@solana.com>
In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.
A change included in
https://github.com/solana-labs/solana/pull/20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.
Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.
In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.
This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
coding shreds will be generated so that the resulting erasure batch
has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
split uniformly into erasure batches with _at least_ 32 data shreds in
each batch. Each erasure batch will have the same number of code and
data shreds.
For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
resulting 19(data):27(code) erasure batch has the same recovery
probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
36:36 and 35:35 data:code shreds each.
A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);
This change sets the receive_window for non-staked node to 1 * PACKET_DATA_SIZE, and maps the staked nodes's connection's receive_window between 1.2 * PACKET_DATA_SIZE to 10 * PACKET_DATA_SIZE based on the stakes.
The changes is based on Quinn library change to support per connection receive_window tweak at the server side. quinn-rs/quinn#1393
#### Problem
LedgerCleanupService requires compactions to propagate & digest range-delete tombstones
to eventually reclaim disk space.
#### Summary of Changes
This PR makes LedgerCleanupService::cleanup_ledger delete any file whose slot-range is
older than the lowest_cleanup_slot. This allows us to reclaim disk space more often with
fewer IOps. Experimental results on mainnet validators show that the PR can effectively
reduce 33% to 40% ledger disk size.
* Keypair: implement clone()
This was not implemented upstream in ed25519-dalek to force everyone to
think twice before creating another copy of a potentially sensitive
private key in memory.
See https://github.com/dalek-cryptography/ed25519-dalek/issues/76
However, there are now 9 instances of
Keypair::from_bytes(&keypair.to_bytes())
in the solana codebase and it would be preferable to have a function.
In particular since this also comes up when writing programs and can
cause users to either start messing with lifetimes or discover the
from_bytes() workaround themselves.
This patch opts to not implement the Clone trait. This avoids automatic
use in order to preserve some of the original "let developers think
twice about this" intention.
* Use Keypair::clone
Prior to this change, long running commands like `solana-ledger-tool
verify` would OOM due to AccountsDb cleanup not happening.
Co-authored-by: Michael Vines <mvines@gmail.com>
* Concurrent replay slots
* Split out concurrent and single bank replay paths
* Sub function processing of replay results for readability
* Add feature switch for concurrent replay
* Only take the latest vote for each validator in gossip
Since the new vote updates are no longer incremental, there
is no value in storing intermediate votes.
* Address pr feedback
* Handle potential downgrade path, FullTowerVote -> Incremental
* Rename sent to bank -> gossip slot
* Handle downgrade case properly
* Only downgrade for newer votes and feature flag, ignore incremental votes otherwise
* Update test
* Use client certs in QUIC to get peer's stake
* fixes to cert processing
* integrate the code
* clippy
* more cleanup
* sort cargo deps
* test fixes
* info -> debug
Shreds have different workload and traffic pattern from TPU vote and
transaction packets. Some of recent changes to SigVerifyStage are not
suitable or at least optimal for shreds sig-verify; e.g. random discard,
dedup with false positives, discard excess by IP-address, ...
SigVerifier trait is meant to abstract out the distinctions between the
two pipelines, but in practice it has led to more verbose and convoluted
code.
This commit discards SigVerifier implementation for shreds sig-verify
and instead provides a standalone stage for verifying shreds signatures.
With recent patches, window-service recv-window does not do much other
than redirecting packets/shreds to downstream channels.
The commit removes window-service recv-window and instead sends
packets/shreds directly from sigverify to retransmit-stage and
window-service insert thread.
- Forward packets by prioritization in desc order
- Add support of cost-tracking by transaction requested compute units
- Hook up account buckets to forwarder
- Add metrics for forwardable batches count
- Remove redundant invalid packets filtering at end of slot since forwarder will do the same when batch forwardable packets
- Add bench test for forwarding
* Remove UseQuic type
Move to storing the UdpSocket on ConnectionCache and accepting a bool
* Remove use_quic from ConnectionCache constructor
Replace with separate with_udp constructor to force callers to choose
Fully deserializing shreds in window-service before sending them to
retransmit stage adds latency to shreds propagation.
This commit instead channels through the payload and relies on only
partial deserialization of a few required fields: slot, shred-index,
shred-type.
* allow initial hash calc to occur in bg
* validator_initialized -> startup_verification_complete
* add infos for leader and vote
* rework snapshot for startup verification
* change to assert
slot_stats are submitted at a different cadence from the rest of
RetransmitStats. Current code erroneously clears slot_stats before
submitting any metrics.
Shred slot and parent are not verified until window-service where
resources are already wasted to sig-verify and deserialize shreds.
This commit moves above verification to earlier in the pipeline in fetch
stage.
Shreds are dropped in window-service if the slot leader is the node
itself:
https://github.com/solana-labs/solana/blob/cd2878acf/core/src/window_service.rs#L181-L185
However this is done after wasting resources verifying signature on
these shreds, and requires a redundant 2nd lookup of the slot leader.
This commit instead discards such shreds in sigverify stage where we
already know the leader for the slot.
Following commits will skip shreds deserializaton before retransmit, and
so we will only have a ShredId and not a fully deserialized shred to
obtain the shuffling seed from.
* Make sure to root local slots even with hard fork
* Address review comments
* Cleanup a bit
* Further clean up
* Further clean up a bit
* Add comment
* Tweak hard fork reconciliation code placement
In order to preserve current behavior, the threshold is set to the
current value of the argument to IndexedParallelIterator::with_min_len.
Follow up commits will recalibrate this threshold to optimize
performance on mainnet-beta.
Shreds arriving at a node for retransmit tend to belong to the same slot
(or a just a couple of different slots). Slot leader and cluster nodes
are common for the shreds of the same slot, and so the common work to
look up these values can be factored out.
This commit first group-bys shreds by slot to factor out that common
lookup work.
* Define shuffle to prep using same shuffle for multiple slices
* Determine transaction indexes and plumb to execute_batch
* Pair transaction_index with transaction in TransactionStatusService
* Add new ReplicaTransactionInfoVersion
* Plumb transaction_indexes through BankingStage
* Prepare BankingStage to receive transaction indexes from PohRecorder
* Determine transaction indexes in PohRecorder; add field to WorkingBank
* Add PohRecorder::record unit test
* Only pass starting_transaction_index around PohRecorder
* Add helper structs to simplify test DashMap
* Pass entry and starting-index into process_entries_with_callback together
* Add tx-index checks to test_rebatch_transactions
* Revert shuffle definition and use zip/unzip
* Only zip/unzip if randomize
* Add confirm_slot_entries test
* Review nits
* Add type alias to make sender docs more clear
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
replace and remove entries_to_data_shreds from the public interface.
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
RetransmitSlotStats can already be utilized to track when the first
shred for a slot was received; therefore
first_shreds_received: &Mutex<BTreeSet<Slot>>
is redundant. Sending update notifications after shreds retransmit will
also bypass the need for a mutex.
* To include forwarding counters in leader slot metrics
* Capture slot_end_detected time when checking leader slots, to be used in reporting later
* Simplify banking stage loop to report leader slot metrics
Co-authored-by: carllin <carl@solana.com>
* Connection pool in connection cache and handle connection errors
1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.
#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)
Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.
Fixes#25710
* client: Remove static connection cache, plumb it instead
* Add TpuClient::new_with_connection_cache to not break downstream
* Refactor get_connection and RwLock into ConnectionCache
* Fix merge conflicts from new async TpuClient
* Remove `ConnectionCache::set_use_quic`
* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
* Remove the args param from Measure::this since we don't ever use it
* banking_stage.rs: convert to measure!
* poh_recorder.rs: convert to measure!
* cost_update_service.rs: convert to measure!
* poh_service.rs: convert to measure!
* bank.rs: convert to measure!
* measure.rs: Remove Measure::this now that all have been converted to measure!
Packets are at the boundary of the system where, vast majority of the
time, they are received from an untrusted source. Raw indexing into the
data buffer can open attack vectors if the offsets are invalid.
Validating offsets beforehand is verbose and error prone.
The commit updates Packet::data() api to take a SliceIndex and always to
return an Option. The call-sites are so forced to explicitly handle the
case where the offsets are invalid.
* Spawn QUIC server to receive forwarded txs
* Update validator port range
* forward votes using UDP
* no forwarding from unstaked nodes
* forwarding stats in banking stage
* fix test builds
* fix lifetime of forward sender
It used to report the number of packets with successful signature
validations but was accidentally changed to count packets passed into
the verifier by e4409a87fe.
This restores the previous meaning.
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
* FindPacketSenderStake: Remove parallelism to improve performance
The work unit sizes were so small that using the thread pool
slowed down this stage significantly.
* fix checks
Co-authored-by: Justin Starry <justin@solana.com>
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.
This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.
This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
Bytes past Packet.meta.size are not valid to read from.
The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
buffer up to Packet.meta.size. The rest of the buffer is not valid to
read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
of the underlying buffer to write into. The caller is responsible to
update Packet.meta.size after writing to the buffer.
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
* - get prioritization fee from compute_budget instruction;
- update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message;
- updated runtime.md
* update transaction fee calculation for prioritization fee rate as lamports per 10K CUs
* review changes
* fix test
* fix a bpf test
* fix bpf test
* patch feedback
* fix clippy
* fix bpf test
* feedback
* rename prioritization fee rate to compute unit price
* feedback
Co-authored-by: Justin Starry <justin@solana.com>
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
* SigVerify: Add total time metrics for dedup/discard/verify
Previously it was impossible to determine the total time the stage spent
on these activities within a measurement window.
* SigVerify: Add _us postfix to time metrics
Shred::new_empty_data_shred returns an invalid shred (i.e.
shred.sanitize() returns error). The method is only used in tests and
can be easily replaced with Shred::new_from_data. To keep the shred api
surface small, this commit removes this method.
The commit makes values in stake_delegations map in Stakes struct
generic. Stakes<Delegation> is equivalent to the old code and is used
for backward compatibility in BankFieldsTo{Serialize,Deserialize}.
But banks cache Stakes<StakeAccount> which includes the entire stake
account and StakeState deserialized from account. Doing so, will remove
the need to load stake account from accounts-db when working with
stake-delegations.
* Pass the sum of consumed compute units to cost_tracker
* cost model tracks builtins and bpf programs separately, enabling adjust block cost by actual bpf programs execution costs
* Copied nightly-only experimental `checked_add_(un)signed` implementation to sdk
* Add function to update cost tracker with execution cost adjustment
* Review suggestion - using enum instead of struct for CommitTransactionDetails
Co-authored-by: Justin Starry <justin.m.starry@gmail.com>
* review - rename variable to distinguish accumulated_consumed_units from individual compute_units_consumed
* not to use signed integer operations
* Review - using saturating_add_assign!(), and checked_*().unwrap_or()
* Review - using Ordering enum to cmp
* replace checked_ with saturating_
* review - remove unnecessary Option<>
* Review - add function to report number of non-zero units account to metrics
Most nodes in the cluster receive the same shred from two different
nodes: parent, and the first node of their neighborhood:
https://github.com/solana-labs/solana/blob/a8c695ba5/core/src/cluster_nodes.rs#L178-L197
Because of the erasure codings, half of the shreds are already
redundant. So this redundant propagation path will only add extra
overhead.
Additionally the very first node of the broadcast tree has 2x fanout
(i.e. 400 nodes) which adds too much load at one node.
This commit simplifies the broadcast tree by dropping the redundant
propagation path and removing the 2x fanout at root node.
- Add write-lock-contention option, replacing same_payer
- write-lock-contention also has a same-batch-only value, where
contention happens only inside batches, not between them
- Rename num-threads to batches-per-iteration, which is closer to what
it is actually doing.
- Add num-banking-threads as a new option
- Rename packets-per-chunk to packets-per-batch, because this is closer
to what's happening; and it was previously confusing that num-chunks
had little to do with packets-per-chunk.
Example output for a iterations=100 and a permutation of inputs:
contention,threads,batchsize,batchcount,tps
none, 3,192, 4,65290.30
none, 4,192, 4,77358.06
none, 5,192, 4,86436.65
none, 3, 12,64,43944.57
none, 4, 12,64,65852.15
none, 5, 12,64,70674.37
same-batch-only,3,192, 4,3928.21
same-batch-only,4,192, 4,6460.15
same-batch-only,5,192, 4,7242.85
same-batch-only,3, 12,64,11377.58
same-batch-only,4, 12,64,19582.79
same-batch-only,5, 12,64,24648.45
full, 3,192, 4,3914.26
full, 4,192, 4,2102.99
full, 5,192, 4,3041.87
full, 3, 12,64,11316.17
full, 4, 12,64,2224.99
full, 5, 12,64,5240.32
1. use was_executed to correctly identify transactions requires cost adjustment;
2. add function to specifically handle executino cost adjustment without have to copy accounts
* Increase connection timeouts
* Bump quic connection cache to 1024
* Use constant for quic connection timeout and add warm cache service
* Fixes to QUIC warmup service
* fix check failure
* fixes after rebase
* fix timeout test
Co-authored-by: Pankaj Garg <pankaj@solana.com>
Benchmarks show roughly a 6% improvement. The impact could be more
significant when transactions need to be retried a lot.
after patch:
{'name': 'banking_bench_total', 'median': '72767.43'}
{'name': 'banking_bench_tx_total', 'median': '80240.38'}
{'name': 'banking_bench_success_tx_total', 'median': '72767.43'}
test bench_banking_stage_multi_accounts
... bench: 6,137,264 ns/iter (+/- 1,364,111)
test bench_banking_stage_multi_programs
... bench: 10,086,435 ns/iter (+/- 2,921,440)
before patch:
{'name': 'banking_bench_total', 'median': '68572.26'}
{'name': 'banking_bench_tx_total', 'median': '75704.75'}
{'name': 'banking_bench_success_tx_total', 'median': '68572.26'}
test bench_banking_stage_multi_accounts
... bench: 6,521,007 ns/iter (+/- 1,926,741)
test bench_banking_stage_multi_programs
... bench: 10,526,433 ns/iter (+/- 2,736,530)
- don't store pending tx signatures and costs in CostTracker
- apply tx costs to global state immediately again
- go from commit_or_cancel to update_or_remove, where the cost tracker
is either updated with the true costs for successful tx, or the costs
of a retryable tx is removed
- move the function into qos_service and hold the cost tracker lock for
the whole loop
* panic when test timeout
* nonblocking send when when droping banks
* debug log
* timeout for tvu
* unused varaible
* timeout for tpu
* Revert "debug log"
This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.
* add timeout const
* fix typo
* Revert "nonblocking send when when droping banks".
I will create another pull request for this.
This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/validator.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations. The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.
The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
* run validator_exit_test sequentially
* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests
* fix intermittent validator exit failure
* no sleep
* undo the code move
* Revert "core: partial versioned transaction support for voting service"
This reverts commit eb3df4c20e.
* Manually serialize vote tx before sending to TPU
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord