* Make sure to root local slots even with hard fork
* Address review comments
* Cleanup a bit
* Further clean up
* Further clean up a bit
* Add comment
* Tweak hard fork reconciliation code placement
In order to preserve current behavior, the threshold is set to the
current value of the argument to IndexedParallelIterator::with_min_len.
Follow up commits will recalibrate this threshold to optimize
performance on mainnet-beta.
Shreds arriving at a node for retransmit tend to belong to the same slot
(or a just a couple of different slots). Slot leader and cluster nodes
are common for the shreds of the same slot, and so the common work to
look up these values can be factored out.
This commit first group-bys shreds by slot to factor out that common
lookup work.
* Define shuffle to prep using same shuffle for multiple slices
* Determine transaction indexes and plumb to execute_batch
* Pair transaction_index with transaction in TransactionStatusService
* Add new ReplicaTransactionInfoVersion
* Plumb transaction_indexes through BankingStage
* Prepare BankingStage to receive transaction indexes from PohRecorder
* Determine transaction indexes in PohRecorder; add field to WorkingBank
* Add PohRecorder::record unit test
* Only pass starting_transaction_index around PohRecorder
* Add helper structs to simplify test DashMap
* Pass entry and starting-index into process_entries_with_callback together
* Add tx-index checks to test_rebatch_transactions
* Revert shuffle definition and use zip/unzip
* Only zip/unzip if randomize
* Add confirm_slot_entries test
* Review nits
* Add type alias to make sender docs more clear
In prepration of
https://github.com/solana-labs/solana/pull/25807
which reworks erasure batch sizes, this commit:
* adds a helper function mapping the number of data shreds to the
erasure batch size.
* adds ProcessShredsStats to Shredder::entries_to_shreds in order to
replace and remove entries_to_data_shreds from the public interface.
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
RetransmitSlotStats can already be utilized to track when the first
shred for a slot was received; therefore
first_shreds_received: &Mutex<BTreeSet<Slot>>
is redundant. Sending update notifications after shreds retransmit will
also bypass the need for a mutex.
* To include forwarding counters in leader slot metrics
* Capture slot_end_detected time when checking leader slots, to be used in reporting later
* Simplify banking stage loop to report leader slot metrics
Co-authored-by: carllin <carl@solana.com>
* Connection pool in connection cache and handle connection errors
1. The connection not has a pool of connections per address, configurable, default 4
2. The connections per address share a lazy initialized endpoint
3. Handle connection issues better, avoid race conditions
4. Various log improvement for help debug connection issues
#### Problem
blockstore clean and compact is quite slow with wait-for-supermajority purge and can take 20-30 minutes
as described in #25710.
#### Summary of Changes
This PR removes the compaction logic in backup_and_clear_blockstore as the
actual the restoration from a bad fork is handled by `blockstore.purge_slots`
(which is done by issuing rocksdb range-delete that makes the bad fork
unavailable.)
Compaction is irreverent to the shred version, as its main job in this context
is to reclaim disk storage from the deleted slots, which we can let the rocksdb
automatic background compaction to handle it.
Fixes#25710
* client: Remove static connection cache, plumb it instead
* Add TpuClient::new_with_connection_cache to not break downstream
* Refactor get_connection and RwLock into ConnectionCache
* Fix merge conflicts from new async TpuClient
* Remove `ConnectionCache::set_use_quic`
* Move DEFAULT_TPU_USE_QUIC to client, use ConnectionCache::default()
Add in some CPU utilization metrics such as: number of vCPUs, clock frequency, average load across different time intervals, and number of total threads
* Remove the args param from Measure::this since we don't ever use it
* banking_stage.rs: convert to measure!
* poh_recorder.rs: convert to measure!
* cost_update_service.rs: convert to measure!
* poh_service.rs: convert to measure!
* bank.rs: convert to measure!
* measure.rs: Remove Measure::this now that all have been converted to measure!
Packets are at the boundary of the system where, vast majority of the
time, they are received from an untrusted source. Raw indexing into the
data buffer can open attack vectors if the offsets are invalid.
Validating offsets beforehand is verbose and error prone.
The commit updates Packet::data() api to take a SliceIndex and always to
return an Option. The call-sites are so forced to explicitly handle the
case where the offsets are invalid.
* Spawn QUIC server to receive forwarded txs
* Update validator port range
* forward votes using UDP
* no forwarding from unstaked nodes
* forwarding stats in banking stage
* fix test builds
* fix lifetime of forward sender
It used to report the number of packets with successful signature
validations but was accidentally changed to count packets passed into
the verifier by e4409a87fe.
This restores the previous meaning.
#### Problem
blockstore_db.rs has a mutual dependency between blockstore_metrics.rs.
#### Summary of Changes
This PR removes the mutual dependency by moving the option-related stuff
out from blockstore_db.rs to its new home --- blockstore_options.rs.
By doing this, we address the mutual dependency and also make the code cleaner.
* FindPacketSenderStake: Remove parallelism to improve performance
The work unit sizes were so small that using the thread pool
slowed down this stage significantly.
* fix checks
Co-authored-by: Justin Starry <justin@solana.com>
Indices for code and data shreds of the same slot overlap; and so they
will have the same random number generator seed when shuffling cluster
nodes for turbine broadcast.
This results in the same propagation path for code and data shreds of
the same index and effectively smaller sample size for re-transmitter
nodes. For example a 32:32 batch (32 code + 32 data shreds), is
retransmitted through _at most_ 32 unique nodes, whereas ideally we want
~64 unique re-transmitters.
This commit adds shred-type to seed function so that code and data
sherds of the same (slot, index) will (most likely) have different
propagation paths.
Bytes past Packet.meta.size are not valid to read from.
The commit makes the buffer field private and instead provides two
methods:
* Packet::data() which returns an immutable reference to the underlying
buffer up to Packet.meta.size. The rest of the buffer is not valid to
read from.
* Packet::buffer_mut() which returns a mutable reference to the entirety
of the underlying buffer to write into. The caller is responsible to
update Packet.meta.size after writing to the buffer.
Upcoming changes to PacketBatch to support variable sized packets will
modify the internals of PacketBatch. So, this change removes usage of
the internal packet struct and instead uses accessors (which are
currently just wrappers of Vector functions but will change down the
road).
* - get prioritization fee from compute_budget instruction;
- update compute_budget::process_instruction function to take instruction iter to support sanitized versioned message;
- updated runtime.md
* update transaction fee calculation for prioritization fee rate as lamports per 10K CUs
* review changes
* fix test
* fix a bpf test
* fix bpf test
* patch feedback
* fix clippy
* fix bpf test
* feedback
* rename prioritization fee rate to compute unit price
* feedback
Co-authored-by: Justin Starry <justin@solana.com>
A VoteAccount may only wrap an account if the account owner is
solana_vote_program:id or equivalently this check returns true:
solana_vote_program::check_id(account.owner())
In addition to thread_local -> lazy_static change, a number of thread-pools are
initialized with get_max_thread_count to achieve parity with the older code in
terms of number of validator threads.
* SigVerify: Add total time metrics for dedup/discard/verify
Previously it was impossible to determine the total time the stage spent
on these activities within a measurement window.
* SigVerify: Add _us postfix to time metrics
Shred::new_empty_data_shred returns an invalid shred (i.e.
shred.sanitize() returns error). The method is only used in tests and
can be easily replaced with Shred::new_from_data. To keep the shred api
surface small, this commit removes this method.
* Pass the sum of consumed compute units to cost_tracker
* cost model tracks builtins and bpf programs separately, enabling adjust block cost by actual bpf programs execution costs
* Copied nightly-only experimental `checked_add_(un)signed` implementation to sdk
* Add function to update cost tracker with execution cost adjustment
* Review suggestion - using enum instead of struct for CommitTransactionDetails
Co-authored-by: Justin Starry <justin.m.starry@gmail.com>
* review - rename variable to distinguish accumulated_consumed_units from individual compute_units_consumed
* not to use signed integer operations
* Review - using saturating_add_assign!(), and checked_*().unwrap_or()
* Review - using Ordering enum to cmp
* replace checked_ with saturating_
* review - remove unnecessary Option<>
* Review - add function to report number of non-zero units account to metrics
Most nodes in the cluster receive the same shred from two different
nodes: parent, and the first node of their neighborhood:
https://github.com/solana-labs/solana/blob/a8c695ba5/core/src/cluster_nodes.rs#L178-L197
Because of the erasure codings, half of the shreds are already
redundant. So this redundant propagation path will only add extra
overhead.
Additionally the very first node of the broadcast tree has 2x fanout
(i.e. 400 nodes) which adds too much load at one node.
This commit simplifies the broadcast tree by dropping the redundant
propagation path and removing the 2x fanout at root node.
- Add write-lock-contention option, replacing same_payer
- write-lock-contention also has a same-batch-only value, where
contention happens only inside batches, not between them
- Rename num-threads to batches-per-iteration, which is closer to what
it is actually doing.
- Add num-banking-threads as a new option
- Rename packets-per-chunk to packets-per-batch, because this is closer
to what's happening; and it was previously confusing that num-chunks
had little to do with packets-per-chunk.
Example output for a iterations=100 and a permutation of inputs:
contention,threads,batchsize,batchcount,tps
none, 3,192, 4,65290.30
none, 4,192, 4,77358.06
none, 5,192, 4,86436.65
none, 3, 12,64,43944.57
none, 4, 12,64,65852.15
none, 5, 12,64,70674.37
same-batch-only,3,192, 4,3928.21
same-batch-only,4,192, 4,6460.15
same-batch-only,5,192, 4,7242.85
same-batch-only,3, 12,64,11377.58
same-batch-only,4, 12,64,19582.79
same-batch-only,5, 12,64,24648.45
full, 3,192, 4,3914.26
full, 4,192, 4,2102.99
full, 5,192, 4,3041.87
full, 3, 12,64,11316.17
full, 4, 12,64,2224.99
full, 5, 12,64,5240.32
1. use was_executed to correctly identify transactions requires cost adjustment;
2. add function to specifically handle executino cost adjustment without have to copy accounts
* Increase connection timeouts
* Bump quic connection cache to 1024
* Use constant for quic connection timeout and add warm cache service
* Fixes to QUIC warmup service
* fix check failure
* fixes after rebase
* fix timeout test
Co-authored-by: Pankaj Garg <pankaj@solana.com>
Benchmarks show roughly a 6% improvement. The impact could be more
significant when transactions need to be retried a lot.
after patch:
{'name': 'banking_bench_total', 'median': '72767.43'}
{'name': 'banking_bench_tx_total', 'median': '80240.38'}
{'name': 'banking_bench_success_tx_total', 'median': '72767.43'}
test bench_banking_stage_multi_accounts
... bench: 6,137,264 ns/iter (+/- 1,364,111)
test bench_banking_stage_multi_programs
... bench: 10,086,435 ns/iter (+/- 2,921,440)
before patch:
{'name': 'banking_bench_total', 'median': '68572.26'}
{'name': 'banking_bench_tx_total', 'median': '75704.75'}
{'name': 'banking_bench_success_tx_total', 'median': '68572.26'}
test bench_banking_stage_multi_accounts
... bench: 6,521,007 ns/iter (+/- 1,926,741)
test bench_banking_stage_multi_programs
... bench: 10,526,433 ns/iter (+/- 2,736,530)
- don't store pending tx signatures and costs in CostTracker
- apply tx costs to global state immediately again
- go from commit_or_cancel to update_or_remove, where the cost tracker
is either updated with the true costs for successful tx, or the costs
of a retryable tx is removed
- move the function into qos_service and hold the cost tracker lock for
the whole loop
* panic when test timeout
* nonblocking send when when droping banks
* debug log
* timeout for tvu
* unused varaible
* timeout for tpu
* Revert "debug log"
This reverts commit da780a3301a51d7c496141a85fcd35014fe6dff5.
* add timeout const
* fix typo
* Revert "nonblocking send when when droping banks".
I will create another pull request for this.
This reverts commit 088c98ec0facf825b5eca058fb860deba6d28888.
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tpu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/tvu.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Update core/src/validator.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
test_skip_repair in retransmit-stage is no longer relevant because
following: https://github.com/solana-labs/solana/pull/19233
repair packets are filtered out earlier in window-service and so
retransmit stage does not know if a shred is repaired or not.
Also, following turbine peer shuffle changes:
https://github.com/solana-labs/solana/pull/24080
the test has become flaky since it does not take into account how peers
are shuffled for each shred.
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations. The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.
The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
* initial work for poh timing report service
* add poh_timing_report_service to validator
* fix comments
* clippy
* imrove test coverage
* delete record when complete
* rename shred full to slot full.
* debug logging
* fix slot full
* remove debug comments
* adding fmt trait
* derive default
* default for poh timing reporter
* better comments
* remove commented code
* fix test
* more test fixes
* delete timestamps for slot that are older than root_slot
* debug log
* record poh start end in bank reset
* report full to start time instead
* fix poh slot offset
* report poh start for normal ticks
* fix typo
* refactor out poh point report fn
* rename
* optimize delete - delete only when last_root changed
* change log level to trace
* convert if to match
* remove redudant check
* fix SlotPohTiming comments
* review feedback on poh timing reporter
* review feedback on poh_recorder
* add test case for out-of-order arrival of timing points and incomplete timing points
* refactor poh_timing_points into its own mod
* remove option for poh_timing_report service
* move poh_timing_point_sender to constructor
* clippy
* better comments
* more clippy
* more clippy
* add slot poh timing point macro
* clippy
* assert in test
* comments and display fmt
* fix check
* assert format
* revise comments
* refactor
* extrac send fn
* revert reporting_poh_timing_point
* align loggin
* small refactor
* move type declaration to the top of the module
* replace macro with constructor
* clippy: remove redundant closure
* review comments
* simplify poh timing point creation
Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
* run validator_exit_test sequentially
* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests
* fix intermittent validator exit failure
* no sleep
* undo the code move
* Revert "core: partial versioned transaction support for voting service"
This reverts commit eb3df4c20e.
* Manually serialize vote tx before sending to TPU
* transaction-status: Add return data to meta
* Add return data to simulation results
* Use pretty-hex for printing return data
* Update arg name, make TransactionRecord struct
* Rename TransactionRecord -> ExecutionRecord
* Use QUIC client in voting service
* guard quic-client usage with a flag
* add measure to time the quic client
* move time measure outside if block
* remove quic vs UDP flag from voting service
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.
To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type.
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.
Dependency: #23580
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.
--shred-storage <SHRED_STORAGE>
EXPERIMENTAL: Controls how RocksDB compacts shreds. *WARNING*: You will lose your ledger data
when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
disk-write-bytes of the ledger store. [default: level] [possible values: level, fifo]
--shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
268435456000]
* refactor packet_threshold adjustment code into own struct and add unittest for it
* fix a typo in error message
* code review feedbacks
* another code review feedback
* Update core/src/ancestor_hashes_service.rs
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* share packet threshold with repair service (credit to carl)
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* Add simulation detection countermeasures
* Add program and test using TestValidator
* Remove incinerator deposit
* Remove incinerator
* Update Cargo.lock
* Add more features to simulation bank
* Update Cargo.lock per rebase
Co-authored-by: Jon Cinque <jon.cinque@gmail.com>
Transaction logs are not being saved to the database through the plugin interface.
Summary of Changes
Retain the transaction logs when transaction notification plugin is loaded.
Fixes #
lijunwangs/solana-accountsdb-plugin-postgres#6
* Add feature gate for new vote instruction and plumb through replay
Add tower versions
* Add check for slot hashes history
* Update is_recent check to exclude voting on hard fork root slot
* Move tower rollback test to flaky and ignore it until #22551 lands
* Refactor: Rename leader_first_tick_height field
* Refactor: add `PohRecorder::slot_for_tick_height` helper
* Refactor: Add type for poh leader status
As shown by the added benchmark, current code does worse if there is a
spam address plus a lot of unique addresses.
on current master:
test bench_packet_discard_many_senders ... bench: 1,997,960 ns/iter (+/- 103,715)
test bench_packet_discard_mixed_senders ... bench: 14,256,116 ns/iter (+/- 534,865)
test bench_packet_discard_single_sender ... bench: 1,306,809 ns/iter (+/- 61,992)
with this commit:
test bench_packet_discard_many_senders ... bench: 1,644,025 ns/iter (+/- 83,715)
test bench_packet_discard_mixed_senders ... bench: 1,089,789 ns/iter (+/- 86,324)
test bench_packet_discard_single_sender ... bench: 955,234 ns/iter (+/- 55,953)
* metrics for generate new bank forks
* fixed
* Apply suggestions from code review
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* --fixup
* fixup!
Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
* - report cost details for transactions selected to be packed into block;
- report estimated execution units packed into block, and actual units and time after execution
* revert reporting per-transaction details
* rollup transaction cost details (eg signature cost, wirte lock, data cost and execution costs) into block stats
* change naming from units to cu, use struct to replace tuple
* Fixup typo
* Add new feature
* Add new TransactionError
* Add framework for checking account state before and after transaction processing
* Fail transactions that leave new rent-paying accounts
* Only check rent-state of writable tx accounts
* Review comments: combine process_result success behavior; log and metrics before feature activation
* Fix tests that assume rent-exempt accounts are okay
* Remove test no longer relevant
* Remove native/sysvar special case
* Move metrics submission to report legacy->legacy rent paying transitions as well
Tpu::new() now matches Tvu::new() in having struct to reduce argument
list. Additionally, Rust supports partial moves, so there is no need to
clone the Tvu sockets out of Node object.
* Refactor Bank::load_and_execute_transactions
* Refactor: improve type safety of TransactionExecutionResult
* Add enum for extra type safety in execution results
* feedback
https://github.com/solana-labs/solana/pull/22169
verifies authorized-voter early on in vote-listener pipeline; and so
VoteTracker no longer needs to maintain and check for epoch authorized
voters.
* - qos_service metrics tagged with leader thread ids to separate gossip/tpu votes and transactions;
- qos_service metrics is reported with bank slot;
- replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread
* - add tpu live packets (eg, not buffered packets) states to qos metrics reporting
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921
However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.
The commit adds constructs to track coding shreds indices explicitly.
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
* use cost model to limit new account creation
* handle every system instruction
* remove &
* simplify match
* simplify match
* add datapoint for account data size
* add postgres error handling
* handle accounts:unlock_accounts
SlotMeta.last_index may be unknown and current code is using u64::MAX to
indicate that:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L169-L174
This lacks type-safety and can introduce bugs if not always checked for
Several instances of slot_meta.last_index + 1 are also subject to
overflow.
This commit updates the type to Option<u64>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
https://github.com/solana-labs/solana/pull/17004
removed position field from coding-shred-header because as it stands the
field is redundant and unused.
However, with the upcoming changes to erasure coding schema this field
will no longer be redundant and needs to be populated.
* Moves create_message(), native_invoke() and process_cross_program_instruction()
from the InstructionProcessor to the InvokeContext so that they can have a useful "self" parameter.
* Moves InstructionProcessor into InvokeContext and Bank.
* Moves ExecuteDetailsTimings into its own file.
* Moves Executor into invoke_context.rs
* Moves PreAccount into its own file.
* impl AbiExample for BuiltinPrograms
with results code path.
- fix a bug that could unlock accounts that weren't locked
- add test to the refactored function
- skip enumerating transaction accounts if qos results is an error
- add #[must_use] annotation
- avoid clone error in results
- add qos error code to unlock_accounts match statement
- remove unnecessary AbiExample
The TransactionNotifierInterface interface for notifying transactions.
Changes to transaction_status_service to notify the notifier of the transaction data.
Interface to query the plugin's interest in transaction data
Problem
Slot status can be used of in other scenarios in addition to account information such as transactions, blocks. The current implementation is too tightly coupled.
Summary of Changes
Decouple the slot status notification from accounts notification. Created a new slot status notification module.
* - cache calculated transaction cost to allow sharing;
- atomic cost tracking op;
- only lock accounts for transactions eligible for current block;
- moved qos service and stats reporting to its own model;
- add cost_weight default to neutral (as 1), vote has zero weight;
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Update core/src/qos_service.rs
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
* Update core/src/qos_service.rs
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
Add additional checks to should_retransmit_and_persist()
- Check invalid shred index
- Update cases that check if node was leader
- Some comments and variable rename for clarity
* Move test-validator to own module to reduce core dependencies
* Fix a few TestValidator paths
* Use solana_test_validator crate for solana_test_validator bin
* Move client int tests to separate crate
Co-authored-by: Tyera Eulberg <tyera@solana.com>
- decouple cost_model from cost_tracker; allowing one cost_model
instance being shared within a validator;
- update cost_model api to calculate_cost(&self...)->transaction_cost
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).
However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.
This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
tree.
* Staked nodes are always included regardless of if their contact-info
is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
* add filler accounts to bloat validator and predict failure
* assert no accounts match filler
* cleanup magic numbers
* panic if can't load from snapshot with filler accounts specified
* some renames
* renames
* into_par_iter
* clean filler accts, too
Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.
Support multiple plugins at the same time.
Now that CRDS supports incremental snapshot hashes,
SnapshotPackagerService needs to push 'em!
This commit does two main things:
1. SnapshotPackagerService now knows about incremental snapshot hashes,
and will push SnapshotPackage::IncrementalSnapshot hashes to CRDS.
2. At startup, when loading from a full + incremental snapshot, the
hashes need to be passed all the way to SnapshotPackagerService so it
can push these starting hashes to CRDS. Those values have been piped
through.
Fixes#20441 and #20423
* Add separate vote processing tpu port
* Add feature to send to tpu vote port
* Add vote rejecting sigverify mode
* use packet.meta.is_simple_vote_tx in place of deserialization
* consolidate code that identifies vote tx atcommon path for cpu and gpu
* new key for feature set
* banking forward tpu vote
* add tpu vote port to dockerfile and other review changes
* Simplify thread id compare
* fix a test; updated cluster_info ABI change
Co-authored-by: Tao Zhu <tao@solana.com>
Co-authored-by: sakridge <sakridge@gmail.com>
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
* reimplement rpc pubsub with a broadcast queue
* update tests for new pubsub implementation
* fix: fix review suggestions
* chore(rpc): add additional pubsub metrics
* integrate max subscriptions check into SubscriptionTracker to reduce locking
* separate subscription control from tracker
* limit memory usage of items in pubsub broadcast queue, improve error handling
* add more pubsub metrics
* add final count metrics to pubsub
* add metric for total number of subscriptions
* fix small review suggestions
* remove by_params from SubscriptionTracker and add node_progress_watchers map instead
* add subscription tracker tests
* add metrics for number of pubsub notifications as a counter
* ignore clippy lint in TokenCounter
* fix underflow in token counter
* reduce queue capacity in pubsub tests
* fix(rpc): fix test timeouts
* fix race in account subscription test
* Add RpcSubscriptions::new_for_tests
Co-authored-by: Pavel Strakhov <p.strakhov@iconic.vc>
Co-authored-by: Nikita Podoliako <n.podoliako@zubr.io>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
* Add slot, count and accumulated-units to per-program-timings for determining transaction cost elements
* correct the stats naming; fixes the dirty bit resetting
Add `--incremental-snapshots` flag to enable incremental snapshots.
This will allow setting `--full-snapshot-interval-slots` and
`--incremental-snapshot-interval-slots`.
Also added `--maximum-incremental-snapshots-to-retain`.
Co-authored-by: Michael Vines <mvines@gmail.com>
This is the 2nd installment for the AccountsDb replication.
Summary of Changes
The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.
The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.
Functionality used by both the client and server side are encapsulated in the replica-lib crate.
There is no impact to the existing validator by default.
Tests:
Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
AccountsBackgroundService now knows about incremental snapshots. It is
now also in charge of deciding if an AccountsPackage is destined to be a
SnapshotPackage or not (or just used by AccountsHashVerifier).
!!! New behavior changes !!!
Taking snapshots (both bank and archive) **MUST** succeed.
This is required because of how the last full snapshot slot is
calculated, which is used by AccountsBackgroundService when calling
`clean_accounts()`.
File system calls are now unwrapped and will result in a crash. As Trent told me:
>Well I think if a snapshot fails due to some IO error, it's very likely that the operator is going to have to intervene before it works. We should exit error in this case, otherwise the validator might happily spin for several more hours, never successfully writing a complete snapshot, before something else brings it down. This would leave the validator's last local snapshot many more slots behind than it would be had we exited outright and potentially force the operator to abandon ledger continuity in favor of a quick catchup
Other errors will set the `exit` flag to `true`, and the node will gracefully shutdown.
Fixes#19167Fixes#19168
#### Problem
Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:
```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```
#### Summary of Changes
For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.
Co-authored-by: Michael Vines <mvines@gmail.com>
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.
This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.
Following commit will add these metrics back to window-service, earlier
in the pipeline.
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
https://github.com/solana-labs/solana/issues/17789
If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
https://github.com/solana-labs/solana/issues/17789#issuecomment-896304969
The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
Renaming these types to better communicate their usages, which will
further diverge as incremental snapshot support is added.
With the new names, AccountsPacakge now refers to the type between
AccountsBackgroundProcess and AccountsHashVerifier, and SnapshotPackage
refers to the type between AccountsHashVerifier and
SnapshotPackagerService.
* added realtime cost checking logic to reject block that would exceed max limit:
- defines max limits at block_cost_limits.rs
- right after each bath's execution, accumulate its cost and check again
limit, return error if limit is exceeded
* update abi that changed due to adding additional TransactionError
* To avoid counting stats mltiple times, only accumulate execute-timing when a bank is completed
* gate it by a feature
* move cost const def into block_cost_limits.rs
* redefine the cost for signature and account access, removed signer part as it is not well defined for now
* check if per_program_timings of execute_timings before sending
AccountsHashVerifier will need access to both the full and incremental
snapshot archive interval slots config values, which is in the
SnapshotConfig.
Also, cleanup some `Option<>` params and their references.
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots. A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify. Additionally, new tests have been
written and existing tests have been updated to use this new function.
Fixes#18973
While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled. Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`. And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats. This bundling simplifies bank
rebuilding.
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).
At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.
To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.
This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.