Commit Graph

2866 Commits

Author SHA1 Message Date
HaoranYi 6ba4e870c4
Blockstore should drop signals before validator exit (#24025)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* wip

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

* clean up import

* add mutex for signal senders in blockstore

* remove mut

* refactor: extract add signal functions

* make blockstore signal private

* let compiler infer mutex type

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-04-04 11:38:05 -05:00
behzad nouri 7cb3b6cbe2
demotes WeightedShuffle failures to error metrics (#24079)
Since call-sites are calling unwrap anyways, panicking seems too punitive
for our use cases.
2022-04-03 16:20:06 +00:00
HaoranYi ffa4cafe1c
Revert sequential execution of validator_exit and validator_parallel_exit tests (#24048)
* handle channel disconnect

* revert sequential execution of validator_exit and parallel_validator_exit tests
2022-04-02 10:22:47 -05:00
Yueh-Hsuan Chiang 0b5ed87220
(LedgerStore) Enable performance sampling in column family get() (#23834)
#### Summary of Changes
This PR enables RocksDB read side performance metrics to report to blockstore_rocksdb_read_perf.
The sampling rate is controlled by an env arg `SOLANA_METRICS_ROCKSDB_PERF_SAMPLES_IN_1K`,
specifies the number of perf samples for every 1000 operations.  The default value is set to 10, meaning
we will report 10 out of 1000 (or 1/100) reads.

The metrics are based on the RocksDB [PerfContext](https://github.com/facebook/rocksdb/blob/main/include/rocksdb/perf_context.h).
It includes many useful metrics including block read time, cache hit rate, and time spent on decompressing the block.
2022-04-01 13:13:32 -07:00
Pankaj Garg df4d92f9cf
Revert voting service to use UDP instead of QUIC (#24032) 2022-04-01 09:34:18 -07:00
HaoranYi 51b37f0184
Modify rpc_completed_slot_service to be non-blocking (#24007)
* timeout for validator exits

* clippy

* print backtrace when panic

* add backtrace package

* increase time out to 30s

* debug logging

* make rpc complete service non blocking

* reduce log level

* remove logging

* recv_timeout

* remove backtrace

* remove sleep

* remove unused variable

* add comments

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/validator.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* whitespace

* more whitespace

* fix build

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-31 16:44:23 -05:00
Jeff Washington (jwash) 9c8dad33c7
add epoch_schedule and rent_collector to hash calc (#24012) 2022-03-31 10:51:18 -05:00
Jeff Washington (jwash) da001d54e5
calculate_accounts_hash_helper uses config (#24003) 2022-03-31 09:29:45 -05:00
Jeff Washington (jwash) 125f9634fd
add hash calc config.use_write_cache (#24005) 2022-03-30 17:19:34 -05:00
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
Jeff Washington (jwash) c24de17278
remove index hash calculation as an option (#23928) 2022-03-25 15:32:53 -05:00
HaoranYi 01af40d6b6
Fix intermittent validator_exit test failure (#23594)
* run validator_exit_test sequentially

* limit validator exit run to its own serial run subset
add 10ms delay in the validator exit tests

* fix intermittent validator exit failure

* no sleep

* undo the code move
2022-03-25 14:38:19 -05:00
ryleung-solana 6b85c2104c
Implement forwarding via TpuConnection (#23817) 2022-03-25 11:31:40 -04:00
Steven Luscher f44c8f296f
fix: thread `enforce_ulimit_nofile` config down when opening blockstore (#23925) 2022-03-25 03:13:33 -05:00
Jeff Washington (jwash) 51f5524e2f
make verify_accounts_package_hash like other hash calc (#23906) 2022-03-24 17:49:48 -05:00
Jeff Washington (jwash) 55d61023f7
document 'accounts' hash (#23907) 2022-03-24 15:58:52 -05:00
HaoranYi fedf4e984f
typo (#23910) 2022-03-24 15:21:59 -05:00
Jeff Washington (jwash) 37c36ce3fa
pass stats separately from CalcAccountsHashConfig (#23892) 2022-03-24 12:48:47 -05:00
steviez c31db81ac4
Use VoteAccountsHashMap type alias in all applicable spots (#23904) 2022-03-24 12:09:48 -05:00
ryleung-solana 82945ba973
Optimize TpuConnection and its implementations and refactor connection-cache to not use dyn in order to enable those changes (#23877) 2022-03-24 11:40:26 -04:00
Jeff Washington (jwash) 5b916961b5
HashCalc uses self.accounts_cache (#23890) 2022-03-24 10:34:28 -05:00
Jeff Washington (jwash) b22165ad69
hash calc uses self.filler_account_suffix (#23887) 2022-03-24 09:58:06 -05:00
Jeff Washington (jwash) 9022931689
calc hash uses self.num_hash_scan_passes (#23883) 2022-03-24 09:44:42 -05:00
Jeff Washington (jwash) db5d68f01f
HashCalc uses self.accounts_hash_cache_path (#23882) 2022-03-24 09:31:55 -05:00
Jeff Washington (jwash) 3e22d4b286
calc hash uses self.thread_pool_clean (#23881) 2022-03-23 20:52:38 -05:00
Jeff Washington (jwash) 9e61fe7583
add AccountsHashConfig to manage parameters (#23850) 2022-03-23 13:44:23 -05:00
HaoranYi db49b826f0
seperate blockstore metrics from window service metrics (#23871) 2022-03-23 13:38:17 -05:00
HaoranYi 7ff8ed869c
typos (#23870) 2022-03-23 13:36:55 -05:00
Jeff Washington (jwash) b1280b670a
calculate_accounts_hash_without_index takes &self (#23846)
* calculate_accounts_hash_without_index takes &self

* Update runtime/src/snapshot_package.rs

Co-authored-by: Brooks Prumo <brooks@prumo.org>

Co-authored-by: Brooks Prumo <brooks@prumo.org>
2022-03-23 11:57:32 -05:00
Justin Starry 92462ae031
Manually serialize and use `send_wire_transaction` for votes (#23826)
* Revert "core: partial versioned transaction support for voting service"

This reverts commit eb3df4c20e.

* Manually serialize vote tx before sending to TPU
2022-03-23 09:47:55 +08:00
Jon Cinque 7af48465fa
transaction-status: Add return data to meta (#23688)
* transaction-status: Add return data to meta

* Add return data to simulation results

* Use pretty-hex for printing return data

* Update arg name, make TransactionRecord struct

* Rename TransactionRecord -> ExecutionRecord
2022-03-22 23:17:05 +01:00
Trent Nelson eb3df4c20e core: partial versioned transaction support for voting service 2022-03-21 22:59:05 -06:00
HaoranYi 45a7c6edfb
Fix typos and a small refactor (#23805)
* fix typo

* remove packet_has_more_unprocessed_transactions function
2022-03-21 18:35:31 -05:00
Pankaj Garg 5d03b188c8
Use QUIC client in voting service (#23713)
* Use QUIC client in voting service

* guard quic-client usage with a flag

* add measure to time the quic client

* move time measure outside if block

* remove quic vs UDP flag from voting service
2022-03-21 09:10:16 -07:00
Tao Zhu 71ea05c176 replace nested for_each with flat_map 2022-03-18 16:37:41 -05:00
Tao Zhu 1c369fb55f Scan entire UnprocessedPacketBatches buffer to produce stake and locator of each packet 2022-03-18 16:37:41 -05:00
Yueh-Hsuan Chiang f999eef452
(LedgerStore) Rename BlockstoreAdvancedOptions to LedgerColumnOptions (#23764)
This PR renames BlockstoreAdvancedOptions to LedgerColumnOptions, as we will
pass-down this struct to LedgerColumn to allow it to perform metric reporting.
2022-03-18 11:13:35 -07:00
Tao Zhu 56428be629 Not exposing inner cost_table to encapsulating implementation details,
making future change easier.
2022-03-18 12:58:43 -05:00
Tao Zhu 0ed23899e7 directly use compute_budget MAX_UNITS and DEFAULT_UNITS 2022-03-18 08:53:11 -05:00
Tao Zhu a4cacf3389 add deterministic default cost 2022-03-18 08:53:11 -05:00
Tao Zhu c478fe2047 add timing metrics, some renaming 2022-03-17 19:31:28 -05:00
Tao Zhu fd515097d8 leader qos part 2: add stage to find sender stake, set to packet meta 2022-03-17 19:31:28 -05:00
Stephen Akridge 976b138e76 Add tx weighting stage 2022-03-17 19:31:28 -05:00
Michael Vines 3773b753d1 Configure shrink paths during blockstore load 2022-03-15 23:08:07 -07:00
Michael Vines ab373bb1a9 Refactor new_banks_from_ledger() into load and process steps 2022-03-15 23:08:07 -07:00
Michael Vines 2da4e3eb6c Add --no-os-memory-stats-reporting 2022-03-15 17:07:40 -07:00
Michael Vines dbc62f2e28 Use consistent variable naming for DropBankService 2022-03-15 17:07:13 -07:00
Michael Vines d44f3d7216 Remove unhelpful log message 2022-03-15 17:07:13 -07:00
Tao Zhu 2d3501dff9 make upsert infallible op 2022-03-15 17:05:41 -05:00
Tao Zhu 61cead9b9b Remove injection of exit signal into cost_update_service 2022-03-15 09:58:56 -05:00
Tao Zhu eb73dacd58 harden banking tests 2022-03-15 09:58:08 -05:00
Justin Starry 8c8f9694e0
Refactor: Sanitized transaction creation (#23558)
* Refactor: SanitizedTransaction::try_create optionally computes hash

* Refactor: Add SimpleAddressLoader
2022-03-15 12:02:22 +08:00
Tyera Eulberg 102dd68a03
Rename AccountsDb plugins to Geyser plugins (#23604) 2022-03-14 19:18:46 -06:00
Michael Vines 17cc095d28 Slot warping doesn't need to be in new_banks_from_ledger 2022-03-14 15:29:58 -07:00
Michael Vines 2e7ee0f177 Tower loading doesn't need to be in new_banks_from_ledger 2022-03-14 15:29:58 -07:00
Michael Vines 390dc24608 Create leader schedule before processing blockstore 2022-03-14 15:29:58 -07:00
Michael Vines 543d5d4a5d Reduce new_banks_from_ledger arguments 2022-03-14 15:29:58 -07:00
Michael Vines 115f376465 Factor out bank_forks_utils::load_bank_forks() 2022-03-14 15:29:58 -07:00
Michael Vines c2ce152be8 Inline do_process_blockstore_from_root 2022-03-14 15:29:58 -07:00
Tao Zhu 5ea6a1e500 code review 2022-03-14 13:14:27 -05:00
Tao Zhu 8590911b0a Replace type alias with newtype for UnprocesedPacketBatches 2022-03-14 13:14:27 -05:00
Brooks Prumo 7758c32035
Banking Stage drops transactions that'll exceed the total account data size limit (#23537) 2022-03-13 15:58:57 +00:00
Yueh-Hsuan Chiang 1e20bd8f9a
(LedgerStore) Include storage type as a tag in RocksDB metric reporting (#23523)
#### Summary of Changes
This PR further enables group by operation on storage type in blockstore_rocksdb_cfs metrics.
Such group-by allows us to further compare the performance metrics between rocks-level and
rocks-fifo.

To make things extensible, this PR introduces BlockstoreAdvancedOptions and move shred_storage_type. 
All fields in BlockstoreAdvancedOptions will support group-by operation in blockstore_rocksdb_cfs.

Dependency: #23580
2022-03-11 15:17:34 -08:00
Tao Zhu 35d1235ed0
- move `unprocessed_packet_batches` from `BankingStage` to its own (#23508)
module
- deserialize packets during receving and buffering
2022-03-10 18:47:46 +00:00
carllin 588414a776
Report even if slot begins and ends in process_buffered_packets() (#23549) 2022-03-09 23:42:35 -05:00
Tao Zhu f68c5a274d remove persist_cost_table code 2022-03-09 21:05:47 -07:00
Tao Zhu 9f71958d7d Patch validator from loading persisted program costs 2022-03-09 21:05:47 -07:00
sakridge 7a9884c831
Quic limit connections (#23283)
* quic server limit connections

* bump per_ip

* Review comments

* Make the connections per port
2022-03-09 10:52:31 +01:00
Carl Lin 5a0cd05866 Revert "- estimate a program cost as 2 standard deviation above mean"
This reverts commit a25ac1c988.
2022-03-08 17:18:44 -08:00
Carl Lin 9acbfa5eb1 Revert "use EMA in place of Welford"
This reverts commit 6587dbfa47.
2022-03-08 17:18:44 -08:00
Carl Lin c878c9e2cb Revert "1. Persist to blockstore less frequently;"
This reverts commit 7aa1fb4e24.
2022-03-08 17:18:44 -08:00
Carl Lin 0a17edcc1f Revert "fix tests after merge"
This reverts commit ba2d83f580.
2022-03-08 17:18:44 -08:00
Michael Vines b719d6a2ad `solana-validator set-identity` no longer writes a tower file unnecessarily 2022-03-08 15:34:23 -08:00
Justin Starry 3114c199bd
Add RPC support for versioned transactions (#22530)
* Add RPC support for versioned transactions

* fix doc tests

* Add rpc test for versioned txs

* Switch to preflight bank
2022-03-08 15:20:34 +08:00
HaoranYi 181fffb916
rename status filename to be consistent (#23501) 2022-03-07 17:34:35 +00:00
Yueh-Hsuan Chiang b8b7163b66
(Ledger Store) Report RocksDB Column Family Metrics (#22503)
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
2022-03-05 16:13:03 -08:00
Yueh-Hsuan Chiang 62d2a4cd88
Make ShredStorageType::RocksLevel public (#23272)
#### Summary of Changes
This PR adds two hidden arguments to the validator that allow users to use RocksDB's FIFO compaction for storing shreds.

        --shred-storage <SHRED_STORAGE>
            EXPERIMENTAL: Controls how RocksDB compacts shreds.  *WARNING*: You will lose your ledger data
            when you switch between options. Possible values are: 'level': stores shreds using RocksDB's default (level)
            compaction. 'fifo': stores shreds under RocksDB's FIFO compaction. This option is more efficient on
            disk-write-bytes of the ledger store. [default: level]  [possible values: level, fifo]

        --shred-storage-size <SHRED_STORAGE_SIZE_BYTES>
            The shred storage size in bytes. The suggested value is 50% of your ledger storage size in bytes. [default:
            268435456000]
2022-03-03 12:43:58 -08:00
Jeff Washington (jwash) 26aa18b3f3
fmt (#23448) 2022-03-02 11:54:58 -06:00
HaoranYi 41f78b9925
small optimization. use shift for pow of 2. (#22975) 2022-03-02 09:11:12 -06:00
HaoranYi 8de88d0a55
Refactor packet_threshold adjustment code into its own struct (#23216)
* refactor packet_threshold adjustment code into own struct and add unittest for it

* fix a typo in error message

* code review feedbacks

* another code review feedback

* Update core/src/ancestor_hashes_service.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* share packet threshold with repair service (credit to carl)

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-02 09:09:06 -06:00
HaoranYi 86e2f728c3
Fix a batch limits bug in banking (#23327)
* add thread index in thread name for debugging

* fix batch_limit

* use NUM_VOTE_THREAD instead of hardcoded number (credit to carllin)
2022-03-02 09:08:08 -06:00
Jeff Biseda c69e3b73ff
bench get_retransmit_peers (#23292) 2022-03-01 19:10:29 -08:00
Brooks Prumo 533eca3b4c
Simplify replay_blockstore_into_bank() (#23282) 2022-02-25 06:57:04 -06:00
Trent Nelson d4292774c5 checks 2022-02-25 08:05:28 +00:00
Justin Starry d0e85c293f
Fix rustfmt check (#23296) 2022-02-23 16:38:53 +08:00
Gavin Chan 20d031e2b8
Refactor ExecuteTimings w/ enum-indexed array (#23085) 2022-02-22 14:46:56 -08:00
Tyera Eulberg 7e08ae1d0c
Revert "Add simulation detection countermeasure (#22880)" (#23261)
This reverts commit c42b80f099.
2022-02-21 21:15:37 +00:00
buffalu 70ebab2c82
Add rustfmt.toml and `cargo fmt` (#23238)
* fmt

* formatted

Co-authored-by: Lucas B <buffalu@jito.network>
2022-02-19 13:32:29 +08:00
carllin 619335df1a
Add execute timings (#23097) 2022-02-17 01:14:32 -05:00
anatoly yakovenko 83d31c9e65
shrink batches when over 80% of the space is wasted (#23066)
* shrink batches when over 80% of the space is wasted
2022-02-16 08:18:17 -08:00
Jeff Biseda 115d71536b
forward_buffered_packets return packet count in error path (#23167) 2022-02-16 07:46:32 -08:00
Michael Vines a6d736572c `solana-validator set-identity` now supports the `--require-tower` flag 2022-02-15 19:45:00 -08:00
Tao Zhu 03bf66a51b flag end-of-slot when poh bank is gone 2022-02-15 15:01:27 -06:00
Ashwin Sekar ab92578b02
Fix the flaky test test_restart_tower_rollback (#23129)
* Add flag to disable voting until a slot to avoid duplicate voting

* Fix the tower rollback test and remove it from flaky.
2022-02-15 13:19:34 -07:00
Michael Vines c42b80f099
Add simulation detection countermeasure (#22880)
* Add simulation detection countermeasures

* Add program and test using TestValidator

* Remove incinerator deposit

* Remove incinerator

* Update Cargo.lock

* Add more features to simulation bank

* Update Cargo.lock per rebase

Co-authored-by: Jon Cinque <jon.cinque@gmail.com>
2022-02-15 13:09:59 +01:00
Lijun Wang c04438be4b
Retaining transaction logs when transaction plugin is loaded. (#22874)
Transaction logs are not being saved to the database through the plugin interface.

Summary of Changes

Retain the transaction logs when transaction notification plugin is loaded.

Fixes #
lijunwangs/solana-accountsdb-plugin-postgres#6
2022-02-11 20:29:07 -08:00
carllin 2f9e30a1f7
Introduce slot-specific packet metrics (#22906) 2022-02-11 03:07:45 -05:00
Justin Starry d5dec989b9
Enforce tx metadata upload with static types (#23028) 2022-02-10 13:28:18 +08:00
Yueh-Hsuan Chiang 1b287f1b59
(Ledger Cleanup) Add code comments for ledger_cleanup. (#22807) 2022-02-08 22:48:56 -08:00
Tao Zhu ba2d83f580 fix tests after merge 2022-02-08 16:18:23 -06:00
Tao Zhu 7aa1fb4e24 1. Persist to blockstore less frequently;
2. reduce alpha for EMA to 1 percent to have roughly 200 data points for estimatio
2022-02-08 16:18:23 -06:00
Tao Zhu 6587dbfa47 use EMA in place of Welford 2022-02-08 16:18:23 -06:00
Tao Zhu a25ac1c988 - estimate a program cost as 2 standard deviation above mean
- replaced get_average / get_mode with get_default to assign max units to unknown program
2022-02-08 16:18:23 -06:00
Tao Zhu e52e48076e
bench should update leader schedule cache (#22991) 2022-02-08 02:28:28 +00:00
Ashwin Sekar 5acf0f6331
Add feature gate for new vote instruction and plumb through replay (#21683)
* Add feature gate for new vote instruction and plumb through replay

Add tower versions

* Add check for slot hashes history

* Update is_recent check to exclude voting on hard fork root slot

* Move tower rollback test to flaky and ignore it until #22551 lands
2022-02-07 14:06:19 -08:00
behzad nouri 27aaf9df85
removes VoteTracker::new in favor of VoteTracker::default (#22941)
VoteTracker::new does not need a bank and is so redundant:
https://github.com/solana-labs/solana/blob/5a230f418/core/src/cluster_info_vote_listener.rs#L103-L107
2022-02-04 19:01:59 +00:00
sakridge 5a230f418d
Add quic port for accepting transactions (#22753)
using quinn library

streamer: Sign TLS cert with validator identity key

Handle multiple incoming chunks
2022-02-04 15:27:09 +01:00
Tao Zhu 4bec182b32
Allow buffered packets be consumed if bank is active, regardless leader schedule (#22913) 2022-02-03 21:29:41 +00:00
Justin Starry 60af1a4cce
Refactor: Add trait for loading addresses (#22903) 2022-02-03 11:00:12 +00:00
carllin bd1850df25
Return actual committed transactions from process_transactions() (#22802) 2022-02-03 03:56:36 -05:00
Trent Nelson c62f9839a2 test-validator-bin: reinstate full rpc method set 2022-02-03 02:43:03 +00:00
Ikko Ashimine 58a70d76a3
fix typo in broadcast_duplicates_run.rs (#22888)
Creat -> Create
2022-02-02 12:29:14 -07:00
behzad nouri dccbddad80
adds reverse lookup index to cluster-nodes (#22892)
retransmit has to exclude slot leader from set of nodes for each shred; 
which currently requires a linear scan:
https://github.com/solana-labs/solana/blob/e3b137066/core/src/cluster_nodes.rs#L238-L242

This commit adds a reverse lookup index to avoid linear scan.
2022-02-02 19:27:50 +00:00
behzad nouri e3b137066d
caches WeightedShuffle struct in ClusterNodes (#22877)
Instead of reconstructing WeightedShuffle struct for each shred
broadcast or retransmit, we can use the same struct with minimal
mutations.
2022-02-02 15:12:26 +00:00
Trent Nelson eac4a6df68 rpc: use minimal mode by default 2022-02-01 19:00:06 -07:00
behzad nouri 45e09664b8
removes Rng field from WeightedShuffle struct (#22850) 2022-02-01 15:27:23 +00:00
behzad nouri 604ca9316c
includes zero weighted entries in WeightedShuffle (#22829)
Current WeightedShuffle implementation excludes zero weighted entries
from the shuffle:
https://github.com/solana-labs/solana/blob/13e631dcf/gossip/src/weighted_shuffle.rs#L29-L30

Though mathematically this might make more sense, for our use-cases
(turbine specifically), this results in less efficient code:
https://github.com/solana-labs/solana/blob/13e631dcf/core/src/cluster_nodes.rs#L409-L430

This commit changes the implementation so that zero weighted indices are
also included in the shuffle but appear only at the end after non-zero
weighted indices.
2022-01-31 16:23:50 +00:00
Justin Starry 220aa6ada0
Fix poh recorder initialization on startup (#22755) 2022-01-28 14:21:15 +08:00
Michael Vines 331b953551 Add vote account address to vote subscription 2022-01-27 08:22:29 -08:00
Justin Starry d9c259a231
Set the correct root in block commitment cache initialization (#22750)
* Set the correct root in block commitment cache initialization

* clean up test

* bump
2022-01-27 00:48:00 +08:00
sakridge 2e56c59bcb
Handle already discarded packets in gpu sigverify path (#22680) 2022-01-24 14:35:47 +01:00
Justin Starry 1240217a73
Refactor: Rename variables and helper method to `PohRecorder` (#22676)
* Refactor: Rename leader_first_tick_height field

* Refactor: add `PohRecorder::slot_for_tick_height` helper

* Refactor: Add type for poh leader status
2022-01-23 10:28:50 +08:00
anatoly yakovenko d6011ba14d
Dedup bloom filter is too slow (#22607)
* Faster dedup

* use ahash

* fixup

* single threaded

* use duration type

* remove the count

* fixup
2022-01-21 20:23:48 -07:00
Michael Vines 6d5bbca630 Pacify clippy 2022-01-21 19:12:57 -08:00
Michael Vines ce4f7601af Avoid unstable_name_collisions warning 2022-01-21 19:12:57 -08:00
sakridge 38b02bbcc0
Handle already discarded packets in discard_excess_packets (#22594) 2022-01-21 17:22:50 +01:00
Jeff Biseda e7777281d6
regularly report network limits (#22563) 2022-01-20 12:38:42 -08:00
Trent Nelson cca3dbc76d system-monitor-service: support percentages from bigger numbers 2022-01-20 09:51:23 +00:00
Justin Starry 7f20c6149e
Refactor: move simple vote parsing to runtime (#22537) 2022-01-20 10:39:21 +08:00
anatoly yakovenko d343713f61
Optimize packet dedup (#22571)
* Use bloom filter to dedup packets

* dedup first

* Update bloom/src/bloom.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* fixup

* fixup

* fixup

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-19 13:58:20 -08:00
behzad nouri dcf44d2523
improves sigverify discard_excess_packets performance (#22577)
As shown by the added benchmark, current code does worse if there is a
spam address plus a lot of unique addresses.

on current master:
test bench_packet_discard_many_senders  ... bench:   1,997,960 ns/iter (+/- 103,715)
test bench_packet_discard_mixed_senders ... bench:  14,256,116 ns/iter (+/- 534,865)
test bench_packet_discard_single_sender ... bench:   1,306,809 ns/iter (+/- 61,992)

with this commit:
test bench_packet_discard_many_senders  ... bench:   1,644,025 ns/iter (+/- 83,715)
test bench_packet_discard_mixed_senders ... bench:   1,089,789 ns/iter (+/- 86,324)
test bench_packet_discard_single_sender ... bench:     955,234 ns/iter (+/- 55,953)
2022-01-19 18:10:02 +00:00
buffalu 650882217c
Add PacketBatch packet_indexes stat (#22564)
* collect stats on packet batch indicies

* cleanup

* cleanup

* cleanup

* change name
2022-01-19 08:13:07 +00:00
anatoly yakovenko e616a7ebfc
Track discard time of excess packets in sigverify (#22554)
* discard time histogram

* closer to the if

* update
2022-01-18 15:09:39 -07:00
Michael Vines 8dc6f9f589 Remove unused mut 2022-01-18 12:10:31 -08:00
sakridge 49443406fd
Use VecDeque instead of Vec in sigverify stage (#22538)
avoid bad performance of remove(0) for a single sender
2022-01-17 18:37:05 +01:00
anatoly yakovenko 2d94e6e5d3
metrics for generate new bank forks (#22492)
* metrics for generate new bank forks

* fixed

* Apply suggestions from code review

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* --fixup

* fixup!

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-17 09:59:47 -07:00
Tao Zhu a724fa2347
Add hidden cli option to allow validator reports replayed transaction cost metrics (#22369)
* add hidden cli option to allow validator reports replayed transaction cost detail metrics

* Update validator/src/main.rs

Co-authored-by: Michael Vines <mvines@gmail.com>

* - rebase master, using unbounded instead of channel; dowgrade to datapoint_trace

* removed cli arg, prefer log at trace

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-01-15 00:31:21 +00:00
Tao Zhu 1309a9cea0
Add estimated and actual block cost units metrics (#22326)
* - report cost details for transactions selected to be packed into block;
- report estimated execution units packed into block, and actual units and time after execution

* revert reporting per-transaction details

* rollup transaction cost details (eg signature cost, wirte lock, data cost and execution costs) into block stats

* change naming from units to cu, use struct to replace tuple
2022-01-14 23:44:18 +00:00
Tao Zhu 9c9f2dd5bd port counting vote CUs to block cost (#22477) 2022-01-14 10:50:29 -06:00
Justin Starry f804ccdece
Store address table lookups in blockstore and bigtable (#22402) 2022-01-14 15:24:41 +08:00
carllin 4ab7d6c23e
Filter out outdated slots (#22450)
* Filter out outdated slots

* Fixup error
2022-01-13 19:51:00 -05:00
Tao Zhu 6614727be8 downgrade individual per-program-timing to trace to reduce writes to influx 2022-01-12 18:52:13 -06:00
Tyera Eulberg 637e366b18
Prevent rent-paying account creation (#22292)
* Fixup typo

* Add new feature

* Add new TransactionError

* Add framework for checking account state before and after transaction processing

* Fail transactions that leave new rent-paying accounts

* Only check rent-state of writable tx accounts

* Review comments: combine process_result success behavior; log and metrics before feature activation

* Fix tests that assume rent-exempt accounts are okay

* Remove test no longer relevant

* Remove native/sysvar special case

* Move metrics submission to report legacy->legacy rent paying transitions as well
2022-01-11 11:32:25 -07:00
Jeff Biseda 8b66625c95
convert std::sync::mpsc to crossbeam_channel (#22264) 2022-01-11 02:44:46 -08:00
steviez 5f1f4dcbdd
Use struct to pass all Tpu sockets as one argument to Tpu::new() (#21965)
Tpu::new() now matches Tvu::new() in having struct to reduce argument
list. Additionally, Rust supports partial moves, so there is no need to
clone the Tvu sockets out of Node object.
2022-01-10 11:29:48 -06:00
Ashwin Sekar eeec1ce2ad
Add local cluster test to repro slot hash expiry bug (#21873) 2022-01-10 00:58:21 -05:00
Justin Starry 52d12cc802
Add runtime support for address table lookups (#22223)
* Add support for address table lookups in runtime

* feedback

* feedback
2022-01-07 11:59:09 +08:00
Trent Nelson 390ef0fbcd Consolidate process instruction execution timings to own struct 2022-01-06 03:56:46 -07:00
Trent Nelson 848b6dfbdd Add metrics for executor creation 2022-01-06 03:56:46 -07:00
Carl Lin b25e4a200b Add execute metrics 2022-01-06 03:56:46 -07:00
Trent Nelson 7d32909e17 move `ExecuteTimings` from `runtime::bank` to `program_runtime::timings` 2022-01-06 03:56:46 -07:00
Justin Starry 45458e7139
Refactor: Improve type safety and readability of transaction execution (#22215)
* Refactor Bank::load_and_execute_transactions

* Refactor: improve type safety of TransactionExecutionResult

* Add enum for extra type safety in execution results

* feedback
2022-01-05 10:15:15 +08:00
behzad nouri 4b24499916 removes total-size from return value of recv_mmsg 2022-01-04 21:06:59 +00:00
behzad nouri 01a096adc8 adds bitflags to Packet.Meta
Instead of a separate bool type for each flag, all the flags can be
encoded in a type-safe bitflags encoded in a single u8:
https://github.com/solana-labs/solana/blob/d6ec103be/sdk/src/packet.rs#L19-L31
2022-01-04 13:53:40 +00:00
behzad nouri 73a7741c49 uses std::net::IpAddr type for Packet.Meta.addr 2022-01-04 13:53:40 +00:00
sakridge 2486e21ffe
Lower vote-only-mode to 400 (#22210) 2022-01-04 12:49:14 +01:00
Jeff Biseda ca8fef5855
retransmit consecutive leader blocks (#22157) 2022-01-04 00:24:16 -08:00
Yueh-Hsuan Chiang e8b7f96a89
Add struct BlockstoreOptions (#22121) 2022-01-03 18:30:45 -10:00
carllin 005592998d
Fix bug, add error specific timings (#22225) 2022-01-03 16:33:54 -05:00
behzad nouri 69d71f8f86
removes epoch_authorized_voters from VoteTracker (#22192)
https://github.com/solana-labs/solana/pull/22169
verifies authorized-voter early on in vote-listener pipeline; and so
VoteTracker no longer needs to maintain and check for epoch authorized
voters.
2022-01-03 21:07:47 +00:00
Jeff Biseda 0e4ede46d1
work around rust 39364 for stats_reporter_sender (#22227) 2022-01-03 11:46:02 -08:00
carllin d06e6c7425
Count compute units even when transaction errors (#22182) 2021-12-30 21:21:42 -05:00
Jeff Biseda 95dfcc546a
bypass retransmission for slots without propagated stats (#22176) 2021-12-30 16:07:34 -08:00
behzad nouri c0c6038654
checks for authorized voter early on in the vote-listener pipeline (#22169)
Before votes are verified that they are signed by the authorized voter,
they might be dropped in verified-vote-packets code. If there are
enough many spam votes from unauthorized voters, this may potentially
drop valid votes but keep the false ones.
https://github.com/solana-labs/solana/blob/57986f982/core/src/verified_vote_packets.rs#L165-L168
2021-12-30 15:03:14 +00:00
carllin 33d0b5e011
Revert "Count compute units even when transaction errors (#22059)" (#22174)
This reverts commit eaa8c67bde.
2021-12-30 02:42:32 -05:00
Lijun Wang f14928a970
Stream additional block metadata via plugin (#22023)
* Stream additional block metadata through plugin
blockhash, block_height, block_time, rewards are streamed
2021-12-29 15:12:01 -08:00
Justin Starry b1d9a2e60e
Don't forward packets received from TPU forwards port (#22078)
* Don't forward packets received from TPU forwards port

* Add banking stage test
2021-12-29 19:34:31 +01:00
carllin eaa8c67bde
Count compute units even when transaction errors (#22059) 2021-12-28 17:05:11 -05:00
carllin f061059e45
Prevent log spam (#22148) 2021-12-28 17:04:48 -05:00
Tao Zhu 3d6ab96587 push live packets straight to buffer, leader only process packets from buffer 2021-12-28 15:21:24 -06:00
Yueh-Hsuan Chiang b89cd8cd1a
Avoid cloning Vec<Entry> when calling entries_to_test_shreds() (#22093) 2021-12-24 12:32:43 -08:00
Justin Starry 93c776ce19
Refactor packet deduplication and harden bench test (#22080) 2021-12-22 23:05:10 -06:00
Tao Zhu dd80a525ef
Leader QoS service metrics (#21708)
* - qos_service metrics tagged with leader thread ids to separate gossip/tpu votes and transactions;
- qos_service metrics is reported with bank slot;
- replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread

* - add tpu live packets (eg, not buffered packets) states to qos metrics reporting
2021-12-22 21:39:59 +00:00
behzad nouri 4d62f03297
uses enum instead of trait for VoteTransaction (#22019)
Box<dyn Trait> involves runtime dispatch, has significant overhead and
is slow. It also requires hacky boilerplate code for implementing Clone
or other basic traits:
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L70-L102

Only limited known types can be VoteTransaction and they are all defined
in the same crate. So using a trait here only adds overhead.
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L125-L165
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L221-L264
2021-12-22 14:25:46 +00:00
Tao Zhu 9c5d82557a skip reporting all-zero stats 2021-12-21 16:20:36 -06:00
behzad nouri 65d59f4ef0
tracks erasure coding shreds' indices explicitly (#21822)
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.
2021-12-19 22:37:55 +00:00
behzad nouri 7476dfeec0
removes Select in favor of recv_timeout/try_iter (#21981)
crossbeam_channel::Select::ready_timeout might return with success spuriously.
2021-12-18 17:39:07 +00:00
Jeff Biseda 3fe942ab30
new net-stats require a new table (#21996) 2021-12-18 00:13:16 -08:00
carllin 7f6fb6937a
Ensure AncestorHashesSerice selects an open port (#21919) 2021-12-18 00:44:01 -05:00
Jeff Biseda 97a1fa10a6
streamer send destination metrics for repair, gossip (#21564) 2021-12-17 15:21:05 -08:00
segfaultdoctor 76098dd42a
RPC Block Subscription (#21787)
* add stuff

* compiling

* add notify block

* wip

* feat: add blockSubscribe pubsub method

* address PR comments

Co-authored-by: Lucas B <buffalu@jito.network>
Co-authored-by: Zano <segfaultdoctor@protonmail.com>
2021-12-17 16:03:09 -07:00
behzad nouri 89d66c3210
removes next_shred_index from return value of entries to shreds api (#21961)
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
2021-12-17 15:01:55 +00:00
Jeff Biseda 7ec39f5a1e
time based retransmit in replay_stage (#21498) 2021-12-17 05:44:40 -08:00
carllin 385efae4b3
Remove need to send bank in retransmit request from ReplayStage (#21943)
* Remove need to send bank in retransmitter
2021-12-16 21:11:01 -05:00
Justin Starry 6ff0be6a82
Clean up demote program write lock feature (#21949)
* Clean up demote program write lock feature

* fix test
2021-12-16 17:27:22 -05:00
carllin cb395abff7
Fix subtraction overflow (#21871) 2021-12-14 14:24:22 -05:00
behzad nouri 8d980f07ba
uses Option<Slot> for SlotMeta.parent_slot (#21808)
SlotMeta.parent_slot for the head of a detached chain of slots is
unknown and that is indicated by u64::MAX which lacks type-safety:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L203-L205

The commit changes the type to Option<Slot>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
2021-12-14 18:57:11 +00:00
behzad nouri 4ceb2689f5
adds ShredId uniquely identifying each shred (#21820) 2021-12-14 17:34:02 +00:00
Jeff Washington (jwash) 90f41fd9b7
use cost model to limit new account creation (#21369)
* use cost model to limit new account creation

* handle every system instruction

* remove &

* simplify match

* simplify match

* add datapoint for account data size

* add postgres error handling

* handle accounts:unlock_accounts
2021-12-12 14:57:18 -06:00
behzad nouri e08139f949
uses Option<u64> for SlotMeta.last_index (#21775)
SlotMeta.last_index may be unknown and current code is using u64::MAX to
indicate that:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L169-L174

This lacks type-safety and can introduce bugs if not always checked for
Several instances of slot_meta.last_index + 1 are also subject to
overflow.

This commit updates the type to Option<u64>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
2021-12-11 14:47:20 +00:00
Justin Starry 254ef3e7b6
Rename Packets to PacketBatch (#21794) 2021-12-11 09:44:15 -05:00
behzad nouri 8063273d09
adds more sanity checks to shreds (#21675) 2021-12-09 16:43:57 +00:00
Ashwin Sekar f0acf7681e
Add vote instructions that directly update on chain vote state (#21531)
* Add vote state instructions

UpdateVoteState and UpdateVoteStateSwitch

* cargo tree

* extract vote state version conversion to common fn
2021-12-07 16:47:26 -08:00
behzad nouri cd17f63d81
adds back position field to coding-shred-header (#21600)
https://github.com/solana-labs/solana/pull/17004
removed position field from coding-shred-header because as it stands the
field is redundant and unused.
However, with the upcoming changes to erasure coding schema this field
will no longer be redundant and needs to be populated.
2021-12-05 14:42:09 +00:00
Jeff Biseda 9c6b95e1e1
fix distance calculation in get_closest_completion (#21601) 2021-12-03 22:36:46 -08:00
Justin Starry 1430b58a6d
Remove deprecated slow epoch boundary methods (#21568) 2021-12-03 17:59:10 +00:00
Michael Vines b8837c04ec Reformat imports to a consistent style for imports
rustfmt.toml configuration:
  imports_granularity = "One"
  group_imports = "One"
2021-12-03 09:19:13 -08:00
Alexander Meißner b78f5b6032
Refactor: Cleanup InstructionProcessor (#21404)
* Moves create_message(), native_invoke() and process_cross_program_instruction()
from the InstructionProcessor to the InvokeContext so that they can have a useful "self" parameter.

* Moves InstructionProcessor into InvokeContext and Bank.

* Moves ExecuteDetailsTimings into its own file.

* Moves Executor into invoke_context.rs

* Moves PreAccount into its own file.

* impl AbiExample for BuiltinPrograms
2021-12-01 08:54:42 +01:00
Michael Vines ba9dfa0d22 Remove frozen account support 2021-11-29 08:38:11 -08:00
Tao Zhu 9edfc5936d
Refactor accounts.rs with Justin's comments to improve lock accounts (#21406)
with results code path.
- fix a bug that could unlock accounts that weren't locked
- add test to the refactored function
- skip enumerating transaction accounts if qos results is an error
- add #[must_use] annotation
- avoid clone error in results
- add qos error code to unlock_accounts match statement
- remove unnecessary AbiExample
2021-11-23 21:17:55 +00:00
Lijun Wang c29838fce1
Accountsdb plugin transaction part 3: Transaction Notifier (#21374)
The TransactionNotifierInterface interface for notifying transactions.
Changes to transaction_status_service to notify the notifier of the transaction data.
Interface to query the plugin's interest in transaction data
2021-11-23 09:55:53 -08:00
Tao Zhu 2602e7c3bc
Fix flaky test (#21402)
* the async test is flaky on ci

* fix unstable test by increasing stats repoting time
2021-11-23 09:47:17 -06:00
behzad nouri dd338b6c9f
changes Shred::parent return type to Option<Slot> (#21370)
Shred::parent can return garbage if the struct fields are invalid:
https://github.com/solana-labs/solana/blob/8a50b6302/ledger/src/shred.rs#L446-L453

The commit adds more sanity checks and changes the return type to Option<Slot>.
2021-11-23 14:45:26 +00:00
Tao Zhu cd5a39ee43
the async test is flaky on ci (#21365) 2021-11-22 18:16:20 -06:00
Jeff Washington (jwash) 87831e7f8d
start system monitor earlier in validator so we get memory stats at startup (#21372) 2021-11-22 14:37:17 -06:00
sakridge f31ca8ba8c
Report cluster slots size (#21380) 2021-11-22 17:47:58 +01:00
Jeff Biseda 2ed7e3af89
prioritize slot repairs for unknown last index and close to completion (#21070) 2021-11-19 19:17:30 -08:00
sakridge 0bda0c3e0c
Add bank drop service (#21322) 2021-11-19 17:20:18 +01:00
behzad nouri 48dfdfb4d5 changes Blockstore::is_shred_duplicate arg type to ShredType 2021-11-19 14:16:39 +00:00
behzad nouri 57057f8d39 uses enum for shred type
Current code is using u8 which does not have any type-safety and can
contain invalid values:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L167

Checks for invalid shred-types are scattered through the code:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/blockstore.rs#L849-L851
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L346-L348

The commit uses enum for shred type with #[repr(u8)]. Backward
compatibility is maintained by implementing Serialize and Deserialize
compatible with u8, and adding a test to assert that.
2021-11-19 14:16:39 +00:00
carllin b30c94ce55
ClusterInfoVoteListener send only missing votes to BankingStage (#20873) 2021-11-18 15:20:41 -08:00
Tao Zhu 0ca255220e
- Encapsulate QoS Service metrics reporting within QosServioce, so client (#21191)
code (eg banking_stage) doesn't need to worry about it.
- Remove dead cost_* stats from banking_stage, clean up call path.
2021-11-18 15:35:30 -06:00
Lijun Wang 89c45a57f8
Refactor slot status notification to decouple from accounts notifications (#21308)
Problem

Slot status can be used of in other scenarios in addition to account information such as transactions, blocks. The current implementation is too tightly coupled.

Summary of Changes

Decouple the slot status notification from accounts notification. Created a new slot status notification module.
2021-11-17 17:11:38 -08:00
Jeff Biseda d5de0c8e12
add --no-os-network-stats-reporting option (#21296) 2021-11-16 10:26:03 -08:00
sakridge 398af132a5
More set_root metrics (#21286) 2021-11-15 16:28:18 -07:00
Jeff Washington (jwash) f2bd9947cc
mem stats: rescale from kb to bytes (#21282) 2021-11-15 14:42:41 -06:00
Jeff Washington (jwash) f8dcb2f38b
report mem stats (#21258) 2021-11-13 00:59:41 +00:00
Michael Keleti b0ca335463
Rename "trusted" to "known" in `validators/` (#21197)
* Replaced trusted with known validator

* Format Convention
2021-11-12 11:57:55 -07:00
Tao Zhu 11153e1f87
refactor cost calculation (#21062)
* - cache calculated transaction cost to allow sharing;
- atomic cost tracking op;
- only lock accounts for transactions eligible for current block;
- moved qos service and stats reporting to its own model;
- add cost_weight default to neutral (as 1), vote has zero weight;

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-11-12 01:04:53 -06:00
Ivan Mironov c78f474373 Add validator option to change niceness of snapshot packager thread 2021-11-04 17:16:46 -06:00
Alexander Meißner 7200c5106e
Replaces MockInvokeContext by ThisInvokeContext in tests (#20881)
* Replaces MockInvokeContext by ThisInvokeContext in BpfLoader, SystemInstructionProcessor, CLIs, ConfigProcessor, StakeProcessor and VoteProcessor.

* Finally, removes MockInvokeContext, MockComputeMeter and MockLogger.

* Adjusts assert_instruction_count test.

* Moves ThisInvokeContext to the program-runtime crate.
2021-11-04 21:47:32 +01:00
Justin Starry 140a5f633d
Simplify replay vote tracking by using packet metadata (#21112) 2021-11-03 09:02:48 +00:00
steviez e6280fc1fa
Add additional checks for should_retransmit_and_persist() (#20672)
Add additional checks to should_retransmit_and_persist()

- Check invalid shred index
- Update cases that check if node was leader
- Some comments and variable rename for clarity
2021-11-03 02:01:07 -05:00
sakridge a8d78e89d3
Move test-validator to own module to reduce core dependencies (#20658)
* Move test-validator to own module to reduce core dependencies

* Fix a few TestValidator paths

* Use solana_test_validator crate for solana_test_validator bin

* Move client int tests to separate crate

Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-10-29 01:27:07 +00:00
Justin Starry 036d7fcc81
Clean up sanitized tx creation for tests (#21006) 2021-10-27 18:09:16 +01:00
sakridge 261dd96ae3
Swap banking stage vote channels (#20987) 2021-10-26 21:20:31 +02:00
behzad nouri 1297a13586
adds metrics tracking crds writes and votes (#20953) 2021-10-26 13:02:30 +00:00
Jeff Washington (jwash) 43ea579f63
add cli for --accounts-hash-num-passes (#20827) 2021-10-25 09:45:46 -05:00
Tao Zhu c2bfce90b3
- cost_tracker is data member of a bank, it can report metrics when bank is frozen (#20802)
- removed cost_tracker_stats and histogram
- move stats reporting outside of bank freeze
2021-10-24 22:19:23 -05:00
behzad nouri 5e1cf39c74
adds metrics for number of outgoing shreds in retransmit stage (#20882) 2021-10-24 13:12:27 +00:00
Michael Vines 350bb561eb Clippy 2021-10-23 08:21:20 +00:00
Jack May bfbbc53dac
Divorce the runtime from FeeCalculator (#20737) 2021-10-22 14:32:40 -07:00
Justin Starry 735016661b
Report timing info for stakes cache updates from txs (#20856) 2021-10-22 12:49:02 -04:00
Tao Zhu 71d0bd4605
Add counter for dropped duplicated packets, fix dropped_packets_count (#20834) 2021-10-21 02:56:48 +00:00
Trent Nelson fe098b5ddc rpc-send-tx-svc: add with_config constructor 2021-10-20 13:43:27 -06:00
Jeff Washington (jwash) 95e91a4863
disable gossip publish of snapshots when using filler accts (#20824) 2021-10-20 18:07:29 +00:00
Tao Zhu 7496b5784b
- make cost_tracker a member of bank, remove shared instance from TPU; (#20627)
- decouple cost_model from cost_tracker; allowing one cost_model
  instance being shared within a validator;
- update cost_model api to calculate_cost(&self...)->transaction_cost
2021-10-19 14:37:33 -05:00
Jeff Biseda 4cac66244d
report udp stats from validator (#20587) 2021-10-15 15:11:11 -07:00
carllin 44ff30b65b
Retry `SampleNotDuplicateConfirmed` decisions in AncestorHashesService (#20240) 2021-10-15 11:40:03 -07:00
behzad nouri 0f03971c3c
adds counters for errors in window-service run_insert (#20670) 2021-10-15 14:13:26 +00:00
behzad nouri 0c0384ec32
revises turbine peers shuffling order (#20480)
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).

However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.

This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
  tree.
* Staked nodes are always included regardless of if their contact-info
  is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
2021-10-14 15:09:36 +00:00
sakridge 588168b99d
Add check for shred data header size (#20668) 2021-10-14 05:56:14 +02:00
Jack May da45be366a
Remove blockhash from fee calculation (#20641) 2021-10-13 13:10:58 -07:00
Tao Zhu 005d6863fd
- move cost tracker into bank, so each bank has its own cost tracker; (#20527)
- move related modules to runtime
2021-10-12 08:51:33 -05:00
Jeff Washington (jwash) a8e000a2a6
add filler accounts to bloat validator and predict failure (#20491)
* add filler accounts to bloat validator and predict failure

* assert no accounts match filler

* cleanup magic numbers

* panic if can't load from snapshot with filler accounts specified

* some renames

* renames

* into_par_iter

* clean filler accts, too
2021-10-11 12:46:27 -05:00
Michael Vines c16510152e Rework AVX/AVX2 detection again 2021-10-10 12:22:10 -07:00
carllin 838ff3b871
Separate out interrupted slots broadcast metrics (#20537) 2021-10-09 01:46:06 -07:00
Lijun Wang d621994fee
Accountsdb stream plugin improvement (#20419)
Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.

Support multiple plugins at the same time.
2021-10-08 20:06:58 -07:00
Brooks Prumo 5440c1d2e1
SnapshotPackagerService pushes incremental snapshot hashes to CRDS (#20442)
Now that CRDS supports incremental snapshot hashes,
SnapshotPackagerService needs to push 'em!

This commit does two main things:

1. SnapshotPackagerService now knows about incremental snapshot hashes,
   and will push SnapshotPackage::IncrementalSnapshot hashes to CRDS.
2. At startup, when loading from a full + incremental snapshot, the
   hashes need to be passed all the way to SnapshotPackagerService so it
   can push these starting hashes to CRDS.  Those values have been piped
   through.

Fixes #20441 and #20423
2021-10-08 15:14:56 -05:00
Tao Zhu 675fa6993b
- update const cost values with data collected by #19627 (#20314)
- update cost calculation to closely proposed fee schedule #16984
2021-10-08 14:48:50 -05:00
Tao Zhu 0ebd8c53ee
cost model to ignore vote transactions (#20510) 2021-10-07 12:49:07 -05:00
Tao Zhu 177a375479
Tpu vote 1.7 (#20187) (#20494)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-07 09:38:23 +00:00
Michael Vines 7027d56064 Resolve nightly-2021-10-05 clippy complaints 2021-10-06 10:37:58 -07:00
Tao Zhu 03913f6661
add tx count and thread id to stats, each stat reports and resets when slot changes (#20451) 2021-10-06 00:09:19 -05:00
Justin Starry 129716f3f0
Optimize stakes cache and rewards at epoch boundaries (#20432)
* Optimize stakes cache and rewards at epoch boundaries

* Fetch from accounts db

* Add cli flag for disabling epoch boundary optimization
2021-10-06 00:53:26 -04:00
Tao Zhu 6ff508c643
add transaction cost histogram metrics (#20350) 2021-10-05 08:57:39 -05:00
Brooks Prumo 4cd50f5d45
Don't gossip more snapshot hashes than what we retain (#20379) 2021-10-01 15:59:45 -05:00
Lijun Wang fe97cb2ddf
AccountsDb plugin framework (#20047)
Summary of Changes

Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows

Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.

The code comprises 4 major parts:

accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
2021-09-30 14:26:17 -07:00
Jeff Biseda 3854cfaa00
Use batch_send in forward_buffered_packets (#20330) 2021-09-29 20:49:43 -07:00
sakridge 94668c95c2
Prune sigverify queue (#20331) 2021-09-30 05:41:05 +02:00
Brooks Prumo 3ea6a01254
Only gossip snapshot hashes for full snapshots (#20271) 2021-09-27 19:29:08 -05:00
Jeff Biseda 640e93187c
periodically report sigverify_stage stats (#19674) 2021-09-21 10:37:58 -07:00
sakridge 013e1d9d49
Limit transaction forwarding from banking_stage (#19940) 2021-09-21 08:49:41 -07:00
carllin e6b4dd3866
Add bank to banking stage regardless of if there is a working bank (#19855) 2021-09-17 16:55:53 -07:00
Pavel Strakhov 65227f44dc
Optimize RPC pubsub for multiple clients with the same subscription (#18943)
* reimplement rpc pubsub with a broadcast queue

* update tests for new pubsub implementation

* fix: fix review suggestions

* chore(rpc): add additional pubsub metrics

* integrate max subscriptions check into SubscriptionTracker to reduce locking

* separate subscription control from tracker

* limit memory usage of items in pubsub broadcast queue, improve error handling

* add more pubsub metrics

* add final count metrics to pubsub

* add metric for total number of subscriptions

* fix small review suggestions

* remove by_params from SubscriptionTracker and add node_progress_watchers map instead

* add subscription tracker tests

* add metrics for number of pubsub notifications as a counter

* ignore clippy lint in TokenCounter

* fix underflow in token counter

* reduce queue capacity in pubsub tests

* fix(rpc): fix test timeouts

* fix race in account subscription test

* Add RpcSubscriptions::new_for_tests

Co-authored-by: Pavel Strakhov <p.strakhov@iconic.vc>
Co-authored-by: Nikita Podoliako <n.podoliako@zubr.io>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-09-17 13:40:14 -06:00
sakridge dc69cc1ae4
Only allow votes when root distance gets too high (#19917) 2021-09-16 15:12:26 +02:00
Justin Starry ca3f147670
Add banking metrics for buffered and dropped packets (#19902) 2021-09-15 15:53:55 -05:00
Tao Zhu 67fa9945e1
Add few more metrics data points (#19624)
* Add slot, count and accumulated-units to per-program-timings for determining transaction cost elements

* correct the stats naming; fixes the dirty bit resetting
2021-09-15 09:49:49 -05:00
Justin Starry 34c1a9ac85
Report consumed_buffered_packets_count stat to metrics (#19900) 2021-09-15 14:19:39 +00:00
Tyera Eulberg c91519961c
Use f64 for stake math in get_stake_percent_in_gossip (#19895) 2021-09-14 23:36:30 -06:00
Michael 4ff50519ff
Add an info log to indicate the node has reached supermajority and print the active stake percentage (#19893) 2021-09-14 21:48:15 -06:00
Jeff Washington (jwash) b57e86abf2
cache account hash info (#19426)
* cache account hash info

* ledger_path -> accounts_hash_cache_path
2021-09-13 20:39:26 -05:00
carllin 87a7f00926
Track reset bank in PohRecorder (#19810) 2021-09-13 16:55:35 -07:00
Brooks Prumo 62c8bcf565
Add default() to SnapshotConfig (#19776) 2021-09-12 13:44:27 -05:00
Brooks Prumo 7aa5f6b833
Add CLI args for incremental snapshots (#19694)
Add `--incremental-snapshots` flag to enable incremental snapshots.
This will allow setting `--full-snapshot-interval-slots` and
`--incremental-snapshot-interval-slots`.

Also added `--maximum-incremental-snapshots-to-retain`.

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-09-10 15:59:26 -05:00
Michael Vines 4386e09710 Reduce wait for supermajority threshold back to 80% 2021-09-09 21:17:35 -07:00
sakridge 3a8c678f62
Remove some copying (#19691) 2021-09-08 18:32:38 +02:00
Jeff Washington (jwash) 456bf15012
AccountsIndexConfig -> AccountsDbConfig (#19687) 2021-09-08 04:30:38 +00:00
Jeff Washington (jwash) d3f938f0cf
Remove Copy from AccountsIndexConfig. Not all types will support it (#19686) 2021-09-07 20:09:40 -05:00
Brooks Prumo a0552e5b46
Make startup aware of Incremental Snapshots (#19600) 2021-09-07 20:43:43 +00:00
behzad nouri 01a7ec8198
uses rayon thread-pool for retransmit-stage parallelization (#19486) 2021-09-07 15:15:01 +00:00
Brooks Prumo fe8ba81ce6
Rename to is_valid instead of is_invalid (#19670) 2021-09-07 09:31:54 -05:00
Brooks Prumo 9d9482b9d8
Plumb `maximum_incremental_snapshot_archives_to_retain` (#19640) 2021-09-06 18:01:56 -05:00
Sean Young d461a9ac10 verify_precompiles needs FeatureSet
Rather than pass in individual features, pass in the entire feature set
so that we can add the ed25519 program feature in a later commit.
2021-09-05 18:59:37 +01:00
Tyera Eulberg decec3cd8b
Demote write locks on transaction program ids (#19593)
* Add feature

* Demote write lock on program ids

* Fixup bpf tests

* Update MappedMessage::is_writable

* Comma nit

* Review comments
2021-09-04 03:05:30 +00:00
Brooks Prumo 1828579580
Pass SnapshotConfig to SnapshotPackagerService (#19616) 2021-09-03 21:42:32 +00:00
Brooks Prumo 5e25ee5ebe
Add maximum_incremental_snapshot_archives_to_retain to SnapshotConfig (#19612) 2021-09-03 20:21:32 +00:00
Brooks Prumo 7ab0aec61f
Rename maximum_full_snapshot_archives_to_retain (#19610)
To prepare for adding maximum_incremental_snapshot_archives_to_retain,
rename the current field in SnapshotConfig.
2021-09-03 11:28:10 -05:00
Brooks Prumo e9374d32a3
Revert "Make startup aware of Incremental Snapshots (#19550)" (#19599)
This reverts commit d45ced0a5d.
2021-09-02 19:14:41 -05:00
Brooks Prumo d45ced0a5d
Make startup aware of Incremental Snapshots (#19550) 2021-09-02 19:05:15 -05:00
Jeff Biseda 7a8eba10b2
add synchronization comment to handle_new_root (#19571) 2021-09-02 13:52:14 -07:00
Lijun Wang 8378e8790f
Accountsdb replication installment 2 (#19325)
This is the 2nd installment for the AccountsDb replication.

Summary of Changes

The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.

The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.

Functionality used by both the client and server side are encapsulated in the replica-lib crate.

There is no impact to the existing validator by default.

Tests:

Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
2021-09-01 14:10:16 -07:00
behzad nouri 6d9818b8e4
skips retransmit for shreds with unknown slot leader (#19472)
Shreds' signatures should be verified before they reach retransmit
stage, and if the leader is unknown they should fail signature check.
Therefore retransmit-stage can as well expect to know who the slot
leader is and otherwise just skip the shred.

Blockstore checking signature of recovered shreds before sending them to
retransmit stage:
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930

Shred signature verifier:
https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105
2021-09-01 15:44:26 +00:00
Brooks Prumo 1d5a8ebc6a
Revert "Add LastFullSnapshotSlot to SnapshotConfig (#19341)" (#19529)
This reverts commit 4d361af976.
2021-08-31 22:03:19 -05:00
Brooks Prumo fe9ee9134a
Make background services aware of incremental snapshots (#19401)
AccountsBackgroundService now knows about incremental snapshots.  It is
now also in charge of deciding if an AccountsPackage is destined to be a
SnapshotPackage or not (or just used by AccountsHashVerifier).

!!! New behavior changes !!!

Taking snapshots (both bank and archive) **MUST** succeed.

This is required because of how the last full snapshot slot is
calculated, which is used by AccountsBackgroundService when calling
`clean_accounts()`.

File system calls are now unwrapped and will result in a crash. As Trent told me:

>Well I think if a snapshot fails due to some IO error, it's very likely that the operator is going to have to intervene before it works.  We should exit error in this case, otherwise the validator might happily spin for several more hours, never successfully writing a complete snapshot, before something else brings it down.  This would leave the validator's last local snapshot many more slots behind than it would be had we exited outright and potentially force the operator to abandon ledger continuity in favor of a quick catchup

Other errors will set the `exit` flag to `true`, and the node will gracefully shutdown.

Fixes #19167 
Fixes #19168
2021-08-31 18:33:27 -05:00
Tyera Eulberg a3bef2e537
Fix shreds-to-hours/days estimations (#19477) 2021-08-30 13:16:06 -06:00
behzad nouri 8ad52fa095
implements copy-on-write for vote-accounts (#19362)
Bank::vote_accounts redundantly clones vote-accounts HashMap even though
an immutable reference will suffice:
https://github.com/solana-labs/solana/blob/95c998a19/runtime/src/bank.rs#L5174-L5186

This commit implements copy-on-write semantics for vote-accounts by
wrapping the underlying HashMap in Arc<...>.
2021-08-30 15:54:01 +00:00
carllin 84db04ce6c
Fix duplicate broadcast test (#19365) 2021-08-27 17:53:24 -07:00
Justin Starry 2d7f036afd
Add solana-program-runtime crate (#19438) 2021-08-27 00:30:36 +00:00
Brooks Prumo 6d939811e9
Name snapshots consistently (#19346)
#### Problem

Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:

```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```

#### Summary of Changes

For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.


Co-authored-by: Michael Vines <mvines@gmail.com>
2021-08-21 15:41:03 -05:00
Brooks Prumo 4d361af976
Add LastFullSnapshotSlot to SnapshotConfig (#19341) 2021-08-20 17:06:53 +00:00
behzad nouri 1deb4add81
removes Slot from TransmitShreds (#19327)
An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.
2021-08-20 13:48:33 +00:00
Trent Nelson e0bc5fa690 validator: Trusted validators are now called known validators 2021-08-19 22:43:49 -06:00
Jack May 3ec33e7d02
Fail secp256k1 if the instruction data looks incorrect (#19300) 2021-08-19 13:13:54 -07:00
Tao Zhu 4982dc20f9
replace function with const var for better readability (#19285) 2021-08-19 14:59:53 -05:00
Justin Starry c50b01cb60
Store versioned transactions in the ledger, disabled by default (#19139)
* Add support for versioned transactions, but disable by default

* merge conflicts

* trent's feedback

* bump Cargo.lock

* Fix transaction error encoding

* Rename legacy_transaction method

* cargo clippy

* Clean up casts, int arithmetic, and unused methods

* Check for duplicates in sanitized message conversion

* fix clippy

* fix new test

* Fix bpf conditional compilation for message module
2021-08-17 15:17:56 -07:00
Jeff Washington (jwash) 7c70f2158b
accounts_index_bins to AccountsIndexConfig (#19257)
* accounts_index_bins to AccountsIndexConfig

* rename param bins -> config

* rename BINS_FOR* to ACCOUNTS_INDEX_CONFIG_FOR*
2021-08-17 14:50:01 -05:00
Brooks Prumo f9986c66b8
Make SnapshotPackagerService aware of Incremental Snapshots (#19254)
Add a field to SnapshotPackage that is an enum for SnapshotType, so archive_snapshot_package() will do the right thing.

Fixes #19166
2021-08-17 13:01:59 -05:00
behzad nouri 7a8807b8bb retransmits shreds recovered from erasure codes
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
2021-08-17 13:44:10 +00:00
behzad nouri 3efccbffab sends shreds (instead of packets) to retransmit stage
Working towards channelling through shreds recovered from erasure codes
to retransmit stage.
2021-08-17 13:44:10 +00:00
behzad nouri 6e413331b5 removes erroneous uses of Arc<...> from retransmit stage 2021-08-17 13:44:10 +00:00
behzad nouri 8198a7eae1 adds packet/shred count stats to window-service
Adding back these metrics from the earlier commit which removed them
from retransmit stage.
2021-08-17 13:44:10 +00:00
behzad nouri bf437b0336 removes packet-count metrics from retransmit stage
Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.
2021-08-17 13:44:10 +00:00
behzad nouri 563aec0b4d
discards epoch-slots epochs ahead of the current root (#19256)
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
https://github.com/solana-labs/solana/issues/17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
https://github.com/solana-labs/solana/issues/17789#issuecomment-896304969

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
2021-08-17 13:13:28 +00:00
behzad nouri f33b7abffb
adds back cluster partitions to broadcast-duplicates (#19253)
An earlier version of this code was aiming to create a partition by
manipulating stakes, and setting some of them to zero:
https://github.com/solana-labs/solana/blob/cde146155/core/src/broadcast_stage/broadcast_duplicates_run.rs#L65-L116

https://github.com/solana-labs/solana/pull/18971
moved stakes computation further down the stream, and so that logic
could no longer live there. This commit adds back cluster partitions
by intercepting packets before send.
2021-08-16 22:24:30 +00:00
Michael Vines 3e5ba594e0 Revert `TestValidatorGenesis::start()` to v1.7.8 signature; add `TestValidatorGenesis::start_with_socket_addr_space()` 2021-08-16 06:37:23 +00:00
Michael Vines b15fa9fbd2 Add EtcdTowerStorage 2021-08-14 09:46:36 -07:00
carllin 22674000bd
Add EpochSlots frozen state transition (#19112) 2021-08-13 14:21:52 -07:00
Brooks Prumo 176036aa58
Rename AccountsPacakge to SnapshotPackage and AccountsPackagePre to AccountsPackage (#19231)
Renaming these types to better communicate their usages, which will
further diverge as incremental snapshot support is added.

With the new names, AccountsPacakge now refers to the type between
AccountsBackgroundProcess and AccountsHashVerifier, and SnapshotPackage
refers to the type between AccountsHashVerifier and
SnapshotPackagerService.
2021-08-13 16:08:09 -05:00
behzad nouri b64eeb7729 removes erroneous uses of &Arc<...> from window-service 2021-08-13 17:26:31 +00:00
behzad nouri d57398a959 removes repeated bank-forks locking in window-service
Window service is repeatedly locking bank-forks to look-up working-bank
for every single shred:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L597-L606

This commit updates shred_filter signature in recv_window so that where
we already obtain the lock on bank-forks, we can also look-up
working-bank once for all packets:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L256-L277
2021-08-13 17:26:31 +00:00
Jack May 0b50bb2b20
Deprecate FeeCalculator returning APIs (#19120) 2021-08-13 09:08:20 -07:00
behzad nouri 7a789e0763
filters for recent contact-infos when checking for live stake (#19204)
Contact-infos are saved to disk:
https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683

and restored on validator start-up:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450

Staked nodes entries will not expire until an epoch after. So when the
validator checks for online stake it is erroneously picking up
contact-infos restored from disk, which breaks the entire
wait-for-supermajority logic:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561

This commit adds an extra check for the age of contact-info entries and
filters out old ones.
2021-08-13 12:12:40 +00:00
Tao Zhu 414d904959
Reject blocks for costs above the max block cost (#18994)
* added realtime cost checking logic to reject block that would exceed max limit:
- defines max limits at block_cost_limits.rs
- right after each bath's execution, accumulate its cost and check again
  limit, return error if limit is exceeded

* update abi that changed due to adding additional TransactionError

* To avoid counting stats mltiple times, only accumulate execute-timing when a bank is completed

* gate it by a feature

* move cost const def into block_cost_limits.rs

* redefine the cost for signature and account access, removed signer part as it is not well defined for now

* check if per_program_timings of execute_timings before sending
2021-08-12 10:48:47 -05:00
Jeff Washington (jwash) e91988c977
cli for num account index bins (#19085) 2021-08-11 11:45:25 -05:00
Michael Vines 7ddda30126 `solana-test-validator` now uses FileTowerStorage 2021-08-11 00:20:46 -07:00
Michael Vines e9722474eb Move tower storage into its own module 2021-08-11 00:20:46 -07:00
Michael Vines d7ab510229 Move tower save into the VotingService 2021-08-11 00:20:46 -07:00
behzad nouri 00e5e12906 renames solana_runtime::vote_account::VoteAccount
Rename:
  VoteAccount    -> VoteAccountInner  # the private type
  ArcVoteAccount -> VoteAccount       # the public type
2021-08-10 22:54:17 +00:00
Brooks Prumo ccfa82461b
Pass SnapshotConfig to AccountsHashVerifier (#19154)
AccountsHashVerifier will need access to both the full and incremental
snapshot archive interval slots config values, which is in the
SnapshotConfig.

Also, cleanup some `Option<>` params and their references.
2021-08-10 14:02:34 -05:00
Brooks Prumo fd937548a0
Move SnapshotArchiveInfo and friends into its own module (#19114) 2021-08-08 07:57:06 -05:00
Brooks Prumo 00890957ee
Add snapshot_utils::bank_from_latest_snapshot_archives() (#18983)
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots.  A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify.  Additionally, new tests have been
written and existing tests have been updated to use this new function.

Fixes #18973

While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled.  Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`.  And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats.  This bundling simplifies bank
rebuilding.
2021-08-06 20:16:06 -05:00
Michael Vines 397801a2d8 Extract tower storage details from Tower struct 2021-08-06 10:04:37 -07:00
behzad nouri e4be00fece falls back on working-bank if root-bank::epoch-staked-nodes is none
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
2021-08-05 21:47:33 +00:00
behzad nouri eaf927cf49 allows only one thread to update cluster-nodes cache entry for an epoch
If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.

This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.
2021-08-05 21:47:33 +00:00
behzad nouri fb69f45f14 adds fallback & metric for when epoch staked-nodes are none 2021-08-05 21:47:33 +00:00
behzad nouri 50d0e830c9 unifies cluster-nodes computation & caching across turbine stages
Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.
2021-08-05 21:47:33 +00:00
behzad nouri aa32738dd5 uses cluster-nodes cache in broadcast-stage
* Current caching mechanism does not update cluster-nodes when the epoch
  (and so epoch staked nodes) changes:
  https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

* Additionally, the cache update has a concurrency bug in which the
  thread which does compare_and_swap may be blocked when it tries to
  obtain the write-lock on cache, while other threads will keep running
  ahead with the outdated cache (since the atomic timestamp is already
  updated).

In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.
2021-08-05 21:47:33 +00:00
behzad nouri 30bec3921e uses cluster-nodes cache in retransmit stage
The new cluster-nodes cache will:
  * ensure cluster-nodes are recalculated if the epoch (and so the epoch
    staked nodes) changes.
  * encapsulate time-to-live eviction policy.
2021-08-05 21:47:33 +00:00
behzad nouri ecc1c7957f implements cluster-nodes cache
Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.
2021-08-05 21:47:33 +00:00
behzad nouri 44b11154ca sends slots (instead of stakes) through broadcast flow
Current broadcast code is computing stakes for each slot before sending
them down the channel:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

Since the stakes are a function of epoch the slot belongs to (and so
does not necessarily change from one slot to another), forwarding the
slot itself would allow better caching downstream.

In addition we need to invalidate the cache if the epoch changes (which
the current code does not do), and that requires to know which slot (and
so epoch) current broadcasted shreds belong to:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344
2021-08-05 21:47:33 +00:00
Jeff Washington (jwash) e368f10973
add _for_tests to new_no_wallclock_throttle (#19086) 2021-08-05 14:50:25 -05:00
Jeff Washington (jwash) a9014ceceb
Bank::default_for_tests() (#19084) 2021-08-05 11:53:29 -05:00
behzad nouri 40914de811 updates cluster-slots with root-bank instead of root-slot + bank-forks
ClusterSlots::update is taking both root-slot and bank-forks only to
later lookup root-bank from bank-forks, which is redundant. Also
potentially by the time bank-forks is locked to obtain root-bank,
root-slot may have already changed and so be inconsistent with the
root-slot passed in as the argument.
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L32-L39
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L122
2021-08-05 14:43:06 +00:00
behzad nouri 2fc112edcf removes unused code from cluster-slots 2021-08-05 14:43:06 +00:00
Jeff Washington (jwash) bf16b0517c
add _for_tests to setup_bank_and_vote_pubkeys (#19060) 2021-08-05 08:43:35 -05:00
Jeff Washington (jwash) 14361906ca
for all tests, bank::new -> bank::new_for_tests (#19064) 2021-08-05 08:42:38 -05:00
Jeff Washington (jwash) 3280ae3e9f
add validator option --accounts-db-skip-shrink (#19028)
* add validator option --accounts-db-skip-shrink

* typo
2021-08-04 17:28:33 -05:00
Jeff Washington (jwash) 1ed12a07ab
introduce Bank::new_for_tests (#19062) 2021-08-04 15:06:57 -05:00
Brooks Prumo ca14475085
Add incremental_snapshot_archive_interval_slots to SnapshotConfig (#19026)
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
2021-08-04 14:40:20 -05:00
Trent Nelson 06a7a9e544 remove superfluous `collect()`s 2021-08-04 07:21:55 +00:00
carllin 03353d500f
Actively manage dead slots in AncestorHashesService (#18912) 2021-08-02 14:33:28 -07:00
behzad nouri 049fb0417f
allows sendmmsg api taking owned values (as well as references) (#18999)
Current signature of api in sendmmsg requires a slice of inner
references:
https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152

That forces the call-site to convert owned values to references even
though doing so is redundant and adds an extra level of indirection:
https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291

This commit expands the api using AsRef and Borrow traits to allow
calling the method with owned values (as well as references like
before).
2021-07-30 20:58:49 +00:00
Tao Zhu 5d297ccf96
Cost model uses compute_unit to replace microsecond as cost unit (#18934)
* wip - cost_update_services to log both us and cu for each instruction to determine possible ratio

* replace microsecond with compute_unit as cost unit
2021-07-29 22:19:36 +00:00
Ryo Onodera da480bdb5f
Fix unstable retransmit-num_nodes (#18970) 2021-07-29 17:32:32 +00:00
behzad nouri d06dc6c8a6
shares cluster-nodes between retransmit threads (#18947)
cluster_nodes and last_peer_update are not shared between retransmit
threads, as each thread have its own value:
https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477

Additionally, with shared references, this code:
https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328
has a concurrency bug where the thread which does compare_and_swap,
updates cluster_nodes much later after other threads have run with
outdated cluster_nodes for a while. In particular, the write-lock there
may block.
2021-07-29 16:20:15 +00:00
Trent Nelson 71f6d839f9 validator: remove disused cuda config argument 2021-07-29 03:08:52 +00:00
Trent Nelson 8ed0cd0fff validator: check target CPU features earlier 2021-07-29 03:08:52 +00:00
Trent Nelson c435f7b3e3 validator: add avx2 runtime check 2021-07-29 03:08:52 +00:00
Trent Nelson e641f257ef test-validator: move feature check earlier in startup 2021-07-29 03:08:52 +00:00
Trent Nelson 59641623d1 Improve check for Apple M1 silicon under Rosetta 2021-07-29 03:08:52 +00:00
Jeff Biseda 9255ae334d
drop outstanding_requests lock before sending repair requests (#18893) 2021-07-28 19:30:43 -07:00
sakridge 84e78316b1
Write helper for multithread update (#18808) 2021-07-29 03:16:36 +02:00
Jack May f1b9f97aef
remove avx error on macos (#18923) 2021-07-27 16:34:04 -07:00
carllin c0704d4ec9
Plumb signal from replay to ancestor hashes service (#18880) 2021-07-26 20:59:00 -07:00
carllin 1ee64afb12
Introduce AncestorHashesService (#18812) 2021-07-23 16:54:47 -07:00
behzad nouri d2d5f36a3c
adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
Ryo Onodera 611af87fdb
Really start caching by fixing swapped CAS... (#18842) 2021-07-23 10:17:19 +09:00
Brooks Prumo d1debcd971
Add incremental snapshot utils (#18504)
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks.  This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.

Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.

Also of note, some renaming has happened:
  1. Snapshots are now either `full_` or `incremental_` throughout the
     codebase.  If not specified, the code applies to both.
  2. Bank snapshots now are called "bank snapshots"
     (before they were called "slot snapshots", "bank snapshots", or
      just "snapshots").  The one exception is within `Bank`, where they
     are still just "snapshots", because they are already "bank
     snapshots".
  3. Snapshot archives now have `_archive` in the code.  This
     should clear up an ambiguity between bank snapshots and snapshot
     archives.
2021-07-22 14:40:37 -05:00
behzad nouri 7d56fa8363
sends packets in batches from sigverify-stage (#18446)
sigverify-stage is breaking batches to single-item vectors before
sending them down the channel:
https://github.com/solana-labs/solana/blob/d451363dc/core/src/sigverify_stage.rs#L88-L92

Also simplifying window-service code, reducing number of nested branches.
2021-07-22 14:49:21 +00:00
Michael Vines 61865c0ee0 `solana-validator set-identity` now loads the tower file for the new identity 2021-07-21 22:22:08 -07:00
carllin 588c0464b8
Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
behzad nouri bbd22f06f4
implements generic lookups into gossip crds table (#18765)
This commit adds CrdsEntry trait which allows generic lookups into crds
table. For example to get ContactInfo or LowestSlot associated with a
Pubkey, the lookup code would be respectively:
   crds.get::<&ContactInfo>(pubkey)
   crds.get::<&LowestSlot>(pubkey)
2021-07-21 12:16:26 +00:00
carllin ce467bea20
Add frozen hashes and marking DuplicateConfirmed in blockstore to state machine (#18648) 2021-07-18 17:04:25 -07:00
behzad nouri e316586516 excludes private ip addresses 2021-07-16 20:05:48 -06:00
Jeff Biseda ae5ad5cf9b
sendmmsg cleanup #18589
Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.
2021-07-16 14:36:49 -07:00
Jack May ca71ca3d6d
Accumulate consumed units (#18714) 2021-07-16 12:40:12 -07:00
Justin Starry d166b9856a
Move transaction sanitization earlier in the pipeline (#18655)
* Move transaction sanitization earlier in the pipeline

* Renamed HashedTransaction to SanitizedTransaction

* Implement deref for sanitized transaction

* bring back process_transactions test method

* Use sanitized transactions for cost model calculation
2021-07-15 22:51:27 -05:00
carllin 8a846b048e
Add AncestorHashesRepair type (#18681) 2021-07-15 19:29:53 -07:00
Trent Nelson 3a85b77bb5 hijack secp256k1 enablement feature plumbing for libsecp256k1 upgrade 2021-07-15 18:43:55 +00:00
Trent Nelson 568660b402 Revert "Remove feature switch for secp256k1 program (#18467)"
This reverts commit fd574dcb3b.
2021-07-15 18:43:55 +00:00
sakridge 0f8bcf65af
Add voting service (#18552) 2021-07-15 16:35:51 +02:00
behzad nouri cf31afdd6a
makes CrdsGossip thread-safe (#18615) 2021-07-14 22:27:17 +00:00
Michael Vines b30b32300d `solana-validator set-identity` now works for voting validators 2021-07-14 09:42:35 -07:00
Michael Vines 62d864559f Tower cleanup: reduce fn visibility, remove unnecessary new_with_key() 2021-07-14 09:42:35 -07:00
sakridge 7f2254225e
Move entry/poh to own crate to speed up poh bench build (#18225) 2021-07-14 14:16:29 +02:00
behzad nouri c90af3cd63
removes id from push_lowest_slot args (#18645)
push_lowest_slot cannot sign the new crds-value unless the id (pubkey)
argument passed-in is the same pubkey as in ClusterInfo::keypair(), in
which case the id argument is redundant:
https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845

Additionally, the lookup is done with self.id(), but insert is done with
the id argument, which is logically a bug.
2021-07-13 22:32:59 +00:00
Tao Zhu 350baece21
Explicitly sanitize program id indexes before usage
1. check transaction has valid program_id before using it to avoid possible panic;
2. change calculate_cost function signature to return Result;
3. add CostModelError enum, update return type from Result<_, str> to Result<_, CostModelError>
2021-07-13 17:29:22 -05:00
Michael Vines 4098af3b5b Record vote account commission with voting/staking rewards and surface in RPC 2021-07-12 15:09:44 -07:00
carllin 175083c4c1
Add updated duplicate broadcast test (#18506) 2021-07-10 22:22:07 -07:00
Jack May e9ace3a0d5
cost model nits (#18528) 2021-07-09 12:55:31 -07:00
Justin Starry fd574dcb3b
Remove feature switch for secp256k1 program (#18467)
* Remove feature switch for secp256k1 program

* fix tests
2021-07-09 10:08:03 -05:00
carllin 4d3e301ee4
Introduce slot dumping to ReplayStage (#18160) 2021-07-08 19:07:32 -07:00
Tao Zhu b6dff12923
update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test
2021-07-07 23:44:51 -05:00
Michael Vines 1e0942e900 Rename ClusterInfo::send_vote to ClusterInfo::send_transaction 2021-07-07 15:51:14 -07:00
jbiseda a86ced0bac
generate deterministic seeds for shreds (#17950)
* generate shred seed from leader pubkey

* clippy

* clippy

* review

* review 2

* fmt

* review

* check

* review

* cleanup

* fmt
2021-07-07 08:21:12 -07:00
behzad nouri a0551b4054
persists repair-peers cache across repair service loops (#18400)
The repair-peers cache is reset each time repair service loop runs,
and so computed repeatedly for the same slots:
https://github.com/solana-labs/solana/blob/d2b07dca9/core/src/repair_service.rs#L275

This commit uses an LRU cache to persists repair-peers for each slot.
In addition to LRU eviction rules, in order to avoid re-using outdated
data, each entry also has 10 seconds TTL.
2021-07-07 14:12:09 +00:00
behzad nouri 04787be8b1
encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00
Justin Starry 100fabf469
Remove feature switch for demoting sysvar write locks (#18373) 2021-07-06 21:22:22 +00:00
Tao Zhu 0e039b4094
Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes
2021-07-06 15:41:25 +00:00
Michael Vines d5c2c72360 Rename Tower::lockouts to Tower::vote_state 2021-07-02 18:35:49 -07:00
Tao Zhu 7cd6224caf
log warning when channel send fails (#18391) 2021-07-02 19:04:09 +00:00
carllin 0eca92de18
Make set roots an iterator (#18357) 2021-07-01 20:02:40 -07:00
Michael Vines b6792a3328 Add ability to change the validator identity at runtime 2021-07-01 17:50:04 -07:00
Brooks Prumo 45d54b1fc6
Add SnapshotArchiveInfo and refactor functions in snapshot_utils (#18232) 2021-07-01 12:20:56 -05:00
Tao Zhu 5e424826ba
Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
2021-07-01 11:32:41 -05:00
Brooks Prumo 89a3e4f91e
Move SnapshotConfig into its own module (#18331)
Also move ArchiveFormat to snapshot_utils, and do not
reexport SnapshotVersion.
2021-07-01 08:55:26 -05:00
sakridge 8d9a6deda4
Add repair number per slot (#18082) 2021-06-30 18:20:07 +02:00
Trent Nelson 02b14caa5f test-validator: hold rent constant with `--slots-per-epoch` 2021-06-30 00:46:12 -06:00
carllin 68c87469c3
Cleanup ReplayStage tests (#18241) 2021-06-28 20:19:42 -07:00
Tao Zhu 9d6f1ebef4
investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time
2021-06-28 21:34:04 -05:00
sakridge 5d08bf9aa3
More detailed voting timings in replay stage (#18229) 2021-06-26 17:32:08 +02:00
Trent Nelson d269975784 Revert "Clean up build warning"
This reverts commit 17a173ebb5.
2021-06-24 19:57:52 -06:00
Michael Vines 314102cb54 Remove redundant JsonRpcConfig::identity_pubkey field 2021-06-22 17:20:11 -07:00
sakridge e808f34b0b
Add batch stats (#18096) 2021-06-22 15:23:26 +02:00
Michael Vines 3b1517237c Clean up argument names 2021-06-21 21:29:52 -07:00
Michael Vines 84b9de8c18 Shredder no longer holds a keypair 2021-06-21 21:29:52 -07:00
Michael Vines 2435ea3ad8 Remove redundant ReplayStageConfig::my_pubkey field 2021-06-21 21:29:52 -07:00
Michael Vines 51a0007001 serve_repair: Remove internal ContactInfo field duplication 2021-06-21 17:23:49 -07:00
behzad nouri 598093b5db adds shred-version to ip-echo-server response
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.

In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.
2021-06-21 19:37:16 +00:00
Jeff Washington (jwash) ec2f930475
user process.accounts_db_test_hash_calculation for debug_verify hash (#18053) 2021-06-21 10:20:27 -05:00
Michael Vines 4a12c715a3 Drop Error suffix from enum values to avoid the enum_variant_names clippy lint 2021-06-18 23:02:13 +00:00
Alexander Meißner 789f33e8db chore: cargo fmt 2021-06-18 10:42:46 -07:00
Alexander Meißner 6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
Tyera Eulberg d0511de9a6
chore: bump trees from 0.2.1 to 0.4.2 (#18052)
* chore: bump trees from 0.2.1 to 0.4.2 (#18041)

Bumps [trees](https://github.com/oooutlk/trees) from 0.2.1 to 0.4.2.
- [Release notes](https://github.com/oooutlk/trees/releases)
- [Commits](https://github.com/oooutlk/trees/commits)

---
updated-dependencies:
- dependency-name: trees
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Accommodate field & type changes

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-17 22:45:09 +00:00
Lijun Wang 071b1ee3e5
Removed pub from some functions which are actually private to improve encapsulation (#18030)
Remove the pub marker to improve encapsulation. Readability improvement only, no functional impact.
2021-06-17 10:14:21 -07:00
Michael Vines fa04531c7a Extricate RpcCompletedSlotsService from RetransmitStage 2021-06-16 16:20:35 -07:00
Trent Nelson 5bc6c89adc validator: run poh speed test earlier in start up 2021-06-16 21:27:08 +00:00
behzad nouri 161838655c
removes port-based forwarding logic from turbine retransmit (#17716)
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
https://github.com/solana-labs/solana/issues/6672
https://github.com/solana-labs/solana/pull/7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
https://github.com/solana-labs/solana/pull/17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
2021-06-15 13:19:41 +00:00
carllin ccc013e134
Handle removing slots during account scans (#17471) 2021-06-14 21:04:01 -07:00
sakridge eeee75c5be
Don't use pinned memory when unnecessary (#17832)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.
2021-06-14 16:10:04 +02:00
sakridge 0feac57cb0
Don't store votes unless we are leader soon (#17803) 2021-06-11 18:29:05 +02:00
carllin c8535be0e1
Port unconfirmed duplicate tracking logic from ProgressMap to ForkChoice (#17779) 2021-06-11 03:09:57 -07:00
carllin afafa624a3
Account for duplicate before a bank is frozen or replayed (#17866) 2021-06-10 22:28:23 -07:00
Lijun Wang 269d995832
Make account shrink configurable #17544 (#17778)
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true

Fixes #17544
2021-06-09 21:21:32 -07:00
Tao Zhu ae27fcbcda
replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
2021-06-09 17:10:59 -05:00
Justin Starry 050bb5446d
Add local cluster tests that broadcast duplicate slots (#13995)
* Add duplicate node local cluster test

* fix clippy

* remove dupe test
2021-06-09 15:01:48 -07:00
Michael Vines e5e7390d44 Wrap long lines 2021-06-08 12:05:29 -07:00
Tyera Eulberg 544b3c0d17
Create solana-poh and move remaining rpc modules to solana-rpc (#17698)
* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc
2021-06-04 09:23:06 -06:00
sakridge f97ce2cd7e
Per-program id timings (#17554) 2021-06-04 16:04:31 +02:00
behzad nouri be957f25c9
adds fallback logic if retransmit multicast fails (#17714)
In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733
2021-06-04 12:16:37 +00:00
Tyera Eulberg 3a647c4bea
Rename ValidatorExit and move to sdk (#17728) 2021-06-04 03:06:13 +00:00
carllin 96ba2edfeb
Switch EpochSlots to be frozen slots, not completed slots (#17168) 2021-06-03 00:20:00 +00:00
carllin bbcdf073ba
Support out of band dumping of unrooted slots in AccountsDb (#17269)
* Accounts dumping logic

* Add test for interaction between cache flush and remove_unrooted_slot()

* Update comments

* Rename

* renaming

* Add more comments

* Renaming

* Fixup test and bad check
2021-06-02 09:51:10 +00:00
Tao Zhu b000d490ce
Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
2021-06-01 09:16:17 -05:00
Ryo Onodera 1f97b2365f
Avoid full-range compactions with periodic filtered b.g. ones (#16697)
* Update rocksdb to v0.16.0

* Promote the infrequent and important log to info!

* Force background compaction by ttl without manual compaction

* Fix test

* Support no compaction mode in test_ledger_cleanup_compaction

* Fix comment

* Make compaction_interval customizable

* Avoid major compaction with periodic filtering...

* Adress lazy_static, special cfs and range check

* Clean up a bit and add comment

* Add comment

* More comments...

* Config code cleanup

* Add comment

* Use .conflicts_with()

* Nullify unneeded delete_range ops for special CFs

* Some clean ups

* Clarify the locking intention

* Ensure special CFs' consistency with PurgeType::CompactionFilter

* Fix comment

* Fix bad copy paste

* Fix various types...

* Don't use tuples

* Add a unit test for compaction_filter

* Fix typo...

* Remove flag and just use new behavior always

* Fix wrong condition negation...

* Doc. about no set_last_purged_slot in purge_slots

* Write a test and fix off-by-one bug....

* Apply suggestions from code review

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Follow up to github review suggestions

* Fix line-wrapping

* Fix conflict

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-05-28 16:42:56 +09:00
Tyera Eulberg ab581dafc2
Add block height to ConfirmedBlock structs (#17523)
* Add BlockHeight CF to blockstore

* Rename CacheBlockTimeService to be more general

* Cache block-height using service

* Fixup previous proto mishandling

* Add block_height to block structs

* Add block-height to solana block

* Fallback to BankForks if block time or block height are not yet written to Blockstore

* Add docs

* Review comments
2021-05-26 22:16:16 -06:00
Michael Vines 9541411c15 Plumb transaction-level rewards (aka "rent debits") into the `getTransaction` RPC method 2021-05-27 03:05:05 +00:00
carllin 52dccc656a
Purge slots greater than new last index (#16071) 2021-05-26 16:12:57 -07:00
Michael Vines cbce440af4 simulateTransaction can now return accounts modified by the simulation 2021-05-26 14:20:23 -07:00