Commit Graph

2866 Commits

Author SHA1 Message Date
Tao Zhu 7aa1fb4e24 1. Persist to blockstore less frequently;
2. reduce alpha for EMA to 1 percent to have roughly 200 data points for estimatio
2022-02-08 16:18:23 -06:00
Tao Zhu 6587dbfa47 use EMA in place of Welford 2022-02-08 16:18:23 -06:00
Tao Zhu a25ac1c988 - estimate a program cost as 2 standard deviation above mean
- replaced get_average / get_mode with get_default to assign max units to unknown program
2022-02-08 16:18:23 -06:00
Tao Zhu e52e48076e
bench should update leader schedule cache (#22991) 2022-02-08 02:28:28 +00:00
Ashwin Sekar 5acf0f6331
Add feature gate for new vote instruction and plumb through replay (#21683)
* Add feature gate for new vote instruction and plumb through replay

Add tower versions

* Add check for slot hashes history

* Update is_recent check to exclude voting on hard fork root slot

* Move tower rollback test to flaky and ignore it until #22551 lands
2022-02-07 14:06:19 -08:00
behzad nouri 27aaf9df85
removes VoteTracker::new in favor of VoteTracker::default (#22941)
VoteTracker::new does not need a bank and is so redundant:
https://github.com/solana-labs/solana/blob/5a230f418/core/src/cluster_info_vote_listener.rs#L103-L107
2022-02-04 19:01:59 +00:00
sakridge 5a230f418d
Add quic port for accepting transactions (#22753)
using quinn library

streamer: Sign TLS cert with validator identity key

Handle multiple incoming chunks
2022-02-04 15:27:09 +01:00
Tao Zhu 4bec182b32
Allow buffered packets be consumed if bank is active, regardless leader schedule (#22913) 2022-02-03 21:29:41 +00:00
Justin Starry 60af1a4cce
Refactor: Add trait for loading addresses (#22903) 2022-02-03 11:00:12 +00:00
carllin bd1850df25
Return actual committed transactions from process_transactions() (#22802) 2022-02-03 03:56:36 -05:00
Trent Nelson c62f9839a2 test-validator-bin: reinstate full rpc method set 2022-02-03 02:43:03 +00:00
Ikko Ashimine 58a70d76a3
fix typo in broadcast_duplicates_run.rs (#22888)
Creat -> Create
2022-02-02 12:29:14 -07:00
behzad nouri dccbddad80
adds reverse lookup index to cluster-nodes (#22892)
retransmit has to exclude slot leader from set of nodes for each shred; 
which currently requires a linear scan:
https://github.com/solana-labs/solana/blob/e3b137066/core/src/cluster_nodes.rs#L238-L242

This commit adds a reverse lookup index to avoid linear scan.
2022-02-02 19:27:50 +00:00
behzad nouri e3b137066d
caches WeightedShuffle struct in ClusterNodes (#22877)
Instead of reconstructing WeightedShuffle struct for each shred
broadcast or retransmit, we can use the same struct with minimal
mutations.
2022-02-02 15:12:26 +00:00
Trent Nelson eac4a6df68 rpc: use minimal mode by default 2022-02-01 19:00:06 -07:00
behzad nouri 45e09664b8
removes Rng field from WeightedShuffle struct (#22850) 2022-02-01 15:27:23 +00:00
behzad nouri 604ca9316c
includes zero weighted entries in WeightedShuffle (#22829)
Current WeightedShuffle implementation excludes zero weighted entries
from the shuffle:
https://github.com/solana-labs/solana/blob/13e631dcf/gossip/src/weighted_shuffle.rs#L29-L30

Though mathematically this might make more sense, for our use-cases
(turbine specifically), this results in less efficient code:
https://github.com/solana-labs/solana/blob/13e631dcf/core/src/cluster_nodes.rs#L409-L430

This commit changes the implementation so that zero weighted indices are
also included in the shuffle but appear only at the end after non-zero
weighted indices.
2022-01-31 16:23:50 +00:00
Justin Starry 220aa6ada0
Fix poh recorder initialization on startup (#22755) 2022-01-28 14:21:15 +08:00
Michael Vines 331b953551 Add vote account address to vote subscription 2022-01-27 08:22:29 -08:00
Justin Starry d9c259a231
Set the correct root in block commitment cache initialization (#22750)
* Set the correct root in block commitment cache initialization

* clean up test

* bump
2022-01-27 00:48:00 +08:00
sakridge 2e56c59bcb
Handle already discarded packets in gpu sigverify path (#22680) 2022-01-24 14:35:47 +01:00
Justin Starry 1240217a73
Refactor: Rename variables and helper method to `PohRecorder` (#22676)
* Refactor: Rename leader_first_tick_height field

* Refactor: add `PohRecorder::slot_for_tick_height` helper

* Refactor: Add type for poh leader status
2022-01-23 10:28:50 +08:00
anatoly yakovenko d6011ba14d
Dedup bloom filter is too slow (#22607)
* Faster dedup

* use ahash

* fixup

* single threaded

* use duration type

* remove the count

* fixup
2022-01-21 20:23:48 -07:00
Michael Vines 6d5bbca630 Pacify clippy 2022-01-21 19:12:57 -08:00
Michael Vines ce4f7601af Avoid unstable_name_collisions warning 2022-01-21 19:12:57 -08:00
sakridge 38b02bbcc0
Handle already discarded packets in discard_excess_packets (#22594) 2022-01-21 17:22:50 +01:00
Jeff Biseda e7777281d6
regularly report network limits (#22563) 2022-01-20 12:38:42 -08:00
Trent Nelson cca3dbc76d system-monitor-service: support percentages from bigger numbers 2022-01-20 09:51:23 +00:00
Justin Starry 7f20c6149e
Refactor: move simple vote parsing to runtime (#22537) 2022-01-20 10:39:21 +08:00
anatoly yakovenko d343713f61
Optimize packet dedup (#22571)
* Use bloom filter to dedup packets

* dedup first

* Update bloom/src/bloom.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* fixup

* fixup

* fixup

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-19 13:58:20 -08:00
behzad nouri dcf44d2523
improves sigverify discard_excess_packets performance (#22577)
As shown by the added benchmark, current code does worse if there is a
spam address plus a lot of unique addresses.

on current master:
test bench_packet_discard_many_senders  ... bench:   1,997,960 ns/iter (+/- 103,715)
test bench_packet_discard_mixed_senders ... bench:  14,256,116 ns/iter (+/- 534,865)
test bench_packet_discard_single_sender ... bench:   1,306,809 ns/iter (+/- 61,992)

with this commit:
test bench_packet_discard_many_senders  ... bench:   1,644,025 ns/iter (+/- 83,715)
test bench_packet_discard_mixed_senders ... bench:   1,089,789 ns/iter (+/- 86,324)
test bench_packet_discard_single_sender ... bench:     955,234 ns/iter (+/- 55,953)
2022-01-19 18:10:02 +00:00
buffalu 650882217c
Add PacketBatch packet_indexes stat (#22564)
* collect stats on packet batch indicies

* cleanup

* cleanup

* cleanup

* change name
2022-01-19 08:13:07 +00:00
anatoly yakovenko e616a7ebfc
Track discard time of excess packets in sigverify (#22554)
* discard time histogram

* closer to the if

* update
2022-01-18 15:09:39 -07:00
Michael Vines 8dc6f9f589 Remove unused mut 2022-01-18 12:10:31 -08:00
sakridge 49443406fd
Use VecDeque instead of Vec in sigverify stage (#22538)
avoid bad performance of remove(0) for a single sender
2022-01-17 18:37:05 +01:00
anatoly yakovenko 2d94e6e5d3
metrics for generate new bank forks (#22492)
* metrics for generate new bank forks

* fixed

* Apply suggestions from code review

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* --fixup

* fixup!

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-17 09:59:47 -07:00
Tao Zhu a724fa2347
Add hidden cli option to allow validator reports replayed transaction cost metrics (#22369)
* add hidden cli option to allow validator reports replayed transaction cost detail metrics

* Update validator/src/main.rs

Co-authored-by: Michael Vines <mvines@gmail.com>

* - rebase master, using unbounded instead of channel; dowgrade to datapoint_trace

* removed cli arg, prefer log at trace

Co-authored-by: Michael Vines <mvines@gmail.com>
2022-01-15 00:31:21 +00:00
Tao Zhu 1309a9cea0
Add estimated and actual block cost units metrics (#22326)
* - report cost details for transactions selected to be packed into block;
- report estimated execution units packed into block, and actual units and time after execution

* revert reporting per-transaction details

* rollup transaction cost details (eg signature cost, wirte lock, data cost and execution costs) into block stats

* change naming from units to cu, use struct to replace tuple
2022-01-14 23:44:18 +00:00
Tao Zhu 9c9f2dd5bd port counting vote CUs to block cost (#22477) 2022-01-14 10:50:29 -06:00
Justin Starry f804ccdece
Store address table lookups in blockstore and bigtable (#22402) 2022-01-14 15:24:41 +08:00
carllin 4ab7d6c23e
Filter out outdated slots (#22450)
* Filter out outdated slots

* Fixup error
2022-01-13 19:51:00 -05:00
Tao Zhu 6614727be8 downgrade individual per-program-timing to trace to reduce writes to influx 2022-01-12 18:52:13 -06:00
Tyera Eulberg 637e366b18
Prevent rent-paying account creation (#22292)
* Fixup typo

* Add new feature

* Add new TransactionError

* Add framework for checking account state before and after transaction processing

* Fail transactions that leave new rent-paying accounts

* Only check rent-state of writable tx accounts

* Review comments: combine process_result success behavior; log and metrics before feature activation

* Fix tests that assume rent-exempt accounts are okay

* Remove test no longer relevant

* Remove native/sysvar special case

* Move metrics submission to report legacy->legacy rent paying transitions as well
2022-01-11 11:32:25 -07:00
Jeff Biseda 8b66625c95
convert std::sync::mpsc to crossbeam_channel (#22264) 2022-01-11 02:44:46 -08:00
steviez 5f1f4dcbdd
Use struct to pass all Tpu sockets as one argument to Tpu::new() (#21965)
Tpu::new() now matches Tvu::new() in having struct to reduce argument
list. Additionally, Rust supports partial moves, so there is no need to
clone the Tvu sockets out of Node object.
2022-01-10 11:29:48 -06:00
Ashwin Sekar eeec1ce2ad
Add local cluster test to repro slot hash expiry bug (#21873) 2022-01-10 00:58:21 -05:00
Justin Starry 52d12cc802
Add runtime support for address table lookups (#22223)
* Add support for address table lookups in runtime

* feedback

* feedback
2022-01-07 11:59:09 +08:00
Trent Nelson 390ef0fbcd Consolidate process instruction execution timings to own struct 2022-01-06 03:56:46 -07:00
Trent Nelson 848b6dfbdd Add metrics for executor creation 2022-01-06 03:56:46 -07:00
Carl Lin b25e4a200b Add execute metrics 2022-01-06 03:56:46 -07:00
Trent Nelson 7d32909e17 move `ExecuteTimings` from `runtime::bank` to `program_runtime::timings` 2022-01-06 03:56:46 -07:00
Justin Starry 45458e7139
Refactor: Improve type safety and readability of transaction execution (#22215)
* Refactor Bank::load_and_execute_transactions

* Refactor: improve type safety of TransactionExecutionResult

* Add enum for extra type safety in execution results

* feedback
2022-01-05 10:15:15 +08:00
behzad nouri 4b24499916 removes total-size from return value of recv_mmsg 2022-01-04 21:06:59 +00:00
behzad nouri 01a096adc8 adds bitflags to Packet.Meta
Instead of a separate bool type for each flag, all the flags can be
encoded in a type-safe bitflags encoded in a single u8:
https://github.com/solana-labs/solana/blob/d6ec103be/sdk/src/packet.rs#L19-L31
2022-01-04 13:53:40 +00:00
behzad nouri 73a7741c49 uses std::net::IpAddr type for Packet.Meta.addr 2022-01-04 13:53:40 +00:00
sakridge 2486e21ffe
Lower vote-only-mode to 400 (#22210) 2022-01-04 12:49:14 +01:00
Jeff Biseda ca8fef5855
retransmit consecutive leader blocks (#22157) 2022-01-04 00:24:16 -08:00
Yueh-Hsuan Chiang e8b7f96a89
Add struct BlockstoreOptions (#22121) 2022-01-03 18:30:45 -10:00
carllin 005592998d
Fix bug, add error specific timings (#22225) 2022-01-03 16:33:54 -05:00
behzad nouri 69d71f8f86
removes epoch_authorized_voters from VoteTracker (#22192)
https://github.com/solana-labs/solana/pull/22169
verifies authorized-voter early on in vote-listener pipeline; and so
VoteTracker no longer needs to maintain and check for epoch authorized
voters.
2022-01-03 21:07:47 +00:00
Jeff Biseda 0e4ede46d1
work around rust 39364 for stats_reporter_sender (#22227) 2022-01-03 11:46:02 -08:00
carllin d06e6c7425
Count compute units even when transaction errors (#22182) 2021-12-30 21:21:42 -05:00
Jeff Biseda 95dfcc546a
bypass retransmission for slots without propagated stats (#22176) 2021-12-30 16:07:34 -08:00
behzad nouri c0c6038654
checks for authorized voter early on in the vote-listener pipeline (#22169)
Before votes are verified that they are signed by the authorized voter,
they might be dropped in verified-vote-packets code. If there are
enough many spam votes from unauthorized voters, this may potentially
drop valid votes but keep the false ones.
https://github.com/solana-labs/solana/blob/57986f982/core/src/verified_vote_packets.rs#L165-L168
2021-12-30 15:03:14 +00:00
carllin 33d0b5e011
Revert "Count compute units even when transaction errors (#22059)" (#22174)
This reverts commit eaa8c67bde.
2021-12-30 02:42:32 -05:00
Lijun Wang f14928a970
Stream additional block metadata via plugin (#22023)
* Stream additional block metadata through plugin
blockhash, block_height, block_time, rewards are streamed
2021-12-29 15:12:01 -08:00
Justin Starry b1d9a2e60e
Don't forward packets received from TPU forwards port (#22078)
* Don't forward packets received from TPU forwards port

* Add banking stage test
2021-12-29 19:34:31 +01:00
carllin eaa8c67bde
Count compute units even when transaction errors (#22059) 2021-12-28 17:05:11 -05:00
carllin f061059e45
Prevent log spam (#22148) 2021-12-28 17:04:48 -05:00
Tao Zhu 3d6ab96587 push live packets straight to buffer, leader only process packets from buffer 2021-12-28 15:21:24 -06:00
Yueh-Hsuan Chiang b89cd8cd1a
Avoid cloning Vec<Entry> when calling entries_to_test_shreds() (#22093) 2021-12-24 12:32:43 -08:00
Justin Starry 93c776ce19
Refactor packet deduplication and harden bench test (#22080) 2021-12-22 23:05:10 -06:00
Tao Zhu dd80a525ef
Leader QoS service metrics (#21708)
* - qos_service metrics tagged with leader thread ids to separate gossip/tpu votes and transactions;
- qos_service metrics is reported with bank slot;
- replaced timer-based reporting with signal via channel; removed async report test as qos_service now lives within a thread

* - add tpu live packets (eg, not buffered packets) states to qos metrics reporting
2021-12-22 21:39:59 +00:00
behzad nouri 4d62f03297
uses enum instead of trait for VoteTransaction (#22019)
Box<dyn Trait> involves runtime dispatch, has significant overhead and
is slow. It also requires hacky boilerplate code for implementing Clone
or other basic traits:
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L70-L102

Only limited known types can be VoteTransaction and they are all defined
in the same crate. So using a trait here only adds overhead.
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L125-L165
https://github.com/solana-labs/solana/blob/e92a81b74/programs/vote/src/vote_state/mod.rs#L221-L264
2021-12-22 14:25:46 +00:00
Tao Zhu 9c5d82557a skip reporting all-zero stats 2021-12-21 16:20:36 -06:00
behzad nouri 65d59f4ef0
tracks erasure coding shreds' indices explicitly (#21822)
The indices for erasure coding shreds are tied to data shreds:
https://github.com/solana-labs/solana/blob/90f41fd9b/ledger/src/shred.rs#L921

However with the upcoming changes to erasure schema, there will be more
erasure coding shreds than data shreds and we can no longer infer coding
shreds indices from data shreds.

The commit adds constructs to track coding shreds indices explicitly.
2021-12-19 22:37:55 +00:00
behzad nouri 7476dfeec0
removes Select in favor of recv_timeout/try_iter (#21981)
crossbeam_channel::Select::ready_timeout might return with success spuriously.
2021-12-18 17:39:07 +00:00
Jeff Biseda 3fe942ab30
new net-stats require a new table (#21996) 2021-12-18 00:13:16 -08:00
carllin 7f6fb6937a
Ensure AncestorHashesSerice selects an open port (#21919) 2021-12-18 00:44:01 -05:00
Jeff Biseda 97a1fa10a6
streamer send destination metrics for repair, gossip (#21564) 2021-12-17 15:21:05 -08:00
segfaultdoctor 76098dd42a
RPC Block Subscription (#21787)
* add stuff

* compiling

* add notify block

* wip

* feat: add blockSubscribe pubsub method

* address PR comments

Co-authored-by: Lucas B <buffalu@jito.network>
Co-authored-by: Zano <segfaultdoctor@protonmail.com>
2021-12-17 16:03:09 -07:00
behzad nouri 89d66c3210
removes next_shred_index from return value of entries to shreds api (#21961)
next-shred-index is already readily available from returned data shreds.
The commit simplifies the api for upcoming changes to erasure coding
schema which will require explicit tracking of indices for coding shreds
as well as data shreds.
2021-12-17 15:01:55 +00:00
Jeff Biseda 7ec39f5a1e
time based retransmit in replay_stage (#21498) 2021-12-17 05:44:40 -08:00
carllin 385efae4b3
Remove need to send bank in retransmit request from ReplayStage (#21943)
* Remove need to send bank in retransmitter
2021-12-16 21:11:01 -05:00
Justin Starry 6ff0be6a82
Clean up demote program write lock feature (#21949)
* Clean up demote program write lock feature

* fix test
2021-12-16 17:27:22 -05:00
carllin cb395abff7
Fix subtraction overflow (#21871) 2021-12-14 14:24:22 -05:00
behzad nouri 8d980f07ba
uses Option<Slot> for SlotMeta.parent_slot (#21808)
SlotMeta.parent_slot for the head of a detached chain of slots is
unknown and that is indicated by u64::MAX which lacks type-safety:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L203-L205

The commit changes the type to Option<Slot>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
2021-12-14 18:57:11 +00:00
behzad nouri 4ceb2689f5
adds ShredId uniquely identifying each shred (#21820) 2021-12-14 17:34:02 +00:00
Jeff Washington (jwash) 90f41fd9b7
use cost model to limit new account creation (#21369)
* use cost model to limit new account creation

* handle every system instruction

* remove &

* simplify match

* simplify match

* add datapoint for account data size

* add postgres error handling

* handle accounts:unlock_accounts
2021-12-12 14:57:18 -06:00
behzad nouri e08139f949
uses Option<u64> for SlotMeta.last_index (#21775)
SlotMeta.last_index may be unknown and current code is using u64::MAX to
indicate that:
https://github.com/solana-labs/solana/blob/6c108c8fc/ledger/src/blockstore_meta.rs#L169-L174

This lacks type-safety and can introduce bugs if not always checked for
Several instances of slot_meta.last_index + 1 are also subject to
overflow.

This commit updates the type to Option<u64>. Backward compatibility is
maintained by customizing serde serialize/deserialize implementations.
2021-12-11 14:47:20 +00:00
Justin Starry 254ef3e7b6
Rename Packets to PacketBatch (#21794) 2021-12-11 09:44:15 -05:00
behzad nouri 8063273d09
adds more sanity checks to shreds (#21675) 2021-12-09 16:43:57 +00:00
Ashwin Sekar f0acf7681e
Add vote instructions that directly update on chain vote state (#21531)
* Add vote state instructions

UpdateVoteState and UpdateVoteStateSwitch

* cargo tree

* extract vote state version conversion to common fn
2021-12-07 16:47:26 -08:00
behzad nouri cd17f63d81
adds back position field to coding-shred-header (#21600)
https://github.com/solana-labs/solana/pull/17004
removed position field from coding-shred-header because as it stands the
field is redundant and unused.
However, with the upcoming changes to erasure coding schema this field
will no longer be redundant and needs to be populated.
2021-12-05 14:42:09 +00:00
Jeff Biseda 9c6b95e1e1
fix distance calculation in get_closest_completion (#21601) 2021-12-03 22:36:46 -08:00
Justin Starry 1430b58a6d
Remove deprecated slow epoch boundary methods (#21568) 2021-12-03 17:59:10 +00:00
Michael Vines b8837c04ec Reformat imports to a consistent style for imports
rustfmt.toml configuration:
  imports_granularity = "One"
  group_imports = "One"
2021-12-03 09:19:13 -08:00
Alexander Meißner b78f5b6032
Refactor: Cleanup InstructionProcessor (#21404)
* Moves create_message(), native_invoke() and process_cross_program_instruction()
from the InstructionProcessor to the InvokeContext so that they can have a useful "self" parameter.

* Moves InstructionProcessor into InvokeContext and Bank.

* Moves ExecuteDetailsTimings into its own file.

* Moves Executor into invoke_context.rs

* Moves PreAccount into its own file.

* impl AbiExample for BuiltinPrograms
2021-12-01 08:54:42 +01:00
Michael Vines ba9dfa0d22 Remove frozen account support 2021-11-29 08:38:11 -08:00
Tao Zhu 9edfc5936d
Refactor accounts.rs with Justin's comments to improve lock accounts (#21406)
with results code path.
- fix a bug that could unlock accounts that weren't locked
- add test to the refactored function
- skip enumerating transaction accounts if qos results is an error
- add #[must_use] annotation
- avoid clone error in results
- add qos error code to unlock_accounts match statement
- remove unnecessary AbiExample
2021-11-23 21:17:55 +00:00
Lijun Wang c29838fce1
Accountsdb plugin transaction part 3: Transaction Notifier (#21374)
The TransactionNotifierInterface interface for notifying transactions.
Changes to transaction_status_service to notify the notifier of the transaction data.
Interface to query the plugin's interest in transaction data
2021-11-23 09:55:53 -08:00
Tao Zhu 2602e7c3bc
Fix flaky test (#21402)
* the async test is flaky on ci

* fix unstable test by increasing stats repoting time
2021-11-23 09:47:17 -06:00
behzad nouri dd338b6c9f
changes Shred::parent return type to Option<Slot> (#21370)
Shred::parent can return garbage if the struct fields are invalid:
https://github.com/solana-labs/solana/blob/8a50b6302/ledger/src/shred.rs#L446-L453

The commit adds more sanity checks and changes the return type to Option<Slot>.
2021-11-23 14:45:26 +00:00
Tao Zhu cd5a39ee43
the async test is flaky on ci (#21365) 2021-11-22 18:16:20 -06:00
Jeff Washington (jwash) 87831e7f8d
start system monitor earlier in validator so we get memory stats at startup (#21372) 2021-11-22 14:37:17 -06:00
sakridge f31ca8ba8c
Report cluster slots size (#21380) 2021-11-22 17:47:58 +01:00
Jeff Biseda 2ed7e3af89
prioritize slot repairs for unknown last index and close to completion (#21070) 2021-11-19 19:17:30 -08:00
sakridge 0bda0c3e0c
Add bank drop service (#21322) 2021-11-19 17:20:18 +01:00
behzad nouri 48dfdfb4d5 changes Blockstore::is_shred_duplicate arg type to ShredType 2021-11-19 14:16:39 +00:00
behzad nouri 57057f8d39 uses enum for shred type
Current code is using u8 which does not have any type-safety and can
contain invalid values:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L167

Checks for invalid shred-types are scattered through the code:
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/blockstore.rs#L849-L851
https://github.com/solana-labs/solana/blob/66fa062f1/ledger/src/shred.rs#L346-L348

The commit uses enum for shred type with #[repr(u8)]. Backward
compatibility is maintained by implementing Serialize and Deserialize
compatible with u8, and adding a test to assert that.
2021-11-19 14:16:39 +00:00
carllin b30c94ce55
ClusterInfoVoteListener send only missing votes to BankingStage (#20873) 2021-11-18 15:20:41 -08:00
Tao Zhu 0ca255220e
- Encapsulate QoS Service metrics reporting within QosServioce, so client (#21191)
code (eg banking_stage) doesn't need to worry about it.
- Remove dead cost_* stats from banking_stage, clean up call path.
2021-11-18 15:35:30 -06:00
Lijun Wang 89c45a57f8
Refactor slot status notification to decouple from accounts notifications (#21308)
Problem

Slot status can be used of in other scenarios in addition to account information such as transactions, blocks. The current implementation is too tightly coupled.

Summary of Changes

Decouple the slot status notification from accounts notification. Created a new slot status notification module.
2021-11-17 17:11:38 -08:00
Jeff Biseda d5de0c8e12
add --no-os-network-stats-reporting option (#21296) 2021-11-16 10:26:03 -08:00
sakridge 398af132a5
More set_root metrics (#21286) 2021-11-15 16:28:18 -07:00
Jeff Washington (jwash) f2bd9947cc
mem stats: rescale from kb to bytes (#21282) 2021-11-15 14:42:41 -06:00
Jeff Washington (jwash) f8dcb2f38b
report mem stats (#21258) 2021-11-13 00:59:41 +00:00
Michael Keleti b0ca335463
Rename "trusted" to "known" in `validators/` (#21197)
* Replaced trusted with known validator

* Format Convention
2021-11-12 11:57:55 -07:00
Tao Zhu 11153e1f87
refactor cost calculation (#21062)
* - cache calculated transaction cost to allow sharing;
- atomic cost tracking op;
- only lock accounts for transactions eligible for current block;
- moved qos service and stats reporting to its own model;
- add cost_weight default to neutral (as 1), vote has zero weight;

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-11-12 01:04:53 -06:00
Ivan Mironov c78f474373 Add validator option to change niceness of snapshot packager thread 2021-11-04 17:16:46 -06:00
Alexander Meißner 7200c5106e
Replaces MockInvokeContext by ThisInvokeContext in tests (#20881)
* Replaces MockInvokeContext by ThisInvokeContext in BpfLoader, SystemInstructionProcessor, CLIs, ConfigProcessor, StakeProcessor and VoteProcessor.

* Finally, removes MockInvokeContext, MockComputeMeter and MockLogger.

* Adjusts assert_instruction_count test.

* Moves ThisInvokeContext to the program-runtime crate.
2021-11-04 21:47:32 +01:00
Justin Starry 140a5f633d
Simplify replay vote tracking by using packet metadata (#21112) 2021-11-03 09:02:48 +00:00
steviez e6280fc1fa
Add additional checks for should_retransmit_and_persist() (#20672)
Add additional checks to should_retransmit_and_persist()

- Check invalid shred index
- Update cases that check if node was leader
- Some comments and variable rename for clarity
2021-11-03 02:01:07 -05:00
sakridge a8d78e89d3
Move test-validator to own module to reduce core dependencies (#20658)
* Move test-validator to own module to reduce core dependencies

* Fix a few TestValidator paths

* Use solana_test_validator crate for solana_test_validator bin

* Move client int tests to separate crate

Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-10-29 01:27:07 +00:00
Justin Starry 036d7fcc81
Clean up sanitized tx creation for tests (#21006) 2021-10-27 18:09:16 +01:00
sakridge 261dd96ae3
Swap banking stage vote channels (#20987) 2021-10-26 21:20:31 +02:00
behzad nouri 1297a13586
adds metrics tracking crds writes and votes (#20953) 2021-10-26 13:02:30 +00:00
Jeff Washington (jwash) 43ea579f63
add cli for --accounts-hash-num-passes (#20827) 2021-10-25 09:45:46 -05:00
Tao Zhu c2bfce90b3
- cost_tracker is data member of a bank, it can report metrics when bank is frozen (#20802)
- removed cost_tracker_stats and histogram
- move stats reporting outside of bank freeze
2021-10-24 22:19:23 -05:00
behzad nouri 5e1cf39c74
adds metrics for number of outgoing shreds in retransmit stage (#20882) 2021-10-24 13:12:27 +00:00
Michael Vines 350bb561eb Clippy 2021-10-23 08:21:20 +00:00
Jack May bfbbc53dac
Divorce the runtime from FeeCalculator (#20737) 2021-10-22 14:32:40 -07:00
Justin Starry 735016661b
Report timing info for stakes cache updates from txs (#20856) 2021-10-22 12:49:02 -04:00
Tao Zhu 71d0bd4605
Add counter for dropped duplicated packets, fix dropped_packets_count (#20834) 2021-10-21 02:56:48 +00:00
Trent Nelson fe098b5ddc rpc-send-tx-svc: add with_config constructor 2021-10-20 13:43:27 -06:00
Jeff Washington (jwash) 95e91a4863
disable gossip publish of snapshots when using filler accts (#20824) 2021-10-20 18:07:29 +00:00
Tao Zhu 7496b5784b
- make cost_tracker a member of bank, remove shared instance from TPU; (#20627)
- decouple cost_model from cost_tracker; allowing one cost_model
  instance being shared within a validator;
- update cost_model api to calculate_cost(&self...)->transaction_cost
2021-10-19 14:37:33 -05:00
Jeff Biseda 4cac66244d
report udp stats from validator (#20587) 2021-10-15 15:11:11 -07:00
carllin 44ff30b65b
Retry `SampleNotDuplicateConfirmed` decisions in AncestorHashesService (#20240) 2021-10-15 11:40:03 -07:00
behzad nouri 0f03971c3c
adds counters for errors in window-service run_insert (#20670) 2021-10-15 14:13:26 +00:00
behzad nouri 0c0384ec32
revises turbine peers shuffling order (#20480)
Turbine randomly shuffles cluster nodes on a broadcast tree for each
shred. This requires knowing the stakes and nodes' contact-infos (from
gossip).

However gossip is subject to partitioning and propogation delays.
Additionally unstaked nodes may join and leave the cluster at any
moment, changing the cluster view from one node to another.

This commit:
* Always arranges the unstaked nodes at the bottom of turbine broadcast
  tree.
* Staked nodes are always included regardless of if their contact-info
  is available in gossip or not.
* Uses the unbiased WeightedShuffle construct for shuffling nodes.
2021-10-14 15:09:36 +00:00
sakridge 588168b99d
Add check for shred data header size (#20668) 2021-10-14 05:56:14 +02:00
Jack May da45be366a
Remove blockhash from fee calculation (#20641) 2021-10-13 13:10:58 -07:00
Tao Zhu 005d6863fd
- move cost tracker into bank, so each bank has its own cost tracker; (#20527)
- move related modules to runtime
2021-10-12 08:51:33 -05:00
Jeff Washington (jwash) a8e000a2a6
add filler accounts to bloat validator and predict failure (#20491)
* add filler accounts to bloat validator and predict failure

* assert no accounts match filler

* cleanup magic numbers

* panic if can't load from snapshot with filler accounts specified

* some renames

* renames

* into_par_iter

* clean filler accts, too
2021-10-11 12:46:27 -05:00
Michael Vines c16510152e Rework AVX/AVX2 detection again 2021-10-10 12:22:10 -07:00
carllin 838ff3b871
Separate out interrupted slots broadcast metrics (#20537) 2021-10-09 01:46:06 -07:00
Lijun Wang d621994fee
Accountsdb stream plugin improvement (#20419)
Support using connection pooling and use multiple threads to do Postgres db operations. The performance is improved from 1500 RPS to 40,000 RPS measured during validator start.

Support multiple plugins at the same time.
2021-10-08 20:06:58 -07:00
Brooks Prumo 5440c1d2e1
SnapshotPackagerService pushes incremental snapshot hashes to CRDS (#20442)
Now that CRDS supports incremental snapshot hashes,
SnapshotPackagerService needs to push 'em!

This commit does two main things:

1. SnapshotPackagerService now knows about incremental snapshot hashes,
   and will push SnapshotPackage::IncrementalSnapshot hashes to CRDS.
2. At startup, when loading from a full + incremental snapshot, the
   hashes need to be passed all the way to SnapshotPackagerService so it
   can push these starting hashes to CRDS.  Those values have been piped
   through.

Fixes #20441 and #20423
2021-10-08 15:14:56 -05:00
Tao Zhu 675fa6993b
- update const cost values with data collected by #19627 (#20314)
- update cost calculation to closely proposed fee schedule #16984
2021-10-08 14:48:50 -05:00
Tao Zhu 0ebd8c53ee
cost model to ignore vote transactions (#20510) 2021-10-07 12:49:07 -05:00
Tao Zhu 177a375479
Tpu vote 1.7 (#20187) (#20494)
* Add separate vote processing tpu port

* Add feature to send to tpu vote port

* Add vote rejecting sigverify mode

* use packet.meta.is_simple_vote_tx in place of deserialization

* consolidate code that identifies vote tx atcommon path for cpu and gpu

* new key for feature set

* banking forward tpu vote

* add tpu vote port to dockerfile and other review changes

* Simplify thread id compare

* fix a test; updated cluster_info ABI change

Co-authored-by: Tao Zhu <tao@solana.com>

Co-authored-by: sakridge <sakridge@gmail.com>
2021-10-07 09:38:23 +00:00
Michael Vines 7027d56064 Resolve nightly-2021-10-05 clippy complaints 2021-10-06 10:37:58 -07:00
Tao Zhu 03913f6661
add tx count and thread id to stats, each stat reports and resets when slot changes (#20451) 2021-10-06 00:09:19 -05:00
Justin Starry 129716f3f0
Optimize stakes cache and rewards at epoch boundaries (#20432)
* Optimize stakes cache and rewards at epoch boundaries

* Fetch from accounts db

* Add cli flag for disabling epoch boundary optimization
2021-10-06 00:53:26 -04:00
Tao Zhu 6ff508c643
add transaction cost histogram metrics (#20350) 2021-10-05 08:57:39 -05:00
Brooks Prumo 4cd50f5d45
Don't gossip more snapshot hashes than what we retain (#20379) 2021-10-01 15:59:45 -05:00
Lijun Wang fe97cb2ddf
AccountsDb plugin framework (#20047)
Summary of Changes

Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows

Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.

The code comprises 4 major parts:

accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
2021-09-30 14:26:17 -07:00
Jeff Biseda 3854cfaa00
Use batch_send in forward_buffered_packets (#20330) 2021-09-29 20:49:43 -07:00
sakridge 94668c95c2
Prune sigverify queue (#20331) 2021-09-30 05:41:05 +02:00
Brooks Prumo 3ea6a01254
Only gossip snapshot hashes for full snapshots (#20271) 2021-09-27 19:29:08 -05:00
Jeff Biseda 640e93187c
periodically report sigverify_stage stats (#19674) 2021-09-21 10:37:58 -07:00
sakridge 013e1d9d49
Limit transaction forwarding from banking_stage (#19940) 2021-09-21 08:49:41 -07:00
carllin e6b4dd3866
Add bank to banking stage regardless of if there is a working bank (#19855) 2021-09-17 16:55:53 -07:00
Pavel Strakhov 65227f44dc
Optimize RPC pubsub for multiple clients with the same subscription (#18943)
* reimplement rpc pubsub with a broadcast queue

* update tests for new pubsub implementation

* fix: fix review suggestions

* chore(rpc): add additional pubsub metrics

* integrate max subscriptions check into SubscriptionTracker to reduce locking

* separate subscription control from tracker

* limit memory usage of items in pubsub broadcast queue, improve error handling

* add more pubsub metrics

* add final count metrics to pubsub

* add metric for total number of subscriptions

* fix small review suggestions

* remove by_params from SubscriptionTracker and add node_progress_watchers map instead

* add subscription tracker tests

* add metrics for number of pubsub notifications as a counter

* ignore clippy lint in TokenCounter

* fix underflow in token counter

* reduce queue capacity in pubsub tests

* fix(rpc): fix test timeouts

* fix race in account subscription test

* Add RpcSubscriptions::new_for_tests

Co-authored-by: Pavel Strakhov <p.strakhov@iconic.vc>
Co-authored-by: Nikita Podoliako <n.podoliako@zubr.io>
Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-09-17 13:40:14 -06:00
sakridge dc69cc1ae4
Only allow votes when root distance gets too high (#19917) 2021-09-16 15:12:26 +02:00
Justin Starry ca3f147670
Add banking metrics for buffered and dropped packets (#19902) 2021-09-15 15:53:55 -05:00
Tao Zhu 67fa9945e1
Add few more metrics data points (#19624)
* Add slot, count and accumulated-units to per-program-timings for determining transaction cost elements

* correct the stats naming; fixes the dirty bit resetting
2021-09-15 09:49:49 -05:00
Justin Starry 34c1a9ac85
Report consumed_buffered_packets_count stat to metrics (#19900) 2021-09-15 14:19:39 +00:00
Tyera Eulberg c91519961c
Use f64 for stake math in get_stake_percent_in_gossip (#19895) 2021-09-14 23:36:30 -06:00
Michael 4ff50519ff
Add an info log to indicate the node has reached supermajority and print the active stake percentage (#19893) 2021-09-14 21:48:15 -06:00
Jeff Washington (jwash) b57e86abf2
cache account hash info (#19426)
* cache account hash info

* ledger_path -> accounts_hash_cache_path
2021-09-13 20:39:26 -05:00
carllin 87a7f00926
Track reset bank in PohRecorder (#19810) 2021-09-13 16:55:35 -07:00
Brooks Prumo 62c8bcf565
Add default() to SnapshotConfig (#19776) 2021-09-12 13:44:27 -05:00
Brooks Prumo 7aa5f6b833
Add CLI args for incremental snapshots (#19694)
Add `--incremental-snapshots` flag to enable incremental snapshots.
This will allow setting `--full-snapshot-interval-slots` and
`--incremental-snapshot-interval-slots`.

Also added `--maximum-incremental-snapshots-to-retain`.

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-09-10 15:59:26 -05:00
Michael Vines 4386e09710 Reduce wait for supermajority threshold back to 80% 2021-09-09 21:17:35 -07:00
sakridge 3a8c678f62
Remove some copying (#19691) 2021-09-08 18:32:38 +02:00
Jeff Washington (jwash) 456bf15012
AccountsIndexConfig -> AccountsDbConfig (#19687) 2021-09-08 04:30:38 +00:00
Jeff Washington (jwash) d3f938f0cf
Remove Copy from AccountsIndexConfig. Not all types will support it (#19686) 2021-09-07 20:09:40 -05:00
Brooks Prumo a0552e5b46
Make startup aware of Incremental Snapshots (#19600) 2021-09-07 20:43:43 +00:00
behzad nouri 01a7ec8198
uses rayon thread-pool for retransmit-stage parallelization (#19486) 2021-09-07 15:15:01 +00:00
Brooks Prumo fe8ba81ce6
Rename to is_valid instead of is_invalid (#19670) 2021-09-07 09:31:54 -05:00
Brooks Prumo 9d9482b9d8
Plumb `maximum_incremental_snapshot_archives_to_retain` (#19640) 2021-09-06 18:01:56 -05:00
Sean Young d461a9ac10 verify_precompiles needs FeatureSet
Rather than pass in individual features, pass in the entire feature set
so that we can add the ed25519 program feature in a later commit.
2021-09-05 18:59:37 +01:00
Tyera Eulberg decec3cd8b
Demote write locks on transaction program ids (#19593)
* Add feature

* Demote write lock on program ids

* Fixup bpf tests

* Update MappedMessage::is_writable

* Comma nit

* Review comments
2021-09-04 03:05:30 +00:00
Brooks Prumo 1828579580
Pass SnapshotConfig to SnapshotPackagerService (#19616) 2021-09-03 21:42:32 +00:00
Brooks Prumo 5e25ee5ebe
Add maximum_incremental_snapshot_archives_to_retain to SnapshotConfig (#19612) 2021-09-03 20:21:32 +00:00
Brooks Prumo 7ab0aec61f
Rename maximum_full_snapshot_archives_to_retain (#19610)
To prepare for adding maximum_incremental_snapshot_archives_to_retain,
rename the current field in SnapshotConfig.
2021-09-03 11:28:10 -05:00
Brooks Prumo e9374d32a3
Revert "Make startup aware of Incremental Snapshots (#19550)" (#19599)
This reverts commit d45ced0a5d.
2021-09-02 19:14:41 -05:00
Brooks Prumo d45ced0a5d
Make startup aware of Incremental Snapshots (#19550) 2021-09-02 19:05:15 -05:00
Jeff Biseda 7a8eba10b2
add synchronization comment to handle_new_root (#19571) 2021-09-02 13:52:14 -07:00
Lijun Wang 8378e8790f
Accountsdb replication installment 2 (#19325)
This is the 2nd installment for the AccountsDb replication.

Summary of Changes

The basic google protocol buffer protocol for replicating updated slots and accounts. tonic/tokio is used for transporting the messages.

The basic framework of the client and server for replicating slots and accounts -- the persisting of accounts in the replica-side will be done at the next PR -- right now -- the accounts are streamed to the replica-node and dumped. Replication for information about Bank is also not done in this PR -- to be addressed in the next PR to limit the change size.

Functionality used by both the client and server side are encapsulated in the replica-lib crate.

There is no impact to the existing validator by default.

Tests:

Observe the confirmed slots replicated to the replica-node.
Observe the accounts for the confirmed slot are received at the replica-node side.
2021-09-01 14:10:16 -07:00
behzad nouri 6d9818b8e4
skips retransmit for shreds with unknown slot leader (#19472)
Shreds' signatures should be verified before they reach retransmit
stage, and if the leader is unknown they should fail signature check.
Therefore retransmit-stage can as well expect to know who the slot
leader is and otherwise just skip the shred.

Blockstore checking signature of recovered shreds before sending them to
retransmit stage:
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/blockstore.rs#L884-L930

Shred signature verifier:
https://github.com/solana-labs/solana/blob/4305d4b7b/core/src/sigverify_shreds.rs#L41-L57
https://github.com/solana-labs/solana/blob/4305d4b7b/ledger/src/sigverify_shreds.rs#L105
2021-09-01 15:44:26 +00:00
Brooks Prumo 1d5a8ebc6a
Revert "Add LastFullSnapshotSlot to SnapshotConfig (#19341)" (#19529)
This reverts commit 4d361af976.
2021-08-31 22:03:19 -05:00
Brooks Prumo fe9ee9134a
Make background services aware of incremental snapshots (#19401)
AccountsBackgroundService now knows about incremental snapshots.  It is
now also in charge of deciding if an AccountsPackage is destined to be a
SnapshotPackage or not (or just used by AccountsHashVerifier).

!!! New behavior changes !!!

Taking snapshots (both bank and archive) **MUST** succeed.

This is required because of how the last full snapshot slot is
calculated, which is used by AccountsBackgroundService when calling
`clean_accounts()`.

File system calls are now unwrapped and will result in a crash. As Trent told me:

>Well I think if a snapshot fails due to some IO error, it's very likely that the operator is going to have to intervene before it works.  We should exit error in this case, otherwise the validator might happily spin for several more hours, never successfully writing a complete snapshot, before something else brings it down.  This would leave the validator's last local snapshot many more slots behind than it would be had we exited outright and potentially force the operator to abandon ledger continuity in favor of a quick catchup

Other errors will set the `exit` flag to `true`, and the node will gracefully shutdown.

Fixes #19167 
Fixes #19168
2021-08-31 18:33:27 -05:00
Tyera Eulberg a3bef2e537
Fix shreds-to-hours/days estimations (#19477) 2021-08-30 13:16:06 -06:00
behzad nouri 8ad52fa095
implements copy-on-write for vote-accounts (#19362)
Bank::vote_accounts redundantly clones vote-accounts HashMap even though
an immutable reference will suffice:
https://github.com/solana-labs/solana/blob/95c998a19/runtime/src/bank.rs#L5174-L5186

This commit implements copy-on-write semantics for vote-accounts by
wrapping the underlying HashMap in Arc<...>.
2021-08-30 15:54:01 +00:00
carllin 84db04ce6c
Fix duplicate broadcast test (#19365) 2021-08-27 17:53:24 -07:00
Justin Starry 2d7f036afd
Add solana-program-runtime crate (#19438) 2021-08-27 00:30:36 +00:00
Brooks Prumo 6d939811e9
Name snapshots consistently (#19346)
#### Problem

Snapshot names are overloaded, and there are multiple terms that mean the same thing. This is confusing. Here's a list of ones in the codebase that I've found:

```
- snapshot_dir
- snapshots_dir
- snapshot_path
- snapshot_output_dir
- snapshot_package_output_path
- snapshot_archives_dir
```

#### Summary of Changes

For all the ones that are about the directory where snapshot archives are stored, ensure they are `snapshot_archives_dir`. For the ones about the (bank) snapshots directory, set to `bank_snapshots_dir`.


Co-authored-by: Michael Vines <mvines@gmail.com>
2021-08-21 15:41:03 -05:00
Brooks Prumo 4d361af976
Add LastFullSnapshotSlot to SnapshotConfig (#19341) 2021-08-20 17:06:53 +00:00
behzad nouri 1deb4add81
removes Slot from TransmitShreds (#19327)
An earlier version of the code was funneling through stakes along with
shreds to broadcast:
https://github.com/solana-labs/solana/blob/b67ffab37/core/src/broadcast_stage.rs#L127

This was changed to only slots as stakes computation was pushed further
down the pipeline in:
https://github.com/solana-labs/solana/pull/18971

However shreds themselves embody which slot they belong to. So pairing
them with slot is redundant and adds rooms for bugs should they become
inconsistent.
2021-08-20 13:48:33 +00:00
Trent Nelson e0bc5fa690 validator: Trusted validators are now called known validators 2021-08-19 22:43:49 -06:00
Jack May 3ec33e7d02
Fail secp256k1 if the instruction data looks incorrect (#19300) 2021-08-19 13:13:54 -07:00
Tao Zhu 4982dc20f9
replace function with const var for better readability (#19285) 2021-08-19 14:59:53 -05:00
Justin Starry c50b01cb60
Store versioned transactions in the ledger, disabled by default (#19139)
* Add support for versioned transactions, but disable by default

* merge conflicts

* trent's feedback

* bump Cargo.lock

* Fix transaction error encoding

* Rename legacy_transaction method

* cargo clippy

* Clean up casts, int arithmetic, and unused methods

* Check for duplicates in sanitized message conversion

* fix clippy

* fix new test

* Fix bpf conditional compilation for message module
2021-08-17 15:17:56 -07:00
Jeff Washington (jwash) 7c70f2158b
accounts_index_bins to AccountsIndexConfig (#19257)
* accounts_index_bins to AccountsIndexConfig

* rename param bins -> config

* rename BINS_FOR* to ACCOUNTS_INDEX_CONFIG_FOR*
2021-08-17 14:50:01 -05:00
Brooks Prumo f9986c66b8
Make SnapshotPackagerService aware of Incremental Snapshots (#19254)
Add a field to SnapshotPackage that is an enum for SnapshotType, so archive_snapshot_package() will do the right thing.

Fixes #19166
2021-08-17 13:01:59 -05:00
behzad nouri 7a8807b8bb retransmits shreds recovered from erasure codes
Shreds recovered from erasure codes have not been received from turbine
and have not been retransmitted to other nodes downstream. This results
in more repairs across the cluster which is slower.

This commit channels through recovered shreds to retransmit stage in
order to further broadcast the shreds to downstream nodes in the tree.
2021-08-17 13:44:10 +00:00
behzad nouri 3efccbffab sends shreds (instead of packets) to retransmit stage
Working towards channelling through shreds recovered from erasure codes
to retransmit stage.
2021-08-17 13:44:10 +00:00
behzad nouri 6e413331b5 removes erroneous uses of Arc<...> from retransmit stage 2021-08-17 13:44:10 +00:00
behzad nouri 8198a7eae1 adds packet/shred count stats to window-service
Adding back these metrics from the earlier commit which removed them
from retransmit stage.
2021-08-17 13:44:10 +00:00
behzad nouri bf437b0336 removes packet-count metrics from retransmit stage
Working towards sending shreds (instead of packets) to retransmit stage
so that shreds recovered from erasure codes are as well retransmitted.

Following commit will add these metrics back to window-service, earlier
in the pipeline.
2021-08-17 13:44:10 +00:00
behzad nouri 563aec0b4d
discards epoch-slots epochs ahead of the current root (#19256)
Cross cluster gossip contamination is causing cluster-slots hash map to
contain a lot of bogus values and consume too much memory:
https://github.com/solana-labs/solana/issues/17789

If a node is using the same identity key across clusters, then these
erroneous values might not be filtered out by shred-versions check,
because one of the variants of the contact-info will have matching
shred-version:
https://github.com/solana-labs/solana/issues/17789#issuecomment-896304969

The cluster-slots hash-map is bounded and trimmed at the lower end by
the current root. This commit also discards slots epochs ahead of the
root.
2021-08-17 13:13:28 +00:00
behzad nouri f33b7abffb
adds back cluster partitions to broadcast-duplicates (#19253)
An earlier version of this code was aiming to create a partition by
manipulating stakes, and setting some of them to zero:
https://github.com/solana-labs/solana/blob/cde146155/core/src/broadcast_stage/broadcast_duplicates_run.rs#L65-L116

https://github.com/solana-labs/solana/pull/18971
moved stakes computation further down the stream, and so that logic
could no longer live there. This commit adds back cluster partitions
by intercepting packets before send.
2021-08-16 22:24:30 +00:00
Michael Vines 3e5ba594e0 Revert `TestValidatorGenesis::start()` to v1.7.8 signature; add `TestValidatorGenesis::start_with_socket_addr_space()` 2021-08-16 06:37:23 +00:00
Michael Vines b15fa9fbd2 Add EtcdTowerStorage 2021-08-14 09:46:36 -07:00
carllin 22674000bd
Add EpochSlots frozen state transition (#19112) 2021-08-13 14:21:52 -07:00
Brooks Prumo 176036aa58
Rename AccountsPacakge to SnapshotPackage and AccountsPackagePre to AccountsPackage (#19231)
Renaming these types to better communicate their usages, which will
further diverge as incremental snapshot support is added.

With the new names, AccountsPacakge now refers to the type between
AccountsBackgroundProcess and AccountsHashVerifier, and SnapshotPackage
refers to the type between AccountsHashVerifier and
SnapshotPackagerService.
2021-08-13 16:08:09 -05:00
behzad nouri b64eeb7729 removes erroneous uses of &Arc<...> from window-service 2021-08-13 17:26:31 +00:00
behzad nouri d57398a959 removes repeated bank-forks locking in window-service
Window service is repeatedly locking bank-forks to look-up working-bank
for every single shred:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L597-L606

This commit updates shred_filter signature in recv_window so that where
we already obtain the lock on bank-forks, we can also look-up
working-bank once for all packets:
https://github.com/solana-labs/solana/blob/5fde4ee3a/core/src/window_service.rs#L256-L277
2021-08-13 17:26:31 +00:00
Jack May 0b50bb2b20
Deprecate FeeCalculator returning APIs (#19120) 2021-08-13 09:08:20 -07:00
behzad nouri 7a789e0763
filters for recent contact-infos when checking for live stake (#19204)
Contact-infos are saved to disk:
https://github.com/solana-labs/solana/blob/9dfeee299/gossip/src/cluster_info.rs#L1678-L1683

and restored on validator start-up:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L450

Staked nodes entries will not expire until an epoch after. So when the
validator checks for online stake it is erroneously picking up
contact-infos restored from disk, which breaks the entire
wait-for-supermajority logic:
https://github.com/solana-labs/solana/blob/9dfeee299/core/src/validator.rs#L1515-L1561

This commit adds an extra check for the age of contact-info entries and
filters out old ones.
2021-08-13 12:12:40 +00:00
Tao Zhu 414d904959
Reject blocks for costs above the max block cost (#18994)
* added realtime cost checking logic to reject block that would exceed max limit:
- defines max limits at block_cost_limits.rs
- right after each bath's execution, accumulate its cost and check again
  limit, return error if limit is exceeded

* update abi that changed due to adding additional TransactionError

* To avoid counting stats mltiple times, only accumulate execute-timing when a bank is completed

* gate it by a feature

* move cost const def into block_cost_limits.rs

* redefine the cost for signature and account access, removed signer part as it is not well defined for now

* check if per_program_timings of execute_timings before sending
2021-08-12 10:48:47 -05:00
Jeff Washington (jwash) e91988c977
cli for num account index bins (#19085) 2021-08-11 11:45:25 -05:00
Michael Vines 7ddda30126 `solana-test-validator` now uses FileTowerStorage 2021-08-11 00:20:46 -07:00
Michael Vines e9722474eb Move tower storage into its own module 2021-08-11 00:20:46 -07:00
Michael Vines d7ab510229 Move tower save into the VotingService 2021-08-11 00:20:46 -07:00
behzad nouri 00e5e12906 renames solana_runtime::vote_account::VoteAccount
Rename:
  VoteAccount    -> VoteAccountInner  # the private type
  ArcVoteAccount -> VoteAccount       # the public type
2021-08-10 22:54:17 +00:00
Brooks Prumo ccfa82461b
Pass SnapshotConfig to AccountsHashVerifier (#19154)
AccountsHashVerifier will need access to both the full and incremental
snapshot archive interval slots config values, which is in the
SnapshotConfig.

Also, cleanup some `Option<>` params and their references.
2021-08-10 14:02:34 -05:00
Brooks Prumo fd937548a0
Move SnapshotArchiveInfo and friends into its own module (#19114) 2021-08-08 07:57:06 -05:00
Brooks Prumo 00890957ee
Add snapshot_utils::bank_from_latest_snapshot_archives() (#18983)
While reviewing PR #18565, as issue was brought up to refactor some code
around verifying the bank after rebuilding from snapshots.  A new
top-level function has been added to get the latest snapshot archives
and load the bank then verify.  Additionally, new tests have been
written and existing tests have been updated to use this new function.

Fixes #18973

While resolving the issue, it became clear there was some additional
low-hanging fruit this change enabled.  Specifically, the functions
`bank_to_xxx_snapshot_archive()` now return their respective
`SnapshotArchiveInfo`.  And on the flip side,
`bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead
of separate paths and archive formats.  This bundling simplifies bank
rebuilding.
2021-08-06 20:16:06 -05:00
Michael Vines 397801a2d8 Extract tower storage details from Tower struct 2021-08-06 10:04:37 -07:00
behzad nouri e4be00fece falls back on working-bank if root-bank::epoch-staked-nodes is none
bank.get_leader_schedule_epoch(shred_slot)
is one epoch after epoch_schedule.get_epoch(shred_slot).

At epoch boundaries, shred is already one epoch after the root-slot. So
we need epoch-stakes 2 epochs ahead of the root. But the root bank only
has epoch-stakes for one epoch ahead, and as a result looking up epoch
staked-nodes from the root-bank fails.

To be backward compatible with the current master code, this commit
implements a fallback on working-bank if epoch staked-nodes obtained
from the root-bank is none.
2021-08-05 21:47:33 +00:00
behzad nouri eaf927cf49 allows only one thread to update cluster-nodes cache entry for an epoch
If two threads simultaneously call into ClusterNodesCache::get for the
same epoch, and the cache entry is outdated, then both threads recompute
cluster-nodes for the epoch and redundantly overwrite each other.

This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that
when needed only one thread does the computations to update the entry.
2021-08-05 21:47:33 +00:00
behzad nouri fb69f45f14 adds fallback & metric for when epoch staked-nodes are none 2021-08-05 21:47:33 +00:00
behzad nouri 50d0e830c9 unifies cluster-nodes computation & caching across turbine stages
Broadcast-stage is using epoch_staked_nodes based on the same slot that
shreds belong to:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

But retransmit-stage is using bank-epoch of the working-bank:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289

So the two are not consistent at epoch boundaries where some nodes may
have a working bank (or similarly a root bank) lagging other nodes. As a
result the node which obtains a packet may construct turbine broadcast
tree inconsistently with its parent node in the tree and so some packets
may fail to reach all nodes in the tree.
2021-08-05 21:47:33 +00:00
behzad nouri aa32738dd5 uses cluster-nodes cache in broadcast-stage
* Current caching mechanism does not update cluster-nodes when the epoch
  (and so epoch staked nodes) changes:
  https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344

* Additionally, the cache update has a concurrency bug in which the
  thread which does compare_and_swap may be blocked when it tries to
  obtain the write-lock on cache, while other threads will keep running
  ahead with the outdated cache (since the atomic timestamp is already
  updated).

In the new ClusterNodesCache, entries are keyed by epoch, and so if
epoch changes cluster-nodes will be recalculated. The time-to-live
eviction policy is also encapsulated and rigidly enforced.
2021-08-05 21:47:33 +00:00
behzad nouri 30bec3921e uses cluster-nodes cache in retransmit stage
The new cluster-nodes cache will:
  * ensure cluster-nodes are recalculated if the epoch (and so the epoch
    staked nodes) changes.
  * encapsulate time-to-live eviction policy.
2021-08-05 21:47:33 +00:00
behzad nouri ecc1c7957f implements cluster-nodes cache
Cluster nodes are cached keyed by the respective epoch from which stakes
are obtained, and so if epoch changes cluster-nodes will be recomputed.

A time-to-live eviction policy is enforced to refresh entries in case
gossip contact-infos are updated.
2021-08-05 21:47:33 +00:00
behzad nouri 44b11154ca sends slots (instead of stakes) through broadcast flow
Current broadcast code is computing stakes for each slot before sending
them down the channel:
https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228
https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349

Since the stakes are a function of epoch the slot belongs to (and so
does not necessarily change from one slot to another), forwarding the
slot itself would allow better caching downstream.

In addition we need to invalidate the cache if the epoch changes (which
the current code does not do), and that requires to know which slot (and
so epoch) current broadcasted shreds belong to:
https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344
2021-08-05 21:47:33 +00:00
Jeff Washington (jwash) e368f10973
add _for_tests to new_no_wallclock_throttle (#19086) 2021-08-05 14:50:25 -05:00
Jeff Washington (jwash) a9014ceceb
Bank::default_for_tests() (#19084) 2021-08-05 11:53:29 -05:00
behzad nouri 40914de811 updates cluster-slots with root-bank instead of root-slot + bank-forks
ClusterSlots::update is taking both root-slot and bank-forks only to
later lookup root-bank from bank-forks, which is redundant. Also
potentially by the time bank-forks is locked to obtain root-bank,
root-slot may have already changed and so be inconsistent with the
root-slot passed in as the argument.
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L32-L39
https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L122
2021-08-05 14:43:06 +00:00
behzad nouri 2fc112edcf removes unused code from cluster-slots 2021-08-05 14:43:06 +00:00
Jeff Washington (jwash) bf16b0517c
add _for_tests to setup_bank_and_vote_pubkeys (#19060) 2021-08-05 08:43:35 -05:00
Jeff Washington (jwash) 14361906ca
for all tests, bank::new -> bank::new_for_tests (#19064) 2021-08-05 08:42:38 -05:00
Jeff Washington (jwash) 3280ae3e9f
add validator option --accounts-db-skip-shrink (#19028)
* add validator option --accounts-db-skip-shrink

* typo
2021-08-04 17:28:33 -05:00
Jeff Washington (jwash) 1ed12a07ab
introduce Bank::new_for_tests (#19062) 2021-08-04 15:06:57 -05:00
Brooks Prumo ca14475085
Add incremental_snapshot_archive_interval_slots to SnapshotConfig (#19026)
This commit also renames `snapshot_interval_slots` to
`full_snapshot_archive_interval_slots`, updates the comments on the
fields, and make appropriate updates where SnapshotConfig is used.
2021-08-04 14:40:20 -05:00
Trent Nelson 06a7a9e544 remove superfluous `collect()`s 2021-08-04 07:21:55 +00:00
carllin 03353d500f
Actively manage dead slots in AncestorHashesService (#18912) 2021-08-02 14:33:28 -07:00
behzad nouri 049fb0417f
allows sendmmsg api taking owned values (as well as references) (#18999)
Current signature of api in sendmmsg requires a slice of inner
references:
https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152

That forces the call-site to convert owned values to references even
though doing so is redundant and adds an extra level of indirection:
https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291

This commit expands the api using AsRef and Borrow traits to allow
calling the method with owned values (as well as references like
before).
2021-07-30 20:58:49 +00:00
Tao Zhu 5d297ccf96
Cost model uses compute_unit to replace microsecond as cost unit (#18934)
* wip - cost_update_services to log both us and cu for each instruction to determine possible ratio

* replace microsecond with compute_unit as cost unit
2021-07-29 22:19:36 +00:00
Ryo Onodera da480bdb5f
Fix unstable retransmit-num_nodes (#18970) 2021-07-29 17:32:32 +00:00
behzad nouri d06dc6c8a6
shares cluster-nodes between retransmit threads (#18947)
cluster_nodes and last_peer_update are not shared between retransmit
threads, as each thread have its own value:
https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477

Additionally, with shared references, this code:
https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328
has a concurrency bug where the thread which does compare_and_swap,
updates cluster_nodes much later after other threads have run with
outdated cluster_nodes for a while. In particular, the write-lock there
may block.
2021-07-29 16:20:15 +00:00
Trent Nelson 71f6d839f9 validator: remove disused cuda config argument 2021-07-29 03:08:52 +00:00
Trent Nelson 8ed0cd0fff validator: check target CPU features earlier 2021-07-29 03:08:52 +00:00
Trent Nelson c435f7b3e3 validator: add avx2 runtime check 2021-07-29 03:08:52 +00:00
Trent Nelson e641f257ef test-validator: move feature check earlier in startup 2021-07-29 03:08:52 +00:00
Trent Nelson 59641623d1 Improve check for Apple M1 silicon under Rosetta 2021-07-29 03:08:52 +00:00
Jeff Biseda 9255ae334d
drop outstanding_requests lock before sending repair requests (#18893) 2021-07-28 19:30:43 -07:00
sakridge 84e78316b1
Write helper for multithread update (#18808) 2021-07-29 03:16:36 +02:00
Jack May f1b9f97aef
remove avx error on macos (#18923) 2021-07-27 16:34:04 -07:00
carllin c0704d4ec9
Plumb signal from replay to ancestor hashes service (#18880) 2021-07-26 20:59:00 -07:00
carllin 1ee64afb12
Introduce AncestorHashesService (#18812) 2021-07-23 16:54:47 -07:00
behzad nouri d2d5f36a3c
adds validator flag to allow private ip addresses (#18850) 2021-07-23 15:25:03 +00:00
Ryo Onodera 611af87fdb
Really start caching by fixing swapped CAS... (#18842) 2021-07-23 10:17:19 +09:00
Brooks Prumo d1debcd971
Add incremental snapshot utils (#18504)
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks.  This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.

Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.

Also of note, some renaming has happened:
  1. Snapshots are now either `full_` or `incremental_` throughout the
     codebase.  If not specified, the code applies to both.
  2. Bank snapshots now are called "bank snapshots"
     (before they were called "slot snapshots", "bank snapshots", or
      just "snapshots").  The one exception is within `Bank`, where they
     are still just "snapshots", because they are already "bank
     snapshots".
  3. Snapshot archives now have `_archive` in the code.  This
     should clear up an ambiguity between bank snapshots and snapshot
     archives.
2021-07-22 14:40:37 -05:00
behzad nouri 7d56fa8363
sends packets in batches from sigverify-stage (#18446)
sigverify-stage is breaking batches to single-item vectors before
sending them down the channel:
https://github.com/solana-labs/solana/blob/d451363dc/core/src/sigverify_stage.rs#L88-L92

Also simplifying window-service code, reducing number of nested branches.
2021-07-22 14:49:21 +00:00
Michael Vines 61865c0ee0 `solana-validator set-identity` now loads the tower file for the new identity 2021-07-21 22:22:08 -07:00
carllin 588c0464b8
Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
behzad nouri bbd22f06f4
implements generic lookups into gossip crds table (#18765)
This commit adds CrdsEntry trait which allows generic lookups into crds
table. For example to get ContactInfo or LowestSlot associated with a
Pubkey, the lookup code would be respectively:
   crds.get::<&ContactInfo>(pubkey)
   crds.get::<&LowestSlot>(pubkey)
2021-07-21 12:16:26 +00:00
carllin ce467bea20
Add frozen hashes and marking DuplicateConfirmed in blockstore to state machine (#18648) 2021-07-18 17:04:25 -07:00
behzad nouri e316586516 excludes private ip addresses 2021-07-16 20:05:48 -06:00
Jeff Biseda ae5ad5cf9b
sendmmsg cleanup #18589
Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.
2021-07-16 14:36:49 -07:00
Jack May ca71ca3d6d
Accumulate consumed units (#18714) 2021-07-16 12:40:12 -07:00
Justin Starry d166b9856a
Move transaction sanitization earlier in the pipeline (#18655)
* Move transaction sanitization earlier in the pipeline

* Renamed HashedTransaction to SanitizedTransaction

* Implement deref for sanitized transaction

* bring back process_transactions test method

* Use sanitized transactions for cost model calculation
2021-07-15 22:51:27 -05:00
carllin 8a846b048e
Add AncestorHashesRepair type (#18681) 2021-07-15 19:29:53 -07:00
Trent Nelson 3a85b77bb5 hijack secp256k1 enablement feature plumbing for libsecp256k1 upgrade 2021-07-15 18:43:55 +00:00
Trent Nelson 568660b402 Revert "Remove feature switch for secp256k1 program (#18467)"
This reverts commit fd574dcb3b.
2021-07-15 18:43:55 +00:00
sakridge 0f8bcf65af
Add voting service (#18552) 2021-07-15 16:35:51 +02:00
behzad nouri cf31afdd6a
makes CrdsGossip thread-safe (#18615) 2021-07-14 22:27:17 +00:00
Michael Vines b30b32300d `solana-validator set-identity` now works for voting validators 2021-07-14 09:42:35 -07:00
Michael Vines 62d864559f Tower cleanup: reduce fn visibility, remove unnecessary new_with_key() 2021-07-14 09:42:35 -07:00
sakridge 7f2254225e
Move entry/poh to own crate to speed up poh bench build (#18225) 2021-07-14 14:16:29 +02:00
behzad nouri c90af3cd63
removes id from push_lowest_slot args (#18645)
push_lowest_slot cannot sign the new crds-value unless the id (pubkey)
argument passed-in is the same pubkey as in ClusterInfo::keypair(), in
which case the id argument is redundant:
https://github.com/solana-labs/solana/blob/bb41cf346/gossip/src/cluster_info.rs#L824-L845

Additionally, the lookup is done with self.id(), but insert is done with
the id argument, which is logically a bug.
2021-07-13 22:32:59 +00:00
Tao Zhu 350baece21
Explicitly sanitize program id indexes before usage
1. check transaction has valid program_id before using it to avoid possible panic;
2. change calculate_cost function signature to return Result;
3. add CostModelError enum, update return type from Result<_, str> to Result<_, CostModelError>
2021-07-13 17:29:22 -05:00
Michael Vines 4098af3b5b Record vote account commission with voting/staking rewards and surface in RPC 2021-07-12 15:09:44 -07:00
carllin 175083c4c1
Add updated duplicate broadcast test (#18506) 2021-07-10 22:22:07 -07:00
Jack May e9ace3a0d5
cost model nits (#18528) 2021-07-09 12:55:31 -07:00
Justin Starry fd574dcb3b
Remove feature switch for secp256k1 program (#18467)
* Remove feature switch for secp256k1 program

* fix tests
2021-07-09 10:08:03 -05:00
carllin 4d3e301ee4
Introduce slot dumping to ReplayStage (#18160) 2021-07-08 19:07:32 -07:00
Tao Zhu b6dff12923
update ledger tool to restore cost table from blockstore (#18489)
* update ledger tool to restore cost model from blockstore when compute-slot-cost

* Move initialize_cost_table into cost_model, so the function can be tested and shared between validator and ledger-tool

* refactor and simplify a test
2021-07-07 23:44:51 -05:00
Michael Vines 1e0942e900 Rename ClusterInfo::send_vote to ClusterInfo::send_transaction 2021-07-07 15:51:14 -07:00
jbiseda a86ced0bac
generate deterministic seeds for shreds (#17950)
* generate shred seed from leader pubkey

* clippy

* clippy

* review

* review 2

* fmt

* review

* check

* review

* cleanup

* fmt
2021-07-07 08:21:12 -07:00
behzad nouri a0551b4054
persists repair-peers cache across repair service loops (#18400)
The repair-peers cache is reset each time repair service loop runs,
and so computed repeatedly for the same slots:
https://github.com/solana-labs/solana/blob/d2b07dca9/core/src/repair_service.rs#L275

This commit uses an LRU cache to persists repair-peers for each slot.
In addition to LRU eviction rules, in order to avoid re-using outdated
data, each entry also has 10 seconds TTL.
2021-07-07 14:12:09 +00:00
behzad nouri 04787be8b1
encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00
Justin Starry 100fabf469
Remove feature switch for demoting sysvar write locks (#18373) 2021-07-06 21:22:22 +00:00
Tao Zhu 0e039b4094
Aggregate cost_model into cost_tracker (#18374)
* * aggregate cost_model into cost_tracker, decouple it from banking_stage to prevent accidental deadlock. * Simplified code, removed unused functions

* review fixes
2021-07-06 15:41:25 +00:00
Michael Vines d5c2c72360 Rename Tower::lockouts to Tower::vote_state 2021-07-02 18:35:49 -07:00
Tao Zhu 7cd6224caf
log warning when channel send fails (#18391) 2021-07-02 19:04:09 +00:00
carllin 0eca92de18
Make set roots an iterator (#18357) 2021-07-01 20:02:40 -07:00
Michael Vines b6792a3328 Add ability to change the validator identity at runtime 2021-07-01 17:50:04 -07:00
Brooks Prumo 45d54b1fc6
Add SnapshotArchiveInfo and refactor functions in snapshot_utils (#18232) 2021-07-01 12:20:56 -05:00
Tao Zhu 5e424826ba
Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
2021-07-01 11:32:41 -05:00
Brooks Prumo 89a3e4f91e
Move SnapshotConfig into its own module (#18331)
Also move ArchiveFormat to snapshot_utils, and do not
reexport SnapshotVersion.
2021-07-01 08:55:26 -05:00
sakridge 8d9a6deda4
Add repair number per slot (#18082) 2021-06-30 18:20:07 +02:00
Trent Nelson 02b14caa5f test-validator: hold rent constant with `--slots-per-epoch` 2021-06-30 00:46:12 -06:00
carllin 68c87469c3
Cleanup ReplayStage tests (#18241) 2021-06-28 20:19:42 -07:00
Tao Zhu 9d6f1ebef4
investigate system performance test degradation (#17919)
* Add stats and counter around cost model ops, mainly:
- calculate transaction cost
- check transaction can fit in a block
- update block cost tracker after transactions are added to block
- replay_stage to update/insert execution cost to table

* Change mutex on cost_tracker to RwLock

* removed cloning cost_tracker for local use, as the metrics show clone is very expensive.

* acquire and hold locks for block of TXs, instead of acquire and release per transaction;

* remove redundant would_fit check from cost_tracker update execution path

* refactor cost checking with less frequent lock acquiring

* avoid many Transaction_cost heap allocation when calculate cost, which
is in the hot path - executed per transaction.

* create hashmap with new_capacity to reduce runtime heap realloc.

* code review changes: categorize stats, replace explicit drop calls, concisely initiate to default

* address potential deadlock by acquiring locks one at time
2021-06-28 21:34:04 -05:00
sakridge 5d08bf9aa3
More detailed voting timings in replay stage (#18229) 2021-06-26 17:32:08 +02:00
Trent Nelson d269975784 Revert "Clean up build warning"
This reverts commit 17a173ebb5.
2021-06-24 19:57:52 -06:00
Michael Vines 314102cb54 Remove redundant JsonRpcConfig::identity_pubkey field 2021-06-22 17:20:11 -07:00
sakridge e808f34b0b
Add batch stats (#18096) 2021-06-22 15:23:26 +02:00
Michael Vines 3b1517237c Clean up argument names 2021-06-21 21:29:52 -07:00
Michael Vines 84b9de8c18 Shredder no longer holds a keypair 2021-06-21 21:29:52 -07:00
Michael Vines 2435ea3ad8 Remove redundant ReplayStageConfig::my_pubkey field 2021-06-21 21:29:52 -07:00
Michael Vines 51a0007001 serve_repair: Remove internal ContactInfo field duplication 2021-06-21 17:23:49 -07:00
behzad nouri 598093b5db adds shred-version to ip-echo-server response
When starting a validator, the node initially joins gossip with
shred_verison = 0, until it adopts the entrypoint's shred-version:
https://github.com/solana-labs/solana/blob/9b182f408/validator/src/main.rs#L417

Depending on the load on the entrypoint, this adopting entrypoint
shred-version through gossip sometimes becomes very slow, and causes
several problems in gossip because we have to partially support
shred_version == 0 which is a source of leaking crds values from one
cluster to another. e.g. see
https://github.com/solana-labs/solana/pull/17899
and the other linked issues there.

In order to remove shred_version == 0 from gossip, this commit adds
shred-version to ip-echo-server response. Once the entrypoints are
updated, on validator start-up, if --expected_shred_version is not
specified we will obtain shred-version from the entrypoint using
ip-echo-server.
2021-06-21 19:37:16 +00:00
Jeff Washington (jwash) ec2f930475
user process.accounts_db_test_hash_calculation for debug_verify hash (#18053) 2021-06-21 10:20:27 -05:00
Michael Vines 4a12c715a3 Drop Error suffix from enum values to avoid the enum_variant_names clippy lint 2021-06-18 23:02:13 +00:00
Alexander Meißner 789f33e8db chore: cargo fmt 2021-06-18 10:42:46 -07:00
Alexander Meißner 6514096a67 chore: cargo +nightly clippy --fix -Z unstable-options 2021-06-18 10:42:46 -07:00
Tyera Eulberg d0511de9a6
chore: bump trees from 0.2.1 to 0.4.2 (#18052)
* chore: bump trees from 0.2.1 to 0.4.2 (#18041)

Bumps [trees](https://github.com/oooutlk/trees) from 0.2.1 to 0.4.2.
- [Release notes](https://github.com/oooutlk/trees/releases)
- [Commits](https://github.com/oooutlk/trees/commits)

---
updated-dependencies:
- dependency-name: trees
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Accommodate field & type changes

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-06-17 22:45:09 +00:00
Lijun Wang 071b1ee3e5
Removed pub from some functions which are actually private to improve encapsulation (#18030)
Remove the pub marker to improve encapsulation. Readability improvement only, no functional impact.
2021-06-17 10:14:21 -07:00
Michael Vines fa04531c7a Extricate RpcCompletedSlotsService from RetransmitStage 2021-06-16 16:20:35 -07:00
Trent Nelson 5bc6c89adc validator: run poh speed test earlier in start up 2021-06-16 21:27:08 +00:00
behzad nouri 161838655c
removes port-based forwarding logic from turbine retransmit (#17716)
Turbine retransmit logic is based on which socket it received the packet
from (i.e `packet.meta.forward`):
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can leave the cluster vulnerable to spoofing and selective
propagation of packets; see
https://github.com/solana-labs/solana/issues/6672
https://github.com/solana-labs/solana/pull/7774

This commit identifies if the node is on the "critical path" based on
its index in the shuffled cluster. If so, it forwards the packet to both
neighbors and children; otherwise, the packet is only forwarded to the
children.

The metrics added in
https://github.com/solana-labs/solana/pull/17351
shows that the number of times the index does not match the port is very
rare, and therefore this change should be safe.
2021-06-15 13:19:41 +00:00
carllin ccc013e134
Handle removing slots during account scans (#17471) 2021-06-14 21:04:01 -07:00
sakridge eeee75c5be
Don't use pinned memory when unnecessary (#17832)
Reports of excessive GPU memory usage and errors
from cudaHostRegister. There are some cases where pinning is
not required.
2021-06-14 16:10:04 +02:00
sakridge 0feac57cb0
Don't store votes unless we are leader soon (#17803) 2021-06-11 18:29:05 +02:00
carllin c8535be0e1
Port unconfirmed duplicate tracking logic from ProgressMap to ForkChoice (#17779) 2021-06-11 03:09:57 -07:00
carllin afafa624a3
Account for duplicate before a bank is frozen or replayed (#17866) 2021-06-10 22:28:23 -07:00
Lijun Wang 269d995832
Make account shrink configurable #17544 (#17778)
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true

Fixes #17544
2021-06-09 21:21:32 -07:00
Tao Zhu ae27fcbcda
replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
2021-06-09 17:10:59 -05:00
Justin Starry 050bb5446d
Add local cluster tests that broadcast duplicate slots (#13995)
* Add duplicate node local cluster test

* fix clippy

* remove dupe test
2021-06-09 15:01:48 -07:00
Michael Vines e5e7390d44 Wrap long lines 2021-06-08 12:05:29 -07:00
Tyera Eulberg 544b3c0d17
Create solana-poh and move remaining rpc modules to solana-rpc (#17698)
* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc
2021-06-04 09:23:06 -06:00
sakridge f97ce2cd7e
Per-program id timings (#17554) 2021-06-04 16:04:31 +02:00
behzad nouri be957f25c9
adds fallback logic if retransmit multicast fails (#17714)
In retransmit-stage, based on the packet.meta.seed and resulting
children/neighbors, each packet is sent to a different set of peers:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L421-L457

However, current code errors out as soon as a multicast call fails,
which will skip all the remaining packets:
https://github.com/solana-labs/solana/blob/708bbcb00/core/src/retransmit_stage.rs#L467-L470

This can exacerbate packets loss in turbine.

This commit:
  * keeps iterating over retransmit packets for loop even if some
    intermediate sends fail.
  * adds a fallback to UdpSocket::send_to if multicast fails.

Recent discord chat:
https://discord.com/channels/428295358100013066/689412830075551748/849530845052403733
2021-06-04 12:16:37 +00:00
Tyera Eulberg 3a647c4bea
Rename ValidatorExit and move to sdk (#17728) 2021-06-04 03:06:13 +00:00
carllin 96ba2edfeb
Switch EpochSlots to be frozen slots, not completed slots (#17168) 2021-06-03 00:20:00 +00:00
carllin bbcdf073ba
Support out of band dumping of unrooted slots in AccountsDb (#17269)
* Accounts dumping logic

* Add test for interaction between cache flush and remove_unrooted_slot()

* Update comments

* Rename

* renaming

* Add more comments

* Renaming

* Fixup test and bad check
2021-06-02 09:51:10 +00:00
Tao Zhu b000d490ce
Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
2021-06-01 09:16:17 -05:00
Ryo Onodera 1f97b2365f
Avoid full-range compactions with periodic filtered b.g. ones (#16697)
* Update rocksdb to v0.16.0

* Promote the infrequent and important log to info!

* Force background compaction by ttl without manual compaction

* Fix test

* Support no compaction mode in test_ledger_cleanup_compaction

* Fix comment

* Make compaction_interval customizable

* Avoid major compaction with periodic filtering...

* Adress lazy_static, special cfs and range check

* Clean up a bit and add comment

* Add comment

* More comments...

* Config code cleanup

* Add comment

* Use .conflicts_with()

* Nullify unneeded delete_range ops for special CFs

* Some clean ups

* Clarify the locking intention

* Ensure special CFs' consistency with PurgeType::CompactionFilter

* Fix comment

* Fix bad copy paste

* Fix various types...

* Don't use tuples

* Add a unit test for compaction_filter

* Fix typo...

* Remove flag and just use new behavior always

* Fix wrong condition negation...

* Doc. about no set_last_purged_slot in purge_slots

* Write a test and fix off-by-one bug....

* Apply suggestions from code review

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Follow up to github review suggestions

* Fix line-wrapping

* Fix conflict

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-05-28 16:42:56 +09:00
Tyera Eulberg ab581dafc2
Add block height to ConfirmedBlock structs (#17523)
* Add BlockHeight CF to blockstore

* Rename CacheBlockTimeService to be more general

* Cache block-height using service

* Fixup previous proto mishandling

* Add block_height to block structs

* Add block-height to solana block

* Fallback to BankForks if block time or block height are not yet written to Blockstore

* Add docs

* Review comments
2021-05-26 22:16:16 -06:00
Michael Vines 9541411c15 Plumb transaction-level rewards (aka "rent debits") into the `getTransaction` RPC method 2021-05-27 03:05:05 +00:00
carllin 52dccc656a
Purge slots greater than new last index (#16071) 2021-05-26 16:12:57 -07:00
Michael Vines cbce440af4 simulateTransaction can now return accounts modified by the simulation 2021-05-26 14:20:23 -07:00
Tyera Eulberg 6abe089740
Add custom error for tx-history queries when node does not support (#17494) 2021-05-26 13:27:41 -06:00
Tyera Eulberg 9a5330b7eb
Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
Tyera Eulberg e9bc1c6b07
Add last valid block height to rpc Fees (#17506)
* Add last_valid_block_height to fees rpc

* Add getBlockHeight rpc

* Update docs
2021-05-26 07:26:19 +00:00
Justin Starry 660d37aadf sigVerify conflicts with replace, add tests 2021-05-25 17:32:00 -07:00
Justin Starry e14f3eb529 rename flag 2021-05-25 17:32:00 -07:00
Justin Starry 96cef5260c Add a flag to simulateTransaction to use most recent blockhash 2021-05-25 17:32:00 -07:00
carllin d8bc56fa51
Refactor purge_slots_from_cache_and_store() and handle_reclaims() (#17319) 2021-05-24 13:51:17 -07:00
Tyera Eulberg 41ec1c8d50
Add blockstore-root-scan for api nodes on boot (#17402)
* Add blockstore-root-scan for api nodes on boot

* Ensure cluster-confirmed root and parents are set as root in blockstore in load_frozen_forks()

* Plumb rpc-scan-and-fix-roots validator flag
2021-05-24 13:24:47 -06:00
Michael Vines 30b60a976b Avoid ip_echo_server unwrap 2021-05-24 12:10:50 -07:00
behzad nouri e867d7f3b8
removes Crds::lookup and lookup_versioned (#17438) 2021-05-24 18:21:54 +00:00
sakridge a8dca3976b
Refactor genesis download/load/check functions (#17276)
* Refactor genesis ingest functions

* Consolidate genesis.bin/genesis.tar.bz2 references
2021-05-24 16:45:36 +02:00
behzad nouri 9d112cf41f
encapsulates purged values bookkeeping into crds module (#17265)
For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491

Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.

This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.
2021-05-24 13:47:21 +00:00
behzad nouri 060332c704
indexes crds votes by insert order (#17340)
Crds::get_votes is scanning over all votes in the crds table only to
return those inserted since the given cursor:
https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds.rs#L250-L266

Having votes indexed by insert order avoids the table scan and will be
more efficient.
2021-05-24 13:35:01 +00:00
behzad nouri 5567305a5f
rolls back min number of bloom items for debug builds (#17420)
coverage ci builds are have become flaky presumably because of the
overhead added in https://github.com/solana-labs/solana/pull/17236
for very small test clusters.

This commit uses a smaller min number of bloom items condition on that
if debug assertions are enabled or not.

Previous attempt at fixing the flakiness:
https://github.com/solana-labs/solana/pull/17408
2021-05-23 16:50:19 +00:00
carllin 8664b2cc39
Fix bad assertion (#17401) 2021-05-22 20:18:13 -07:00
behzad nouri cf1acfb021 uses Duration type for gossip discover timeout 2021-05-22 19:17:36 +00:00
behzad nouri d6496376ce increases timeout duration for gossip discover 2021-05-22 19:17:36 +00:00
behzad nouri a7870cda8d records hash of timed-out pull responses
Gossip should record hash of pull responses which are timed out and
fail to insert:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L397-L400

so that they are excluded from the next pull request:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491

otherwise the next pull request will likely include the same timed out
values and waste bandwidth.
2021-05-22 17:02:24 +00:00
Nikita d41266e4e9
rpc: add context toggle to getProgramAccounts (#17399)
* fix(rpc): return context in get_program_accounts

* doc(rpc): document withContext flag

* fix(rpc): fix comment

Co-authored-by: Michael Vines <mvines@gmail.com>

* fix(rpc): fix doc

Co-authored-by: Michael Vines <mvines@gmail.com>

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-05-22 07:12:21 +00:00
behzad nouri 9471ba61c5 removes redundant slots sort in push_epoch_slots 2021-05-21 19:17:15 +00:00
behzad nouri 9339a6f8f3 locks gossip only once in push_epoch_slots
push_epoch_slots is unlocking and locking again gossip when iterating
over epoch slot indices which is wasteful:
https://github.com/solana-labs/solana/blob/0486df02b/core/src/cluster_info.rs#L915-L929
2021-05-21 19:17:15 +00:00
behzad nouri ff0e623d30 removes the nested for loop from retransmit-stage
The code can be simplified by just flattening the vector of packets.
2021-05-21 17:10:56 +00:00
behzad nouri 71de021177 adds metric for turbine retransmit tree mismatch
In order to remove port-based forwarding logic in turbine, we need to
first track how often the turbine retransmit/broadcast trees mismatch
across nodes.
One consistency condition is that if the node is on the critical path
(i.e. the first node in each neighborhood), then we expect that the
packet arrives at tvu socket as opposed to tvu-forwards.

This commit adds a metric to track how often above condition is not met.
2021-05-21 17:10:56 +00:00
behzad nouri 2adce67260
extends crds values timeouts if stakes are unknown (#17261)
If stakes are unknown, then timeouts will be short, resulting in values
being purged from the crds table, and consequently higher pull-response
load when they are obtained again from gossip. In particular, this slows
down validator start where almost all values obtained from entrypoint
are immediately discarded.
2021-05-21 15:55:22 +00:00
behzad nouri 5e6b00fe98
prioritizes more recent values in pull responses (#17238)
On the receiving end, the outdated values are discarded, and they will
only waste bandwidth:
https://github.com/solana-labs/solana/blob/3f0480d06/core/src/crds_gossip_pull.rs#L385-L400

This is also exacerbating validator start, since the entrypoint is
returning old values in pull responses, and the validator immediately
discards those; resulting in huge delay until the validator obtains
contact-info of the entrypoint and is able to adopt shred-version and
fully start.
2021-05-21 14:07:46 +00:00
behzad nouri e8b35a4f7b
bumps up min number of bloom items in gossip pull requests (#17236)
When a validator starts, it has an (almost) empty crds table and it only
sends one pull-request to the entrypoint. The bloom filter in the
pull-request targets 10% false rate given the number of items. So, if
the `num_items` is very wrong, it makes a very small bloom filter with a
very high false rate:
https://github.com/solana-labs/solana/blob/2ae57c172/runtime/src/bloom.rs#L70-L80
https://github.com/solana-labs/solana/blob/2ae57c172/core/src/crds_gossip_pull.rs#L48

As a result, it is very unlikely that the validator obtains entrypoint's
contact-info in response. This exacerbates how long the validator will
loop on:
    > Waiting to adopt entrypoint shred version
https://github.com/solana-labs/solana/blob/ed51cde37/validator/src/main.rs#L390-L412

This commit increases the min number of bloom items when making gossip
pull requests. Effectively this will break the entrypoint crds table
into 64 shards, one pull-request for each, a larger bloom filter for
each shard, and increases the chances that the response will include
entrypoint's contact-info, which is needed for adopting shred version
and validator start.
2021-05-21 13:59:26 +00:00
behzad nouri 13b032b2d4
removes manual trait impl for contact-info (#17332)
The current implementations use only the id and disregard other fields,
in particular wallclock. This can lead to bugs where an outdated
contact-info shadows or overrides a current one because they compare
equal.
2021-05-19 20:56:10 +00:00
behzad nouri e7073ecab1
adds gossip metrics for number of staked nodes (#17330) 2021-05-19 19:25:21 +00:00
Tao Zhu 0781fe1b4f
Upgrade Rust to 1.52.0 (#17096)
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script

* upgrade to Rust 1.52.1

* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
2021-05-19 09:31:47 -05:00
Tyera Eulberg 827355a6b1
Create solana-rpc crate and move subscriptions (#17320)
* Move non_circulating_supply to runtime

* Add solana-rpc crate and move max_slots

* Move subscriptions to solana-rpc

* Single use statements
2021-05-19 00:54:28 -06:00
behzad nouri f7b0184f81
patches flaky test_new_mark_creation_time (#17288) 2021-05-18 13:39:35 +00:00
Trent Nelson 67e6a3106f rpc: plumb shred_version through RpcContactInfo 2021-05-14 08:36:08 +00:00
Tyera Eulberg 27004f1b76
Return error for excluded secondary-index keys (#17193)
* Add runtime helpers to check secondary indexes for key

* Add custom rpc error

* Check secondary-index key inclusion in rpc

* Clone complete AccountSecondaryIndexes into rpc to avoid bank query
2021-05-13 21:04:21 +00:00
behzad nouri 0e646d10bb
prunes received-cache only once per unique owner's key (#17039) 2021-05-13 13:50:16 +00:00
behzad nouri 0aa7824884
retains one node-instance per pubkey (#17187)
crds table retains up to 32 node-instance values per each pubkey. This
is so because if there are multiple running instances of the same node,
then we want gossip to propagate node-instance values associated with
both instances, therefore the corresponding label/key includes the
randomly generated token in addition to the pubkey:
https://github.com/solana-labs/solana/blob/9c42a89a4/core/src/crds_value.rs#L448
https://github.com/solana-labs/solana/pull/14037

As a result, the number of such values per pubkey are effectively
unbounded, requiring custom mitigations implemented in:
https://github.com/solana-labs/solana/pull/14467
but still taking redundant extra memory and bandwidth.

This commit instead retains only one node-instance per pubkey by
extending crds values override logic. If a crds value is of type
node-instance, it will always override an existing one with the same key
if it has more recent starting timestamp (not wallclock). As a result,
gossip will always propagate the node-instance with more recent
timestamp. Since the check_duplicate logic will stop the node with older
timestamp, this change should preserve existing functionality.
2021-05-13 13:35:46 +00:00
Lijun Wang 9c42a89a43
Issue #17008 -- make snapshot archives to hold on to configurable. (#17158)
* purge_old_snapshot_archives is changed to take an extra argument 'maximum_snapshots_to_retain' to control the max number of latest snapshot archives to retain. Note the oldest snapshot is always retained as before and is not subjected to this new options.
* The validator and ledger-tool executables are modified with a CLI argument --maximum-snapshots-to-retain. And the options are propagated down the call chains. Their corresponding shell scripts were changed accordingly.
* SnapshotConfig is modified to have an extra field for the maximum_snapshots_to_retain
* Unit tests are developed to cover purge_old_snapshot_archives
2021-05-12 10:32:27 -07:00
Tyera Eulberg 6e9deaf1bd
Move block-time caching earlier (#17109)
* Require that blockstore block-time only be recognized slot, instead of root

* Move cache_block_time to after Bank freeze

* Single use statement

* Pass transaction_status_sender by reference

* Remove unnecessary slot-existence check before caching block time altogether

* Move block-time existence check into Blockstore::cache_block_time, Blockstore no longer needed in blockstore_processor helper
2021-05-10 13:14:56 -06:00
Jeff Washington (jwash) f39dda00e0
type AccountSecondaryIndexes = HashSet (#17108) 2021-05-10 14:22:48 +00:00
behzad nouri 81ad795d46 removes position field in coding-shred-header
CodingShredHeader.position is equal to
  ShredCommonHeader.index - ShredCommonHeader.fec_set_index
and is so redundant. The extra position field can add bugs if not
consistent with index and fec_set_index.
2021-05-10 13:20:56 +00:00
behzad nouri 22c02b917e reads gossip push messages off crds ordinal index
Having an ordinal index on crds values based on insert order allows to
efficiently filter values using a cursor. In particular
CrdsGossipPush::push_messages hash-map can be replaced with a cursor,
saving on the bookkeepings, purging, etc
2021-05-09 22:40:41 +00:00
behzad nouri dfa3e7a61c indexes crds values by their insert order 2021-05-09 22:40:41 +00:00
Michael Vines d6c076f1b6 getBlockProduction now correctly reports block production 2021-05-07 19:04:51 -07:00
behzad nouri fa86a335b0
implements cursor for gossip crds table queries (#16952)
VersionedCrdsValue.insert_timestamp is used for fetching crds values
inserted since last query:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

So it is crucial that insert_timestamp does not go backward in time when
new values are inserted into the table. However std::time::SystemTime is
not monotonic, or due to workload, lock contention, thread scheduling,
etc, ... new values may be inserted with a stalled timestamp way in the
past. Additionally, reading system time for the above purpose is
inefficient/unnecessary.

This commit adds an ordinal index to crds values indicating their insert
order. Additionally, it implements a new Cursor type for fetching values
inserted since last query.
2021-05-06 14:04:17 +00:00
Michael Vines 9ba2c53b85 Add --tower argument to specify where tower files are persisted 2021-05-05 12:20:39 -07:00
Trent Nelson f17b80236f test-validator: Plumb --limit-ledger-size 2021-05-04 08:45:24 +00:00
carllin bc7e741514
Integrate gossip votes into switching threshold (#16973) 2021-05-04 00:51:42 -07:00
publish-docs.sh 6318705607 Add keys 2021-05-03 17:18:54 -07:00
publish-docs.sh b948a18841 Key rotation 2021-05-03 17:18:54 -07:00
publish-docs.sh b2778f34f5 Rotate keys 2021-05-03 17:18:54 -07:00
behzad nouri 7cea2c4466 validates gossip addresses before sending pull-requests
IP addresses need to be validated before sending packets to them.
This commit, sends a ping packet to nodes before any pull requests.
Pull requests are then only sent to the nodes which have responded with
the correct hash of their respective ping packet.
2021-05-03 18:21:06 +00:00
behzad nouri 2231017b35 uses Mutex instead of RwLock for ping_cache 2021-05-03 18:21:06 +00:00
behzad nouri a698e34744
patches local pending push messages processing (#16833)
process_push_messages writes local pending push messages to the crds
table, but it discards the return value:
https://github.com/solana-labs/solana/blob/cf779c63c/core/src/crds_gossip.rs#L96-L102

In order to exclude outdated values from the next pull-request, we need
to record the hash of values purged/overridden by the local push
messages, otherwise pull-responses will return outdated values back to
the node:
https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L447-L452

Additionally, gossip packets arrive and are processed out of order. So,
local pending push messages should be flushed *before* generating bloom
filters for pull-requests, preventing pull-responses returning the same
values back to the node itself. This requires flipping order of
generating pull and push messages:
https://github.com/solana-labs/solana/blob/cf779c63c/core/src/cluster_info.rs#L1757-L1762

Both above bugs cause redundant traffic and bandwidth waste in gossip
pull-responses.
2021-05-03 16:00:17 +00:00
Jeff Washington (jwash) 541aa5ad85
tests: lamports -> lamports() (#16982) 2021-05-03 10:45:54 -05:00
Justin Starry 8e561354d5
Improve readability of vote lockout processing (#16987)
* Improve readability of vote lockout processing

* clippy

* simplify comment

* feedback
2021-05-02 08:36:06 +00:00
carllin 5981399612
Distinguish max replayed and max observed vote (#16936) 2021-04-29 14:43:28 -07:00
Michael Vines 542d88929f Add getBlockProduction RPC method 2021-04-28 20:02:54 -07:00
carllin b5d30846d6
Retry latest vote if expired (#16735) 2021-04-28 11:46:16 -07:00
behzad nouri 25054bfd35
retains peer's contact-info when making pull requests (#16715)
ClusterInfo::new_pull_requests has to lookup contact-infos:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/cluster_info.rs#L1663-L1673

when it was already available when making pull requests:
https://github.com/solana-labs/solana/blob/a1ef2bd74/core/src/crds_gossip_pull.rs#L232
2021-04-28 13:19:12 +00:00
behzad nouri 1ac2a8cfa5
removes delayed crds inserts when upserting gossip table (#16806)
It is crucial that VersionedCrdsValue::insert_timestamp does not go
backward in time:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79

Otherwise methods such as get_votes and get_epoch_slots_since will
break, which will break their downstream flow, including vote-listener
and optimistic confirmation:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

For that, Crds::new_versioned is intended to be called "atomically" with
Crds::insert_verioned (as the comment already says so):
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129

However, currently this is violated in the code. For example,
filter_pull_responses creates VersionedCrdsValues (with the current
timestamp), then acquires an exclusive lock on gossip, then
process_pull_responses writes those values to the crds table:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392

Depending on the workload and lock contention, the insert_timestamps may
well be in the past when these values finally are inserted into gossip.

To avoid such scenarios, this commit:
  * removes Crds::new_versioned and Crd::insert_versioned.
  * makes VersionedCrdsValue constructor private, only invoked in
    Crds::insert, so that insert_timestamp is populated right before
    insert.

This will improve insert_timestamp monotonicity as long as Crds::insert
is not called with a stalled timestamp. Following commits may further
improve this by calling timestamp() inside Crds::insert, and/or
switching to std::time::Instant which guarantees monotonicity.
2021-04-28 11:56:13 +00:00
behzad nouri b17d5eeaee
moves cluster-info metrics to a separate module (#16883) 2021-04-28 02:04:49 +00:00
behzad nouri b468ead1b1
uses current timestamp when flushing local pending push queue (#16808)
local_message_pending_push_queue is recording timestamps at the time the
value is created, and uses that when the pending values are flushed:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L321
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip.rs#L96-L102

which is then used as the insert_timestamp when inserting values in the
crds table:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds_gossip_push.rs#L183

The flushing may happen 100ms after the values are created (or even
later if there is a lock contention). This will cause non-monotone
insert_timestamps in the crds table (where time goes backward),
hindering the usability of insert_timestamps for other computations.

For example both ClusterInfo::get_votes and get_epoch_slots_since rely
on monotone insert_timestamps when values are inserted into the table:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298

This commit removes timestamps from local_message_pending_push_queue and
uses current timestamp when flushing the queue.
2021-04-28 00:15:11 +00:00
steviez bc31378797
Trim extra shred bytes in blockstore (#16602)
Strip the zero-padding off of data shreds before insertion into blockstore

Co-authored-by: Stephen Akridge <sakridge@gmail.com>
Co-authored-by: Nathan Hawkins <utsl@utsl.org>
2021-04-27 17:40:41 -05:00
behzad nouri 3b8d6b59fb
records hash of values purged by expired pull-responses (#16800)
process_pull_responses should record hash of values purged by expired
responses (as well as unexpired ones):
https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L385-L387

otherwise, these values are not excluded from following pull-requests
(from likely different nodes):
https://github.com/solana-labs/solana/blob/c1829dd00/core/src/crds_gossip_pull.rs#L447-L452

and would waste bandwidth should they be included in subsequent
pull-responses.
2021-04-27 12:06:49 +00:00
behzad nouri 0f3ac51cf1
limits to data_header.size when combining shreds' payloads (#16708)
Shredder::deshred is ignoring data_header.size when combining shreds' payloads:
https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/shred.rs#L940-L961

Also adding more sanity checks on the alignment of data shreds indices.
2021-04-27 12:04:44 +00:00
Michael Vines 59fc33635a Add getVoteAccounts RPC method parameter to restrict results to a single vote account 2021-04-27 04:27:15 +00:00
behzad nouri 9706512115
removes old runtime feature gates in gossip and turbine (#16633) 2021-04-26 17:12:02 +00:00
Jeff Washington (jwash) ca14c18998
owner -> owner() (#16782) 2021-04-23 22:49:47 +00:00
Michael Vines 63436cc2bf
Disable flaky test_poh_service (#16772) 2021-04-23 12:14:11 -05:00
behzad nouri 2c82f2154d
retains crds values if the origin is still active (#16576)
Local timestamps are updated for records associated with a pubkey if the
origin is still active:
https://github.com/solana-labs/solana/blob/c8ed14c64/core/src/crds.rs#L301-L311

However this is done inconsistently on some gossip paths (pull requests
and pull responses) but not all (e.g. push messages). Additionally
update_record_timestamp is inefficient since there can be ~800 values
associated with each pubkey.

This commit updates records timestamps only on contact-infos; and,
instead utilizes origin's timestamp when purging old values.
2021-04-23 15:14:49 +00:00
behzad nouri 03194145c0
removes first_coding_index from erasure recovery code (#16646)
first_coding_index is the same as the set_index and is so redundant:
https://github.com/solana-labs/solana/blob/37b8587d4/ledger/src/blockstore_meta.rs#L49-L60
2021-04-23 12:00:37 +00:00
Justin Starry 75b8434b76
Add TPU client for sending txs to the current leader tpu port (#16736)
* Add TPU client for sending txs to the current leader tpu port

* Update tpu_client.rs
2021-04-23 09:35:12 +08:00
Tyera Eulberg 636b5987af
Update getLeaderSchedule options (#16749) 2021-04-22 19:27:30 +00:00
Michael Vines 6004c0abf5 getLeaderSchedule now supports filtered results based on validator identity 2021-04-21 17:59:26 -07:00
Michael Vines 91b6888e15 verify_pubkey() now takes a ref 2021-04-21 14:43:49 -07:00
carllin 4c94f8933f
Ingest votes from gossip into fork choice (#16560) 2021-04-21 14:40:35 -07:00
Michael Vines a1ef2bd74d Ignore flaky test_pull_request_time_pruning 2021-04-21 12:07:36 -07:00
behzad nouri 37b8587d4e
expands number of erasure coding shreds in the last batch in slots (#16484)
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719

Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714

However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.

This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.

As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
2021-04-21 12:47:50 +00:00
Tyera Eulberg 0924c2d070
Add port and gossip options to solana-test-validator (#16696) 2021-04-21 02:40:52 +00:00
Michael Vines 34addee882 getVoteAccounts: Limit the length of the `epoch_credits` array 2021-04-20 14:42:28 -07:00
sakridge 8e69dd42c1
Add non-default repair nonce values (#16512)
* Track outstanding nonces in repair

* Rework outstanding requests to use lru cache and randomize nonces

Co-authored-by: Carl <carl@solana.com>
2021-04-20 09:37:33 -07:00
behzad nouri bc90e04e64 uses current local timestamp when recording purged values
CrdsGossipPull.purged_values is meant to record recently purged values
so that they are excluded from imminent pull requests, until the entire
cluster have synced to the updated value:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L449-L454

However, VersionedCrdsValue.local_timestamp represents the local time
when the value was last updated, and given that crds values may have
different timeouts based on stake, it does not necessarily represent how
recently the value was purged:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds.rs#L75-L76

As such, recording current local timestamp when purging values is more
appropriate. Additionally, purge_purged assumes that the purge_values is
sorted in timestamps when draining the old ones; which is not true if
those timestamps are VersionedCrdsValue.local_timestamp:
https://github.com/solana-labs/solana/blob/c826cddbb/core/src/crds_gossip_pull.rs#L563-L571
2021-04-20 11:21:00 +00:00
Justin Starry a7e65c0034
RPC: use finalized as default pubsub commitment level (#16659)
* RPC: use finalized as default pubsub commitment level

* update docs

* Fix tests
2021-04-20 08:19:54 +00:00
Michael Vines c8b474cd0b Send votes to next leader's TPU instead of our TPU 2021-04-20 00:38:21 -07:00
Jeff Washington (jwash) 4aa753ff01
rename threads: 15 char limit (#16625) 2021-04-19 12:16:58 -05:00
Michael Vines b06e93fe5b Increase test timeout 2021-04-18 20:55:02 -07:00
Michael Vines a911ae00ba clippy 2021-04-18 20:55:02 -07:00
behzad nouri e405747409 Revert "Add limit and shrink policy for recycler (#15320)"
This reverts commit c2e8814dce.
2021-04-18 19:29:24 +00:00
Michael Vines 6907a2366e Remove unnecessary clone 2021-04-17 10:23:13 -07:00
Tyera Eulberg 974e6dd2c1
Deprecate "confirmed" RpcClient methods (#16520)
* Remove obsolete client methods

* Deprecate GetConfirmed client methods

* Rename Confirmed config structs, with appropriate deprecation

* Fixup client apps

* Map RpcRequest to deprecated when targeting older nodes
2021-04-15 17:00:14 -06:00
Tyera Eulberg 7dfb51c0b4
Cli: move airdrop to rpc requests (#16557)
* Add recent_blockhash to requestAirdrop

* Move tx confirmation to separate method

* Add RpcClient airdrop methods

* Request cli airdrop via RpcClient

* Pass optional faucet_addr into TestValidator and fix tests

* Update client/src/rpc_client.rs

Co-authored-by: Michael Vines <mvines@gmail.com>

Co-authored-by: Michael Vines <mvines@gmail.com>
2021-04-15 06:25:23 +00:00
behzad nouri d92721aab9
uses timeouts based on stake for filtering pull responses (#16549)
filter_pull_responses is using default timeout when discarding pull
responses (except for ContactInfo):
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L349-L350

But purging code uses timeouts based on stake:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L1867-L1870

So the crds value will not be purged from the sender's table and will be
sent again over the next pull request.
2021-04-14 20:18:00 +00:00
behzad nouri f35a6a8be0
prioritizes contact-infos in pull responses (#16541)
Expired crds values where the contact-info does not exist are wasted:
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/crds_gossip_pull.rs#L353-L378
and then are sent again over the next pull-request.

Also, the stake of the first response (which can be anything) is used to
weight all pull-responses to a node, while the rest of responses can
have different stake.
https://github.com/solana-labs/solana/blob/f804ce63c/core/src/cluster_info.rs#L2231
2021-04-14 18:45:20 +00:00
carllin f0c150cfb9
Fix channel panic in tests (#16503)
* Fix channel panic

* Add exit signal to PohRecorder because Crossbeam doesnt drop objects inside dropped channel
2021-04-14 12:07:21 -05:00
Tyera Eulberg 37afa00ffb
Rpc: deprecate getConfirmed endpoints (#16502)
* Deprecate getConfirmed methods in rpc

* Add new methods to docs

* Move deprecated rpc methods to separate docs section

* Add note to docs about removal timing
2021-04-13 01:50:15 -06:00
Justin Starry 85eb37fab0
Merge pull request from GHSA-8v47-8c53-wwrc
* Track transaction check time separately from account loads

* banking packet process metrics

* Remove signature clone in status cache lookup

* Reduce allocations when converting packets to transactions

* Add blake3 hash of transaction messages in status cache

* Bug fixes

* fix tests and run fmt

* Address feedback

* fix simd tx entry verification

* Fix rebase

* Feedback

* clean up

* Add tests

* Remove feature switch and fall back to signature check

* Bump programs/bpf Cargo.lock

* clippy

* nudge benches

* Bump `BankSlotDelta` frozen ABI hash`

* Add blake3 to sdk/programs/Cargo.lock

* nudge bpf tests

* short circuit status cache checks

Co-authored-by: Trent Nelson <trent@solana.com>
2021-04-13 00:28:08 -06:00
Tyera Eulberg 70f3f7e679
Move obsolete rpc endpoints to separate api for removal (#16500)
* Move obsolete rpc methods to separate api for removal

* Remove obsolete method from docs

* Fix test using obs method
2021-04-12 20:33:40 -06:00
Michael Vines 2229b70c4e Add authorized-voter add/remove-all commands 2021-04-12 15:55:28 -07:00
Michael Vines 17a173ebb5 Clean up build warning 2021-04-12 15:55:28 -07:00
carllin dc7030ffaa
Allow fork choice to support multiple versions of a slot (#16266) 2021-04-12 01:00:59 -07:00
carllin 99b3aab703
Track gossip vote updates per hash for replay stage (#16421)
* Track gossip vote updates per hash for replay stage
2021-04-10 17:34:45 -07:00
Christian Drappi 54a04bac3d
Apple M1 compatibility (#16346)
Co-authored-by: Christian Drappi <christiandrappi@Christians-MacBook-Pro.local>
2021-04-09 17:21:01 -07:00
Tyera Eulberg 8bc0bdd40b
Fill in not-yet-finalized block-time if possible (#16460) 2021-04-09 20:25:47 +00:00
behzad nouri 22a18a68e3
stops consuming pinned vectors with a recycler (#16441)
If the vector is pinned and has a recycler, From<PinnedVec>
implementation of Vec should clone (instead of consuming) the underlying
vector so that the next allocation of a PinnedVec will recycle an
already pinned one.
2021-04-09 16:55:24 +00:00
François Garillot b08cff9e77
Simplify some pattern-matches (#16402)
When those match an exact combinator on Option / Result.

Tool-aided by [comby-rust](https://github.com/huitseeker/comby-rust).
2021-04-08 12:40:37 -06:00
Tyera Eulberg bb9d2fd07a
Cli: use get_inflation_rewards and limit epochs queried (#16408)
* Fix block-with-limit when not finalized blocks found

* Enable confirmed commitment in getInflationReward

* Use get_inflation_rewards in cli

* Line up rewards output

* Add range validator

* Change cli epoch arg -> num epochs

* Add solana inflation rewards subcommand

* Consolidate epoch rewards meta
2021-04-08 10:57:33 -06:00
Josh e501fa5f0b
Rpc: introduce get_inflation_reward rpc call (#16278)
* feat: introduce get_inflation_reward rpc call

* fix: style suggestions

* fix: more style changes and match how other rpc functions are defined

* feat: get reward for a single epoch

* feat: default to the most recent epoch

* fix: don't factor out get_confirmed_block

* style: introduce from impl for RpcEncodingConfigWrapper

* style: bring commitment into variable

* feat: support multiple pubkeys for get_inflation_reward

* feat: add get_inflation_reward to rpc client

* feat: return rewards in order

* fix: rename pubkeys to addresses

* docs: introduce jsonrpc docs for get_inflation_reward

* style: early return in map (not sure which is more idiomatic)

* fix: call the rpc client function args addresses as well

* fix: style

* fix: filter out only addresses we care about

* style: make this more idiomatic

* fix: change rpc client epoch to optional and include some docs edits

* feat: filter out rent rewards in get_inflation_reward

* feat: add option epoch config param to get_inflation_reward

* feat: rpc client get_inflation_reward takes epoch instead of config and some filter staking and voting rewards
2021-04-06 18:10:53 -07:00
carllin 1219842a96
No wallclock throttle tests (#16396) 2021-04-05 19:40:16 -07:00
Trent Nelson b71875df61 cluster-info: Get rid of some integer math while we're here 2021-04-06 00:09:37 +00:00
Trent Nelson b6b08706b9 cluster-info: Don't subtract non-shred spies from node count 2021-04-06 00:09:37 +00:00
Trent Nelson 7a2a39093d validator: Use a const for wait for supermajority threshold 2021-04-05 17:29:37 -06:00
Michael Vines b242f82696 Reduce test-validator ledger size 2021-04-05 08:37:29 -07:00
behzad nouri 701fc93343
patches bug in banking stage where buffered packets are never retained (#16276)
banking_stage::handle_forwarding is retaining buffered packets with
empty index, so nothing is held:
https://github.com/solana-labs/solana/blob/6f3926b64/core/src/banking_stage.rs#L520
2021-04-05 12:46:21 +00:00
sakridge 3429785d9b
Wait for 90 percent of stake before starting (#16340) 2021-04-03 14:21:20 -07:00
carllin 4e5ef6bce2
Add cluster state verifier logging (#16330)
* Add cluster state verifier logging

* Add duplicate-slots iterator to ledger tool
2021-04-02 21:48:44 -07:00
Tyera Eulberg da27acabcc
Rpc: enable getConfirmedSignaturesForAddress2 to return confirmed (not yet finalized) data (#16281)
* Update blockstore method to allow return of unfinalized signature

* Support confirmed sigs in getConfirmedSignaturesForAddress2

* Add deprecated comments

* Update docs

* Enable confirmed transaction-history in cli

* Return real confirmation_status; fill in not-yet-finalized block time if possible
2021-04-01 04:35:57 +00:00
Tyera Eulberg 18bd47dbe1
Rpc: fix getConfirmedTransaction slot (#16288)
* Fix transaction blockstore apis

* Update blockstore apis in rpc
2021-03-31 21:04:00 -06:00
behzad nouri 3f63ed9a72
removes OrderedIterator and transaction batch iteration order (#16153)
In TransactionBatch,
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/transaction_batch.rs#L4-L11
lock_results[i] is aligned with transactions[iteration_order[i]]:
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2414-L2424
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/accounts.rs#L788-L817

However load_and_execute_transactions is iterating over
  lock_results[iteration_order[i]]
https://github.com/solana-labs/solana/blob/e50f59844/runtime/src/bank.rs#L2878-L2889
and then returning i as for the index of the retryable transaction.

If iteratorion_order is [1, 2, 0], and i is 0, then:
  lock_results[iteration_order[i]] = lock_results[1]
which corresponds to
  transactions[iteration_order[1]] = transactions[2]
so neither i = 0, nor iteration_order[i] = 1 gives the correct index for the
corresponding transaction (which is 2).

This commit removes OrderedIterator and transaction batch iteration order
entirely. There is only one place in blockstore processor which the
iteration order is not ordinal:
https://github.com/solana-labs/solana/blob/e50f59844/ledger/src/blockstore_processor.rs#L269-L271
It seems like, instead of using an iteration order, that can shuffle entry
transactions in-place.
2021-03-31 23:59:19 +00:00
sakridge 54c68ea83f
Drop write lock on sysvars (#15497)
* Drop write lock on sysvars

* adds env var for demoting sysvar write lock demotion

* moves demote logic to is_writable

* feature gates sysvar write lock demotion

* adds builtins to write lock demotion

* adds system program id to builtins

* adds Feature111...

* adds an abi-freeze test

* mvines set of builtin program keys

Co-authored-by: Michael Vines <mvines@gmail.com>

* update tests

* adds bpf loader keys

* Add test sysvar

* Plumb demote_sysvar to is_writable

* more plumbing of demote_sysvar_write_locks to is_writable

* patches test_program_bpf_instruction_introspection

* hard codes demote_sysvar_write_locks to false for serialization/encoding methods

* Revert "hard codes demote_sysvar_write_locks to false for serialization/encoding methods"

This reverts commit ae3e2d2e777437bddd753933097a210dcbc1b1fc.

* change the hardcoded ones to demote_sysvar_write_locks=true

* Use data_as_mut_slice

Co-authored-by: behzad nouri <behzadnouri@gmail.com>
Co-authored-by: Michael Vines <mvines@gmail.com>
2021-03-30 10:05:09 -07:00
Jeff Washington (jwash) 414c7070cb
poll checking for new record in poh service after every batch of hashes instead of busy waiting (#16167)
* poll waiting in poh service after every batch of hashes

* clippy
2021-03-30 10:34:21 -05:00
Jeff Washington (jwash) 5eff23db0c
TransactionRecorder uses unique channel so we can use Recv instead of RecvTimeout (#16195)
* time

* new channel each call

* new channel every time
2021-03-30 00:51:35 -05:00
Tyera Eulberg 3977ed5c82
Future-aware enum name 2021-03-29 14:58:05 -06:00
Tyera Eulberg 60ed8e2892
Rpc: enable getConfirmedBlocks and getConfirmedBlocksWithLimit to return confirmed (not yet finalized) data (#16161)
* Add commitment config capabilities

* Use rpc limit if no end_slot provided

* Limit to actually finalized blocks

* Support confirmed blocks in getConfirmedBlocks and getConfirmedBlocksWithLimit

* Update docs

* Add client plumbing

* Rename config enum
2021-03-29 12:41:31 -06:00
sakridge 60b4771fc6
Only print skipped leader slot message when the node is actually leader (#16156)
Also, check vote signature after the vote is signed
2021-03-26 17:45:53 -07:00
Tyera Eulberg 433f1ead1c
Rpc: enable getConfirmedBlock and getConfirmedTransaction to return confirmed (not yet finalized) data (#16142)
* Add Blockstore block and tx apis that allow unrooted responses

* Add TransactionStatusMessage, and send on bank freeze; also refactor TransactionStatusSender

* Track highest slot with tx-status writes complete

* Rename and unpub fn

* Add commitment to GetConfirmed input configs

* Support confirmed blocks in getConfirmedBlock

* Support confirmed txs in getConfirmedTransaction

* Update sigs-for-addr2 comment

* Enable confirmed block in cli

* Enable confirmed transaction in cli

* Review comments

* Rename blockstore method
2021-03-26 16:47:35 -06:00
Jeff Washington (jwash) 4f4cffbd03
Throttle PoH ticks by cumulative slot time (#16139)
* Throttle PoH ticks by cumulative slot time

* respond to pr feedback

* saturating sub

* updated comment
2021-03-26 18:54:16 +00:00
Jeff Washington (jwash) 06ac0fe9a3
increase timeout in TransactionRecorder.record (#16133) 2021-03-25 21:31:07 -05:00
sakridge b99ae8f334
Skip leader slots until a vote lands (#15607) 2021-03-25 18:54:51 -07:00
behzad nouri b041b55028
makes test_pull_request_time_pruning smaller (#16128) 2021-03-25 22:44:43 +00:00
sakridge 9b94741290
Fix test_replay_commitment_cache (#16131) 2021-03-25 21:16:39 +00:00
Justin Starry e817a6db00
Add timeout for local cluster partition tests (#16123)
* Add timeout for local cluster partition tests

* fix optimistic conf test logs

* Bump instruction count assertions
2021-03-25 13:27:07 -06:00
carllin 52703badfa
Setup ReplayStage confirmation scaffolding for duplicate slots (#9698) 2021-03-24 23:41:52 -07:00
Tyera Eulberg a8ef29df27
Support getBlockTime for unfinalized blocks (#16103) 2021-03-24 20:52:08 -06:00
sakridge 96ccc40f0a
Set ticks_per_slot higher for banking stage tests (#16094)
Tests are timing out because the bank hit the MaxTickHeight and
will not process the transactions.
2021-03-24 14:05:43 -07:00
Jeff Washington (jwash) f68860a643
poh record metrics (#16092) 2021-03-24 14:48:32 -05:00
behzad nouri a6c23648cb
limits CrdsGossipPull::pull_request_time size (#15793)
There is no pruning logic on CrdsGossipPull::pull_request_time
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_pull.rs#L172-L174
potentially allowing this to take too much memory.

Additionally, CrdsGossipPush::last_pushed_to is pruning recent push
timestamps:
https://github.com/solana-labs/solana/blob/79ac1997d/core/src/crds_gossip_push.rs#L275-L279
instead of the older ones.

Co-authored-by: Nathan Hawkins <utsl@utsl.org>
2021-03-24 18:33:56 +00:00
behzad nouri 570fd3f810
makes turbine peer computation consistent between broadcast and retransmit (#14910)
get_broadcast_peers is using tvu_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/broadcast_stage.rs#L362-L370
which is potentially inconsistent with retransmit_peers:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1332-L1345

Also, the leader does not include its own contact-info when broadcasting
shreds:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/cluster_info.rs#L1324
but on the retransmit side, slot leader is removed only _after_ neighbors and
children are computed:
https://github.com/solana-labs/solana/blob/84e52b606/core/src/retransmit_stage.rs#L383-L384
So the turbine broadcast tree is different between the two stages.

This commit:
* Removes retransmit_peers. Broadcast and retransmit stages will use tvu_peers
  consistently.
* Retransmit stage removes slot leader _before_ computing children and
  neighbors.
2021-03-24 13:34:48 +00:00
Justin Starry e7fd7d46cf
rpc: add getSlotLeaders method (#16057) 2021-03-23 17:48:54 +00:00
behzad nouri 4f82b897bc
buffers data shreds to make larger erasure coded sets (#15849)
Broadcast stage batches up to 8 entries:
https://github.com/solana-labs/solana/blob/79280b304/core/src/broadcast_stage/broadcast_utils.rs#L26-L29
which will be serialized into some number of shreds and chunked into FEC
sets of at most 32 shreds each:
https://github.com/solana-labs/solana/blob/79280b304/ledger/src/shred.rs#L576-L597
So depending on the size of entries, FEC sets can be small, which may
aggravate loss rate.
For example 16 FEC sets of 2:2 data/code shreds each have higher loss
rate than one 32:32 set.

This commit broadcasts data shreds immediately, but also buffers them
until it has a batch of 32 data shreds, at which point 32 coding shreds
are generated and broadcasted.
2021-03-23 14:52:38 +00:00
Jeff Washington (jwash) 57ba86c821
eliminate lock on record (#15929)
* eliminate lock on record

* use same error as MaxHeightReached

* clippy

* review feedback

* refactor should_tick code

* pr feedback
2021-03-23 09:10:04 -05:00
Jeff Washington (jwash) 2fc609a294
add metric for ticks from poh_recorder.record (#16047) 2021-03-22 15:35:06 -05:00
Tyera Eulberg 2ec24d438f
Make getStakeActivation response consistent for undelegated accounts (#16038) 2021-03-19 14:54:56 -06:00
Jeff Washington (jwash) ddc758439e
metrics for poh_recorder.record (#15998) 2021-03-19 09:48:55 -05:00
Tyera Eulberg aa54c468ea
rpc: Add config options limiting getConfirmedBlock response data (#15970)
* Add new confirmed block struct

* Add RpcConfirmedBlockConfig options

* Configure block response based on new options

* Add client api, use in cli fetch_epoch_rewards

* Update docs

* Apply review suggestions
2021-03-18 17:58:20 +00:00
Michael Vines 04c99cf7ea Add --slots-per-epoch argument 2021-03-17 22:56:41 +00:00
carllin f548a04fae
Allow unbounded wallclock processing time in tests (#15961) 2021-03-17 15:48:50 -07:00
Michael Vines 59c19d9fbf Notice the user when the --mint, --bpf-program, or --clone arguments are ignored 2021-03-17 20:04:53 +00:00
Michael Vines 8a9b51952e Ignore flaky test_banking_stage_entries_only and test_banking_stage_entryfication 2021-03-17 11:28:56 -07:00
Jeff Washington (jwash) 40997d0aef
add metrics for tick producer and poh_recorder (#15931) 2021-03-17 10:38:26 -05:00
Jeff Washington (jwash) 5460fb10a2
drop poh lock after record (#15930) 2021-03-17 10:37:20 -05:00
Jeff Washington (jwash) efee8b62d7
a few missed set_data calls (#15846)
* a few missed set_data calls

* another set data call
2021-03-15 21:57:23 -05:00
Jeff Washington (jwash) c09ea2c314
More AccountSharedData construction (#15844)
* one more AccountSharedData construction

* one more construct
2021-03-15 19:27:17 -05:00
carllin c1ba265dd9
Wallclock BankingStage Throttle (#15731) 2021-03-15 17:11:15 -07:00
Tyera Eulberg 5b2da19c93
Rpc: support extended config for getConfirmedBlock (#15827)
* Add rpc confirmed-block config wrapper to support struct of extended config

* Update docs

* Make config wrapper generic and use in getConfirmedTransaction as well

* Update/clean confirmed-tx docs
2021-03-12 22:19:45 +00:00
behzad nouri f2865dfd63
requires stakes for propagating crds values through gossip (#15561) 2021-03-12 15:50:14 +00:00
Justin Starry 918d04e3f0
Add more slot update notifications (#15734)
* Add more slot update notifications

* fix merge

* Address feedback and add integration test

* switch to datapoint

* remove unused shred method

* fix clippy

* new thread for rpc completed slots

* remove extra constant

* fixes

* rely on channel closing

* fix check
2021-03-12 21:44:06 +08:00
Ryo Onodera 4bbeb9c033
Remove old feature: simple_capitalization (#15763)
* Remove old feature: simple_capitalization

* Fix another failing test in core

* Finish up test cleanup

* Further clean up a bit
2021-03-12 11:12:40 +09:00
Jeff Washington (jwash) 952c3bcbb7
AccountSharedData construction (#15790) 2021-03-11 18:09:04 -06:00
Jeff Washington (jwash) 1135ffd595
mut data refs as slice (#15782) 2021-03-10 15:28:03 -06:00
behzad nouri 56923c91bf
limits number of unique pubkeys in the crds table (#15539) 2021-03-10 20:46:05 +00:00
Jeff Washington (jwash) 52e54e1100
account.data -> data() (#15778) 2021-03-09 22:31:33 +00:00
Jeff Washington (jwash) 8a3135d17b
Account->AccountSharedData (#15691) 2021-03-09 15:06:07 -06:00
carllin 2bee9435f3
Add tracer key for tracing transaction path through the network (#15732) 2021-03-08 19:31:00 -08:00
carllin 331c45decf
Report datapoint on number of retransmit shreds (#15694) 2021-03-08 17:54:53 -08:00
sakridge d09112fa6d
PoH batch size calibration (#15717) 2021-03-05 16:01:21 -08:00
Michael Vines 4a3ab77baf Remove unused id field 2021-03-05 19:07:59 +00:00
Michael Vines 66b781eec3 Add 'unknown' health check state 2021-03-05 17:46:50 +00:00
Tyera Eulberg 7e65289729
Convert blockstore TransactionStatus column family to protobufs (#15733)
* Prevent panic if TransactionStatus can't be deserialized

* Convert Blockstore TransactionStatus column to protobuf

* Add compatability test
2021-03-05 09:05:35 -07:00
Michael Vines bd13262b42 Add validator startup process reporting before RPC is available 2021-03-05 08:03:36 -08:00
Michael Vines 24ab84936e Break up RPC API into three categories: minimal, full and admin 2021-03-04 16:39:44 -08:00
Jeff Washington (jwash) 34bebb7d09
report execution details in replay time (#15693) 2021-03-04 11:38:12 -06:00
Jeff Washington (jwash) be35c1c1b7
add execute detail timings (#15638) 2021-03-03 17:07:45 -06:00
behzad nouri 658951e680
sends only the latest vote of each validator to the banking stage (#15629) 2021-03-03 19:07:16 +00:00
carllin aacb28c453
Only report metrics every second (#15652) 2021-03-03 10:58:47 -08:00
sakridge 830be855dc
Forward and hold packets (#15634) 2021-03-03 10:23:05 -08:00
Tyera Eulberg 19ac79b5cc
Deprecate UiTokenAmount::ui_amount (#15616)
* Add TokenAmount::ui_amount_string

* Fixup solana-tokens

* Update docs
2021-03-02 22:51:41 -07:00
Tyera Eulberg a4f0033bd7
Remove ValidatorConfig derive Clone, and fix local-cluster tests (#15647)
* Remove ValidatorConfig derive Clone

* Add local-cluster ValidatorConfig helpers

* Fix benches
2021-03-03 04:21:30 +00:00
behzad nouri 0bd0084b0d
adds more metrics for tx counts and batch sizes (#15642) 2021-03-03 01:28:15 +00:00
behzad nouri 416ea38028
adds metrics for the size and number of batches in bank_send_loop (#15627) 2021-03-02 15:44:35 +00:00
Greg Fitzgerald 2463cc1e6a
Fix typos (#15610) 2021-03-02 06:36:49 -08:00
Michael Vines 640e36287e Move ValidatorExit into ValidatorConfig, making it accessible from the solana-validator crate 2021-03-01 16:49:56 -08:00
sakridge f1223fb783
Lower blockstore processor error severity (#15578) 2021-03-01 14:57:37 -08:00
carllin ae96ba3459
Plumb slot update pubsub notifications (#15488) 2021-02-28 23:29:11 -08:00
behzad nouri f7a049f87f
coalesces vote packets into one Packets (#15566) 2021-02-26 23:23:08 +00:00
sakridge 05409e51ce
Increase tpu coalescing and add parameter (#15536)
Should create larger entries on average
2021-02-26 09:15:45 -08:00
behzad nouri 5a9896706c
indexes epoch slots in crds table (#15459)
ClusterInfo::get_epoch_slots_since scans the entire crds table to obtain
epoch-slots inserted since a timestamp:
https://github.com/solana-labs/solana/blob/013daa8f4/core/src/cluster_info.rs#L1245-L1262
The alternative is to index epoch-slots in crds table ordered by their
insert timestamp.
2021-02-26 14:12:04 +00:00
Tyera Eulberg 1ad2c9f741
Revert "Make UiTokenAmount::ui_amount a String (#15447)" (#15542)
This reverts commit d14374bc9f.
2021-02-25 21:53:40 +00:00
Michael Vines 5b54aed1c0 Speed up getLeaderSchedule 2021-02-24 11:17:25 -08:00
Justin Starry 61ed980ac0
Fix received notifications for gossip signature subscriptions (#15506) 2021-02-24 16:59:22 +08:00
carllin c2e8814dce
Add limit and shrink policy for recycler (#15320) 2021-02-24 00:15:58 -08:00
Tyera Eulberg 52f2d425e5
Count if optimistically confirmed slot is already rooted (#15492) 2021-02-23 22:03:22 +00:00
sakridge 1b59b163dd
Add max retransmit and shred insert slot (#15475) 2021-02-23 13:06:33 -08:00
Michael Vines 4b0114b991 Limit the number of getProgramAccounts filters 2021-02-23 18:43:22 +00:00
Tyera Eulberg d14374bc9f
Make UiTokenAmount::ui_amount a String (#15447)
* Make UiTokenAmount::ui_amount a String

* Fixup solana-tokens

* Ignore spl downstream-project
2021-02-22 13:05:45 -07:00
Ryo Onodera 5ccaa6336a
Print original error from accounts dir remove (#15458) 2021-02-22 21:24:09 +09:00
Ivan Mironov 013daa8f47 RPC: Improve snapshot path sanitization 2021-02-20 13:06:07 -08:00
Michael Vines 5df36aec7d Pacify clippy 2021-02-19 20:08:41 -08:00
Michael Vines fd3b71a2c6 cargo fmt 2021-02-19 20:08:41 -08:00
behzad nouri aa3aac766f
adds metrics for inbound/outbound gossip packets counts (#15407) 2021-02-19 22:49:35 +00:00
Tyera Eulberg 170cb792eb
Return blockstore error if previous_blockhash cannot be determined (#15382)
* Return blockstore error if previous_blockhash cannot be determined

* Add require_previous_blockshash flag
2021-02-18 01:04:52 +00:00
Trent Nelson 7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00