Commit Graph

173 Commits

Author SHA1 Message Date
Tao Zhu c1d89ad749
forward packets by prioritization in desc order (#25406)
- Forward packets by prioritization in desc order
- Add support of cost-tracking by transaction requested compute units
- Hook up account buckets to forwarder
- Add metrics for forwardable batches count
- Remove redundant invalid packets filtering at end of slot since forwarder will do the same when batch forwardable packets
- Add bench test for forwarding
2022-07-05 23:24:58 -05:00
carllin ce39c14025
Add end-to-end replay slot metrics (#25752) 2022-07-05 13:58:51 -05:00
behzad nouri 1f0f5dc03e verifies shred-version in fetch stage
Shred versions are not verified until window-service where resources are
already wasted to sig-verify and deserialize shreds.
The commit verifies shred-version earlier in the pipeline in fetch stage.
2022-06-22 12:17:37 +00:00
behzad nouri d2afa6b418
moves packet-hasher out of the mutex (#26091)
Packet-hasher is not mutated across threads and does not need to be
wrapped in a mutex.
2022-06-21 16:29:27 +00:00
carllin bf8faa8a30
Report banking stage tracer metrics (#25620) 2022-06-09 00:25:37 -05:00
sakridge d71986cecf
Separate staked and un-staked on quic tpu port (#24339) 2022-04-16 10:54:22 +02:00
sakridge 1b7d1f78de
Implement QUIC connection warmup service for future leaders (#24054)
* Increase connection timeouts

* Bump quic connection cache to 1024

* Use constant for quic connection timeout and add warm cache service

* Fixes to QUIC warmup service

* fix check failure

* fixes after rebase

* fix timeout test

Co-authored-by: Pankaj Garg <pankaj@solana.com>
2022-04-15 12:09:24 -07:00
HaoranYi ba770832d0
Poh timing service (#23736)
* initial work for poh timing report service

* add poh_timing_report_service to validator

* fix comments

* clippy

* imrove test coverage

* delete record when complete

* rename shred full to slot full.

* debug logging

* fix slot full

* remove debug comments

* adding fmt trait

* derive default

* default for poh timing reporter

* better comments

* remove commented code

* fix test

* more test fixes

* delete timestamps for slot that are older than root_slot

* debug log

* record poh start end in bank reset

* report full to start time instead

* fix poh slot offset

* report poh start for normal ticks

* fix typo

* refactor out poh point report fn

* rename

* optimize delete - delete only when last_root changed

* change log level to trace

* convert if to match

* remove redudant check

* fix SlotPohTiming comments

* review feedback on poh timing reporter

* review feedback on poh_recorder

* add test case for out-of-order arrival of timing points and incomplete timing points

* refactor poh_timing_points into its own mod

* remove option for poh_timing_report service

* move poh_timing_point_sender to constructor

* clippy

* better comments

* more clippy

* more clippy

* add slot poh timing point macro

* clippy

* assert in test

* comments and display fmt

* fix check

* assert format

* revise comments

* refactor

* extrac send fn

* revert reporting_poh_timing_point

* align loggin

* small refactor

* move type declaration to the top of the module

* replace macro with constructor

* clippy: remove redundant closure

* review comments

* simplify poh timing point creation

Co-authored-by: Haoran Yi <hyi@Haorans-MacBook-Air.local>
2022-03-30 09:04:49 -05:00
Tao Zhu fd515097d8 leader qos part 2: add stage to find sender stake, set to packet meta 2022-03-17 19:31:28 -05:00
Stephen Akridge 976b138e76 Add tx weighting stage 2022-03-17 19:31:28 -05:00
Tao Zhu 35d1235ed0
- move `unprocessed_packet_batches` from `BankingStage` to its own (#23508)
module
- deserialize packets during receving and buffering
2022-03-10 18:47:46 +00:00
Yueh-Hsuan Chiang b8b7163b66
(Ledger Store) Report RocksDB Column Family Metrics (#22503)
This PR enables blockstore to periodically report RocksDB column family properties.
The reported properties are under blockstore_rocksdb_cfs, and the properties also
support group by operation on cf_name.
2022-03-05 16:13:03 -08:00
HaoranYi 8de88d0a55
Refactor packet_threshold adjustment code into its own struct (#23216)
* refactor packet_threshold adjustment code into own struct and add unittest for it

* fix a typo in error message

* code review feedbacks

* another code review feedback

* Update core/src/ancestor_hashes_service.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* share packet threshold with repair service (credit to carl)

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-03-02 09:09:06 -06:00
carllin 619335df1a
Add execute timings (#23097) 2022-02-17 01:14:32 -05:00
carllin 2f9e30a1f7
Introduce slot-specific packet metrics (#22906) 2022-02-11 03:07:45 -05:00
Ashwin Sekar 5acf0f6331
Add feature gate for new vote instruction and plumb through replay (#21683)
* Add feature gate for new vote instruction and plumb through replay

Add tower versions

* Add check for slot hashes history

* Update is_recent check to exclude voting on hard fork root slot

* Move tower rollback test to flaky and ignore it until #22551 lands
2022-02-07 14:06:19 -08:00
anatoly yakovenko d343713f61
Optimize packet dedup (#22571)
* Use bloom filter to dedup packets

* dedup first

* Update bloom/src/bloom.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* Update core/src/sigverify_stage.rs

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>

* fixup

* fixup

* fixup

Co-authored-by: Trent Nelson <trent.a.b.nelson@gmail.com>
2022-01-19 13:58:20 -08:00
Justin Starry 93c776ce19
Refactor packet deduplication and harden bench test (#22080) 2021-12-22 23:05:10 -06:00
Jeff Biseda 97a1fa10a6
streamer send destination metrics for repair, gossip (#21564) 2021-12-17 15:21:05 -08:00
Jeff Biseda 2ed7e3af89
prioritize slot repairs for unknown last index and close to completion (#21070) 2021-11-19 19:17:30 -08:00
sakridge 0bda0c3e0c
Add bank drop service (#21322) 2021-11-19 17:20:18 +01:00
Tao Zhu 11153e1f87
refactor cost calculation (#21062)
* - cache calculated transaction cost to allow sharing;
- atomic cost tracking op;
- only lock accounts for transactions eligible for current block;
- moved qos service and stats reporting to its own model;
- add cost_weight default to neutral (as 1), vote has zero weight;

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

* Update core/src/qos_service.rs

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>

Co-authored-by: Tyera Eulberg <teulberg@gmail.com>
2021-11-12 01:04:53 -06:00
sakridge a8d78e89d3
Move test-validator to own module to reduce core dependencies (#20658)
* Move test-validator to own module to reduce core dependencies

* Fix a few TestValidator paths

* Use solana_test_validator crate for solana_test_validator bin

* Move client int tests to separate crate

Co-authored-by: Tyera Eulberg <tyera@solana.com>
2021-10-29 01:27:07 +00:00
Jeff Biseda 4cac66244d
report udp stats from validator (#20587) 2021-10-15 15:11:11 -07:00
Tao Zhu 005d6863fd
- move cost tracker into bank, so each bank has its own cost tracker; (#20527)
- move related modules to runtime
2021-10-12 08:51:33 -05:00
Tao Zhu 03913f6661
add tx count and thread id to stats, each stat reports and resets when slot changes (#20451) 2021-10-06 00:09:19 -05:00
Michael Vines e9722474eb Move tower storage into its own module 2021-08-11 00:20:46 -07:00
carllin 1ee64afb12
Introduce AncestorHashesService (#18812) 2021-07-23 16:54:47 -07:00
carllin 588c0464b8
Add sampling logic and DuplicateSlotRepairStatus module (#18721) 2021-07-21 11:15:08 -07:00
sakridge 0f8bcf65af
Add voting service (#18552) 2021-07-15 16:35:51 +02:00
carllin 4d3e301ee4
Introduce slot dumping to ReplayStage (#18160) 2021-07-08 19:07:32 -07:00
behzad nouri 04787be8b1
encapsulates turbine peers computations of broadcast & retransmit stages (#18238)
Broadcast stage and retransmit stage should arrange nodes on turbine
broadcast tree in exactly same order. Additionally any changes to this
ordering (e.g. updating how unstaked nodes are handled) requires feature
gating to keep the cluster in sync.

Current implementation is scattered out over several public methods and
exposes too much of implementation details (e.g. usize indices into
peers vector) which makes code changes and checking for feature
activations more difficult.

This commit encapsulates turbine peer computations into a new struct,
and only exposes two public methods, get_broadcast_peer and
get_retransmit_peers, for call-sites.
2021-07-07 00:35:25 +00:00
Tao Zhu 5e424826ba
Persist cost table to blockstore (#18123)
* Add `ProgramCosts` Column Family to blockstore, implement LedgerColumn; add `delete_cf` to Rocks
* Add ProgramCosts to compaction excluding list alone side with TransactionStatusIndex in one place: `excludes_from_compaction()`

* Write cost table to blockstore after `replay_stage` replayed active banks; add stats to measure persist time
* Deletes program from `ProgramCosts` in blockstore when they are removed from cost_table in memory
* Only try to persist to blockstore when cost_table is changed.
* Restore cost table during validator startup

* Offload `cost_model` related operations from replay main thread to dedicated service thread, add channel to send execute_timings between these threads;
* Move `cost_update_service` to its own module; replay_stage is now decoupled from cost_model.
2021-07-01 11:32:41 -05:00
Tao Zhu ae27fcbcda
replay stage feed back program cost (#17731)
* replay stage feeds back realtime per-program execution cost to cost model;

* program cost execution table is initialized into empty table, no longer populated with hardcoded numbers;

* changed cost unit to microsecond, using value collected from mainnet;

* add ExecuteCostTable with fixed capacity for security concern, when its limit is reached, programs with old age AND less occurrence will be pushed out to make room for new programs.
2021-06-09 17:10:59 -05:00
Tyera Eulberg 544b3c0d17
Create solana-poh and move remaining rpc modules to solana-rpc (#17698)
* Create solana-poh crate

* Move BigTableUploadService to solana-ledger

* Add solana-rpc to workspace

* Move dependencies to solana-rpc

* Move remaining rpc modules to solana-rpc

* Single use statement solana-poh

* Single use statement solana-rpc
2021-06-04 09:23:06 -06:00
Tao Zhu b000d490ce
Cost Model to limit transactions which are not parallelizeable (#16694)
* * Add following to banking_stage:
  1. CostModel as immutable ref shared between threads, to provide estimated cost for transactions.
  2. CostTracker which is shared between threads, tracks transaction costs for each block.

* replace hard coded program ID with id() calls

* Add Account Access Cost as part of TransactionCost. Account Access cost are weighted differently between read and write, signed and non-signed.

* Establish instruction_execution_cost_table, add function to update or insert instruction cost, unit tested. It is read-only for now; it allows Replay to insert realtime instruction execution costs to the table.

* add test for cost_tracker atomically try_add operation, serves as safety guard for future changes

* check cost against local copy of cost_tracker, return transactions that would exceed limit as unprocessed transaction to be buffered; only apply bank processed transactions cost to tracker;

* bencher to new banking_stage with max cost limit to allow cost model being hit consistently during bench iterations
2021-06-01 09:16:17 -05:00
Tyera Eulberg ab581dafc2
Add block height to ConfirmedBlock structs (#17523)
* Add BlockHeight CF to blockstore

* Rename CacheBlockTimeService to be more general

* Cache block-height using service

* Fixup previous proto mishandling

* Add block_height to block structs

* Add block-height to solana block

* Fallback to BankForks if block time or block height are not yet written to Blockstore

* Add docs

* Review comments
2021-05-26 22:16:16 -06:00
Tyera Eulberg 9a5330b7eb
Move gossip modules into solana-gossip crate (#17352)
* Move gossip modules to solana-gossip

* Update Protocol abi digest due to move

* Move gossip benches and hook up CI

* Remove unneeded Result entries

* Single use statements
2021-05-26 09:15:46 -06:00
Tao Zhu 0781fe1b4f
Upgrade Rust to 1.52.0 (#17096)
* Upgrade Rust to 1.52.0
update nightly_version to newly pushed docker image
fix clippy lint errors
1.52 comes with grcov 0.8.0, include this version to script

* upgrade to Rust 1.52.1

* disabling Serum from downstream projects until it is upgraded to Rust 1.52.1
2021-05-19 09:31:47 -05:00
Tyera Eulberg 827355a6b1
Create solana-rpc crate and move subscriptions (#17320)
* Move non_circulating_supply to runtime

* Add solana-rpc crate and move max_slots

* Move subscriptions to solana-rpc

* Single use statements
2021-05-19 00:54:28 -06:00
behzad nouri b17d5eeaee
moves cluster-info metrics to a separate module (#16883) 2021-04-28 02:04:49 +00:00
carllin 4c94f8933f
Ingest votes from gossip into fork choice (#16560) 2021-04-21 14:40:35 -07:00
sakridge 8e69dd42c1
Add non-default repair nonce values (#16512)
* Track outstanding nonces in repair

* Rework outstanding requests to use lru cache and randomize nonces

Co-authored-by: Carl <carl@solana.com>
2021-04-20 09:37:33 -07:00
carllin 52703badfa
Setup ReplayStage confirmation scaffolding for duplicate slots (#9698) 2021-03-24 23:41:52 -07:00
Justin Starry 918d04e3f0
Add more slot update notifications (#15734)
* Add more slot update notifications

* fix merge

* Address feedback and add integration test

* switch to datapoint

* remove unused shred method

* fix clippy

* new thread for rpc completed slots

* remove extra constant

* fixes

* rely on channel closing

* fix check
2021-03-12 21:44:06 +08:00
sakridge 1b59b163dd
Add max retransmit and shred insert slot (#15475) 2021-02-23 13:06:33 -08:00
Trent Nelson 7f7370c306 Re-allow clippy::integer_arithmetic at crate-level 2021-02-17 13:55:08 -07:00
behzad nouri b6f231b60e
removes locked pubkey references (#15152) 2021-02-08 02:07:00 +00:00
behzad nouri 6a3797e164
adds crds-value for broadcasting duplicate shreds through gossip (#14133)
In gossip, the header overhead we get from:
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/cluster_info.rs#L434-L435
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L31-L36
https://github.com/solana-labs/solana/blob/de9ac43eb/core/src/crds_value.rs#L73
already exceeds SIZE_OF_NONCE in shreds. We also need aditional
meta-data (wallclock, source pubkey, ...). Which means that given the
SHRED_PAYLOAD_SIZE, we cannot fit all these in PACKET_DATA_SIZE:
https://github.com/solana-labs/solana/blob/de9ac43eb/ledger/src/shred.rs#L80

On top of that, we need 2 shred payloads as the proof of duplicate. So
each DuplicateShred crds value includes only a chunk of the payload,
along with the meta-data to reconstruct the full payload from the chunks
on the receiving end.
2020-12-18 14:32:43 +00:00
sakridge d4a174fb7c
Partial shred deserialize cleanup and shred type differentiation (#14094)
* Partial shred deserialize cleanup and shred type differentiation in retransmit

* consolidate packet hashing logic
2020-12-15 16:50:40 -08:00