Commit Graph

49 Commits

Author SHA1 Message Date
Stephen Akridge 0c90e1eff6 Make entry_sender optional on window_service
window_service in replicator has no need to consume the the produced entries.
2019-01-09 15:15:47 -08:00
Stephen Akridge 491bca5e4b Remove ledger.rs
Split into entry.rs for entry-constructing functions and EntrySlice
trait and db_ledger.rs for ledger helper test functions.
2019-01-09 15:15:47 -08:00
Rob Walker bafd90807d
encapsulate meta_cf (#2335) 2019-01-08 11:41:55 -08:00
Pankaj Garg 91bd38504e
Use vote signer service in fullnode (#2009)
* Use vote signer service in fullnode

* Use native types for signature and pubkey, and address other review comments

* Start local vote signer if a remote service address is not provided

* Rebased to master

* Fixes after rebase
2019-01-05 12:57:52 -08:00
Pankaj Garg 00d310f86d
Remove some metrics datapoint, as it was causing excessive logging (#2287)
- 100 nodes test was bringing down the influx DB server
2019-01-03 09:25:11 -08:00
Michael Vines 034c5d0422 db_ledger now fully encapsulates rocksdb 2018-12-20 12:32:25 -08:00
Michael Vines 37d7ad819b Purge DB::destroy() usage 2018-12-20 10:38:03 -08:00
Rob Walker a65022aed7
DbLedger doesn't need to be mut, doesn't need an RwLock (#2215)
* DbLedger doesn't need to be mut, doesn't need an RwLock

* fix erasure cases
2018-12-18 15:18:57 -08:00
carllin 9720ac0019
Fix try_erasure() (#2185)
* Fix try_erasure bug

* Re-enable asserts in test_replicator_startup

* Add test for out of order process_blobs
2018-12-17 15:34:19 -08:00
Michael Vines 6ac466c0a4 Move src/logger.rs into logger/ crate to unify logging across the workspace 2018-12-14 13:10:43 -08:00
Rob Walker 4f48f1a850
add db_ledger genesis, rework to_blob(), to_blobs() (#2135) 2018-12-12 20:42:12 -08:00
carllin ae903f190e
Broadcast for slots (#2081)
* Insert blobs into db_ledger in broadcast stage to support leader to validator transitions

* Add transmitting real slots to broadcast stage

* Handle real slots instead of default slots in window

* Switch to dummy repair on slots and modify erasure to support leader rotation

* Shorten length of holding locks

* Remove logger from replicator test
2018-12-12 15:58:29 -08:00
Pankaj Garg 9243bc58db
Metrics for window repair (#2106)
* Metrics for window repair

- Also increase max repair length

* fix vote counters, and add repair window graph

* update per node graphs

* revert max repair length change
2018-12-11 15:43:41 -08:00
Greg Fitzgerald 5e703dc70a Free up the term 'replicate' for exclusive use in replicator
Also, align Sockets field names with ContactInfo.
2018-12-10 15:26:43 -07:00
Greg Fitzgerald a8d6c75a24 cargo +nightly fix --features=bpf_c,cuda,erasure,chacha --edition-idioms 2018-12-08 23:19:55 -07:00
Greg Fitzgerald ec5a8141eb cargo fix --edition 2018-12-08 23:19:55 -07:00
Greg Fitzgerald 0a83b17cdd
Upgrade to Rust 1.31.0 (#2052)
* Upgrade to Rust 1.31.0
* Upgrade nightly
* Fix all clippy warnings
* Revert relaxed version check and update
2018-12-07 20:01:28 -07:00
Greg Fitzgerald 97b1156a7a Rename Ncp to GossipService
And BroadcastStage to BroadcastService since it's not included in the
TPU pipeline.
2018-12-06 15:48:19 -07:00
carllin 57a384d6a0
Rocks db window service (#1888)
* Add db_window module for windowing functions from RocksDb

* Replace window with db_window functions in window_service

* Fix tests

* Make note of change in db_window

* Create RocksDb ledger in bin/fullnode

* Make db_ledger functions generic

* Add db_ledger to bin/replicator
2018-11-24 19:32:33 -08:00
Rob Walker 3d113611cc
remove Result<> return from ClusterInfo::new() (#1869)
strip Result<> for ClusterInfo::new()
2018-11-19 11:25:14 -08:00
Michael Vines d96a6b42a5 Move drone into its own crate 2018-11-16 20:42:21 -08:00
Michael Vines 6ac5700f2e Move metrics into its own crate 2018-11-16 15:10:07 -08:00
anatoly yakovenko a41254e18c
Add scalable gossip library (#1546)
* Cluster Replicated Data Store

Separate the data storage and merge strategy from the network IO boundary.
Implement an eager push overlay for transporting recent messages.

Simulation shows fast convergence with 20k nodes.
2018-11-15 13:23:26 -08:00
Rob Walker 6c10458b5b
leader slots in Blobs (#1732)
* add leader slot to Blobs
* remove get_X() methods in favor of X() methods for Blob
* add slot to get_scheduled_leader()
2018-11-07 13:18:14 -08:00
Rob Walker 13bfdde228
remove ledger tail code, WINDOW_SIZE begone (#1617)
* remove WINDOW_SIZE, use window.window_size()
* move ledger tail, redundant with ledger-based repair
2018-10-30 10:05:18 -07:00
Michael Vines e47fcb196b s/solana_program_interface/solana[_-]sdk/g 2018-10-25 12:31:45 -07:00
Pankaj Garg df9ccce5b2
Remove hostname() from calls to metrics as it's expensive operation (#1557) 2018-10-20 06:38:20 -07:00
carllin 0bd1412562
Switch leader scheduler to use PoH ticks instead of Entry height (#1519)
* Add PoH height to process_ledger()

* Moved broadcast_stage Leader Scheduling logic to use Poh height instead of entry_height

* Moved LeaderScheduler logic to PoH in ReplicateStage

* Fix Leader scheduling tests to use PoH instead of entry height

* Change is_leader detection in repair() to use PoH instead of entry height

* Add tests to LeaderScheduler for new functionality

* fix Entry::new and genesis block PoH counts

* Moved LeaderScheduler to PoH ticks

* Cleanup to resolve PR comments
2018-10-18 22:57:48 -07:00
Pankaj Garg 31e779d3f2
Added counters to track more metrics on dashboard (#1535)
- Total number of IP packets TX/RX from all nodes in the testnet
- Last consumed index on validator
- Last transmitted index on leader
2018-10-17 17:32:50 -07:00
Pankaj Garg f6c10d8a2e
Add channel pressure for validator TVU stages (#1509) 2018-10-16 12:54:23 -07:00
carllin 9931ac9780
Leader scheduler plumbing (#1440)
* Added LeaderScheduler module and tests

* plumbing for LeaderScheduler in Fullnode + tests. Add vote processing for active set to ReplicateStage and WriteStage

* Add LeaderScheduler plumbing for Tvu, window, and tests

* Fix bank and switch tests to use new LeaderScheduler

* move leader rotation check from window service to replicate stage

* Add replicate_stage leader rotation exit test

* removed leader scheduler from the window service and associated modules/tests

* Corrected is_leader calculation in repair() function in window.rs

* Integrate LeaderScheduler with write_stage for leader to validator transitions

* Integrated LeaderScheduler with BroadcastStage

* Removed gossip leader rotation from crdt

* Add multi validator, leader test

* Comments and cleanup

* Remove unneeded checks from broadcast stage

* Fix case where a validator/leader need to immediately transition on startup after reading ledger and seeing they are not in the correct role

* Set new leader in validator -> validator transitions

* Clean up for PR comments, refactor LeaderScheduler from process_entry/process_ledger_tail

* Cleaned out LeaderScheduler options, implemented LeaderScheduler strategy that only picks the bootstrap leader to support existing tests, drone/airdrops

* Ignore test_full_leader_validator_network test due to bug where the next leader in line fails to get the last entry before rotation (b/c it hasn't started up yet). Added a test test_dropped_handoff_recovery go track this bug
2018-10-10 16:49:41 -07:00
Greg Fitzgerald 95701114e3 Crdt -> ClusterInfo 2018-10-09 03:49:39 -06:00
Greg Fitzgerald 423e7ebc3f Pacify clippy 2018-09-27 16:21:12 -06:00
Pankaj Garg e10574c64d Remove recycler and it's usage
- The memory usage due to recycler was high, and incrementing with
  time.
2018-09-27 10:42:37 -06:00
jackcmay 9c47e022dc
break dependency of programs on solana core (#1371)
* break dependency of programs on Solana core
2018-09-27 07:49:26 -07:00
Greg Fitzgerald b7ae5b712a Move Pubkey into its own module 2018-09-26 20:40:40 -06:00
Stephen Akridge a5b28349ed Add max entry height to download for replicator 2018-09-26 09:57:22 -07:00
carllin e7383a7e66
Validator to leader (#1303)
* Add check in window_service to exit in checks for leader rotation, and propagate that service exit up to fullnode

* Added logic to shutdown Tvu once ReplicateStage finishes

* Added test for successfully shutting down validator and starting up leader

* Add test for leader validator interaction

* fix streamer to check for exit signal before checking socket again to prevent busy leaders from never returning

* PR comments - Rewrite make_consecutive_blobs() function, revert genesis function change
2018-09-25 15:41:29 -07:00
Rob Walker 304d63623f give replication some time to happen
fixes #1307
2018-09-24 23:57:09 -07:00
Rob Walker a51c2f193e fix Rob and Carl crossing wires 2018-09-21 21:37:25 -07:00
Rob Walker be31da3dce
lastidnotfound step 2: (#1300)
lastidnotfound step 2:
  * move "record stage", aka poh_service into banking stage
  * remove Entry.has_more, is incompatible with leader rotation
  * rewrite entry_next_hash in terms of Poh
  * simplify and unify transaction hashing (no embedded nulls)
  * register_last_entry from banking stage, fixes #1171 (w00t!)
  * new PoH doesn't generate empty ledger entries, so some fixes necessary in 
         multinode tests that rely on that (e.g. giving validators airdrops)
  * make window repair less patient, if we've been waiting for an answer, 
          don't be shy about most recent blobs
   * delete recorder and record stage
   * make more verbost  thin_client error reporting
   * more tracing in window (sigh)
2018-09-21 21:01:13 -07:00
carllin c50ac96f75
Moved deserialization of blobs to entries from replicate_stage to window_service (#1287) 2018-09-21 16:01:24 -07:00
Greg Fitzgerald 6073cd57fa Boot Recycler::recycle() 2018-09-20 17:08:51 -06:00
Anatoly Yakovenko 431692d9d0 Use a Drop trait to keep track of lifetimes for recycled objects.
* Move recycler instances to the point of allocation
* sinks no longer need to call `recycle`
* Remove the recycler arguments from all the apis that no longer need them
2018-09-19 16:59:42 -06:00
Pankaj Garg e142aafca9
Use multiple sockets for receiving blobs on validators (#1228)
* Use multiple sockets for receiving blobs on validators

- The blobs that are broadcasted by leader or retransmitted by peer
  validators are received on replicate_port
- Using reuse_addr/reuse_port, multiple sockets can be opened for
  the same port
- This allows the kernel to queue data to user space app on multiple
  socket queues, preventing over-running one queue
- This helps with reducing packets dropped due to queue over-runs

Fixes #1224

* Fixed failing tests
2018-09-14 16:56:06 -07:00
Michael Vines 4196cf43e8 cargo fmt 2018-09-14 16:37:49 -07:00
Greg Fitzgerald a093d5c809 Fix erasure build 2018-09-10 11:40:26 -06:00
Greg Fitzgerald fc64e1853c Initialize Window, not SharedWindow
Wrap with Arc<RwLock>> when/if needed, no earlier.
2018-09-10 11:40:26 -06:00
Greg Fitzgerald 7f669094de Split window into two modules 2018-09-10 11:40:26 -06:00