Commit Graph

14 Commits

Author SHA1 Message Date
Wen a5229f989c
wen_restart: Ignore Gossip messages from my own pubkey. (#1678) 2024-06-15 10:50:32 -07:00
Wen 3f5e8352d3
wen_restart: Accept all vote types in the last vote. (#1602) 2024-06-04 19:14:21 -07:00
Wen 0c42df47fe
Generate snapshot after reaching agreement in wen_restart. (#1109)
* Generate snapshot after reaching agreement in wen_restart.

* Fix a bad merge and carry new_root_slot through HeaviestFork.

* Replace real snapshot service with fake one to avoid circular dependency.

* Remove circular dependency.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Add extra newline.

* Fix constant name.

* Do not use &Arc<...>.

* Check return values in tests.

* Check more return values.

* Remove unnecessary rehash and comment on why new_root_bank is always present.

* Split trigger_eah_calculation_if_needed into separate function.

* Find base slot for incremental snapshot correctly, generate full snapshot
if base is not available.

* Switch to new send_eah_request_if_needed() interface.

* Always write full snapshot into our own directory.

* No need to specify snapshot under the snapshot dir.

* The normal_flow test doesn't need fake snapshot service if it doesn't
need to trigger EAH request.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Add a test for generate_snapshot.

* Small fixes.

* Return error of the slot we picked is lower than any of the snapshot slots.
Always write incremental snapshot, which is faster.

* Add more tests for generate_snapshot.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update comments about set_root().

* Write directly into incremental snapshot dir and purge bank snapshots.

* Remove the loop, we should have the snapshot when the method exits.

* Change comments.

* Small fixes.

* Fix a bad merge.

* Remove unnecessary loop and error.

---------

Co-authored-by: Brooks <brooks@prumo.org>
2024-05-10 14:39:40 -07:00
Wen 01c4b03ab6
Send and Aggregate RestartHeaviestFork. (#699)
* Send and Aggregate RestartHeaviestFork.

* total_active_stake in my_heaviest_fork should always be the sum of the
stake of all the validators which sent me HeaviestFork.

* A few name changes and other small fixes.

* Move active_peers update to after stakes_map is updated.

* Only send out RestartHeaviestFork and write snapshots every 30 minutes.

* Proceed if 5% of the nodes disagree and log the disagreement if the
(slot, hash) chosen by us is not the majority choice.

* Make linter happy.

* Make linter happy.

* Add successful case.

* Add a few constants and methods.

* Account for 5% non_conforming when calculating exit threshold.

* Adding a few more logs.

* Fix tests to use 75% when aggregating HeaviestFork and a few bugs.

* Reuse adjusted_threhold_percent.
2024-04-25 23:10:04 -07:00
behzad nouri 443bb6c1dc
migrates to the new contact-info (#823)
The commit replaces (most) uses of LegacyContactInfo with the new ContactInfo.
2024-04-24 18:47:04 +00:00
Ashwin Sekar 499d36e354
vote: update benches and tests to TowerSync (#725) 2024-04-11 22:15:02 -07:00
Wen 312f725f1e
wen_restart: Find the bank hash of the heaviest fork, replay if necessary. (#420)
* Find the bank hash of the heaviest fork, replay if necessary.

* Make it more explicit how heaviest fork slot is selected.

* Use process_single_slot instead of process_blockstore_from_root, the latter
may re-insert banks already frozen.

* Put BlockstoreProcessError into the error message.

* Check that all existing blocks link to correct parent before replay.

* Use the default number of threads instead.

* Check whether block is full and other small fixes.

* Fix root_bank and move comments to function level.

* Remove the extra parent link check.
2024-04-07 16:17:52 -07:00
Wen 11aa06d24f
wen-restart: Find heaviest fork (#183)
* Pass the final result of LastVotedForkSlots aggregation to next
stage and find the heaviest fork we will Gossip to others.

* Change comments.

* Small fixes to address PR comments.

* Move correctness proof to SIMD.

* Fix a broken merge.

* Use blockstore to check parent slot of any block in FindHeaviestFork

* Change error message.

* Add special message when first slot in the list doesn't link to root.
2024-03-20 19:38:46 -07:00
Wen 5591db7801
Wen_restart: check block full using blockstore (#250)
* Switch to blockstore.is_full() check because replay thread isn't active.

* Use make_chaining_slot_entries and add first_parent to the method.
Small style fixes.

* Switch to blockstore.is_full() check because replay thread isn't active.
2024-03-14 20:45:03 -07:00
Wen f5a3f2476a
wen_restart: replace get_aggregate_result() with more methods (#254)
* Replace AggregateResult with more methods.

* Rename slots_to_repair() to slots_to_repair_iter().
2024-03-14 19:01:17 -07:00
Wen bfe44d95f4
Wen restart aggregate last voted fork slots (#33892)
* Push and aggregate RestartLastVotedForkSlots.

* Fix API and lint errors.

* Reduce clutter.

* Put my own LastVotedForkSlots into the aggregate.

* Write LastVotedForkSlots aggregate progress into local file.

* Fix typo and name constants.

* Fix flaky test.

* Clarify the comments.

* - Use constant for wait_for_supermajority
- Avoid waiting after first shred when repair is in wen_restart

* Fix delay_after_first_shred and remove loop in wen_restart.

* Read wen_restart slots inside the loop instead.

* Discard turbine shreds while in wen_restart in windows insert rather than
shred_fetch_stage.

* Use the new Gossip API.

* Rename slots_to_repair_for_wen_restart and a few others.

* Rename a few more and list all states.

* Pipe exit down to aggregate loop so we can exit early.

* Fix import of RestartLastVotedForkSlots.

* Use the new method to generate test bank.

* Make linter happy.

* Use new bank constructor for tests.

* Fix a bad merge.

* - add new const for wen_restart
- fix the test to cover more cases
- add generate_repairs_for_slot_not_throtted_by_tick and
  generate_repairs_for_slot_throtted_by_tick to make it readable

* Add initialize and put the main logic into a loop.

* Change aggregate interface and other fixes.

* Add failure tests and tests for state transition.

* Add more tests and add ability to recover from written records in
last_voted_fork_slots_aggregate.

* Various name changes.

* We don't really care what type of error is returned.

* Wait on expected progress message in proto file instead of sleep.

* Code reorganization and cleanup.

* Make linter happy.

* Add WenRestartError.

* Split WenRestartErrors into separate erros per state.

* Revert "Split WenRestartErrors into separate erros per state."

This reverts commit 4c920cb8f8d492707560441912351cca779129f6.

* Use individual functions when testing for failures.

* Move initialization errors into initialize().

* Use anyhow instead of thiserror to generate backtrace for error.

* Add missing Cargo.lock.

* Add error log when last_vote is missing in the tower storage.

* Change error log info.

* Change test to match exact error.
2024-03-01 18:52:47 -08:00
Wen 295d610f43
We need to publish solana-wen-restart so we can publish 1.18.0 later. (#33662) 2023-10-11 11:27:09 -07:00
Wen 2d5496a564
Fix wen_restart proto compilation: (#33596)
* Fix wen_restart proto compilation:
- should recompile when proto changes
- no need for customization

* There is only one proto file, no need for loop.
2023-10-09 10:51:44 -07:00
Wen 630feeddf2
Add wen_restart module (#33344)
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.

* Update lock file.

* Fix linter errors.

* Fix depencies order.

* Update wen_restart explanation and small fixes.

* Generate tower outside tvu.

* Update validator/src/cli.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/protos/wen_restart.proto

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/build.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Rename proto directory.

* Rename InitRecord to MyLastVotedForkSlots, add imports.

* Update wen-restart/Cargo.toml

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Move prost-build dependency to project toml.

* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.

* Use 16k slots instead of 81k slots, a few more wording changes.

* Use AncestorIterator which does the same thing.

* Update Cargo.lock

* Update Cargo.lock

---------

Co-authored-by: Tyera <teulberg@gmail.com>
2023-10-06 15:04:37 -07:00