Commit Graph

5 Commits

Author SHA1 Message Date
Wen 0c42df47fe
Generate snapshot after reaching agreement in wen_restart. (#1109)
* Generate snapshot after reaching agreement in wen_restart.

* Fix a bad merge and carry new_root_slot through HeaviestFork.

* Replace real snapshot service with fake one to avoid circular dependency.

* Remove circular dependency.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Add extra newline.

* Fix constant name.

* Do not use &Arc<...>.

* Check return values in tests.

* Check more return values.

* Remove unnecessary rehash and comment on why new_root_bank is always present.

* Split trigger_eah_calculation_if_needed into separate function.

* Find base slot for incremental snapshot correctly, generate full snapshot
if base is not available.

* Switch to new send_eah_request_if_needed() interface.

* Always write full snapshot into our own directory.

* No need to specify snapshot under the snapshot dir.

* The normal_flow test doesn't need fake snapshot service if it doesn't
need to trigger EAH request.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Add a test for generate_snapshot.

* Small fixes.

* Return error of the slot we picked is lower than any of the snapshot slots.
Always write incremental snapshot, which is faster.

* Add more tests for generate_snapshot.

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Brooks <brooks@prumo.org>

* Update comments about set_root().

* Write directly into incremental snapshot dir and purge bank snapshots.

* Remove the loop, we should have the snapshot when the method exits.

* Change comments.

* Small fixes.

* Fix a bad merge.

* Remove unnecessary loop and error.

---------

Co-authored-by: Brooks <brooks@prumo.org>
2024-05-10 14:39:40 -07:00
Wen 01c4b03ab6
Send and Aggregate RestartHeaviestFork. (#699)
* Send and Aggregate RestartHeaviestFork.

* total_active_stake in my_heaviest_fork should always be the sum of the
stake of all the validators which sent me HeaviestFork.

* A few name changes and other small fixes.

* Move active_peers update to after stakes_map is updated.

* Only send out RestartHeaviestFork and write snapshots every 30 minutes.

* Proceed if 5% of the nodes disagree and log the disagreement if the
(slot, hash) chosen by us is not the majority choice.

* Make linter happy.

* Make linter happy.

* Add successful case.

* Add a few constants and methods.

* Account for 5% non_conforming when calculating exit threshold.

* Adding a few more logs.

* Fix tests to use 75% when aggregating HeaviestFork and a few bugs.

* Reuse adjusted_threhold_percent.
2024-04-25 23:10:04 -07:00
Wen 11aa06d24f
wen-restart: Find heaviest fork (#183)
* Pass the final result of LastVotedForkSlots aggregation to next
stage and find the heaviest fork we will Gossip to others.

* Change comments.

* Small fixes to address PR comments.

* Move correctness proof to SIMD.

* Fix a broken merge.

* Use blockstore to check parent slot of any block in FindHeaviestFork

* Change error message.

* Add special message when first slot in the list doesn't link to root.
2024-03-20 19:38:46 -07:00
Wen bfe44d95f4
Wen restart aggregate last voted fork slots (#33892)
* Push and aggregate RestartLastVotedForkSlots.

* Fix API and lint errors.

* Reduce clutter.

* Put my own LastVotedForkSlots into the aggregate.

* Write LastVotedForkSlots aggregate progress into local file.

* Fix typo and name constants.

* Fix flaky test.

* Clarify the comments.

* - Use constant for wait_for_supermajority
- Avoid waiting after first shred when repair is in wen_restart

* Fix delay_after_first_shred and remove loop in wen_restart.

* Read wen_restart slots inside the loop instead.

* Discard turbine shreds while in wen_restart in windows insert rather than
shred_fetch_stage.

* Use the new Gossip API.

* Rename slots_to_repair_for_wen_restart and a few others.

* Rename a few more and list all states.

* Pipe exit down to aggregate loop so we can exit early.

* Fix import of RestartLastVotedForkSlots.

* Use the new method to generate test bank.

* Make linter happy.

* Use new bank constructor for tests.

* Fix a bad merge.

* - add new const for wen_restart
- fix the test to cover more cases
- add generate_repairs_for_slot_not_throtted_by_tick and
  generate_repairs_for_slot_throtted_by_tick to make it readable

* Add initialize and put the main logic into a loop.

* Change aggregate interface and other fixes.

* Add failure tests and tests for state transition.

* Add more tests and add ability to recover from written records in
last_voted_fork_slots_aggregate.

* Various name changes.

* We don't really care what type of error is returned.

* Wait on expected progress message in proto file instead of sleep.

* Code reorganization and cleanup.

* Make linter happy.

* Add WenRestartError.

* Split WenRestartErrors into separate erros per state.

* Revert "Split WenRestartErrors into separate erros per state."

This reverts commit 4c920cb8f8d492707560441912351cca779129f6.

* Use individual functions when testing for failures.

* Move initialization errors into initialize().

* Use anyhow instead of thiserror to generate backtrace for error.

* Add missing Cargo.lock.

* Add error log when last_vote is missing in the tower storage.

* Change error log info.

* Change test to match exact error.
2024-03-01 18:52:47 -08:00
Wen 630feeddf2
Add wen_restart module (#33344)
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.

* Update lock file.

* Fix linter errors.

* Fix depencies order.

* Update wen_restart explanation and small fixes.

* Generate tower outside tvu.

* Update validator/src/cli.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/protos/wen_restart.proto

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/build.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Rename proto directory.

* Rename InitRecord to MyLastVotedForkSlots, add imports.

* Update wen-restart/Cargo.toml

Co-authored-by: Tyera <teulberg@gmail.com>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <teulberg@gmail.com>

* Move prost-build dependency to project toml.

* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.

* Use 16k slots instead of 81k slots, a few more wording changes.

* Use AncestorIterator which does the same thing.

* Update Cargo.lock

* Update Cargo.lock

---------

Co-authored-by: Tyera <teulberg@gmail.com>
2023-10-06 15:04:37 -07:00