solana/docs/src/implemented-proposals/repair-service.md

---
title: Repair Service
---

## Repair Service

The RepairService is in charge of retrieving missing shreds that failed to be
delivered by primary communication protocols like Turbine. It is in charge of
managing the protocols described below in the `Repair Protocols` section below.

## Challenges:

1\) Validators can fail to receive particular shreds due to network failures

2\) Consider a scenario where blockstore contains the set of slots {1, 3, 5}.
Then Blockstore receives shreds for some slot 7, where for each of the shreds
b, b.parent == 6, so then the parent-child relation 6 -&gt; 7 is stored in
blockstore. However, there is no way to chain these slots to any of the
existing banks in Blockstore, and thus the `Shred Repair` protocol will not
repair these slots. If these slots happen to be part of the main chain, this
will halt replay progress on this node.

## Repair-related primitives

Epoch Slots:
Each validator advertises separately on gossip the various parts of an
`Epoch Slots`:

- The `stash`: An epoch-long compressed set of all completed slots.
- The `cache`: The Run-length Encoding (RLE) of the latest `N` completed
  slots starting from some some slot `M`, where `N` is the number of slots
  that will fit in an MTU-sized packet.

`Epoch Slots` in gossip are updated every time a validator receives a
complete slot within the epoch. Completed slots are detected by blockstore
and sent over a channel to RepairService. It is important to note that we
know that by the time a slot `X` is complete, the epoch schedule must exist
for the epoch that contains slot `X` because WindowService will reject
shreds for unconfirmed epochs.

Every `N/2` completed slots, the oldest `N/2` slots are moved from the
`cache` into the `stash`. The base value `M` for the RLE should also
be updated.

## Repair Request Protocols

The repair protocol makes best attempts to progress the forking structure of
Blockstore.

The different protocol strategies to address the above challenges:

1. Shred Repair \(Addresses Challenge \#1\): This is the most basic repair
   protocol, with the purpose of detecting and filling "holes" in the ledger.
   Blockstore tracks the latest root slot. RepairService will then periodically
   iterate every fork in blockstore starting from the root slot, sending repair
   requests to validators for any missing shreds. It will send at most some `N`
   repair reqeusts per iteration. Shred repair should prioritize repairing
   forks based on the leader's fork weight. Validators should only send repair
   requests to validators who have marked that slot as completed in their
   EpochSlots. Validators should prioritize repairing shreds in each slot
   that they are responsible for retransmitting through turbine. Validators can
   compute which shreds they are responsible for retransmitting because the
   seed for turbine is based on leader id, slot, and shred index.

   Note: Validators will only accept shreds within the current verifiable
   epoch \(epoch the validator has a leader schedule for\).

2. Preemptive Slot Repair \(Addresses Challenge \#2\): The goal of this
   protocol is to discover the chaining relationship of "orphan" slots that do not
   currently chain to any known fork. Shred repair should prioritize repairing
   orphan slots based on the leader's fork weight.

   - Blockstore will track the set of "orphan" slots in a separate column family.
   - RepairService will periodically make `Orphan` requests for each of
     the orphans in blockstore.

     `Orphan(orphan)` request - `orphan` is the orphan slot that the
     requestor wants to know the parents of `Orphan(orphan)` response -
     The highest shreds for each of the first `N` parents of the requested
     `orphan`

     On receiving the responses `p`, where `p` is some shred in a parent slot,
     validators will:

     - Insert an empty `SlotMeta` in blockstore for `p.slot` if it doesn't
       already exist.
     - If `p.slot` does exist, update the parent of `p` based on `parents`

     Note: that once these empty slots are added to blockstore, the
     `Shred Repair` protocol should attempt to fill those slots.

     Note: Validators will only accept responses containing shreds within the
     current verifiable epoch \(epoch the validator has a leader schedule
     for\).

Validators should try to send orphan requests to validators who have marked that
orphan as completed in their EpochSlots. If no such validators exist, then
randomly select a validator in a stake-weighted fashion.

## Repair Response Protocol

When a validator receives a request for a shred `S`, they respond with the
shred if they have it.

When a validator receives a shred through a repair response, they check
`EpochSlots` to see if <= `1/3` of the network has marked this slot as
completed. If so, they resubmit this shred through its associated turbine
path, but only if this validator has not retransmitted this shred before.
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			`---`
			`title: Repair Service`
			`---`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
			`## Repair Service`

New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`The RepairService is in charge of retrieving missing shreds that failed to be`
			`delivered by primary communication protocols like Turbine. It is in charge of`
			managing the protocols described below in the `Repair Protocols` section below.
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
			`## Challenges:`

			`1\) Validators can fail to receive particular shreds due to network failures`

New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`2\) Consider a scenario where blockstore contains the set of slots {1, 3, 5}.`
			`Then Blockstore receives shreds for some slot 7, where for each of the shreds`
			`b, b.parent == 6, so then the parent-child relation 6 -> 7 is stored in`
			`blockstore. However, there is no way to chain these slots to any of the`
			existing banks in Blockstore, and thus the `Shred Repair` protocol will not
			`repair these slots. If these slots happen to be part of the main chain, this`
			`will halt replay progress on this node.`

			`## Repair-related primitives`
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`Epoch Slots:`
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			`Each validator advertises separately on gossip the various parts of an`
			`Epoch Slots`:

			- The `stash`: An epoch-long compressed set of all completed slots.
			- The `cache`: The Run-length Encoding (RLE) of the latest `N` completed
			slots starting from some some slot `M`, where `N` is the number of slots
			`that will fit in an MTU-sized packet.`

			`Epoch Slots` in gossip are updated every time a validator receives a
			`complete slot within the epoch. Completed slots are detected by blockstore`
			`and sent over a channel to RepairService. It is important to note that we`
			know that by the time a slot `X` is complete, the epoch schedule must exist
			for the epoch that contains slot `X` because WindowService will reject
			`shreds for unconfirmed epochs.`

			Every `N/2` completed slots, the oldest `N/2` slots are moved from the
			`cache` into the `stash`. The base value `M` for the RLE should also
			`be updated.`

New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`## Repair Request Protocols`

			`The repair protocol makes best attempts to progress the forking structure of`
			`Blockstore.`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
			`The different protocol strategies to address the above challenges:`

New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`1. Shred Repair \(Addresses Challenge \#1\): This is the most basic repair`
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			`protocol, with the purpose of detecting and filling "holes" in the ledger.`
			`Blockstore tracks the latest root slot. RepairService will then periodically`
			`iterate every fork in blockstore starting from the root slot, sending repair`
			requests to validators for any missing shreds. It will send at most some `N`
			`repair reqeusts per iteration. Shred repair should prioritize repairing`
			`forks based on the leader's fork weight. Validators should only send repair`
			`requests to validators who have marked that slot as completed in their`
			`EpochSlots. Validators should prioritize repairing shreds in each slot`
			`that they are responsible for retransmitting through turbine. Validators can`
			`compute which shreds they are responsible for retransmitting because the`
			`seed for turbine is based on leader id, slot, and shred index.`
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00
			`Note: Validators will only accept shreds within the current verifiable`
			`epoch \(epoch the validator has a leader schedule for\).`

			`2. Preemptive Slot Repair \(Addresses Challenge \#2\): The goal of this`
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			`protocol is to discover the chaining relationship of "orphan" slots that do not`
			`currently chain to any known fork. Shred repair should prioritize repairing`
			`orphan slots based on the leader's fork weight.`

			`- Blockstore will track the set of "orphan" slots in a separate column family.`
			- RepairService will periodically make `Orphan` requests for each of
			`the orphans in blockstore.`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`Orphan(orphan)` request - `orphan` is the orphan slot that the
			requestor wants to know the parents of `Orphan(orphan)` response -
			The highest shreds for each of the first `N` parents of the requested
			`orphan`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			On receiving the responses `p`, where `p` is some shred in a parent slot,
			`validators will:`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			- Insert an empty `SlotMeta` in blockstore for `p.slot` if it doesn't
			`already exist.`
			- If `p.slot` does exist, update the parent of `p` based on `parents`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`Note: that once these empty slots are added to blockstore, the`
			`Shred Repair` protocol should attempt to fill those slots.
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`Note: Validators will only accept responses containing shreds within the`
			`current verifiable epoch \(epoch the validator has a leader schedule`
			`for\).`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`Validators should try to send orphan requests to validators who have marked that`
			`orphan as completed in their EpochSlots. If no such validators exist, then`
			`randomly select a validator in a stake-weighted fashion.`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`## Repair Response Protocol`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			When a validator receives a request for a shred `S`, they respond with the
Move from gitbook to docusaurus, build docs in Travis CI (#10970) * fix: ignore unknown fields in more RPC responses * Remove mdbook infrastructure * Delete gitattributes and other theme related items Move all docs to /docs folder to support Docusaurus * all docs need to be moved to /docs * can be changed in the future Add Docusaurus infrastructure * initialize docusaurus repo Remove trailing whitespace, add support for eslint Change Docusaurus configuration to support `src` * No need to rename the folder! Change a setting and we're all good to go. * Fixing rebase items * Remove unneccessary markdown file, fix type * Some fonts are hard to read. Others, not so much. Rubik, you've been sidelined. Roboto, into the limelight! * As much as we all love tutorials, I think we all can navigate around a markdown file. Say goodbye, `mdx.md`. * Setup deployment infrastructure * Move docs job from buildkite to travic * Fix travis config * Add vercel token to travis config * Only deploy docs after merge * Docker rust env * Revert "Docker rust env" This reverts commit f84bc208e807aab1c0d97c7588bbfada1fedfa7c. * Build CLI usage from docker * Pacify shellcheck * Run job on PR and new commits for publication * Update README * Fix svg image building * shellcheck Co-authored-by: Michael Vines <mvines@gmail.com> Co-authored-by: Ryan Shea <rmshea@users.noreply.github.com> Co-authored-by: publish-docs.sh <maintainers@solana.com> 2020-07-10 22:11:07 -07:00			`shred if they have it.`
GitBook: [master] 156 pages and 12 assets modified 2019-09-22 20:38:34 -07:00
New Repair Design (#8256) * New Repair Design 2020-02-19 01:02:09 -08:00			`When a validator receives a shred through a repair response, they check`
			`EpochSlots` to see if <= `1/3` of the network has marked this slot as
			`completed. If so, they resubmit this shred through its associated turbine`
			`path, but only if this validator has not retransmitted this shred before.`