Closes#697.
per https://github.com/ZcashFoundation/zebra/issues/697#issuecomment-662742971
The response to a getblocks message is an inv message with the hashes of the
following blocks. However, inv messages are also sent unsolicited to gossip new
blocks across the network. Normally, this wouldn't be a problem, because for
every other request we filter only for the messages that are relevant to us.
But because the response to a getblocks message is an inv, the network layer
doesn't (and can't) distinguish between the response inv and the unsolicited
inv.
But there is a mitigation we can do. In our sync algorithm we have two phases:
(1) "ObtainTips" to get a set of tips to chase down, (2) repeatedly call
"ExtendTips" to extend those as far as possible. The unsolicited inv messages
have length 1, but when extending tips we expect to get more than one hash. So
we could reject responses in ExtendTips that have length 1 in order to ignore
these messages. This way we automatically ignore gossip messages during initial
block sync (while we're extending a tip) but we don't ignore length-1 responses
while trying to obtain tips (while querying the network for new tips).
Unfortunately, since the Batch wrapper was changed to have a generic error
type, when wrapping it in another Service, nothing constrains the error type,
so we have to specify it explicitly to avoid an inference hole. This is pretty
unergonomic -- from the compiler error message it's very unintuitive that the
right fix is to change `Batch::new` to `Batch::<_, _, SomeError>::new`.
The options are:
1. roll back the changes that make the error type generic, so that the error
type is a concrete type;
2. keep the error type generic but hardcode the error in the default
constructor and add an additional code path that allows overriding the
error.
However, there's a further issue with generic errors: the error type must be
Clone. This problem comes from the fact that there can be multiple Batch
handles that have to share access to errors generated by the inner Batch
worker, so there's not a way to work around this. However, almost all error
types aren't Clone, so there are fairly few error types that we would be
swapping in.
This suggests that in case (2) we would be maintaining extra code to allow
generic errors, but with restrictive enough generic bounds to make it
impractical to use generic error types. For this reason I think that (1) is a
better option.
When the connection sees the client_rx channel close it knows it will never get
any more requests, and it should terminate. But instead of terminating, it
errored itself, and the method to error itself tries to pull all the
outstanding client requests from the channel in order to fail them before it
shuts down. This results in reading from a closed channel, causing a panic.
Instead we return cleanly rather than failing (since we know there are no
outstanding requests, as the channel is closed).
Closes#617.
Closes#698.
The remaining work on the syncer is alluded to in a new comment:
1. Correctly constructing a block locator object
2. Detecting when we've stopped making progress syncing and restarting obtain_tips.
This fixes a bug introduced when we added heartbeat support. Recall that we
handle the Bitcoin connection state machine on a per-peer basis. Each
connection has a task created from the `Connection` struct, and a `Client:
tower::Service` "frontend" that passes requests to it via a channel. In the
`Connection` event loop, the connection checks whether the request channel has
been closed, indicating no further requests from the `Client`, in which case it
shuts itself down and cleans up resources. This occurs when all of the senders
have been dropped.
However, this behavior broke when we introduced heartbeat support, because we
spawned an additional task to send heartbeat messages along the request
channel. This meant that instead of having a single sender, dropped by the
`Client`, we have two senders, the `Client` and the "shadow client" task that
generates heartbeat messages. This means that when the `Client` is dropped, we
still have a live sender and the connection is not closed. To fix this, the
`Client` now uses a `oneshot` to shut down its corresponding heartbeat task.
This closes all senders.
* Add checkpoint list generation scripts
* Limit the checkpoint block data size
* Limit the checkpoint height gap
* Add Mainnet and Testnet checkpoint lists
* Parse hard-coded checkpoint lists
The lists were generated using the following limits:
- 256 MB spacing, based on block byte size, and
- 2000 blocks.
* Add skeleton of eventual zebra book
* reorg sections
* restore file and reorg book a little
* try setting up a firebase deployment
* allow firebase ci to work on test
* download mdbook
* fix book path
* use newer version of mdbook
* remove event hook for book branch pre merge
* Apply suggestions from code review
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>
Co-authored-by: Henry de Valence <hdevalence@hdevalence.ca>