This helps prevent overloading the network with too many concurrent
block requests. On a fast network, we're likely to still have enough
room to saturate our bandwidth. In the worst case, with 2MB blocks,
downloading 50 blocks concurrently is 100MB of queued downloads. If we
need to download this in 20 seconds to avoid peer connection timeouts,
the implied worst-case minimum speed is 5MB/s. In practice, this
minimum speed will likely be much lower.
This change explicitly documents cancellation contracts for our Tower services,
and tries to correct a bug in the implementation of the CheckpointVerifier,
which duplicates information from the state service but did not ensure that it
would be kept in sync.
This change has two benefits:
* reduces conflicts with the sled refactor and any replacement
* allows the function to be called independently for testing
`check_contextual_validity` mistakenly used the new block's hash to try
to get the parent block from the state. This caused a panic, because the
new block isn't in the state yet.
Use `StateService::chain` to get the parent block, because we'll be
using `chain` for difficulty adjustment contextual verification anyway.
* Add internal iterator API for accessing relevant chain blocks
* get blocks from all chains in non_finalized state
* Impl FusedIterator for service::Iter
* impl ExactSizedIterator for service::Iter
* let size_hint find heights in side chains
Co-authored-by: teor <teor@riseup.net>
* Add transcript test for requests while state is empty
* Add happy path test for each query once the state is populated
* let populate logic handle out of order blocks
* Add a maximum queued height metric to the finalized state
And rename all the finalized state metrics to contain "finalized".
* Use i32 and -1 instead of Option<Height>
Co-authored-by: Jane Lusby <jlusby42@gmail.com>
This reverts commit 656bd24ba7.
The Hedge middleware keeps a pair of histograms, writing into one in the
current time interval and reading from the previous time interval's
data. This means that the reverted change resulted in doubling all
block downloads until after at least the second measurement interval
(which means that the time measurements are also incorrect, as they're
operating under double the network load...)
* Create and mount persistent disk to store zebrad state, update runner container config to use
* Enable checkpoint sync in zebrad image config
* Lower state memory cache from 500MB to 50MB
* Upgrade host to n2-standard-4
* Bump zebrad-cache disk size to 100GB
* Copy zebrad as the tests are compiled with a hardcoded path to it
* Rename all debug binaries for easy invocation
* Name state cache disk, use the correct path to binaries
* Create volume and all that jazz on instance creation
Otherwise there's a lot of on-instance commands to do that is just handled by this shortcut.
* Explicitly mount the state cache and cleanup test instance
* Wait for zebra-test container to start then attach
* Always clean up even if the tests step fails
* Keep fast sleep but only print 'waiting' once