For all code paths (gossip push, pull, purge, etc) that remove or
override a crds value, it is necessary to record hash of values purged
from crds table, in order to exclude them from subsequent pull-requests;
otherwise the next pull request will likely return outdated values,
wasting bandwidth:
https://github.com/solana-labs/solana/blob/ed51cde37/core/src/crds_gossip_pull.rs#L486-L491
Currently this is done all over the place in multiple modules, and this
has caused bugs in the past where purged values were not recorded.
This commit encapsulated this bookkeeping into crds module, so that any
code path which removes or overrides a crds value, also records the hash
of purged value in-place.
It is crucial that VersionedCrdsValue::insert_timestamp does not go
backward in time:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L67-L79
Otherwise methods such as get_votes and get_epoch_slots_since will
break, which will break their downstream flow, including vote-listener
and optimistic confirmation:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1197-L1215https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L1274-L1298
For that, Crds::new_versioned is intended to be called "atomically" with
Crds::insert_verioned (as the comment already says so):
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/crds.rs#L126-L129
However, currently this is violated in the code. For example,
filter_pull_responses creates VersionedCrdsValues (with the current
timestamp), then acquires an exclusive lock on gossip, then
process_pull_responses writes those values to the crds table:
https://github.com/solana-labs/solana/blob/ec37a843a/core/src/cluster_info.rs#L2375-L2392
Depending on the workload and lock contention, the insert_timestamps may
well be in the past when these values finally are inserted into gossip.
To avoid such scenarios, this commit:
* removes Crds::new_versioned and Crd::insert_versioned.
* makes VersionedCrdsValue constructor private, only invoked in
Crds::insert, so that insert_timestamp is populated right before
insert.
This will improve insert_timestamp monotonicity as long as Crds::insert
is not called with a stalled timestamp. Following commits may further
improve this by calling timestamp() inside Crds::insert, and/or
switching to std::time::Instant which guarantees monotonicity.
Number of parity coding shreds is always less than the number of data
shreds in FEC blocks:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L719
Data shreds are batched in chunks of 32 shreds each:
https://github.com/solana-labs/solana/blob/6907a2366/ledger/src/shred.rs#L714
However the very last batch of data shreds in a slot can be small, in
which case the loss rate can be exacerbated.
This commit expands the number of coding shreds in the last FEC block in
slots to: 64 - number of data shreds; so that FEC blocks are always 64
data and parity coding shreds each.
As a consequence of this, the last FEC block has more parity coding
shreds than data shreds. So for some shred indices we will have a coding
shred but no data shreds. This should not cause any kind of overlapping
FEC blocks as in:
https://github.com/solana-labs/solana/pull/10095
since this is done only for the very last batch in a slot, and the next
slot will reset the shred index.
* Track transaction check time separately from account loads
* banking packet process metrics
* Remove signature clone in status cache lookup
* Reduce allocations when converting packets to transactions
* Add blake3 hash of transaction messages in status cache
* Bug fixes
* fix tests and run fmt
* Address feedback
* fix simd tx entry verification
* Fix rebase
* Feedback
* clean up
* Add tests
* Remove feature switch and fall back to signature check
* Bump programs/bpf Cargo.lock
* clippy
* nudge benches
* Bump `BankSlotDelta` frozen ABI hash`
* Add blake3 to sdk/programs/Cargo.lock
* nudge bpf tests
* short circuit status cache checks
Co-authored-by: Trent Nelson <trent@solana.com>
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
* Save/restore Tower
* Avoid unwrap()
* Rebase cleanups
* Forcibly pass test
* Correct reconcilation of votes after validator resume
* d b g
* Add more tests
* fsync and fix test
* Add test
* Fix fmt
* Debug
* Fix tests...
* save
* Clarify error message and code cleaning around it
* Move most of code out of tower save hot codepath
* Proper comment for the lack of fsync on tower
* Clean up
* Clean up
* Simpler type alias
* Manage tower-restored ancestor slots without banks
* Add comment
* Extract long code blocks...
* Add comment
* Simplify returned tuple...
* Tweak too aggresive log
* Fix typo...
* Add test
* Update comment
* Improve test to require non-empty stray restored slots
* Measure tower save and dump all tower contents
* Log adjust and add threshold related assertions
* cleanup adjust
* Properly lower stray restored slots priority...
* Rust fmt
* Fix test....
* Clarify comments a bit and add TowerError::TooNew
* Further clean-up arround TowerError
* Truly create ancestors by excluding last vote slot
* Add comment for stray_restored_slots
* Add comment for stray_restored_slots
* Use BTreeSet
* Consider root_slot into post-replay adjustment
* Tweak logging
* Add test for stray_restored_ancestors
* Reorder some code
* Better names for unit tests
* Add frozen_abi to SavedTower
* Fold long lines
* Tweak stray ancestors and too old slot history
* Re-adjust error conditon of too old slot history
* Test normal ancestors is checked before stray ones
* Fix conflict, update tests, adjust behavior a bit
* Fix test
* Address review comments
* Last touch!
* Immediately after creating cleaning pr
* Revert stray slots
* Revert comment...
* Report error as metrics
* Revert not to panic! and ignore unfixable test...
* Normalize lockouts.root_slot more strictly
* Add comments for panic! and more assertions
* Proper initialize root without vote account
* Clarify code and comments based on review feedback
* Fix rebase
* Further simplify based on assured tower root
* Reorder code for more readability
Co-authored-by: Michael Vines <mvines@gmail.com>
filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.
This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.
If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.
Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
* Make Message::new_with_payer the default constructor
* Remove Transaction::new_[un]signed_instructions
These guess the fee-payer instead of stating it explicitly