record_labels returns all the possible labels for a record identified by
a pubkey, used in updating timestamp of crds values:
https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds_value.rs#L560-L577https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds.rs#L240-L251
The code relies on CrdsValueLabel to be limited to a small deterministic
set of possible values for a fixed pubkey. As we expand crds values to
include duplicate shreds, this limits what the duplicate proofs can be
keyed by in the table.
In addition the computation of these labels is inefficient and will
become more so as duplicate shreds and more types of crds values are
added. An alternative is to maintain an index of all crds values
associated with a pubkey.
If a node "a" receives instance-info from node "b1" it will override any
instance-info associated with "b1" pubkey in its crds table. This makes
it less likely that when "b1" receives crds values from "a" (either
through pull or push), it sees other instances of itself (because node
"a" discarded them when it received "b1" instance info).
In order for the crds table to contain all instance-info associated with
the same pubkey at the same time, we need to add the instance tokens to
the keys in the crds table (i.e. the CrdsValueLabel).
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.
This commit obtains unique callers before acquiring an exclusive lock on
crds table.
Validator logs show that prune messages are dropped because they exceed
packet data size:
https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92
This can exacerbate gossip traffic by redundantly increasing push
messages across network. The workaround is to break prunes into smaller
chunks and send over in multiple messages.
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
Current code only returns values which are expired based on the default
timeout. Example from the added unit test:
- value inserted at time 0
- pubkey specific timeout = 1
- default timeout = 3
Then at now = 2, the value is expired, but the function fails to return
the value because it compares with the default timeout.
* Accounts hash consistency halting
* Add option to inject account hash faults for testing.
Enable option in local cluster test to see that node halts.
* Update epoch slots to include all missing slots
* new test for compress/decompress
* address review comments
* limit cache based on size, instead of comparing roots
* Fix repair when most peers are incapable of serving requests
* Add a test for getting the lowest slot in blocktree
* Replace some more u64s with Slot
* vote array
wip
wip
wip
update
gossip index should match tower index
tests build
clippy
test index after expired vote
test
bank specific last vote sync time
* verify
* we are likely to see many more warnings about old votes now
* Insert data shreds in blocktree and database
* Integrate data shreds with rest of the code base
* address review comments, and some clippy fixes
* Fixes to some tests
* more test fixes
* ignore some local cluster tests
* ignore replicator local cluster tests
* Add signature to blob
* Change Signable trait to support returning references to signable data
* Add signing to broadcast
* Verify signatures in window_service
* Add testing for signatures to erasure
* Add RPC for getting current slot, consume RPC call in test_repairman_catchup for more deterministic results
* Add Epoch Slots to gossip
* Add new gossip structure to support Repair
* remove unnecessary clones
* Setup dummy fast repair in repair_service
* PR comments
* Rename userdata to data
Instead of saying "userdata", which is ambiguous and imprecise,
say "instruction data" or "account data".
Also, add `ProgramError::InvalidInstructionData`
Fixes#2761