record_labels returns all the possible labels for a record identified by
a pubkey, used in updating timestamp of crds values:
https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds_value.rs#L560-L577https://github.com/solana-labs/solana/blob/1792100e2/core/src/crds.rs#L240-L251
The code relies on CrdsValueLabel to be limited to a small deterministic
set of possible values for a fixed pubkey. As we expand crds values to
include duplicate shreds, this limits what the duplicate proofs can be
keyed by in the table.
In addition the computation of these labels is inefficient and will
become more so as duplicate shreds and more types of crds values are
added. An alternative is to maintain an index of all crds values
associated with a pubkey.
* Move bank drop to AccountsBackgroundService
* Send to ABS on drop instead, protects against other places banks are dropped
* Fix Abi
* test
Co-authored-by: Carl Lin <carl@solana.com>
* feat: store pre / post token balances
* move helper functions into separate include
* move token balance functionality to transaction-status crate
* fix blockstore processor test
* fix bigtable legacy test
* add caching to decimals
If a node "a" receives instance-info from node "b1" it will override any
instance-info associated with "b1" pubkey in its crds table. This makes
it less likely that when "b1" receives crds values from "a" (either
through pull or push), it sees other instances of itself (because node
"a" discarded them when it received "b1" instance info).
In order for the crds table to contain all instance-info associated with
the same pubkey at the same time, we need to add the instance tokens to
the keys in the crds table (i.e. the CrdsValueLabel).
* Adds a CLI option to the validator to enable just-in-time compilation of BPF.
* Refactoring to use bpf_loader_program instead of feature_set to pass JIT flag from the validator CLI to the executor.
Gossip and other places repeatedly de-serialize vote-state stored in
vote accounts. Ideally the first de-serialization should cache the
result.
This commit adds new VoteAccount type which lazily de-serializes
VoteState from Account data and caches the result internally.
Serialize and Deserialize traits are manually implemented to match
existing code. So, despite changes to frozen_abi, this commit should be
backward compatible.
* runtime: Add `FeeCalculator` resolution method to `HashAgeKind`
* runtime: Plumb fee-collected accounts for failed nonce tx rollback
* runtime: Use fee-collected nonce/fee account for nonced TX error rollback
* runtime: Add test for failed nonced TX accounts rollback
* Fee payer test
* fixup: replace nonce account when it pays the fee
* fixup: nonce fee-payer collect test
* fixup: fixup: clippy/fmt for replace...
* runtime: Test for `HashAgeKind::fee_calculator()`
* Clippy
Co-authored-by: Trent Nelson <trent@solana.com>
process_pull_requests acquires a write lock on crds table to update
records timestamp for each of the pull-request callers:
https://github.com/solana-labs/solana/blob/3087c9049/core/src/crds_gossip_pull.rs#L287-L300
However, pull-requests overlap a lot in callers and this function ends
up doing a lot of redundant duplicate work.
This commit obtains unique callers before acquiring an exclusive lock on
crds table.
* Fix fragile tests in prep of stake rewrite pr
* Restore BOOTSTRAP_VALIDATOR_LAMPORTS where appropriate
* Further clean up
* Further clean up
* Aligh with other call site change
* Remove false warn!
* fix ci!
Validator logs show that prune messages are dropped because they exceed
packet data size:
https://github.com/solana-labs/solana/blob/f25c969ad/perf/src/packet.rs#L90-L92
This can exacerbate gossip traffic by redundantly increasing push
messages across network. The workaround is to break prunes into smaller
chunks and send over in multiple messages.
* Add TestValidator::new_with_fees constructor, and warning for low bootstrap_validator_lamports
* Add logging to solana-tokens integration test to help catch low bootstrap_validator_lamports in the future
* Reasonable TestValidator mint_lamports
split_gossip_messages:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L1536-L1574
splits crds-values into chunks to fit into a gossip packet. However it is
using a global upper-bound for the header-size across all protocols:
https://github.com/solana-labs/solana/blob/a97c04b40/core/src/cluster_info.rs#L90-L93
This can be wasteful as the specific gossip protocol can have smaller
header than this upper-bound (e.g. Protocol::PushMessage is 170 bytes
smaller). Adding more crds-values in one gossip packet can avoid the
overheads of separate packets and reduce total number of bytes sent over
the wire.
This commit updates the splitting function to take a max-chunk-size
argument. At call-site, this value is set to the size of the protocol
which the values are sent over.
In several places in gossip code, the entire crds table is scanned only
to filter out nodes' contact infos. Currently on mainnet, crds table is
of size ~70k, while there are only ~470 nodes. So the full table scan is
inefficient. Instead we may maintain an index of only nodes' contact
infos.
The --rpc-pubsub-enable-vote-subscription flag may be used to enable it.
The current vote subscription is problematic because it emits a
notification for *every* vote, so hundreds a second in a real cluster.
Critically it's also missing information about *who* is voting,
rendering all those notifications practically useless.
Until these two issues can be resolved, the vote subscription is not
much more than a potential DoS vector.
* Discard pre hard fork persisted tower if hard-forking
* Relax config.require_tower
* Add cluster test
* nits
* Remove unnecessary check
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
Co-authored-by: Carl Lin <carl@solana.com>
* Fix slow/stuck unstaking due to toggling in epoch
* nits
* nits
* Add stake_program_v2 feature status check to cli
Co-authored-by: Tyera Eulberg <tyera@solana.com>
Packet::from_data is ignoring serialization errors:
https://github.com/solana-labs/solana/blob/d08c3232e/sdk/src/packet.rs#L42-L48
This is likely never useful as the packet will be sent over the wire
taking bandwidth but at the receiving end will either fail to
deserialize or it will be invalid.
This commit will propagate the errors out of the function to the
call-site, allowing the call-site to handle the error.
* Fix tower/blockstore unsync due to external causes
* Add and clean up long comments
* Clean up test
* Comment about warped_slot_history
* Run test_future_tower with master-only/master-slave
* Update comments about false leader condition
https://hackerone.com/reports/991106
> It’s possible to use UDP gossip protocol to amplify DDoS attacks. An attacker
> can spoof IP address in UDP packet when sending PullRequest to the node.
> There's no any validation if provided source IP address is not spoofed and
> the node can send much larger PullResponse to victim's IP. As I checked,
> PullRequest is about 290 bytes, while PullResponse is about 10 kB. It means
> that amplification is about 34x. This way an attacker can easily perform DDoS
> attack both on Solana node and third-party server.
>
> To prevent it, need for example to implement ping-pong mechanism similar as
> in Ethereum: Before accepting requests from remote client needs to validate
> his IP. Local node sends Ping packet to the remote node and it needs to reply
> with Pong packet that contains hash of matching Ping packet. Content of Ping
> packet is unpredictable. If hash from Pong packet matches, local node can
> remember IP where Ping packet was sent as correct and allow further
> communication.
>
> More info:
> https://github.com/ethereum/devp2p/blob/master/discv4.md#endpoint-proof
> https://github.com/ethereum/devp2p/blob/master/discv4.md#wire-protocol
The commit adds a PingCache, which maintains records of remote nodes
which have returned a valid response to a ping message, and on-the-fly
ping messages pending a pong response from the remote node.
When handling pull-requests, those from addresses which have not passed
the ping-pong check are filtered out, and additionally ping packets are
added for addresses which need to be (re)verified.
mark_pull_request_creation time requires an exclusive lock on gossip:
https://github.com/solana-labs/solana/blob/16944e218/core/src/cluster_info.rs#L1547-L1548
Current code is redundantly marking each peer once for each request.
There are at most only 2 unique peers, whereas there are hundreds of
requests per each. So the lock is acquired hundreds of time longer than
necessary.
ClusterInfo::process_packets handles incoming packets in a thread_pool:
https://github.com/solana-labs/solana/blob/87311cce7/core/src/cluster_info.rs#L2118-L2134
However, profiling runtime shows that threads are not well utilized and
a lot of the processing is done sequentially.
This commit redistributes the work done in parallel. Testing on a gce
cluster shows 20%+ improvement in processing gossip packets with much
smaller variations.
* Follow up to persistent tower
* Ignore for now...
* Hard-code validator identities for easy reasoning
* Add a test for opt. conf violation without tower
* Fix compile with rust < 1.47
* Remove unused method
* More move of assert tweak to the asser pr
* Add comments
* Clean up
* Clean the test addressing various review comments
* Clean up a bit
* introduce store program logs in blockstore / bigtable
* fix test, transaction logs created for successful transactions
* fix test for legacy bincode implementation around log_messages
* only api nodes should record logs
* truncate transaction logs to 100KB
* refactor log truncate for improved coverage
crds_gossip tests start large networks, which with large thread-pools
will exhaust system resources, causing failures in ci tests:
https://buildkite.com/solana-labs/solana/builds/31953
The commit limits size of thread-pools in the test.
* Include post balance information for rewards
* Add post-balance to stored Reward struct
* Handle extended Reward in bigtable
Co-authored-by: Michael Vines <mvines@gmail.com>
Based on run-time profiles, the majority time of new_pull_requests is
spent building bloom filters, in hashing and bit-vec ops.
This commit builds crds filters in parallel using rayon constructs. The
added benchmark shows ~5x speedup (4-core machine, 8 threads).
Current code only returns values which are expired based on the default
timeout. Example from the added unit test:
- value inserted at time 0
- pubkey specific timeout = 1
- default timeout = 3
Then at now = 2, the value is expired, but the function fails to return
the value because it compares with the default timeout.
* Add service to track the most recent optimistically confirmed bank
* Plumb service into ClusterInfoVoteListener and ReplayStage
* Clean up test
* Use OptimisticallyConfirmedBank in RPC
* Remove superfluous notifications from RpcSubscriptions
* Use crossbeam to avoid mpsc recv_timeout panic
* Review comments
* Remove superfluous last_checked_slots, but pass in OptimisticallyConfirmedBank for complete correctness
* Add blockstore column to store performance sampling data
* introduce timer and write performance metrics to blockstore
* introduce getRecentPerformanceSamples rpc
* only run on rpc nodes enabled with transaction history
* add unit tests for get_recent_performance_samples
* remove RpcResponse from rpc call
* refactor to use Instant::now and elapsed for timer
* switch to root bank and ensure not negative subraction
* Add PerfSamples to purge/compaction
* refactor to use Instant::now and elapsed for timer
* switch to root bank and ensure not negative subraction
* remove duplicate constants
Co-authored-by: Tyera Eulberg <tyera@solana.com>
* Save/restore Tower
* Avoid unwrap()
* Rebase cleanups
* Forcibly pass test
* Correct reconcilation of votes after validator resume
* d b g
* Add more tests
* fsync and fix test
* Add test
* Fix fmt
* Debug
* Fix tests...
* save
* Clarify error message and code cleaning around it
* Move most of code out of tower save hot codepath
* Proper comment for the lack of fsync on tower
* Clean up
* Clean up
* Simpler type alias
* Manage tower-restored ancestor slots without banks
* Add comment
* Extract long code blocks...
* Add comment
* Simplify returned tuple...
* Tweak too aggresive log
* Fix typo...
* Add test
* Update comment
* Improve test to require non-empty stray restored slots
* Measure tower save and dump all tower contents
* Log adjust and add threshold related assertions
* cleanup adjust
* Properly lower stray restored slots priority...
* Rust fmt
* Fix test....
* Clarify comments a bit and add TowerError::TooNew
* Further clean-up arround TowerError
* Truly create ancestors by excluding last vote slot
* Add comment for stray_restored_slots
* Add comment for stray_restored_slots
* Use BTreeSet
* Consider root_slot into post-replay adjustment
* Tweak logging
* Add test for stray_restored_ancestors
* Reorder some code
* Better names for unit tests
* Add frozen_abi to SavedTower
* Fold long lines
* Tweak stray ancestors and too old slot history
* Re-adjust error conditon of too old slot history
* Test normal ancestors is checked before stray ones
* Fix conflict, update tests, adjust behavior a bit
* Fix test
* Address review comments
* Last touch!
* Immediately after creating cleaning pr
* Revert stray slots
* Revert comment...
* Report error as metrics
* Revert not to panic! and ignore unfixable test...
* Normalize lockouts.root_slot more strictly
* Add comments for panic! and more assertions
* Proper initialize root without vote account
* Clarify code and comments based on review feedback
* Fix rebase
* Further simplify based on assured tower root
* Reorder code for more readability
Co-authored-by: Michael Vines <mvines@gmail.com>
filter_crds_values checks every crds filter against every hash value:
https://github.com/solana-labs/solana/blob/ee646aa7/core/src/crds_gossip_pull.rs#L432
which can be inefficient if the filter's bit-mask only matches small
portion of the entire crds table.
This commit shards crds values into separate tables based on shard_bits
first bits of their hash prefix. Given a (mask, mask_bits) filter,
filtering crds can be done by inspecting only relevant shards.
If CrdsFilter.mask_bits <= shard_bits, then precisely only the crds
values which match (mask, mask_bits) bit pattern are traversed.
If CrdsFilter.mask_bits > shard_bits, then approximately only
1/2^shard_bits of crds values are inspected.
Benchmarking on a gce cluster of 20 nodes, I see ~10% improvement in
generate_pull_responses metric, but with larger clusters, crds table and
2^mask_bits are both larger, so the impact should be more significant.
* Check bank capitalization
* Simplify and unify capitalization calculation
* Improve and add tests
* Avoid overflow and inhibit automatic restart
* Fix test
* Tweak checked sum for cap. and add tests
* Fix broken build after merge conflicts..
* Rename to ClusterType
* Rename confusing method
* Clarify comment
* Verify cap. in rent and inflation tests
Co-authored-by: Stephen Akridge <sakridge@gmail.com>
* Add blockstore column to cache block times
* Add method to cache block time
* Add service to cache block time
* Update rpc getBlockTime to use new method, and refactor blockstore slightly
* Return block_time with confirmed block, if available
* Add measure and warning to cache-block-time
* Use untagged RpcSignatureResult enum to avoid breaking downstream consumers of current signature subscriptions
* Clean up client duplication
* Clippy