--tpu-enable-udp is introduced. And when this is on, the transaction receive and transaction forward is enabled using udp.
Except for a few tests which was hard-coded sending transactions using udp, most tests are being done with udp based tpu disabled.
* Plumb priority_fee_cache into rpc
* Add PrioritizationFeeCache api
* Add getRecentPrioritizationFees rpc endpoint
* Use MAX_TX_ACCOUNT_LOCKS to limit input keys
* Remove unused cache apis
* Map fee data by slot, and make rpc account inputs optional
* Add priority_fee_cache to rpc test framework, and add test
* Add endpoint to jsonrpc docs
* Update docs/src/developing/clients/jsonrpc-api.md
* Update docs/src/developing/clients/jsonrpc-api.md
In kin-sim, we found that bounded channel causes halt for account
background services. As the number of accounts grows, the time for
pruning and cleaning increases, which would leads to longer intervals
between the pruning of deaded bank slots. With 1.7B accounts, we will
exceed the 10K bounded channel threshold that causes halt of account
back ground services. Without pruning, the node will eventually run out
of memory.
* Check overflow on vote tx compaction boundary
Check for overflow during the conversion between VoteStateUpdate and
CompactVoteStateUpdate.
* Try removing clippy supress
Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case
* Create a new function cleanup_accounts_paths, a trivial change
* Remove account files asynchronously
* Update and simplify the implementation after the validator test runs.
* Fixes after testing on the dev device
* Discard tokio. Use thread instead
* Fix comments format
* Fix config type to pass the github test
* Fix failed tests. Handle the case of non-existing path
* Final cleanup, addressing the review comments
Avoided OsString.
Made the function more generic with "impl AsRef<Path>"
Co-authored-by: Jeff Washington <jeff.washington@solana.com>
In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.
A change included in
https://github.com/solana-labs/solana/pull/20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.
Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.
In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.
This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
coding shreds will be generated so that the resulting erasure batch
has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
split uniformly into erasure batches with _at least_ 32 data shreds in
each batch. Each erasure batch will have the same number of code and
data shreds.
For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
resulting 19(data):27(code) erasure batch has the same recovery
probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
36:36 and 35:35 data:code shreds each.
A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);
This change sets the receive_window for non-staked node to 1 * PACKET_DATA_SIZE, and maps the staked nodes's connection's receive_window between 1.2 * PACKET_DATA_SIZE to 10 * PACKET_DATA_SIZE based on the stakes.
The changes is based on Quinn library change to support per connection receive_window tweak at the server side. quinn-rs/quinn#1393
#### Problem
LedgerCleanupService requires compactions to propagate & digest range-delete tombstones
to eventually reclaim disk space.
#### Summary of Changes
This PR makes LedgerCleanupService::cleanup_ledger delete any file whose slot-range is
older than the lowest_cleanup_slot. This allows us to reclaim disk space more often with
fewer IOps. Experimental results on mainnet validators show that the PR can effectively
reduce 33% to 40% ledger disk size.
* Keypair: implement clone()
This was not implemented upstream in ed25519-dalek to force everyone to
think twice before creating another copy of a potentially sensitive
private key in memory.
See https://github.com/dalek-cryptography/ed25519-dalek/issues/76
However, there are now 9 instances of
Keypair::from_bytes(&keypair.to_bytes())
in the solana codebase and it would be preferable to have a function.
In particular since this also comes up when writing programs and can
cause users to either start messing with lifetimes or discover the
from_bytes() workaround themselves.
This patch opts to not implement the Clone trait. This avoids automatic
use in order to preserve some of the original "let developers think
twice about this" intention.
* Use Keypair::clone
Prior to this change, long running commands like `solana-ledger-tool
verify` would OOM due to AccountsDb cleanup not happening.
Co-authored-by: Michael Vines <mvines@gmail.com>
* Concurrent replay slots
* Split out concurrent and single bank replay paths
* Sub function processing of replay results for readability
* Add feature switch for concurrent replay
* Only take the latest vote for each validator in gossip
Since the new vote updates are no longer incremental, there
is no value in storing intermediate votes.
* Address pr feedback
* Handle potential downgrade path, FullTowerVote -> Incremental
* Rename sent to bank -> gossip slot
* Handle downgrade case properly
* Only downgrade for newer votes and feature flag, ignore incremental votes otherwise
* Update test
* Use client certs in QUIC to get peer's stake
* fixes to cert processing
* integrate the code
* clippy
* more cleanup
* sort cargo deps
* test fixes
* info -> debug