* clap-utils: Refactor compute_unit_price into compute_budget
* clap-utils: Validate compute unit price as a u64
* clap-utils: Add compute unit limit arg
* clap-v3-utils: Add compute unit price and limit helpers
* Add deprecation on `pub use` even though it isn't triggered
Previously, entry verification had a dedicated threadpool used to verify
PoH hashes as well as some basic transaction verification via
Bank::verify_transaction(). It should also be noted that the entry
verification code provides logic to offload to a GPU if one is present.
Regardless of whether a GPU is present or not, some of the verification
must be done on a CPU. Moreso, the CPU verification of entries and
transaction execution are serial operations; entry verification finishes
first before moving onto transaction execution.
So, tx execution and entry verification are not competing for CPU cycles
at the same time and can use the same pool.
One exception to the above statement is that if someone is using the
feature to replay forks in parallel, then hypothetically, different
forks may end up competing for the same resources at the same time.
However, that is already true given that we had pools that were shared
between replay of multiple forks. So, this change doesn't really change
much for that case, but will reduce overhead in the single fork case
which is the vast majority of the time.
#### Problem
TieredStorage::drop() currently panic when it fails to delete the
underlying file to raise awareness of possible storage resource
leakage, including io::ErrorKind::NotFound. But sometimes the
TieredStorage (or AccountsFile in general) instance is created
then dropped without any file being created. This causes some
false-alarms including unit-tests.
#### Summary of Changes
This PR excludes NotFound in reporting storage leakage on
TieredStorage::drop().
* save progress
* rename threads handler
* added writer for txs
* after extracting structure to handle tx confirmations
* extract LogWriter
* Replace pair TimestampedTransaction with struct
* add compute_unit_price to TimestampedTransaction
* add cu_price to LogWriter
* add block time to the logs
* Fix warnings
* add comments and restructure code
* some small improvements
* Renamed conformation_processing.rs to log_transaction_service.rs
* address numerous PR comments
* split LogWriter into two structs
* simplify code of LogWriters
* extract process_blocks
* specify commitment in LogTransactionService
* break thread loop if receiver happens to be dropped
* update start_slot when processing blocks
* address pr comments
* fix clippy error
* minor changes
* fix ms problem
* fix bug with time in clear transaction map
This is port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.chttps://github.com/anza-xyz/agave/pull/185
implemented weighted shuffle using binary tree. Though asymptotically a
binary tree has better performance, compared to a Fenwick tree, it has
less cache locality resulting in smaller improvements and in particular
slower WeightedShuffle::new.
In order to improve cache locality and reduce the overheads of
traversing the tree, this commit instead uses a generalized N-ary tree
with fanout of 16, showing significant improvements in both
WeightedShuffle::new and WeightedShuffle::shuffle.
With 4000 weights:
N-ary tree (fanout 16):
test bench_weighted_shuffle_new ... bench: 36,244 ns/iter (+/- 243)
test bench_weighted_shuffle_shuffle ... bench: 149,082 ns/iter (+/- 1,474)
Binary tree:
test bench_weighted_shuffle_new ... bench: 58,514 ns/iter (+/- 229)
test bench_weighted_shuffle_shuffle ... bench: 269,961 ns/iter (+/- 16,446)
Fenwick tree:
test bench_weighted_shuffle_new ... bench: 39,413 ns/iter (+/- 179)
test bench_weighted_shuffle_shuffle ... bench: 364,771 ns/iter (+/- 2,078)
The improvements become even more significant as there are more items to
shuffle. With 20_000 weights:
N-ary tree (fanout 16):
test bench_weighted_shuffle_new ... bench: 200,659 ns/iter (+/- 4,395)
test bench_weighted_shuffle_shuffle ... bench: 941,928 ns/iter (+/- 26,492)
Binary tree:
test bench_weighted_shuffle_new ... bench: 881,114 ns/iter (+/- 12,343)
test bench_weighted_shuffle_shuffle ... bench: 1,822,257 ns/iter (+/- 12,772)
Fenwick tree:
test bench_weighted_shuffle_new ... bench: 276,936 ns/iter (+/- 14,692)
test bench_weighted_shuffle_shuffle ... bench: 2,644,713 ns/iter (+/- 49,252)
#### Problem
As #72 introduced AccountsFile::TieredStorage, it also performs
file-type check when opening an accounts-file to determine whether
it is a tiered-storage or an append-vec. But before tiered-storage is
enabled, this opening check is unnecessary.
#### Summary of Changes
Remove the accounts-file type check code and simply assume everything
is append-vec on AccountsFile::new_from_file().
#### Problem
AccountsFile currently doesn't have an implementation for TieredStorage.
To enable AccountsDB tests for the TieredStorage, we need AccountsFile
to support TieredStorage.
#### Summary of Changes
This PR implements a AccountsFile::TieredStorage, a thin wrapper between
AccountsFile and TieredStorage.
#### Problem
The TieredStorage has not yet implemented the AccountsFile::capacity()
API.
#### Summary of Changes
Implement capacity() API for TieredStorage and limit file size to 16GB,
same as the append-vec file.
This is partial port of firedancer's implementation of weighted shuffle:
https://github.com/firedancer-io/firedancer/blob/3401bfc26/src/ballet/wsample/fd_wsample.c
Though Fenwick trees use less space, inverse queries require an
additional O(log n) factor for binary search resulting an overall
O(n log n log n) performance for weighted shuffle.
This commit instead uses a binary tree where each node contains the sum
of all weights in its left sub-tree. The weights themselves are
implicitly stored at the leaves. Inverse queries and updates to the tree
all can be done O(log n) resulting an overall O(n log n) weighted
shuffle implementation.
Based on benchmarks, this results in 24% improvement in
WeightedShuffle::shuffle:
Fenwick tree:
test bench_weighted_shuffle_new ... bench: 36,686 ns/iter (+/- 191)
test bench_weighted_shuffle_shuffle ... bench: 342,625 ns/iter (+/- 4,067)
Binary tree:
test bench_weighted_shuffle_new ... bench: 59,131 ns/iter (+/- 362)
test bench_weighted_shuffle_shuffle ... bench: 260,194 ns/iter (+/- 11,195)
Though WeightedShuffle::new is now slower, it generally can be cached
and reused as in Turbine:
https://github.com/anza-xyz/agave/blob/b3fd87fe8/turbine/src/cluster_nodes.rs#L68
Additionally the new code has better asymptotic performance. For
example with 20_000 weights WeightedShuffle::shuffle is 31% faster:
Fenwick tree:
test bench_weighted_shuffle_new ... bench: 255,071 ns/iter (+/- 9,591)
test bench_weighted_shuffle_shuffle ... bench: 2,466,058 ns/iter (+/- 9,873)
Binary tree:
test bench_weighted_shuffle_new ... bench: 830,727 ns/iter (+/- 10,210)
test bench_weighted_shuffle_shuffle ... bench: 1,696,160 ns/iter (+/- 75,271)
#### Problem
TieredStorage::file_size() essentially supports AccountsFile::len(),
but its API is inconsistent with AccountsFile's.
#### Summary of Changes
Refactor TieredStorage::file_size() to ::len() and share the same API
as AccountsFile's.
#### Test Plan
Build
Existing unit-tests.
#### Problem
The current implementation of TieredStorage::file_size() requires
a sys-call to provide the file size.
#### Summary of Changes
Add len() API to TieredStorageReader, and have HotStorageReader()
implement the API using Mmap::len().
#### Test Plan
Update existing unit-test to also verify HotStorageReader::len().
#### Problem
The current AppendVecId actually refers to an accounts file id.
#### Summary of Changes
Rename AppendVecId to AccountsFileId.
#### Test Plan
Build
* add set compute units arg for program deploy
* update master changes
* remove duplicates
* fixes and tests
* remove extra lines
* feedback
* Use simulation to determine compute units consumed
* feedback
---------
Co-authored-by: NagaprasadVr <nagaprasadvr246@gmail.com>
* Fix: deploy program on last slot of epoch during environment change
* solana-runtime: deploy at last epoch slot test
* disable deployment of sol_alloc_free
* Move tx-batch-constructor to its own function
* use new_from_cache
---------
Co-authored-by: Alessandro Decina <alessandro.d@gmail.com>