program-runtime: double program cache size (#3481)
The cache is currently getting thrashed and programs are getting
reloaded pretty much at every single slot. Double the cache size, which
makes reloading happen only due to random eviction sometimes picking a
popular entry.
The JIT code size with the new cache size is about 800MB.
This change reduces jit time 15x.
(cherry picked from commit fb4adda5a8a59d7eafe942157260c023b962925a)
Co-authored-by: Alessandro Decina <alessandro.d@gmail.com>
* Refactor cost tracking (#1954)
* Refactor and additional metrics for cost tracking (#1888)
* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;
* Move committed transactions adjustment into its own function
* remove histogram for loaded accounts size due to performance impact
(cherry picked from commit f8630a3522)
* rename cost_tracker.account_data_size to better describe its purpose is to tracker per-block new account allocation
---------
Co-authored-by: Tao Zhu <82401714+tao-stones@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
* Refactor and additional metrics for cost tracking (#1888)
* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;
* Move committed transactions adjustment into its own function
(cherry picked from commit c3fadacf69)
* rename cost_tracker.account_data_size to better describe its purpose is to tracker per-block new account allocation
---------
Co-authored-by: Tao Zhu <82401714+tao-stones@users.noreply.github.com>
Co-authored-by: Tao Zhu <tao@solana.com>
* Rename ComputeBudget::max_invoke_stack_height to max_instruction_stack_depth
The new name is consistent with the existing
ComputeBudget::max_instruction_trace_length.
Also expose compute_budget:MAX_INSTRUCTION_DEPTH.
* bpf_loader: use an explicit thread-local pool for stack and heap memory
Use a fixed thread-local pool to hold stack and heap memory. This
mitigates the long standing issue of jemalloc causing TLB shootdowns to
serve such frequent large allocations.
Because we need 1 stack and 1 heap region per instruction, and the
current max instruction nesting is hardcoded to 5, the pre-allocated
size is (MAX_STACK + MAX_HEAP) * 5 * NUM_THREADS. With the current
limits that's about 2.5MB per thread. Note that this is memory that
would eventually get allocated anyway, we're just pre-allocating it now.
* programs/sbf: add test for stack/heap zeroing
Add TEST_STACK_HEAP_ZEROED which tests that stack and heap regions are
zeroed across reuse from the memory pool.
* Adjust replay-related metrics for unified schduler
* Fix grammar
* Don't compute slowest for unified scheduler
* Rename to is_unified_scheduler_enabled
* Hoist uses to top of file
* Conditionally disable replay-slot-end-to-end-stats
* Remove the misleading fairly balanced text
* local program cache: add `modified_entries` field
* use `modified_entries` for modified program cache
* invoke context: make `program_cache_for_tx_batch` mutable
* invoke context: unify local program cache instances
* remove `find_program_in_cache` alias
* put most AbiExample derivations behind a cfg_attr
* feature gate all `extern crate solana_frozen_abi_macro;`
* use cfg_attr wherever we were deriving both AbiExample and AbiEnumVisitor
* fix cases where AbiEnumVisitor was still being derived unconditionally
* fix a case where AbiExample was derived unconditionally
* fix more cases where both AbiEnumVisitor and AbiExample were derived unconditionally
* two more cases where AbiExample and AbiEnumVisitor were unconditionally derived
* fix remaining unconditional derivations of AbiEnumVisitor
* fix cases where AbiExample is the first thing derived
* fix most remaining unconditional derivations of AbiExample
* move all `frozen_abi(digest =` behind cfg_attr
* replace incorrect cfg with cfg_attr
* fix one more unconditionally derived AbiExample
* feature gate AbiExample impls
* add frozen-abi feature to required Cargo.toml files
* make frozen-abi features activate recursively
* fmt
* add missing feature gating
* fix accidentally changed digest
* activate frozen-abi in relevant test scripts
* don't activate solana-program's frozen-abi in sdk dev-dependencies
* update to handle AbiExample derivation on new AppendVecFileBacking enum
* revert toml formatting
* remove unused frozen-abi entries from address-lookup-table Cargo.toml
* remove toml references to solana-address-lookup-table-program/frozen-abi
* update lock file
* remove no-longer-used generic param
program-runtime: modify sysvar cache impl
in prep for sol_sysvar_get, rework how SysvarCache stores data internally
we store account data directly, except also storing StakeHistory and SlotHashes as objects
these object representations can be removed after native programs are converted to bpf
* Removes direct usage of ProgramCache::entries in get_entries_to_load().
* Adds enum to switch between index implementations.
* Inserts match self.index where ever necessary.
* Adds reproducer to test_feature_activation_loaded_programs_epoch_transition.
* Differentiate entries by environment instead of adjusting the effective slot.
* Fixes tests of assign_program().
* Fixes env order in test_feature_activation_loaded_programs_recompilation_phase().
* Turns test_load_program_effective_slot() into test_load_program_environment().
* Adds comments inside ProgramCache::assign_program().
* program cache: reduce contention
Before this change we used to take the write lock to extract(). This
means that even in the ideal case (all programs are already cached),
the cache was contended by all batches and all operations were
serialized.
With this change we now take the write lock only when we store a new
entry in the cache, and take the read lock to extract(). This means
that in the common case where most/all programs are cached, there is no
contention and all batches progress in parallel.
This improves node replay perf by 20-25% on current mnb traffic.
* ProgramCache: remove SecondLevel structure
* Adds get_entries_to_load().
* Differentiates account not found and unloaded in match_missing().
* Orders assert_eq!(match_missing(), false) in front of cache.extract().
* Makes cargo clippy happy.
* Adds test_extract_nonexistent().
* Move Stats::submit to solana-runtime
* fmt
* rename Stats to LoadedProgramStats
* move submit_loaded_programs_stats to bank::metricss and rename it to report_loaded_programs_stats
* add new log method to LoadedProgramStats and move some code there
* fix unused import
* reload stats in LoadedProgramStats::log
* remove old import
* Only the verifier can cause FailedVerification, everything else is Closed
* Removes the environments parameter from load_program_accounts().
* cargo fmt
* Simplify invocation of deployed program
* Attempt to invoke a program before it is deployed
* Attempt to invoke a buffer before it is used in a deployment
* Escalates Option return value of load_program_accounts() to load_program_with_pubkey().
* Review feedback
* Fix: deploy program on last slot of epoch during environment change
* solana-runtime: deploy at last epoch slot test
* disable deployment of sol_alloc_free
* Move tx-batch-constructor to its own function
* use new_from_cache
---------
Co-authored-by: Alessandro Decina <alessandro.d@gmail.com>
RuntimeConfig doesn't use anything SVM specific and logically belongs
in program runtime rather than SVM. This change moves the definition
of RuntimeConfig struct from the SVM crate to program-runtime and
adjusts `use` statements accordingly.