program-runtime: double program cache size (#3481)
The cache is currently getting thrashed and programs are getting
reloaded pretty much at every single slot. Double the cache size, which
makes reloading happen only due to random eviction sometimes picking a
popular entry.
The JIT code size with the new cache size is about 800MB.
This change reduces jit time 15x.
(cherry picked from commit fb4adda5a8a59d7eafe942157260c023b962925a)
Co-authored-by: Alessandro Decina <alessandro.d@gmail.com>
* make declare_id a declarative macro
* remove old comment
* deprecate program_declare_id
* deprecate pubkey macro
* put deprecation on the re-export of the pubkey macro
* replace pubkey! with from_str_const in this repo
* fmt
* remove unused import
* Revert "remove unused rustc_version dep from wen-restart (wrong branch)"
This reverts commit 60dbddd03ab3330fc3d86d36ed77ad2babf0afc6.
* avoid wen-restart changes again
* fmt
* fix deprecation text
* make declare_deprecated_id a declarative macro
* put back the deprecation on the re-export of the pubkey macro
* fmt
* don't deprecate the pubkey macro, but make it a declarative macro
* update deprecation note
* re-export the new pubkey macro in solana-sdk (with deprecation) instead of the old one
* replace cfg(RUSTC_WITH_SPECIALIZATION) with cfg(feature = "frozen-abi")
* remove the build scripts for the two frozen-abi crates
* remove all rustc_version deps
* remove a rustc_version dep that I missed
* fix duplicate lines in Cargo.toml files
* remove build.rs from instruction crate
* remove rustc_version from instruction crate
* remove no-longer-needed check-cfg entries
* update lock file after rebase
* Splits transfer authority and finalize into two instructions.
* Adds next-version-forwarding to finalization.
* Makes loader-v4 a program runtime v1 loader.
* bump rust to 1.80
* bump nightly version to 2024-07-21
* bump rust stable to 1.80.1, nightly to 2024-08-08
* clippy: macro_metavars_in_unsafe
* fix unexpected tag
* run anchor downstream test with their master
* add no-entrypoint into workspace level lint
* use correct llvm path for coverage test
* add cfg(feature = "frozen-abi") to build.rs
* only depend on rustc_version when frozen-abi feature is activated
* remove extraneous dirs that snuck in from another branch
* update perf/build.rs as it's different from the standard build script
* use symlink for svm/build.rs
* remove unused build dep rustc_version from wen-restart
* fmt Cargo.toml
refactor: move process_compute_budget_instructions from solana_compute_budget to solana_runtime_transaction, to break circular dependencies. Issue #2169
* Refactor and additional metrics for cost tracking (#1888)
* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;
* Move committed transactions adjustment into its own function
* remove histogram for loaded accounts size due to performance impact
* Refactor and add metrics:
- Combine remove_* and update_* functions to reduce locking on cost-tracker and iteration.
- Add method to calculate executed transaction cost by directly using actual execution cost and loaded accounts size;
- Wireup histogram to report loaded accounts size;
- Report time of block limits checking;
- Move account counters from ExecuteDetailsTimings to ExecuteAccountsDetails;
* Move committed transactions adjustment into its own function
* Rename ComputeBudget::max_invoke_stack_height to max_instruction_stack_depth
The new name is consistent with the existing
ComputeBudget::max_instruction_trace_length.
Also expose compute_budget:MAX_INSTRUCTION_DEPTH.
* bpf_loader: use an explicit thread-local pool for stack and heap memory
Use a fixed thread-local pool to hold stack and heap memory. This
mitigates the long standing issue of jemalloc causing TLB shootdowns to
serve such frequent large allocations.
Because we need 1 stack and 1 heap region per instruction, and the
current max instruction nesting is hardcoded to 5, the pre-allocated
size is (MAX_STACK + MAX_HEAP) * 5 * NUM_THREADS. With the current
limits that's about 2.5MB per thread. Note that this is memory that
would eventually get allocated anyway, we're just pre-allocating it now.
* programs/sbf: add test for stack/heap zeroing
Add TEST_STACK_HEAP_ZEROED which tests that stack and heap regions are
zeroed across reuse from the memory pool.
* Adjust replay-related metrics for unified schduler
* Fix grammar
* Don't compute slowest for unified scheduler
* Rename to is_unified_scheduler_enabled
* Hoist uses to top of file
* Conditionally disable replay-slot-end-to-end-stats
* Remove the misleading fairly balanced text
* local program cache: add `modified_entries` field
* use `modified_entries` for modified program cache
* invoke context: make `program_cache_for_tx_batch` mutable
* invoke context: unify local program cache instances
* remove `find_program_in_cache` alias
* put most AbiExample derivations behind a cfg_attr
* feature gate all `extern crate solana_frozen_abi_macro;`
* use cfg_attr wherever we were deriving both AbiExample and AbiEnumVisitor
* fix cases where AbiEnumVisitor was still being derived unconditionally
* fix a case where AbiExample was derived unconditionally
* fix more cases where both AbiEnumVisitor and AbiExample were derived unconditionally
* two more cases where AbiExample and AbiEnumVisitor were unconditionally derived
* fix remaining unconditional derivations of AbiEnumVisitor
* fix cases where AbiExample is the first thing derived
* fix most remaining unconditional derivations of AbiExample
* move all `frozen_abi(digest =` behind cfg_attr
* replace incorrect cfg with cfg_attr
* fix one more unconditionally derived AbiExample
* feature gate AbiExample impls
* add frozen-abi feature to required Cargo.toml files
* make frozen-abi features activate recursively
* fmt
* add missing feature gating
* fix accidentally changed digest
* activate frozen-abi in relevant test scripts
* don't activate solana-program's frozen-abi in sdk dev-dependencies
* update to handle AbiExample derivation on new AppendVecFileBacking enum
* revert toml formatting
* remove unused frozen-abi entries from address-lookup-table Cargo.toml
* remove toml references to solana-address-lookup-table-program/frozen-abi
* update lock file
* remove no-longer-used generic param
program-runtime: modify sysvar cache impl
in prep for sol_sysvar_get, rework how SysvarCache stores data internally
we store account data directly, except also storing StakeHistory and SlotHashes as objects
these object representations can be removed after native programs are converted to bpf
* Removes direct usage of ProgramCache::entries in get_entries_to_load().
* Adds enum to switch between index implementations.
* Inserts match self.index where ever necessary.
* Adds reproducer to test_feature_activation_loaded_programs_epoch_transition.
* Differentiate entries by environment instead of adjusting the effective slot.
* Fixes tests of assign_program().
* Fixes env order in test_feature_activation_loaded_programs_recompilation_phase().
* Turns test_load_program_effective_slot() into test_load_program_environment().
* Adds comments inside ProgramCache::assign_program().
* program cache: reduce contention
Before this change we used to take the write lock to extract(). This
means that even in the ideal case (all programs are already cached),
the cache was contended by all batches and all operations were
serialized.
With this change we now take the write lock only when we store a new
entry in the cache, and take the read lock to extract(). This means
that in the common case where most/all programs are cached, there is no
contention and all batches progress in parallel.
This improves node replay perf by 20-25% on current mnb traffic.
* ProgramCache: remove SecondLevel structure
* Adds get_entries_to_load().
* Differentiates account not found and unloaded in match_missing().
* Orders assert_eq!(match_missing(), false) in front of cache.extract().
* Makes cargo clippy happy.
* Adds test_extract_nonexistent().