* Rename ComputeBudget::max_invoke_stack_height to max_instruction_stack_depth
The new name is consistent with the existing
ComputeBudget::max_instruction_trace_length.
Also expose compute_budget:MAX_INSTRUCTION_DEPTH.
* bpf_loader: use an explicit thread-local pool for stack and heap memory
Use a fixed thread-local pool to hold stack and heap memory. This
mitigates the long standing issue of jemalloc causing TLB shootdowns to
serve such frequent large allocations.
Because we need 1 stack and 1 heap region per instruction, and the
current max instruction nesting is hardcoded to 5, the pre-allocated
size is (MAX_STACK + MAX_HEAP) * 5 * NUM_THREADS. With the current
limits that's about 2.5MB per thread. Note that this is memory that
would eventually get allocated anyway, we're just pre-allocating it now.
* programs/sbf: add test for stack/heap zeroing
Add TEST_STACK_HEAP_ZEROED which tests that stack and heap regions are
zeroed across reuse from the memory pool.