* Add fully-reproducible online tracer for banking
* Don't use eprintln!()...
* Update programs/sbf/Cargo.lock...
* Remove meaningless assert_eq
* Group test-only code under aptly named mod
* Remove needless overflow handling in receive_until
* Delay stat aggregation as it's possible now
* Use Cow to avoid needless heap allocs
* Properly consume metrics action as soon as hold
* Trace UnprocessedTransactionStorage::len() instead
* Loosen joining api over type safety for replaystage
* Introce hash event to override these when simulating
* Use serde_with/serde_as instead of hacky workaround
* Update another Cargo.lock...
* Add detailed comment for Packet::buffer serialize
* Rename sender_overhead_minimized_receiver_loop()
* Use type interference for TraceError
* Another minor rename
* Retire now useless ForEach to simplify code
* Use type alias as much as possible
* Properly translate and propagate tracing errors
* Clarify --enable-banking-trace with better naming
* Consider unclean (signal-based) node restarts..
* Tweak logging and cli
* Remove Bank events as it's not needed anymore
* Make tpu own banking tracer thread
* Reduce diff a bit..
* Use latest serde_with
* Finally use the published rolling-file crate
* Make test code change more consistent
* Revive dead and non-terminating test code path...
* Dispose batches early now that possible
* Split off thread handle very early at ::new()
* Tweak message for TooSmallDirByteLimitl
* Remove too much of indirection
* Remove needless pub from ::channel()
* Clarify test comments
* Avoid needless event creation if tracer is disabled
* Write tests around file rotation and spill-over
* Remove unneeded PathBuf::clone()s...
* Introduce inner struct instead of tuple...
* Remove unused enum BankStatus...
* Avoid .unwrap() for the case of disabled tracer...
* introduce workspace.package
* introduce workspace.dependencies
* read version from root cargo.toml
* pass check when version = { workspace = true }
* don't bump version when version = { workspace = true }
* including workspace Cargo.toml when bump version
* programs/sbf use workspace inheritance
* fix increasing cargo version ignore program/sbf/Cargo.toml
Currently, the cleanup service counts the number of shreds in the
database by iterating the entire SlotMeta column and reading the number
of received shreds for each slot. This gives us a fairly accurate count
at the expense of performing a good amount of IO.
Instead of counting the individual slots, use the live_files()
rust-rocksdb entrypoint that we expose in Blockstore. This API allows us
to get the number of entries (shreds) in the data shred column family by
reading file metadata. This is much more efficient from IO perspective.
* Increase turbine propagation const
Value is used as a delay threshold for issuing shred repairs and analysis is showing we are overly aggressive in requesting repairs. Shreds show up via turbine before the repair completes the vast majority of the time
* Use Duration type for MAX_TURBINE_PROPAGATION
Store non-vote transaction counts that are now recorded by the banks
into the `blockstore`.
`SamplePerformanceService` now populates `PerfSampleV2` with counts from
the banks.
{verify,sign}_shreds_gpu need to point to offsets within the packets for
the signed data. For merkle shreds this signed data is the merkle root
of the erasure batch and this would necessitate embedding the merkle
roots in the shreds payload.
However this is wasteful and reduces shreds capacity to store data
because the merkle root can already be recovered from the encoded merkle
proof.
Instead of pointing to offsets within the shreds payload, this commit
recovers merkle roots from the merkle proofs and stores them in an
allocated buffer. {verify,sign}_shreds_gpu would then point to offsets
within this new buffer for the respective signed data.
This would unblock us from removing merkle roots from shreds payload
which would save capacity to send more data with each shred.
The commit adds an associated SignedData type to Shred trait so that
merkle and legacy shreds can return different types for signed_data
method.
This would allow legacy shreds to point to a section of the shred
payload, whereas merkle shreds would compute and return the merkle root.
Ultimately this would allow to remove the merkle root from the shreds
binary.
Merkle shreds within the same erasure batch have the same merkle root.
The root of the merkle tree is signed. So either the signatures match
or one fails sigverify, and the comparison of merkle roots is redundant.
If data is empty, make_shreds_from_data will now return one data shred
with empty data. This preserves invariants verified in tests regardless
of data size.
We currently use the is_connected field to be able to signal to
ReplayStage that a slot has replayable updates. It was discovered that
this functionality is effectively broken, and that is_connected is never
true. In order to convey this information to ReplayStage more
effectively, we need extra state information so this PR changes the
existing bool to bitflags with two bits.
From a compatibility standpoint, the is_connected bool was already
occupying one byte in the serialized SlotMeta in blockstore. Thus, the
change from a bool to bitflags still "fits" in that one byte allotment.
In consideration of a case where a client may wish to downgrade software
and use the same ledger, deserializing the bitflags into a bool could
fail if the new bit is set. As such, this PR introduces the second bit
field, but does not set it anywhere. Once clusters have mass adopted a
software version with this PR, a subsequent change to actually set and
use the new field can be introduced.