Use bulk insertion to Postgres at startup to reduce time taken for initial snapshot restore for postgres plugin. Avoid duplicate writes of accounts at startup. Doing account plugin notification and indexing in parallel.
Improved error handling for postgres plugin to show the real db issues for debug purpose
Added more metrics for postgres plugin.
Refactored plugin centric code out to a sub module from accounts_db and added unit tests
* add filler accounts to bloat validator and predict failure
* assert no accounts match filler
* cleanup magic numbers
* panic if can't load from snapshot with filler accounts specified
* some renames
* renames
* into_par_iter
* clean filler accts, too
Summary of Changes
Create a plugin mechanism in the accounts update path so that accounts data can be streamed out to external data stores (be it Kafka or Postgres). The plugin mechanism allows
Data stores of connection strings/credentials to be configured,
Accounts with patterns to be streamed
PostgreSQL implementation of the streaming for different destination stores to be plugged in.
The code comprises 4 major parts:
accountsdb-plugin-intf: defines the plugin interface which concrete plugin should implement.
accountsdb-plugin-manager: manages the load/unload of plugins and provide interfaces which the validator can notify of accounts update to plugins.
accountsdb-plugin-postgres: the concrete plugin implementation for PostgreSQL
The validator integrations: updated streamed right after snapshot restore and after account update from transaction processing or other real updates.
The plugin is optionally loaded on demand by new validator CLI argument -- there is no impact if the plugin is not loaded.
When reconstructing the AccountsDb, if the storages came from full and
incremental snapshots generated on different nodes, it's possible that
the AppendVec IDs could overlap/have duplicates, which would cause the
reconstruction to fail.
This commit handles this issue by unconditionally remapping the
AppendVec ID for every AppendVec.
Fixes#17088
This commit adds high-level functions for creating and loading-from
incremental snapshots, plus all low-level functions required to perform
those tasks. This commit **does not** add taking incremental snapshots
as part of a running validator, nor starting up a node with an
incremental snapshot; just laying ground work.
Additionally, `snapshot_utils` and `serde_snapshot` have been
refactored to use a common code paths for the different snapshots.
Also of note, some renaming has happened:
1. Snapshots are now either `full_` or `incremental_` throughout the
codebase. If not specified, the code applies to both.
2. Bank snapshots now are called "bank snapshots"
(before they were called "slot snapshots", "bank snapshots", or
just "snapshots"). The one exception is within `Bank`, where they
are still just "snapshots", because they are already "bank
snapshots".
3. Snapshot archives now have `_archive` in the code. This
should clear up an ambiguity between bank snapshots and snapshot
archives.
1. Added both options for measuring space usage using total accounts usage and for individual store shrink ratio using an enum. Validator CLI options: --accounts-shrink-optimize-total-space and --accounts-shrink-ratio
2. Added code for selecting candidates based on total usage in a separate function select_candidates_by_total_usage
3. Added unit tests for the new functions added
4. The default implementations is kept at 0.8 shrink ratio with --accounts-shrink-optimize-total-space set to true
Fixes#17544
* Re-use accounts_db stores
Creating files and dropping mmap areas can be expensive
* Add test for storage finder
Can encounter an infinite loop when the store is too small, but
smaller than the normal store size.
* Fix storage finding
* Check for strong_count == 1
* try_recycle helper
* Move bank (de)serialisation logic from bank and snapshot_utils to serde_snapshot.
Add sanity assertions between genesis config and bank fields on deserialisation.
Atomically update atomic bool in quote_for_specialization_detection().
Use same genesis config when restoring snapshots in test cases.
* Tidy up namings and duplicate structs to version
* Apply struct renames to tests
* Update abi hashes
Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
* Multi-version snapshot support
* rustfmt
* Remove CLI options and runtime support for selection output snapshot version.
Address some clippy complaints.
* Muzzle clippy type complexity warning.
Despite clippy's suggestion, it is not currently possible to create type aliases
for traits and so everything within the 'Box<...>' cannot be type aliased.
This then leaves creating full blown traits, and either implementing
said traits by closure (somehow) or moving the closures into new structs
implementing said traits which seems a bit of a palaver.
Alternatively it is possible to define and use the type alias 'type ResultBox<T> = Result<Box<T>>'
which does seems rather pointless and not a great reduction in complexity but is enough to keep clippy happy.
In the end I simply went with squelching the clippy warning.
* Remove now unused Serialize/Deserialize trait implementations for AccountStorageEntry and AppendVec
* refactor versioned de/serialisers
* rename serde_utils to serde_snapshot
* move call to accounts_db.generate_index() back down to context_accountsdb_from_stream()
* update version 1.1.1 to 1.2.0
remove nested use of serialize_bytes
* cleanups
* Add back measurement of account storage entry serialization.
Remove construction of Vec and HashMap temporaries during serialization.
* consolidate serialisation test cases into serde_snapshot.
clean up leakage of implementation details in serde_snapshot.
* move short term / legacy snapshot code into child module
* add serialize_iter_as_tuple
* preliminary integration of following commit
commit 6d58b73c47294bfb93465d5a83cd2175660b6e6d
Author: Ryo Onodera <ryoqun@gmail.com>
Date: Wed May 20 14:02:02 2020 +0900
Confine snapshot 1.1 relic to versioned codepath
* refactored serde_snapshot, rustfmt
legacy accounts_db format now "owns" both leading u64s, legacy bank_rc format has none
* reduce type complexity (clippy)