solana

Commit Graph

Author	SHA1	Message	Date
Brooks Prumo	fd937548a0	Move SnapshotArchiveInfo and friends into its own module (#19114 )	2021-08-08 07:57:06 -05:00
Brooks Prumo	00890957ee	Add snapshot_utils::bank_from_latest_snapshot_archives() (#18983 ) While reviewing PR #18565, as issue was brought up to refactor some code around verifying the bank after rebuilding from snapshots. A new top-level function has been added to get the latest snapshot archives and load the bank then verify. Additionally, new tests have been written and existing tests have been updated to use this new function. Fixes #18973 While resolving the issue, it became clear there was some additional low-hanging fruit this change enabled. Specifically, the functions `bank_to_xxx_snapshot_archive()` now return their respective `SnapshotArchiveInfo`. And on the flip side, `bank_from_snapshot_archives()` now takes `SnapshotArchiveInfo`s instead of separate paths and archive formats. This bundling simplifies bank rebuilding.	2021-08-06 20:16:06 -05:00
Michael Vines	397801a2d8	Extract tower storage details from Tower struct	2021-08-06 10:04:37 -07:00
behzad nouri	e4be00fece	falls back on working-bank if root-bank::epoch-staked-nodes is none bank.get_leader_schedule_epoch(shred_slot) is one epoch after epoch_schedule.get_epoch(shred_slot). At epoch boundaries, shred is already one epoch after the root-slot. So we need epoch-stakes 2 epochs ahead of the root. But the root bank only has epoch-stakes for one epoch ahead, and as a result looking up epoch staked-nodes from the root-bank fails. To be backward compatible with the current master code, this commit implements a fallback on working-bank if epoch staked-nodes obtained from the root-bank is none.	2021-08-05 21:47:33 +00:00
behzad nouri	eaf927cf49	allows only one thread to update cluster-nodes cache entry for an epoch If two threads simultaneously call into ClusterNodesCache::get for the same epoch, and the cache entry is outdated, then both threads recompute cluster-nodes for the epoch and redundantly overwrite each other. This commit wraps ClusterNodesCache entries in Arc<Mutex<...>>, so that when needed only one thread does the computations to update the entry.	2021-08-05 21:47:33 +00:00
behzad nouri	fb69f45f14	adds fallback & metric for when epoch staked-nodes are none	2021-08-05 21:47:33 +00:00
behzad nouri	50d0e830c9	unifies cluster-nodes computation & caching across turbine stages Broadcast-stage is using epoch_staked_nodes based on the same slot that shreds belong to: https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228 https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349 But retransmit-stage is using bank-epoch of the working-bank: https://github.com/solana-labs/solana/blob/19bd30262/core/src/retransmit_stage.rs#L272-L289 So the two are not consistent at epoch boundaries where some nodes may have a working bank (or similarly a root bank) lagging other nodes. As a result the node which obtains a packet may construct turbine broadcast tree inconsistently with its parent node in the tree and so some packets may fail to reach all nodes in the tree.	2021-08-05 21:47:33 +00:00
behzad nouri	aa32738dd5	uses cluster-nodes cache in broadcast-stage * Current caching mechanism does not update cluster-nodes when the epoch (and so epoch staked nodes) changes: https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344 * Additionally, the cache update has a concurrency bug in which the thread which does compare_and_swap may be blocked when it tries to obtain the write-lock on cache, while other threads will keep running ahead with the outdated cache (since the atomic timestamp is already updated). In the new ClusterNodesCache, entries are keyed by epoch, and so if epoch changes cluster-nodes will be recalculated. The time-to-live eviction policy is also encapsulated and rigidly enforced.	2021-08-05 21:47:33 +00:00
behzad nouri	30bec3921e	uses cluster-nodes cache in retransmit stage The new cluster-nodes cache will: * ensure cluster-nodes are recalculated if the epoch (and so the epoch staked nodes) changes. * encapsulate time-to-live eviction policy.	2021-08-05 21:47:33 +00:00
behzad nouri	ecc1c7957f	implements cluster-nodes cache Cluster nodes are cached keyed by the respective epoch from which stakes are obtained, and so if epoch changes cluster-nodes will be recomputed. A time-to-live eviction policy is enforced to refresh entries in case gossip contact-infos are updated.	2021-08-05 21:47:33 +00:00
behzad nouri	44b11154ca	sends slots (instead of stakes) through broadcast flow Current broadcast code is computing stakes for each slot before sending them down the channel: https://github.com/solana-labs/solana/blob/049fb0417/core/src/broadcast_stage/standard_broadcast_run.rs#L208-L228 https://github.com/solana-labs/solana/blob/0cf52e206/core/src/broadcast_stage.rs#L342-L349 Since the stakes are a function of epoch the slot belongs to (and so does not necessarily change from one slot to another), forwarding the slot itself would allow better caching downstream. In addition we need to invalidate the cache if the epoch changes (which the current code does not do), and that requires to know which slot (and so epoch) current broadcasted shreds belong to: https://github.com/solana-labs/solana/blob/19bd30262/core/src/broadcast_stage/standard_broadcast_run.rs#L332-L344	2021-08-05 21:47:33 +00:00
Jeff Washington (jwash)	e368f10973	add _for_tests to new_no_wallclock_throttle (#19086 )	2021-08-05 14:50:25 -05:00
Jeff Washington (jwash)	a9014ceceb	Bank::default_for_tests() (#19084 )	2021-08-05 11:53:29 -05:00
behzad nouri	40914de811	updates cluster-slots with root-bank instead of root-slot + bank-forks ClusterSlots::update is taking both root-slot and bank-forks only to later lookup root-bank from bank-forks, which is redundant. Also potentially by the time bank-forks is locked to obtain root-bank, root-slot may have already changed and so be inconsistent with the root-slot passed in as the argument. https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L32-L39 https://github.com/solana-labs/solana/blob/6d95d679c/core/src/cluster_slots.rs#L122	2021-08-05 14:43:06 +00:00
behzad nouri	2fc112edcf	removes unused code from cluster-slots	2021-08-05 14:43:06 +00:00
Jeff Washington (jwash)	bf16b0517c	add _for_tests to setup_bank_and_vote_pubkeys (#19060 )	2021-08-05 08:43:35 -05:00
Jeff Washington (jwash)	14361906ca	for all tests, bank::new -> bank::new_for_tests (#19064 )	2021-08-05 08:42:38 -05:00
Jeff Washington (jwash)	3280ae3e9f	add validator option --accounts-db-skip-shrink (#19028 ) * add validator option --accounts-db-skip-shrink * typo	2021-08-04 17:28:33 -05:00
Jeff Washington (jwash)	1ed12a07ab	introduce Bank::new_for_tests (#19062 )	2021-08-04 15:06:57 -05:00
Brooks Prumo	ca14475085	Add incremental_snapshot_archive_interval_slots to SnapshotConfig (#19026 ) This commit also renames `snapshot_interval_slots` to `full_snapshot_archive_interval_slots`, updates the comments on the fields, and make appropriate updates where SnapshotConfig is used.	2021-08-04 14:40:20 -05:00
Trent Nelson	06a7a9e544	remove superfluous `collect()`s	2021-08-04 07:21:55 +00:00
carllin	03353d500f	Actively manage dead slots in AncestorHashesService (#18912 )	2021-08-02 14:33:28 -07:00
behzad nouri	049fb0417f	allows sendmmsg api taking owned values (as well as references) (#18999 ) Current signature of api in sendmmsg requires a slice of inner references: https://github.com/solana-labs/solana/blob/fe1ee4980/streamer/src/sendmmsg.rs#L130-L152 That forces the call-site to convert owned values to references even though doing so is redundant and adds an extra level of indirection: https://github.com/solana-labs/solana/blob/fe1ee4980/core/src/repair_service.rs#L291 This commit expands the api using AsRef and Borrow traits to allow calling the method with owned values (as well as references like before).	2021-07-30 20:58:49 +00:00
Tao Zhu	5d297ccf96	Cost model uses compute_unit to replace microsecond as cost unit (#18934 ) * wip - cost_update_services to log both us and cu for each instruction to determine possible ratio * replace microsecond with compute_unit as cost unit	2021-07-29 22:19:36 +00:00
Ryo Onodera	da480bdb5f	Fix unstable retransmit-num_nodes (#18970 )	2021-07-29 17:32:32 +00:00
behzad nouri	d06dc6c8a6	shares cluster-nodes between retransmit threads (#18947 ) cluster_nodes and last_peer_update are not shared between retransmit threads, as each thread have its own value: https://github.com/solana-labs/solana/blob/65ccfed86/core/src/retransmit_stage.rs#L476-L477 Additionally, with shared references, this code: https://github.com/solana-labs/solana/blob/0167daa11/core/src/retransmit_stage.rs#L315-L328 has a concurrency bug where the thread which does compare_and_swap, updates cluster_nodes much later after other threads have run with outdated cluster_nodes for a while. In particular, the write-lock there may block.	2021-07-29 16:20:15 +00:00
Trent Nelson	71f6d839f9	validator: remove disused cuda config argument	2021-07-29 03:08:52 +00:00
Trent Nelson	8ed0cd0fff	validator: check target CPU features earlier	2021-07-29 03:08:52 +00:00
Trent Nelson	c435f7b3e3	validator: add avx2 runtime check	2021-07-29 03:08:52 +00:00
Trent Nelson	e641f257ef	test-validator: move feature check earlier in startup	2021-07-29 03:08:52 +00:00
Trent Nelson	59641623d1	Improve check for Apple M1 silicon under Rosetta	2021-07-29 03:08:52 +00:00
Jeff Biseda	9255ae334d	drop outstanding_requests lock before sending repair requests (#18893 )	2021-07-28 19:30:43 -07:00
sakridge	84e78316b1	Write helper for multithread update (#18808 )	2021-07-29 03:16:36 +02:00
Jack May	f1b9f97aef	remove avx error on macos (#18923 )	2021-07-27 16:34:04 -07:00
carllin	c0704d4ec9	Plumb signal from replay to ancestor hashes service (#18880 )	2021-07-26 20:59:00 -07:00
carllin	1ee64afb12	Introduce AncestorHashesService (#18812 )	2021-07-23 16:54:47 -07:00
behzad nouri	d2d5f36a3c	adds validator flag to allow private ip addresses (#18850 )	2021-07-23 15:25:03 +00:00
Ryo Onodera	611af87fdb	Really start caching by fixing swapped CAS... (#18842 )	2021-07-23 10:17:19 +09:00
Brooks Prumo	d1debcd971	Add incremental snapshot utils (#18504 ) This commit adds high-level functions for creating and loading-from incremental snapshots, plus all low-level functions required to perform those tasks. This commit does not add taking incremental snapshots as part of a running validator, nor starting up a node with an incremental snapshot; just laying ground work. Additionally, `snapshot_utils` and `serde_snapshot` have been refactored to use a common code paths for the different snapshots. Also of note, some renaming has happened: 1. Snapshots are now either `full_` or `incremental_` throughout the codebase. If not specified, the code applies to both. 2. Bank snapshots now are called "bank snapshots" (before they were called "slot snapshots", "bank snapshots", or just "snapshots"). The one exception is within `Bank`, where they are still just "snapshots", because they are already "bank snapshots". 3. Snapshot archives now have `_archive` in the code. This should clear up an ambiguity between bank snapshots and snapshot archives.	2021-07-22 14:40:37 -05:00
behzad nouri	7d56fa8363	sends packets in batches from sigverify-stage (#18446 ) sigverify-stage is breaking batches to single-item vectors before sending them down the channel: https://github.com/solana-labs/solana/blob/d451363dc/core/src/sigverify_stage.rs#L88-L92 Also simplifying window-service code, reducing number of nested branches.	2021-07-22 14:49:21 +00:00
Michael Vines	61865c0ee0	`solana-validator set-identity` now loads the tower file for the new identity	2021-07-21 22:22:08 -07:00
carllin	588c0464b8	Add sampling logic and DuplicateSlotRepairStatus module (#18721 )	2021-07-21 11:15:08 -07:00
behzad nouri	bbd22f06f4	implements generic lookups into gossip crds table (#18765 ) This commit adds CrdsEntry trait which allows generic lookups into crds table. For example to get ContactInfo or LowestSlot associated with a Pubkey, the lookup code would be respectively: crds.get::<&ContactInfo>(pubkey) crds.get::<&LowestSlot>(pubkey)	2021-07-21 12:16:26 +00:00
carllin	ce467bea20	Add frozen hashes and marking DuplicateConfirmed in blockstore to state machine (#18648 )	2021-07-18 17:04:25 -07:00
behzad nouri	e316586516	excludes private ip addresses	2021-07-16 20:05:48 -06:00
Jeff Biseda	ae5ad5cf9b	sendmmsg cleanup #18589 Rationalize usage of sendmmsg(2). Skip packets which failed to send and track failures.	2021-07-16 14:36:49 -07:00
Jack May	ca71ca3d6d	Accumulate consumed units (#18714 )	2021-07-16 12:40:12 -07:00
Justin Starry	d166b9856a	Move transaction sanitization earlier in the pipeline (#18655 ) * Move transaction sanitization earlier in the pipeline * Renamed HashedTransaction to SanitizedTransaction * Implement deref for sanitized transaction * bring back process_transactions test method * Use sanitized transactions for cost model calculation	2021-07-15 22:51:27 -05:00
carllin	8a846b048e	Add AncestorHashesRepair type (#18681 )	2021-07-15 19:29:53 -07:00
Trent Nelson	3a85b77bb5	hijack secp256k1 enablement feature plumbing for libsecp256k1 upgrade	2021-07-15 18:43:55 +00:00

1 2 3 4 5 ...

2136 Commits