Remove recursive read lock that could deadlock Blockstore (#30203)
This deadlock could only occur on nodes that call Blockstore::get_rooted_block(). Regular validators don't call this function, RPC nodes and nodes that have BigTableUploadService enabled do. Blockstore::get_rooted_block() grabs a read lock on lowest_cleanup_slot right at the start to check if the block has been cleaned up, and to ensure it doesn't get cleaned up during execution. As part of the callstack of get_rooted_block(), Blockstore::get_completed_ranges() will get called, which also grabs a read lock on lowest_cleanup_slot. If LedgerCleanupService attempts to grab a write lock between the read lock calls, we could hit a deadlock if priority is given to the write lock request in this scenario. This change removes the call to get the read lock in get_completed_ranges(). The lock is only held for the scope of this function, which is a single rocksdb read and thus not needed. This does mean that a different error will be returned if the requested slot was below lowest_cleanup_slot. Previously, a BlockstoreError::SlotCleanedUp would have been thrown; the rocksdb error will be bubbled up now. Note that callers of get_rooted_block() will still get the SlotCleanedUp error when appropriate because get_rooted_block() grabs the lock. If the slot is unavailable, it will return immediately. If the slot is available, get_rooted_block() holding the lock means the slot will remain available.
This commit is contained in:
parent
2dbe36e82a
commit
328b674edc
|
@ -2900,8 +2900,6 @@ impl Blockstore {
|
|||
slot: Slot,
|
||||
start_index: u64,
|
||||
) -> Result<(CompletedRanges, Option<SlotMeta>)> {
|
||||
let _lock = self.check_lowest_cleanup_slot(slot)?;
|
||||
|
||||
let slot_meta_cf = self.db.column::<cf::SlotMeta>();
|
||||
let slot_meta = slot_meta_cf.get(slot)?;
|
||||
if slot_meta.is_none() {
|
||||
|
|
Loading…
Reference in New Issue