Remove recursive read lock that could deadlock Blockstore (#30203)

This deadlock could only occur on nodes that call
Blockstore::get_rooted_block(). Regular validators don't call this
function, RPC nodes and nodes that have BigTableUploadService enabled
do.

Blockstore::get_rooted_block() grabs a read lock on lowest_cleanup_slot
right at the start to check if the block has been cleaned up, and to
ensure it doesn't get cleaned up during execution. As part of the
callstack of get_rooted_block(), Blockstore::get_completed_ranges() will
get called, which also grabs a read lock on lowest_cleanup_slot.

If LedgerCleanupService attempts to grab a write lock between the read
lock calls, we could hit a deadlock if priority is given to the write
lock request in this scenario.

This change removes the call to get the read lock in
get_completed_ranges(). The lock is only held for the scope of this
function, which is a single rocksdb read and thus not needed. This does
mean that a different error will be returned if the requested slot was
below lowest_cleanup_slot. Previously, a BlockstoreError::SlotCleanedUp
would have been thrown; the rocksdb error will be bubbled up now. Note
that callers of get_rooted_block() will still get the SlotCleanedUp
error when appropriate because get_rooted_block() grabs the lock. If the
slot is unavailable, it will return immediately. If the slot is
available, get_rooted_block() holding the lock means the slot will
remain available.
This commit is contained in:
steviez 2023-02-13 17:33:24 -06:00 committed by GitHub
parent 2dbe36e82a
commit 328b674edc
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 0 additions and 2 deletions

View File

@ -2900,8 +2900,6 @@ impl Blockstore {
slot: Slot,
start_index: u64,
) -> Result<(CompletedRanges, Option<SlotMeta>)> {
let _lock = self.check_lowest_cleanup_slot(slot)?;
let slot_meta_cf = self.db.column::<cf::SlotMeta>();
let slot_meta = slot_meta_cf.get(slot)?;
if slot_meta.is_none() {