โฌข DragonFlyBSD Kernel Audit
โ† dashboard
DF-0047

mtx_wait_link lock-leak race: chain can grant lock during mtx_delete_link window, caller returns error despite holding the lock (permanent deadlock)

Field Value
ID DF-0047
Status new
Severity Medium
CVSS 3.1 CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H
CWE CWE-362 Race Condition; CWE-667 Improper Locking
File sys/kern/kern_mutex.c
Lines 1002-1028 (mtx_wait_link), 925-941 (mtx_delete_link default)
Area kern
Confidence likely
Discovered 2026-06-29
Reported pending

Summary

When mtx_wait_link's tsleep() returns a non-zero error (EINTR from a PCATCH signal, or EWOULDBLOCK from a timeout) and link->state is still MTX_LINK_LINKED_* at the unlocked switch read (:1002), the code calls mtx_delete_link (:1012). Between that read and mtx_delete_link's acquisition of MTX_LINKSPIN, a concurrent mtx_chain_link_ex/sh (triggered by another CPU's _mtx_unlock) can grant the lock to this exact link: it removes the link, sets mtx->mtx_owner = link->owner, link->state = MTX_LINK_ACQUIRED, and wakes. mtx_delete_link then sees ACQUIRED and hits its default "no change" case (:935). But mtx_wait_link never re-checks link->state after mtx_delete_link returns โ€” it falls through to the default (:1014-1016, preserving the non-zero error) and overwrites state to IDLE (:1023), returning EINTR/EWOULDBLOCK even though the mutex is now exclusively held by the caller's thread. The caller, seeing a non-zero return, does not call mtx_unlock, so the lock is permanently leaked โ€” every subsequent acquisition deadlocks.

Root cause

sys/kern/kern_mutex.c:1002-1028:

switch(link->state) {                       /* :1002  UNLOCKED read */
case MTX_LINK_ACQUIRED:
case MTX_LINK_CALLEDBACK:
    error = 0;
    break;
case MTX_LINK_ABORTED:
    error = ENOLCK;
    break;
case MTX_LINK_LINKED_EX:
case MTX_LINK_LINKED_SH:
    mtx_delete_link(mtx, link);             /* :1012  may race chain grant */
    /* fall through */
default:
    if (error == 0)                         /* :1015  no re-check of state */
        error = EWOULDBLOCK;
    break;
}
link->state = MTX_LINK_IDLE;                /* :1023  clobbers ACQUIRED */
return error;                               /* :1028  returns non-zero despite owning lock */

mtx_delete_link correctly handles seeing ACQUIRED (default "no change" :935-937, since the chain already removed the link), but mtx_wait_link does not re-check link->state afterward. tsleep can return EINTR even when the wakeup was also called (a pending signal sets error=EINTR).

Threat model & preconditions

  • Attacker position: any local user on a code path that takes a mutex with PCATCH or a timeout. The primary in-tree caller is the NFS client socket lock (mtx_lock_ex_link with slpflag=PCATCH, slptimeo=2*hz at sys/vfs/nfs/nfs_socket.c:2184). A local user doing NFS I/O who receives a signal (Ctrl-C / SIGINT) at the moment the socket lock is being released by another thread can hit the race.
  • Privileges gained or impact: permanent kernel deadlock (local DoS) โ€” the leaked lock deadlocks all subsequent operations on that NFS mount, requiring a reboot to clear. Not remote on its own (needs a local signal target); the NFS path is reachable from any local user with NFS access.
  • Required config or capabilities: a contended mtx_lock_*_link caller with PCATCH/timeout (NFS); a local signal target.
  • Reachability: contended mutex acquisition + signal/timeout racing the unlock/chain-grant on another CPU.

Proof of concept (sketch)

  1. Local user mounts NFS and issues contended I/O (concurrent reads) contending on the NFS socket lock.
  2. One thread blocks in mtx_wait_link inside nfs_rcvlock/nfs_sndlock (PCATCH).
  3. Send SIGINT to the process. tsleep returns EINTR.
  4. The lock holder releases; _mtx_unlockโ†’mtx_chain_link_ex grants the lock to the signaled thread during its mtx_delete_link window.
  5. mtx_wait_link returns EINTR; the NFS code treats it as not-acquired and does not mtx_unlock.
  6. The socket lock is now permanently held (MTX_EXCLUSIVE|1); all future NFS send/receive on that mount deadlocks. Repeat 1-3 to raise hit probability.

Impact

Permanent kernel deadlock (local DoS) via a lock-leak race on PCATCH/timeout mutex acquisitions. Medium (AC:H = the race window; A:H = a full permanent deadlock).

Re-check link->state for ACQUIRED after mtx_delete_link returns in the LINKED case, and return success so the caller unlocks:

--- a/sys/kern/kern_mutex.c
+++ b/sys/kern/kern_mutex.c
@@ -1010,6 +1010,16 @@
    case MTX_LINK_LINKED_EX:
    case MTX_LINK_LINKED_SH:
        mtx_delete_link(mtx, link);
+       /*
+        * mtx_chain_link_{ex,sh}() may have granted us the lock
+        * (state -> ACQUIRED) while we were spinning on LINKSPIN
+        * inside mtx_delete_link().  If so we now own the lock and
+        * MUST return success so the caller unlocks it; otherwise
+        * the lock is silently leaked, deadlocking all future
+        * acquisitions.
+        */
+       if (link->state == MTX_LINK_ACQUIRED) {
+           error = 0;
+           break;
+       }
        /* fall through */
    default:
        if (error == 0)

References

Timeline

  • 2026-06-29 Discovered during automated file-by-file audit of sys/kern/kern_mutex.c.
  • pending Reported to DragonFlyBSD security contact.