mount_get_by_nc returns struct mount without a hold -> use-after-free via cache_fullpath racing dounmount
| Field | Value |
|---|---|
| ID | DF-0044 |
| Status | new |
| Severity | Medium |
| CVSS 3.1 | CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:L/I:L/A:H |
| CWE | CWE-416 Use After Free; CWE-820 Missing Release of Resource |
| File | sys/kern/vfs_mount.c (mount_get_by_nc); caller sys/kern/vfs_cache.c (cache_fullpath) |
| Lines | 1235-1248 (mount_get_by_nc), 413-424 (vfs_getvfs contrast) |
| Area | kern |
| Confidence | likely |
| Discovered | 2026-06-29 |
| Reported | pending |
Summary
mount_get_by_nc() looks up a mount by namecache handle under
mountlist_token (shared) but returns the mp without calling
mount_hold(mp), unlike the otherwise-identical vfs_getvfs()
(:413-424) which mount_holds before releasing the token. Its sole caller,
cache_fullpath() (sys/kern/vfs_cache.c:5214), dereferences the returned
pointer (new_mp->mnt_ncmounton at :5224) after the token has been released
and never mount_drop()s it. A concurrent dounmount() that frees the mount
(mount_drop โ kfree) turns that dereference into a use-after-free. The
caller's _cache_hold on the ncp pins the namecache node but not the mount
struct, so nothing prevents the free.
Root cause
sys/kern/vfs_mount.c:1235-1248:
struct mount *
mount_get_by_nc(struct namecache *ncp)
{
struct mount *mp = NULL;
lwkt_gettoken_shared(&mountlist_token);
TAILQ_FOREACH(mp, &mountlist, mnt_list) {
if (ncp == mp->mnt_ncmountpt.ncp)
break;
}
lwkt_reltoken(&mountlist_token); /* :1245 mp not held */
return (mp); /* :1247 unheld pointer */
}
Contrast vfs_getvfs (:413-424), whose comment (:410-411) states "the
returned mp is held and the caller is expected to drop it via mount_drop()", and
which does if (mp) mount_hold(mp); (:420-421) before the token release. The
caller cache_fullpath then derefs new_mp->mnt_ncmounton
(sys/kern/vfs_cache.c:5224) and follows nch.ncp into _cache_hold
(:5228) โ across the window between the token release and those derefs. The
free side is dounmount: after mountlist_remove and cache_zero, it spins
until mnt_refs == 0, then mount_drop(mp) โ kfree.
Threat model & preconditions
- Attacker position: the deref side is reached by any local user via
path-resolution syscalls that run
cache_fullpath(guess=1)across a mountpoint (getcwd / readlink //proc/selfpath / jail path enumeration). The free side isunmount: withvfs.usermount=1an unprivileged user who owns a mount can mount+unmount their own filesystem and thus drive both sides; otherwise it races a privileged unmount (automounter / removable media / a privileged daemon). - Privileges gained or impact: kernel use-after-free โ reading
mp->mnt_ncmountonfrom freed/reallocatedM_MOUNTmemory yields garbagench.ncp/nch.mountpointers that are then dereferenced (_cache_hold), causing a kernel panic (reliable local DoS) and/or disclosure of recycled slab memory. - Required config or capabilities: none for the deref side; the free side is
unprivileged if
vfs.usermount=1, else races a privileged unmount. - Reachability: path-resolution across a mountpoint racing an unmount.
Proof of concept
PoC source: findings/poc/DF-0044/mount_uaf.c
A thread cycles mount+MNT_FORCE umount of a tmpfs the attacker owns (free
side); the main thread resolves paths across the mountpoint to drive
cache_fullpath (deref side).
Build & run (unprivileged with vfs.usermount=1; disposable VM)
cc -pthread -o mount_uaf findings/poc/DF-0044/mount_uaf.c ./mount_uaf /tmp/df0044
Expected output
Kernel panic in _cache_hold on a bogus ncp pointer (DoS), or recycled-slab
info disclosure.
Impact
Kernel UAF reachable by an unprivileged user (fully unprivileged with
vfs.usermount=1, else racing a privileged unmount). Medium (AC:H = race
window; the UAF is a DoS / potential info-leak).
Recommended fix
mount_hold the result in mount_get_by_nc (matching vfs_getvfs), and
mount_drop it in the cache_fullpath caller:
--- a/sys/kern/vfs_mount.c
+++ b/sys/kern/vfs_mount.c
@@ -1243,6 +1243,8 @@
if (ncp == mp->mnt_ncmountpt.ncp)
break;
}
+ if (mp)
+ mount_hold(mp);
lwkt_reltoken(&mountlist_token);
return (mp);
and in cache_fullpath (sys/kern/vfs_cache.c), mount_drop(new_mp) after the
deref is finished (the caller must release the hold it now acquires).
References
sys/kern/vfs_mount.c:1235-1248โmount_get_by_nc(no hold).sys/kern/vfs_mount.c:413-424โvfs_getvfs(correct hold pattern).sys/kern/vfs_cache.c:5214,5224,5228โ the caller's deref window.- CWE-416 Use After Free; CWE-820.
Timeline
- 2026-06-29 Discovered during automated file-by-file audit of
sys/kern/vfs_mount.c. - pending Reported to DragonFlyBSD security contact.
PoC verification
Evidence pack
findings/poc/DF-0044 ยท 12 files| File | Type | Description | Size | |
|---|---|---|---|---|
| mount_uaf.c | trigger-source | unprivileged PoC: cycler threads mount+unmount a tmpfs the binary sits inside, while reader threads read /proc/self/map (driving vn_fullpath guess=1 -> cache_fullpath -> mount_get_by_nc -> deref of new_mp->mnt_ncmounton) | 6.0 KB | view raw |
| mount_uaf_root.c | trigger-source | root-only variant using MNT_FORCE for a faster free-side cycle (defeated by allproc_scan SIGKILL of the test process itself) | 2.9 KB | view raw |
| build.sh | build-script | cc -pthread -O2 -o mount_uaf mount_uaf.c (and the root variant) | 297 B | view raw |
| run.sh | run-script | sets up the cycled mount, hosts the binary inside, runs the unprivileged variant | 967 B | view raw |
| build.log | build-log | final successful build of both variants on the guest | 554 B | view raw |
| run.log | run-log | decisive 60s unprivileged run: 288137 deref iters, 2 successful free cycles, no panic, guest stayed up | 2.1 KB | view raw |
| panic.txt | panic-signature | grep -iE 'fatal trap|panic:|Stopped at|db> ' dfbsd-qemu/boot.log -> EMPTY. Includes the kernel's own race-detection messages as evidence the race was exercised. | 2.7 KB | view raw |
| env.txt | environment | uname, cc version, vfs.usermount, kernel config (X86_64_GENERIC, non-INVARIANTS) | 598 B | view raw |
| fix.diff | suggested-fix | git-apply-able unified diff: add mount_hold(mp) in mount_get_by_nc (matching vfs_getvfs) + matching mount_drop in the cache_fullpath caller | 1.7 KB | view raw |
| VERDICT.md | verdict | full narrative: code-level proof, why the race is too tight on this kernel, classification | 8.0 KB | โ raw |
| README.md | readme | PoC overview, build/run instructions, expected output | 2.9 KB | โ raw |
| manifest.json | manifest | this catalog | 4.1 KB | view raw |
DF-0044 โ PoC
mount_uaf.c โ race between path-resolution across a mountpoint and unmount of
that mountpoint, intended to trigger the UAF in cache_fullpath via
mount_get_by_nc's unheld return.
mount_uaf_root.c โ root-only variant that uses MNT_FORCE for a faster
free-side cycle (used during verification; not the unprivileged vector).
The bug (code-level confirmed on master DEV)
mount_get_by_nc (sys/kern/vfs_mount.c:1235-1248) returns mp without
mount_hold (token released :1245, return :1247), unlike its sibling
vfs_getvfs (:413-424, which mount_holds at :420-421 and documents the
hold contract in the comment at :410-411). mountlist_scan
(:756, :784) also mount_holds before dropping mountlist_token. Thus
mount_get_by_nc is the lone outlier and breaks the file's own invariant.
The sole caller, cache_fullpath (sys/kern/vfs_cache.c:5214), dereferences
new_mp->mnt_ncmounton (:5224) after the token has been released and never
calls mount_drop(new_mp). A concurrent dounmount
(sys/kern/vfs_syscalls.c:1040 removes from mountlist, :1066-1069 zeroes
mnt_ncmounton, :1108-1117 waits for mnt_refs==0 then mount_dropโkfree)
frees mp โ use-after-free on the deref at :5224.
Reachability
- Deref side: the ONLY caller of
cache_fullpath(guess=1)issys/vfs/procfs/procfs_map.c:181(vn_fullpath(p, vp, ..., 1)when reading/proc/$pid/map). So the deref is driven by reading/proc/$pid/mapfor a vnode whose path traverses a mountpoint. Most othercache_fullpathcallers (__getcwd,kern.proc.pathname,kern.proc.cwd,kern_proc_filedesc,kern_exec, etc.) useguess=0and do not invokemount_get_by_nc. - Free side:
unmount. Withvfs.usermount=1an unprivileged user owns a mount they created and drives both sides (mount+umount their own FS); otherwise the free side races a privileged unmount (automounter / removable media / privileged daemon).
Build & run (unprivileged with vfs.usermount=1; disposable VM)
# as root: enable user mounts and prepare a directory maxx owns sysctl vfs.usermount=1 mkdir -p /tmp/df0044/m && chown maxx:maxx /tmp/df0044/m && chmod 0755 /tmp/df0044/m # as maxx: build the PoC, host it inside the mount, and run it from there cc -pthread -O2 -o mount_uaf mount_uaf.c mount -t tmpfs dummy /tmp/df0044/m # initial mount to host the binary chmod 0755 /tmp/df0044/m cp mount_uaf /tmp/df0044/m/ cd /tmp/df0044/m && ./mount_uaf /tmp/df0044/m 60
The PoC pins a tmpfs over its own CWD (so /proc/self/map's text-vp path
traverses the cycled mountpoint), then has cycler threads mount+unmount the
path (free side) while reader threads repeatedly open and read
/proc/self/map (deref side).
Expected output (bug present)
A kernel panic in _cache_hold on a bogus ncp pointer, or a slab/UAF
signature, when the race is won. Realistically the race is extremely tight
on a non-INVARIANTS kernel โ see VERDICT.md.
DF-0044 โ VERDICT
Verdict: NOT REPRODUCED (race too tight to win on the non-INVARIANTS master kernel) โ but the code-level bug is CONFIRMED.
The missing mount_hold in mount_get_by_nc is real and present on master
DEV (X86_64_GENERIC, v6.5.0.1712.g89e6a-DEVELOPMENT, built 2026-06-29).
A multi-minute, multi-strategy stress campaign could not produce a kernel
panic, but the code-level proof below shows the UAF exists, and the fix is
warranted as defense-in-depth (the function violates the file's own
documented invariant and is the lone outlier among its siblings).
Code-level proof (every line cited from the audited sys/ tree)
1. mount_get_by_nc returns mp without a hold
sys/kern/vfs_mount.c:1235-1248:
struct mount *
mount_get_by_nc(struct namecache *ncp)
{
struct mount *mp = NULL;
lwkt_gettoken_shared(&mountlist_token); /* :1240 */
TAILQ_FOREACH(mp, &mountlist, mnt_list) {
if (ncp == mp->mnt_ncmountpt.ncp)
break;
}
lwkt_reltoken(&mountlist_token); /* :1245 mp not held */
return (mp); /* :1247 unheld pointer */
}
mount_hold(mp) is never called. Contrast the otherwise-identical
vfs_getvfs (:413-424), whose comment at :410-411 states the contract
("the returned mp is held and the caller is expected to drop it via
mount_drop()") and which executes if (mp) mount_hold(mp); at :420-421
before releasing the token at :422:
struct mount *
vfs_getvfs(fsid_t *fsid)
{
struct mount *mp;
lwkt_gettoken_shared(&mountlist_token); /* :418 */
mp = mount_rb_tree_RB_LOOKUP_FSID(&mounttree, fsid);
if (mp)
mount_hold(mp); /* :420-421 */
lwkt_reltoken(&mountlist_token); /* :422 */
return (mp);
}
mountlist_scan (:756, :784) also mount_holds each mount before dropping
the token across its callback. mount_get_by_nc is the only function in the
file that drops the token while still holding an unprotected mp pointer.
The hold is what prevents dounmount's final kfree(mp, M_MOUNT) (see
sys/kern/vfs_mount.c:393-405) from freeing the struct while a caller is
still inside the dereference window.
2. The sole caller derefs the unheld mp after the token is dropped
sys/kern/vfs_cache.c:5213-5230:
if (guess && (ncp->nc_flag & NCF_ISMOUNTPT)) {
new_mp = mount_get_by_nc(ncp); /* :5214 -- unheld */
}
if (ncp == mp->mnt_ncmountpt.ncp) {
new_mp = mp;
}
if (new_mp) {
nch = new_mp->mnt_ncmounton; /* :5224 -- DEREF */
_cache_drop(ncp);
ncp = nch.ncp;
if (ncp)
_cache_hold(ncp);
mp = nch.mount;
continue;
}
The function returns to cache_fullpath with the mountlist_token already
released (:1245) and new_mp is dereferenced at :5224 to copy
mnt_ncmounton. No mount_hold was taken on new_mp and no mount_drop
exists in the caller. The hold on the ncp (_cache_hold(ncp) at :5228)
pins the namecache node but not the mount struct.
3. dounmount frees mp after the token has been released
mountlist_remove(mp); /* :1040 -- removes from mountlist */
...
if (mp->mnt_ncmounton.ncp != NULL) { /* :1066 */
nch = mp->mnt_ncmounton;
cache_zero(&mp->mnt_ncmounton); /* :1069 -- zeroes mnt_ncmounton */
cache_clrmountpt(&nch);
cache_drop(&nch);
}
...
while (mp->mnt_refs > 0) { /* :1110 -- drain refs (may tsleep) */
cache_unmounting(mp);
wakeup(mp);
tsleep(&mp->mnt_refs, 0, "umntrwait", hz / 10 + 1);
cache_clearmntcache(mp);
}
lwkt_reltoken(&mp->mnt_token);
mount_drop(mp); /* :1117 -- kfree(mp, M_MOUNT) */
The kfree at the end (via mount_drop at sys/kern/vfs_mount.c:399-405)
frees the very struct the unheld pointer in cache_fullpath may still be
about to dereference at :5224. This is the UAF.
Why it is hard to demonstrate as a panic
For the UAF to manifest as observable corruption or a panic, three things must all happen in a window of microseconds:
mount_get_by_ncreturnsmpwhile the unmount is pastmountlist_removeAND past themnt_refs==0drain AND pastmount_dropโkfree.- The slab memory has been reallocated and overwritten with attacker-
controlled or garbage data (otherwise the freed memory still reads as
zeroed
mnt_ncmountonfrom thecache_zeroat:1069, and theif (ncp)check at:5227makes the loop exit cleanly โ no crash). - The freed/reallocated
mnt_ncmounton.ncpis a non-NULL garbage pointer that_cache_holdthen dereferences.
On a non-INVARIANTS X86_64_GENERIC kernel the deref at :5224 is a single
(register-width) memory read and finishes in nanoseconds; the unmount's
kfree is gated by a tsleep-bounded mnt_refs==0 drain that typically
takes milliseconds. In practice the deref almost always reads the
zeroed-but-still-mapped mnt_ncmounton ({NULL, NULL}) and the loop exits
benignly. Additionally the kernel's nlookup path (sys/kern/vfs_nlookup.c:1056)
prints nlookup: warning umount race avoided and bails out on
MNTK_UNMOUNT-flagged mounts, which competes away some of the racing lookups.
Verification campaign (all on the booted master DEV guest)
vfs.usermount=1 was enabled as root; the unprivileged maxx user owns the
mounts and drives both sides. Three strategies were attempted:
- Unprivileged variant (
mount_uaf.c) โ binary hosted inside a tmpfs overmounted+unmounted by cycler threads while reader threads read/proc/self/map. 60-second run: 288,137 deref-sidevn_fullpathiterations, 2 successful free-side cycles (over-mount+unmount), 6 failed. No panic. (run.log) - Root variant (
mount_uaf_root.c) โ same race, root only, withMNT_FORCEfor faster free-side teardown. Defeated bysys/kern/vfs_syscalls.c:931-947:MNT_FORCEcallsallproc_scan(unmount_allproc_cb)whichSIGINT/SIGKILLs processes whosep_textnch/ CWD / open fds are on the mount, killing the test process itself. - Inspecting
boot.logfor the panic signature across all runs โ onlyunmount ... forcing unmountandnlookup: warning umount race avoidedmessages appear; nofatal trap,panic:, orStopped at. (boot.logexcerpt inpanic.txtโ empty.)
Verdict classification
The realistic outcome of the missing hold is:
- Most of the time: the deref at
:5224reads zeroedmnt_ncmountonand the loop exits benignly (no crash, no corruption). - Occasionally: the deref reads stale-but-still-valid
mnt_ncmountonfrom the freed-but-not-yet-reallocated slab (also benign). - Rarely: if heap grooming forces the slab to be reallocated with attacker
data between
kfreeand the read, the deref hands a garbagencpto_cache_hold, which crashes (DoS) or leaks recycled slab memory (info disclosure). This is the theoretical worst case but I could not trigger it on this kernel.
So: the bug is real and the fix is warranted (it removes a clear
invariant violation and eliminates a genuine, if narrow, UAF window), but
on this kernel it could not be turned into an observable panic within a
reasonable time budget, even with significant hammering. An INVARIANTS
build (with DEBUG=-g slab poisoning / M_USE_WAIT / KKASSERT on
mnt_refs) would likely turn this into a reliable panic.
Recommended fix
fix.diff adds mount_hold(mp) in mount_get_by_nc (matching vfs_getvfs)
and the matching mount_drop(new_mp) in the cache_fullpath caller. The fix
matches the proposal in findings/DF-0044-mount-get-by-nc-unheld-uaf-cache-fullpath.md,
with the additional caller-side mount_drop (which the finding markdown
mentions but does not show as a diff).
Confirmed kernel references
- sys/kern/vfs_mount.c:1235
- sys/kern/vfs_mount.c:1240
- sys/kern/vfs_mount.c:1245
- sys/kern/vfs_mount.c:1247
- sys/kern/vfs_mount.c:413
- sys/kern/vfs_mount.c:420
- sys/kern/vfs_mount.c:421
- sys/kern/vfs_mount.c:756
- sys/kern/vfs_mount.c:784
- sys/kern/vfs_mount.c:399
- sys/kern/vfs_mount.c:401
- sys/kern/vfs_mount.c:403
- sys/kern/vfs_cache.c:5213
- sys/kern/vfs_cache.c:5214
- sys/kern/vfs_cache.c:5224
- sys/kern/vfs_cache.c:5227
- sys/kern/vfs_syscalls.c:1040
- sys/kern/vfs_syscalls.c:1066
- sys/kern/vfs_syscalls.c:1069
- sys/kern/vfs_syscalls.c:1108
- sys/kern/vfs_syscalls.c:1110
- sys/kern/vfs_syscalls.c:1117
- sys/kern/vfs_syscalls.c:931
- sys/kern/vfs_syscalls.c:946
- sys/kern/vfs_syscalls.c:988
- sys/kern/vfs_nlookup.c:1056
- sys/vfs/procfs/procfs_map.c:181
Detail
Exploit chain
none observed. Theoretical chain (not demonstrated): (1) attacker sets vfs.usermount=1 or races a privileged unmount; (2) drives vn_fullpath(guess=1) via /proc/$pid/map reads for a vnode whose path traverses a mountpoint they are simultaneously unmounting; (3) wins the narrow race where the deref at vfs_cache.c:5224 reads mnt_ncmounton from freed-then-reallocated M_MOUNT slab memory; (4) the garbage ncp is fed to _cache_hold which dereferences it -> kernel panic (DoS) or, with heap grooming to land a controlled ncp in the recycled slab, controlled kernel-memory read via the nchandle the caller proceeds with. The race window (microsecond deref vs millisecond kfree gated by mnt_refs drain) makes step 3 extremely unlikely on a non-INVARIANTS kernel -- I could not win it in ~300k deref iterations across multiple 60-90s runs.
Evidence (decisive lines)
findings/poc/DF-0044/ holds the full evidence pack. Decisive bytes from run.log: 'DF-0044: deref=288137 free_ok=2 free_fail=6 / DF-0044: still alive -- race not won this run'. panic.txt is EMPTY of fatal trap/panic/Stopped-at signatures (the race was exercised -- boot.log shows 'nlookup: warning umount race avoided' and 'unmount ... forcing unmount' -- but never crashed). VERDICT.md carries the line-by-line code-level proof. fix.diff applies cleanly with git apply.
PoC changes
Rewrote findings/poc/DF-0044/mount_uaf.c: original PoC used readlink() which does NOT drive cache_fullpath(guess=1) (only sys/vfs/procfs/procfs_map.c:181 calls vn_fullpath with guess=1). New PoC hosts the binary inside a tmpfs the cycler overmounts+unmounts, while reader threads repeatedly open and read /proc/self/map to drive vn_fullpath(p, p_textvp, ..., 1) -> cache_fullpath -> mount_get_by_nc -> deref. Had to pass a proper tmpfs_mount_info struct with ta_root_uid=getuid() so the unprivileged user actually owns the mount and can unmount it (otherwise tmpfs_statfs overwrites mnt_stat.f_owner with the root_uid default of 0, and sys_unmount then denies with EPERM). Added mount_uaf_root.c as a root variant with MNT_FORCE for a faster free-side cycle (defeated by allproc_scan SIGKILL of the test process). Added build.sh, run.sh, VERDICT.md, README.md, env.txt, build.log, run.log, panic.txt, fix.diff, manifest.json.
Verified recommended fix
fix.diff: (1) in sys/kern/vfs_mount.c mount_get_by_nc, add 'if (mp) mount_hold(mp);' before lwkt_reltoken to match the hold contract of vfs_getvfs and mountlist_scan; (2) in sys/kern/vfs_cache.c cache_fullpath, track the held mp separately and mount_drop() it both when overridden by the ncp==mp->mnt_ncmountpt.ncp branch and after the deref of new_mp->mnt_ncmounton completes. Supersedes the finding proposal (which showed only the mount_hold half) by additionally implementing the caller-side mount_drop that the finding markdown mentions but does not show as a diff.
Verdict
Code-level bug CONFIRMED but the race could not be turned into an observable panic on the master DEV X86_64_GENERIC kernel. mount_get_by_nc (sys/kern/vfs_mount.c:1235-1248) returns mp WITHOUT calling mount_hold(mp) -- token released at :1245, return at :1247 -- unlike every sibling in the same file: vfs_getvfs (:413-424) mount_hold()s at :420-421 and documents the contract at :410-411, and mountlist_scan mount_hold()s at :756/:784 before dropping the token. The sole caller cache_fullpath (sys/kern/vfs_cache.c:5214) then dereferences new_mp->mnt_ncmounton at :5224 after the token is dropped and never mount_drop()s. dounmount (sys/kern/vfs_syscalls.c:1040,1066-1069,1108-1117) removes mp from mountlist, zeroes mnt_ncmounton, drains mnt_refs, then mount_drop->kfree. A 60-second unprivileged stress run with vfs.usermount=1 produced 288,137 vn_fullpath(guess=1) deref iterations against the cycled mount and 2 successful free-side cycles -- no panic, no slab complaints. The deref at :5224 is a single memory read finishing in nanoseconds, while dounmount's kfree is gated by an mnt_refs drain that takes milliseconds; in every observed cycle the deref read either still-valid or zeroed-but-not-yet-freed memory. The kernel's own race-detection logic (nlookup 'warning umount race avoided' at vfs_nlookup.c:1056, and dounmount's freeok=0 skip at vfs_syscalls.c:988) competes away the remaining racing lookups. An INVARIANTS build (slab poisoning / KKASSERT on mnt_refs) would likely turn this into a reliable panic. The bug is real, the missing mount_hold is the file's lone invariant violation, and the fix is warranted as defense-in-depth.