โฌข DragonFlyBSD Kernel Audit
โ† dashboard
DF-0044

mount_get_by_nc returns struct mount without a hold -> use-after-free via cache_fullpath racing dounmount

Field Value
ID DF-0044
Status new
Severity Medium
CVSS 3.1 CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:L/I:L/A:H
CWE CWE-416 Use After Free; CWE-820 Missing Release of Resource
File sys/kern/vfs_mount.c (mount_get_by_nc); caller sys/kern/vfs_cache.c (cache_fullpath)
Lines 1235-1248 (mount_get_by_nc), 413-424 (vfs_getvfs contrast)
Area kern
Confidence likely
Discovered 2026-06-29
Reported pending

Summary

mount_get_by_nc() looks up a mount by namecache handle under mountlist_token (shared) but returns the mp without calling mount_hold(mp), unlike the otherwise-identical vfs_getvfs() (:413-424) which mount_holds before releasing the token. Its sole caller, cache_fullpath() (sys/kern/vfs_cache.c:5214), dereferences the returned pointer (new_mp->mnt_ncmounton at :5224) after the token has been released and never mount_drop()s it. A concurrent dounmount() that frees the mount (mount_drop โ†’ kfree) turns that dereference into a use-after-free. The caller's _cache_hold on the ncp pins the namecache node but not the mount struct, so nothing prevents the free.

Root cause

sys/kern/vfs_mount.c:1235-1248:

struct mount *
mount_get_by_nc(struct namecache *ncp)
{
    struct mount *mp = NULL;
    lwkt_gettoken_shared(&mountlist_token);
    TAILQ_FOREACH(mp, &mountlist, mnt_list) {
        if (ncp == mp->mnt_ncmountpt.ncp)
            break;
    }
    lwkt_reltoken(&mountlist_token);   /* :1245  mp not held */
    return (mp);                       /* :1247  unheld pointer */
}

Contrast vfs_getvfs (:413-424), whose comment (:410-411) states "the returned mp is held and the caller is expected to drop it via mount_drop()", and which does if (mp) mount_hold(mp); (:420-421) before the token release. The caller cache_fullpath then derefs new_mp->mnt_ncmounton (sys/kern/vfs_cache.c:5224) and follows nch.ncp into _cache_hold (:5228) โ€” across the window between the token release and those derefs. The free side is dounmount: after mountlist_remove and cache_zero, it spins until mnt_refs == 0, then mount_drop(mp) โ†’ kfree.

Threat model & preconditions

  • Attacker position: the deref side is reached by any local user via path-resolution syscalls that run cache_fullpath(guess=1) across a mountpoint (getcwd / readlink / /proc/self path / jail path enumeration). The free side is unmount: with vfs.usermount=1 an unprivileged user who owns a mount can mount+unmount their own filesystem and thus drive both sides; otherwise it races a privileged unmount (automounter / removable media / a privileged daemon).
  • Privileges gained or impact: kernel use-after-free โ€” reading mp->mnt_ncmounton from freed/reallocated M_MOUNT memory yields garbage nch.ncp/nch.mount pointers that are then dereferenced (_cache_hold), causing a kernel panic (reliable local DoS) and/or disclosure of recycled slab memory.
  • Required config or capabilities: none for the deref side; the free side is unprivileged if vfs.usermount=1, else races a privileged unmount.
  • Reachability: path-resolution across a mountpoint racing an unmount.

Proof of concept

PoC source: findings/poc/DF-0044/mount_uaf.c

A thread cycles mount+MNT_FORCE umount of a tmpfs the attacker owns (free side); the main thread resolves paths across the mountpoint to drive cache_fullpath (deref side).

Build & run (unprivileged with vfs.usermount=1; disposable VM)

cc -pthread -o mount_uaf findings/poc/DF-0044/mount_uaf.c
./mount_uaf /tmp/df0044

Expected output

Kernel panic in _cache_hold on a bogus ncp pointer (DoS), or recycled-slab info disclosure.

Impact

Kernel UAF reachable by an unprivileged user (fully unprivileged with vfs.usermount=1, else racing a privileged unmount). Medium (AC:H = race window; the UAF is a DoS / potential info-leak).

mount_hold the result in mount_get_by_nc (matching vfs_getvfs), and mount_drop it in the cache_fullpath caller:

--- a/sys/kern/vfs_mount.c
+++ b/sys/kern/vfs_mount.c
@@ -1243,6 +1243,8 @@
        if (ncp == mp->mnt_ncmountpt.ncp)
            break;
    }
+   if (mp)
+       mount_hold(mp);
    lwkt_reltoken(&mountlist_token);
    return (mp);

and in cache_fullpath (sys/kern/vfs_cache.c), mount_drop(new_mp) after the deref is finished (the caller must release the hold it now acquires).

References

Timeline

  • 2026-06-29 Discovered during automated file-by-file audit of sys/kern/vfs_mount.c.
  • pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0044 ยท 12 files
FileTypeDescriptionSize
mount_uaf.c trigger-source unprivileged PoC: cycler threads mount+unmount a tmpfs the binary sits inside, while reader threads read /proc/self/map (driving vn_fullpath guess=1 -> cache_fullpath -> mount_get_by_nc -> deref of new_mp->mnt_ncmounton) 6.0 KB view raw
mount_uaf_root.c trigger-source root-only variant using MNT_FORCE for a faster free-side cycle (defeated by allproc_scan SIGKILL of the test process itself) 2.9 KB view raw
build.sh build-script cc -pthread -O2 -o mount_uaf mount_uaf.c (and the root variant) 297 B view raw
run.sh run-script sets up the cycled mount, hosts the binary inside, runs the unprivileged variant 967 B view raw
build.log build-log final successful build of both variants on the guest 554 B view raw
run.log run-log decisive 60s unprivileged run: 288137 deref iters, 2 successful free cycles, no panic, guest stayed up 2.1 KB view raw
panic.txt panic-signature grep -iE 'fatal trap|panic:|Stopped at|db> ' dfbsd-qemu/boot.log -> EMPTY. Includes the kernel's own race-detection messages as evidence the race was exercised. 2.7 KB view raw
env.txt environment uname, cc version, vfs.usermount, kernel config (X86_64_GENERIC, non-INVARIANTS) 598 B view raw
fix.diff suggested-fix git-apply-able unified diff: add mount_hold(mp) in mount_get_by_nc (matching vfs_getvfs) + matching mount_drop in the cache_fullpath caller 1.7 KB view raw
VERDICT.md verdict full narrative: code-level proof, why the race is too tight on this kernel, classification 8.0 KB โ†“ raw
README.md readme PoC overview, build/run instructions, expected output 2.9 KB โ†“ raw
manifest.json manifest this catalog 4.1 KB view raw
README.md readme PoC overview, build/run instructions, expected output
โ†“ download raw

DF-0044 โ€” PoC

mount_uaf.c โ€” race between path-resolution across a mountpoint and unmount of that mountpoint, intended to trigger the UAF in cache_fullpath via mount_get_by_nc's unheld return.

mount_uaf_root.c โ€” root-only variant that uses MNT_FORCE for a faster free-side cycle (used during verification; not the unprivileged vector).

The bug (code-level confirmed on master DEV)

mount_get_by_nc (sys/kern/vfs_mount.c:1235-1248) returns mp without mount_hold (token released :1245, return :1247), unlike its sibling vfs_getvfs (:413-424, which mount_holds at :420-421 and documents the hold contract in the comment at :410-411). mountlist_scan (:756, :784) also mount_holds before dropping mountlist_token. Thus mount_get_by_nc is the lone outlier and breaks the file's own invariant.

The sole caller, cache_fullpath (sys/kern/vfs_cache.c:5214), dereferences new_mp->mnt_ncmounton (:5224) after the token has been released and never calls mount_drop(new_mp). A concurrent dounmount (sys/kern/vfs_syscalls.c:1040 removes from mountlist, :1066-1069 zeroes mnt_ncmounton, :1108-1117 waits for mnt_refs==0 then mount_dropโ†’kfree) frees mp โ†’ use-after-free on the deref at :5224.

Reachability

  • Deref side: the ONLY caller of cache_fullpath(guess=1) is sys/vfs/procfs/procfs_map.c:181 (vn_fullpath(p, vp, ..., 1) when reading /proc/$pid/map). So the deref is driven by reading /proc/$pid/map for a vnode whose path traverses a mountpoint. Most other cache_fullpath callers (__getcwd, kern.proc.pathname, kern.proc.cwd, kern_proc_filedesc, kern_exec, etc.) use guess=0 and do not invoke mount_get_by_nc.
  • Free side: unmount. With vfs.usermount=1 an unprivileged user owns a mount they created and drives both sides (mount+umount their own FS); otherwise the free side races a privileged unmount (automounter / removable media / privileged daemon).

Build & run (unprivileged with vfs.usermount=1; disposable VM)

# as root: enable user mounts and prepare a directory maxx owns
sysctl vfs.usermount=1
mkdir -p /tmp/df0044/m && chown maxx:maxx /tmp/df0044/m && chmod 0755 /tmp/df0044/m

# as maxx: build the PoC, host it inside the mount, and run it from there
cc -pthread -O2 -o mount_uaf mount_uaf.c
mount -t tmpfs dummy /tmp/df0044/m   # initial mount to host the binary
chmod 0755 /tmp/df0044/m
cp mount_uaf /tmp/df0044/m/
cd /tmp/df0044/m && ./mount_uaf /tmp/df0044/m 60

The PoC pins a tmpfs over its own CWD (so /proc/self/map's text-vp path traverses the cycled mountpoint), then has cycler threads mount+unmount the path (free side) while reader threads repeatedly open and read /proc/self/map (deref side).

Expected output (bug present)

A kernel panic in _cache_hold on a bogus ncp pointer, or a slab/UAF signature, when the race is won. Realistically the race is extremely tight on a non-INVARIANTS kernel โ€” see VERDICT.md.

VERDICT.md verdict full narrative: code-level proof, why the race is too tight on this kernel, classification
โ†“ download raw

DF-0044 โ€” VERDICT

Verdict: NOT REPRODUCED (race too tight to win on the non-INVARIANTS master kernel) โ€” but the code-level bug is CONFIRMED.

The missing mount_hold in mount_get_by_nc is real and present on master DEV (X86_64_GENERIC, v6.5.0.1712.g89e6a-DEVELOPMENT, built 2026-06-29). A multi-minute, multi-strategy stress campaign could not produce a kernel panic, but the code-level proof below shows the UAF exists, and the fix is warranted as defense-in-depth (the function violates the file's own documented invariant and is the lone outlier among its siblings).

Code-level proof (every line cited from the audited sys/ tree)

1. mount_get_by_nc returns mp without a hold

sys/kern/vfs_mount.c:1235-1248:

struct mount *
mount_get_by_nc(struct namecache *ncp)
{
        struct mount *mp = NULL;

        lwkt_gettoken_shared(&mountlist_token);            /* :1240 */
        TAILQ_FOREACH(mp, &mountlist, mnt_list) {
                if (ncp == mp->mnt_ncmountpt.ncp)
                        break;
        }
        lwkt_reltoken(&mountlist_token);                    /* :1245  mp not held */
        return (mp);                                       /* :1247  unheld pointer */
}

mount_hold(mp) is never called. Contrast the otherwise-identical vfs_getvfs (:413-424), whose comment at :410-411 states the contract ("the returned mp is held and the caller is expected to drop it via mount_drop()") and which executes if (mp) mount_hold(mp); at :420-421 before releasing the token at :422:

struct mount *
vfs_getvfs(fsid_t *fsid)
{
        struct mount *mp;

        lwkt_gettoken_shared(&mountlist_token);             /* :418 */
        mp = mount_rb_tree_RB_LOOKUP_FSID(&mounttree, fsid);
        if (mp)
                mount_hold(mp);                             /* :420-421 */
        lwkt_reltoken(&mountlist_token);                    /* :422 */
        return (mp);
}

mountlist_scan (:756, :784) also mount_holds each mount before dropping the token across its callback. mount_get_by_nc is the only function in the file that drops the token while still holding an unprotected mp pointer. The hold is what prevents dounmount's final kfree(mp, M_MOUNT) (see sys/kern/vfs_mount.c:393-405) from freeing the struct while a caller is still inside the dereference window.

2. The sole caller derefs the unheld mp after the token is dropped

sys/kern/vfs_cache.c:5213-5230:

if (guess && (ncp->nc_flag & NCF_ISMOUNTPT)) {
        new_mp = mount_get_by_nc(ncp);                      /* :5214 -- unheld */
}
if (ncp == mp->mnt_ncmountpt.ncp) {
        new_mp = mp;
}
if (new_mp) {
        nch = new_mp->mnt_ncmounton;                        /* :5224 -- DEREF */
        _cache_drop(ncp);
        ncp = nch.ncp;
        if (ncp)
                _cache_hold(ncp);
        mp = nch.mount;
        continue;
}

The function returns to cache_fullpath with the mountlist_token already released (:1245) and new_mp is dereferenced at :5224 to copy mnt_ncmounton. No mount_hold was taken on new_mp and no mount_drop exists in the caller. The hold on the ncp (_cache_hold(ncp) at :5228) pins the namecache node but not the mount struct.

3. dounmount frees mp after the token has been released

sys/kern/vfs_syscalls.c:

mountlist_remove(mp);                       /* :1040 -- removes from mountlist */
...
if (mp->mnt_ncmounton.ncp != NULL) {        /* :1066 */
        nch = mp->mnt_ncmounton;
        cache_zero(&mp->mnt_ncmounton);     /* :1069 -- zeroes mnt_ncmounton */
        cache_clrmountpt(&nch);
        cache_drop(&nch);
}
...
while (mp->mnt_refs > 0) {                  /* :1110 -- drain refs (may tsleep) */
        cache_unmounting(mp);
        wakeup(mp);
        tsleep(&mp->mnt_refs, 0, "umntrwait", hz / 10 + 1);
        cache_clearmntcache(mp);
}
lwkt_reltoken(&mp->mnt_token);
mount_drop(mp);                             /* :1117 -- kfree(mp, M_MOUNT) */

The kfree at the end (via mount_drop at sys/kern/vfs_mount.c:399-405) frees the very struct the unheld pointer in cache_fullpath may still be about to dereference at :5224. This is the UAF.

Why it is hard to demonstrate as a panic

For the UAF to manifest as observable corruption or a panic, three things must all happen in a window of microseconds:

  1. mount_get_by_nc returns mp while the unmount is past mountlist_remove AND past the mnt_refs==0 drain AND past mount_dropโ†’kfree.
  2. The slab memory has been reallocated and overwritten with attacker- controlled or garbage data (otherwise the freed memory still reads as zeroed mnt_ncmounton from the cache_zero at :1069, and the if (ncp) check at :5227 makes the loop exit cleanly โ€” no crash).
  3. The freed/reallocated mnt_ncmounton.ncp is a non-NULL garbage pointer that _cache_hold then dereferences.

On a non-INVARIANTS X86_64_GENERIC kernel the deref at :5224 is a single (register-width) memory read and finishes in nanoseconds; the unmount's kfree is gated by a tsleep-bounded mnt_refs==0 drain that typically takes milliseconds. In practice the deref almost always reads the zeroed-but-still-mapped mnt_ncmounton ({NULL, NULL}) and the loop exits benignly. Additionally the kernel's nlookup path (sys/kern/vfs_nlookup.c:1056) prints nlookup: warning umount race avoided and bails out on MNTK_UNMOUNT-flagged mounts, which competes away some of the racing lookups.

Verification campaign (all on the booted master DEV guest)

vfs.usermount=1 was enabled as root; the unprivileged maxx user owns the mounts and drives both sides. Three strategies were attempted:

  1. Unprivileged variant (mount_uaf.c) โ€” binary hosted inside a tmpfs overmounted+unmounted by cycler threads while reader threads read /proc/self/map. 60-second run: 288,137 deref-side vn_fullpath iterations, 2 successful free-side cycles (over-mount+unmount), 6 failed. No panic. (run.log)
  2. Root variant (mount_uaf_root.c) โ€” same race, root only, with MNT_FORCE for faster free-side teardown. Defeated by sys/kern/vfs_syscalls.c:931-947: MNT_FORCE calls allproc_scan(unmount_allproc_cb) which SIGINT/SIGKILLs processes whose p_textnch / CWD / open fds are on the mount, killing the test process itself.
  3. Inspecting boot.log for the panic signature across all runs โ€” only unmount ... forcing unmount and nlookup: warning umount race avoided messages appear; no fatal trap, panic:, or Stopped at. (boot.log excerpt in panic.txt โ€” empty.)

Verdict classification

The realistic outcome of the missing hold is:

  • Most of the time: the deref at :5224 reads zeroed mnt_ncmounton and the loop exits benignly (no crash, no corruption).
  • Occasionally: the deref reads stale-but-still-valid mnt_ncmounton from the freed-but-not-yet-reallocated slab (also benign).
  • Rarely: if heap grooming forces the slab to be reallocated with attacker data between kfree and the read, the deref hands a garbage ncp to _cache_hold, which crashes (DoS) or leaks recycled slab memory (info disclosure). This is the theoretical worst case but I could not trigger it on this kernel.

So: the bug is real and the fix is warranted (it removes a clear invariant violation and eliminates a genuine, if narrow, UAF window), but on this kernel it could not be turned into an observable panic within a reasonable time budget, even with significant hammering. An INVARIANTS build (with DEBUG=-g slab poisoning / M_USE_WAIT / KKASSERT on mnt_refs) would likely turn this into a reliable panic.

fix.diff adds mount_hold(mp) in mount_get_by_nc (matching vfs_getvfs) and the matching mount_drop(new_mp) in the cache_fullpath caller. The fix matches the proposal in findings/DF-0044-mount-get-by-nc-unheld-uaf-cache-fullpath.md, with the additional caller-side mount_drop (which the finding markdown mentions but does not show as a diff).

Confirmed kernel references

Detail

Exploit chain

none observed. Theoretical chain (not demonstrated): (1) attacker sets vfs.usermount=1 or races a privileged unmount; (2) drives vn_fullpath(guess=1) via /proc/$pid/map reads for a vnode whose path traverses a mountpoint they are simultaneously unmounting; (3) wins the narrow race where the deref at vfs_cache.c:5224 reads mnt_ncmounton from freed-then-reallocated M_MOUNT slab memory; (4) the garbage ncp is fed to _cache_hold which dereferences it -> kernel panic (DoS) or, with heap grooming to land a controlled ncp in the recycled slab, controlled kernel-memory read via the nchandle the caller proceeds with. The race window (microsecond deref vs millisecond kfree gated by mnt_refs drain) makes step 3 extremely unlikely on a non-INVARIANTS kernel -- I could not win it in ~300k deref iterations across multiple 60-90s runs.

Evidence (decisive lines)

findings/poc/DF-0044/ holds the full evidence pack. Decisive bytes from run.log: 'DF-0044: deref=288137  free_ok=2  free_fail=6 / DF-0044: still alive -- race not won this run'. panic.txt is EMPTY of fatal trap/panic/Stopped-at signatures (the race was exercised -- boot.log shows 'nlookup: warning umount race avoided' and 'unmount ... forcing unmount' -- but never crashed). VERDICT.md carries the line-by-line code-level proof. fix.diff applies cleanly with git apply.

PoC changes

Rewrote findings/poc/DF-0044/mount_uaf.c: original PoC used readlink() which does NOT drive cache_fullpath(guess=1) (only sys/vfs/procfs/procfs_map.c:181 calls vn_fullpath with guess=1). New PoC hosts the binary inside a tmpfs the cycler overmounts+unmounts, while reader threads repeatedly open and read /proc/self/map to drive vn_fullpath(p, p_textvp, ..., 1) -> cache_fullpath -> mount_get_by_nc -> deref. Had to pass a proper tmpfs_mount_info struct with ta_root_uid=getuid() so the unprivileged user actually owns the mount and can unmount it (otherwise tmpfs_statfs overwrites mnt_stat.f_owner with the root_uid default of 0, and sys_unmount then denies with EPERM). Added mount_uaf_root.c as a root variant with MNT_FORCE for a faster free-side cycle (defeated by allproc_scan SIGKILL of the test process). Added build.sh, run.sh, VERDICT.md, README.md, env.txt, build.log, run.log, panic.txt, fix.diff, manifest.json.

Verified recommended fix

fix.diff: (1) in sys/kern/vfs_mount.c mount_get_by_nc, add 'if (mp) mount_hold(mp);' before lwkt_reltoken to match the hold contract of vfs_getvfs and mountlist_scan; (2) in sys/kern/vfs_cache.c cache_fullpath, track the held mp separately and mount_drop() it both when overridden by the ncp==mp->mnt_ncmountpt.ncp branch and after the deref of new_mp->mnt_ncmounton completes. Supersedes the finding proposal (which showed only the mount_hold half) by additionally implementing the caller-side mount_drop that the finding markdown mentions but does not show as a diff.

Verdict

Code-level bug CONFIRMED but the race could not be turned into an observable panic on the master DEV X86_64_GENERIC kernel. mount_get_by_nc (sys/kern/vfs_mount.c:1235-1248) returns mp WITHOUT calling mount_hold(mp) -- token released at :1245, return at :1247 -- unlike every sibling in the same file: vfs_getvfs (:413-424) mount_hold()s at :420-421 and documents the contract at :410-411, and mountlist_scan mount_hold()s at :756/:784 before dropping the token. The sole caller cache_fullpath (sys/kern/vfs_cache.c:5214) then dereferences new_mp->mnt_ncmounton at :5224 after the token is dropped and never mount_drop()s. dounmount (sys/kern/vfs_syscalls.c:1040,1066-1069,1108-1117) removes mp from mountlist, zeroes mnt_ncmounton, drains mnt_refs, then mount_drop->kfree. A 60-second unprivileged stress run with vfs.usermount=1 produced 288,137 vn_fullpath(guess=1) deref iterations against the cycled mount and 2 successful free-side cycles -- no panic, no slab complaints. The deref at :5224 is a single memory read finishing in nanoseconds, while dounmount's kfree is gated by an mnt_refs drain that takes milliseconds; in every observed cycle the deref read either still-valid or zeroed-but-not-yet-freed memory. The kernel's own race-detection logic (nlookup 'warning umount race avoided' at vfs_nlookup.c:1056, and dounmount's freeok=0 skip at vfs_syscalls.c:988) competes away the remaining racing lookups. An INVARIANTS build (slab poisoning / KKASSERT on mnt_refs) would likely turn this into a reliable panic. The bug is real, the missing mount_hold is the file's lone invariant violation, and the fix is warranted as defense-in-depth.