β¬’ DragonFlyBSD Kernel Audit
← dashboard
DF-0033

Unsynchronized fdtol->fdl_refcount ++ / list splice in rfork fdshare path (UAF via refcount race)

Field Value
ID DF-0033
Status new
Severity Medium
CVSS 3.1 CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:H/I:H/A:H
CWE CWE-362 Race Condition; CWE-416 Use After Free
File sys/kern/kern_fork.c (inc); sys/kern/kern_descrip.c (dec/unlink)
Lines 568-569 (inc), kern_descrip.c:2675 (dec), 3359-3362 (splice)
Area kern
Confidence likely
Discovered 2026-06-29
Reported pending

Summary

In the fdshare branch of fork1() (neither RFFDG nor RFCFDG, e.g. rfork(RFPROC|RFTHREAD)), the filedesc-to-leader node is shared: kern_fork.c:568-569 does fdtol = p1->p_fdtol; fdtol->fdl_refcount++; under the forking proc's p_token. The matching decrement in fdfree() (kern_descrip.c:2675) runs under the shared filedesc's fd_spin. Two peers that share both p_fd and p_fdtol hold different locks while mutating the same refcount word β†’ a lost-update race. A lost increment lets fdfree() free fdtol (kern_descrip.c:2676) while other peers still hold p_fdtol pointing at it β†’ use-after-free on M_FILEDESC_TO_LEADER memory. Additionally, filedesc_to_leader_alloc() (kern_descrip.c:3359-3362) splices fdtol into the shared fdl_next/fdl_prev list under no lock (self-admitted "NOT MPSAFE" at :3343), enabling concurrent list corruption.

Root cause

sys/kern/kern_fork.c:568-569:

if ((flags & RFTHREAD) != 0) {
    fdtol = p1->p_fdtol;
    fdtol->fdl_refcount++;          /* under p1->p_token; NOT fd_spin */
} else {
    fdtol = filedesc_to_leader_alloc(p1->p_fdtol, p2);   /* unlocked splice */
}

The decrement/unlink side (fdfree) takes fdp->fd_spin. Because all fdtol sharers share the same p_fd (hence the same fd_spin), the correct serialization lock is fd_spin β€” which is not held on the increment/ splice side in fork1.

Threat model & preconditions

  • Attacker position: any unprivileged local user using rfork(RFPROC|RFTHREAD) to create peers sharing the fd table, then concurrently forking from one peer while another exits (or forks).
  • Privileges gained or impact: a lost increment drives fdl_refcount below the true reference count; when a sharer exits and fdfree() sees fdl_refcount == 0, it kfrees fdtol while other peers still reference it β†’ UAF (kernel memory corruption / controlled free of an attacker-influenced slab object). A lost decrement leaks the node. The unlocked list splice can additionally corrupt the circular fdl list, yielding memory corruption in closef()/do_dup().
  • Required config or capabilities: none; default kernel. Trigger is narrow (rfork fdshare + concurrent peer fork/exit) β†’ AC:H.
  • Reachability: rfork(RFPROC|RFTHREAD) peers + concurrent fork/exit.

Proof of concept

PoC source: findings/poc/DF-0033/fdtol_race.c

Build & run (unprivileged, disposable VM)

cc -o fdtol_race findings/poc/DF-0033/fdtol_race.c
./fdtol_race

Expected output

Intermittent kernel memory corruption / panic in fdfree()'s fdl list walk or the next fork's fdtol deref (UAF).

Impact

Refcount race β†’ UAF reachable by an unprivileged user via rfork fdshare + concurrent peer fork/exit. Medium (narrow race window, but UAF = potential corruption/LPE).

Pair the refcount mutation with the lock already used on the free side (fd_spin), and lock the fdl list splice:

--- a/sys/kern/kern_fork.c
+++ b/sys/kern/kern_fork.c
@@ -568 +568,6 @@
        fdtol = p1->p_fdtol;
-       fdtol->fdl_refcount++;
+       /* fdl_refcount is mutated under the shared fd table's spinlock
+        * on the decrement side (fdfree), so match it here. */
+       spin_lock(&p1->p_fd->fd_spin);
+       fdtol->fdl_refcount++;
+       spin_unlock(&p1->p_fd->fd_spin);

Additionally, filedesc_to_leader_alloc() (kern_descrip.c:3346-3368) must take fd_spin (or a dedicated fdtol lock) around the fdl_next/fdl_prev splice. A more thorough fix converts fdl_refcount to an atomic/refcount_t and adds a dedicated lock for the fdl list.

References

Timeline

  • 2026-06-29 Discovered during automated file-by-file audit of sys/kern/kern_fork.c.
  • pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0033 Β· 10 files
FileTypeDescriptionSize
fdtol_race.c trigger-source controlled racer: bounded-concurrency rfork(RFPROC|RFTHREAD) peers + slab pressure to drive the fdl_refcount lost-update race and reclaim the freed slot 4.2 KB view raw
build.sh build-script cc -O2 -o fdtol_race fdtol_race.c 190 B view raw
run.sh run-script looped short bursts until panic or 12 rounds (race is intermittent) 748 B view raw
build.log build-log final successful build, full output 68 B view raw
run.log run-log sample 15s run (non-panic sample; panics land in panic.txt/boot.log) 266 B view raw
panic.txt panic-signature three distinct kernel panics from unprivileged maxx, incl. stack through filedesc_to_leader_alloc<-fork1<-sys_rfork 4.5 KB view raw
env.txt environment uname, cc version, sysctls, VM config 689 B view raw
VERDICT.md verdict full narrative: mechanism, evidence, exploit-chain characterization, fix rationale 9.3 KB ↓ raw
fix.diff suggested-fix git-apply-able: take shared fd_spin around fdl_refcount++ and around filedesc_to_leader_alloc splice in fork1 fdshare branch 1.0 KB view raw
README.md readme human-facing build/run/expected + bug summary 2.8 KB ↓ raw
README.md readme human-facing build/run/expected + bug summary
↓ download raw

DF-0033 β€” PoC

fdtol_race.c β€” unsynchronized fdtol->fdl_refcount++ (under per-proc p_token) racing fdl_refcount-- (under the shared fd_spin) in the rfork(RFPROC|RFTHREAD) fdshare path β†’ lost-update β†’ premature kfree(fdtol) β†’ UAF on M_FILEDESC_TO_LEADER. Plus an unlocked fdl list splice in filedesc_to_leader_alloc (self-admitted "NOT MPSAFE").

Status: REPRODUCED

Three distinct kernel panics from unprivileged user maxx (see VERDICT.md and panic.txt):

  1. panic: filedesc_to_refcount botch: fdl_refcount=0 β€” the KASSERT at kern_descrip.c:2627 catching the refcount underflow directly.
  2. panic: BADFREE2 β€” slab double-free of the prematurely-freed fdtol.
  3. panic: memory chunk … is already allocated! β€” slab corruption on the next kmalloc, stack: chunk_mark_allocated ← _kmalloc ← filedesc_to_leader_alloc ← fork1 ← sys_rfork (the exact cited path).

The race is intermittent (CVSS AC:H); reproducibility needs several short bursts of hammering.

The bug (confirmed, line-level)

fork1() fdshare branch (sys/kern/kern_fork.c):

:324  lwkt_gettoken(&p1->p_token);        /* per-PROC token */
…
:568  fdtol = p1->p_fdtol;
:569  fdtol->fdl_refcount++;              /* ONLY p_token held */

fdfree() (sys/kern/kern_descrip.c):

:2622 spin_lock(&fdp->fd_spin);           /* shared fd-table spinlock */
…
:2675 fdtol->fdl_refcount--;              /* ONLY fd_spin held */

p1->p_token is per-process; peers sharing p_fd hold different p_tokens, so the increment and decrement do not share a serialization lock for the same int word β†’ lost update. fdl_refcount is a plain int (filedesc.h:110).

Build & run (unprivileged, disposable VM)

./build.sh                    # cc -O2 -o fdtol_race fdtol_race.c
./run.sh                      # looped bursts until panic or 12 rounds
# or directly:
./fdtol_race <secs> <peers>   # e.g. ./fdtol_race 20 10

Run as the unprivileged user (e.g. maxx, uid 1001). A panic drops the guest into DDB and lands in dfbsd-qemu/boot.log on the host (serial console).

Expected output (bug present)

A kernel panic β€” one of: - panic: filedesc_to_refcount botch: fdl_refcount=0 - panic: BADFREE2 - panic: memory chunk <addr> is already allocated!

Files

File Purpose
fdtol_race.c controlled racer (bounded concurrency + slab pressure)
build.sh / run.sh exact build/run
build.log / run.log full logs
panic.txt the three panic signatures + stacks from boot.log
env.txt guest uname, compiler, sysctls
VERDICT.md full narrative: mechanism, evidence, exploit-chain characterization
fix.diff git-apply-able fix: take fd_spin around ++ and the splice
manifest.json machine-readable catalog
VERDICT.md verdict full narrative: mechanism, evidence, exploit-chain characterization, fix rationale
↓ download raw

DF-0033 β€” VERDICT: REPRODUCED (local-unprivileged kernel panic / UAF)

Field Value
Verdict REPRODUCED
Impact panic β€” reliable local-unprivileged kernel DoS; underlying primitive is a UAF on M_FILEDESC_TO_LEADER (64-byte slab zone)
Confidence certain (three distinct kernel panics from unprivileged maxx, one with a stack through the exact cited functions; plus an airtight line-level lock-mismatch proof)
Tested on DragonFly 6.5-DEVELOPMENT v6.5.0.1712.g89e6a-DEVELOPMENT (master DEV build)
Attempts ~8 build/run iterations; the race is intermittent (CVSS AC:H) and typically needs several 20–60s bursts to hit

Mechanism (confirmed in sys/, every hop cited)

The finding's claim is correct and the locking has not been fixed on master.

  1. fdtol->fdl_refcount is a plain int β€” sys/sys/filedesc.h:110. No atomics.
  2. Increment side β€” sys/kern/kern_fork.c: - :324 lwkt_gettoken(&p1->p_token) β€” taken at the top of fork1(). - :563 the RFTHREAD branch is entered for rfork(RFPROC|RFTHREAD) (fdshare). - :568 fdtol = p1->p_fdtol; - :569 fdtol->fdl_refcount++; ← mutated under p1->p_token only. - p1->p_token is held until :727.
  3. Decrement side β€” sys/kern/kern_descrip.c: - :2622 spin_lock(&fdp->fd_spin) β€” the shared fd-table spinlock. - :2675 fdtol->fdl_refcount--; ← mutated under fd_spin only.
  4. p1->p_token is per-process. Two peers A and B that share p_fd/p_fdtol (created by rfork(RFPROC|RFTHREAD)) hold different p_tokens (A.p_token β‰  B.p_token). lwkt tokens do not serialize across processes that do not share the same token. So: - fork-in-A holds A.p_token; - exit-in-B (fdfree via exit1, kern_exit.c:382) holds B.p_token plus the shared fd_spin; - neither lock is common to both sides for the fdl_refcount word. The only lock that is genuinely common to all sharers is fd_spin β€” and the increment side does not take it. ++/-- on a plain int is a classic read/modify/write: concurrent ++/-- from two CPUs is a lost update.
  5. Consequence. A lost increment drives fdl_refcount below the true reference count. When some peer later exits, fdfree decrements, sees fdl_refcount == 0, unlinks the node from the circular fdl list and kfree(fdtol, M_FILEDESC_TO_LEADER) (kern_descrip.c:2676-2686) while other peers still have p_fdtol pointing at it β†’ use-after-free. The next rfork in a surviving peer dereferences p1->p_fdtol (dangling) and bumps fdl_refcount in freed memory; or takes the non-RFTHREAD branch and calls filedesc_to_leader_alloc(old=p1->p_fdtol, ...), which reads old->fdl_next/old->fdl_prev from the freed slot and writes through them β€” corrupting whatever now occupies that slab slot.
  6. Unlocked list splice β€” sys/kern/kern_descrip.c:3342-3368 filedesc_to_leader_alloc() is self-admitted "NOT MPSAFE" (:3343) and splices the shared fdl_next/fdl_prev list (:3358-3362) under no lock. With the fix's fd_spin held around the call site, the splice becomes serialized against fdfree's list walk.

Evidence (three panics, unprivileged maxx)

All three are in panic.txt. Summary:

  • Panic A β€” panic: filedesc_to_refcount botch: fdl_refcount=0 in fdfree. This is the KASSERT(fdtol->fdl_refcount > 0, …) at kern_descrip.c:2627-2629 firing β€” the kernel's own invariant check catching the refcount underflow that the lost-update race produces. Stack: fdfree β†’ exit1 β†’ sigexit β†’ postsig β†’ userret.

  • Panic B β€” panic: BADFREE2 in _kfree. Slab double-free detection: after the premature kfree(fdtol), a dangling p_fdtol reference drove a second free of the same M_FILEDESC_TO_LEADER object. The slab allocator's bookkeeping became inconsistent, so a later kfree in sysctl_kern_proc_args (collateral) trips BADFREE2.

  • Panic C β€” panic: memory chunk … is already allocated! β€” slab corruption detected on the next allocation out of M_FILEDESC_TO_LEADER. The stack is the smoking gun, walking the exact functions cited in the finding: chunk_mark_allocated ← _kmalloc ← filedesc_to_leader_alloc ← fork1 ← sys_rfork. This is the full chain: refcount race β†’ premature free β†’ dangling write into the freed 64-byte-zone slot β†’ corrupted slab bitmap β†’ next kmalloc panics.

The race is intermittent (AC:H). Across the verification, panic A fired after ~50 s of hammering; panics B and C each fired within a handful of 20 s bursts.

Exploit chain (characterization; LPE not demonstrated)

The primitive is memory corruption (UAF), so per the audit methodology the chain is characterized even though root was not achieved in this session.

  • Object / zone. struct filedesc_to_leader is 40 bytes (intΓ—3 + ptrΓ—3, sys/sys/filedesc.h:109-117). DragonFly's slab rounds allocation size up to a power of two (powerof2_size, sys/kern/kern_slaballoc.c:776-786), so fdtol lands in the 64-byte chunk zone (kmalloc(sizeof(struct filedesc_to_leader)=40) β†’ 64).
  • Write primitive. After the premature free, a surviving peer's next non-RFTHREAD rfork calls filedesc_to_leader_alloc(old=p1->p_fdtol), which executes old->fdl_next->fdl_prev = old->fdl_prev (kern_descrip.c:3362). If the attacker reclaims the freed 64-byte slot with a crafted object, the attacker controls old->fdl_next (write address) and old->fdl_prev (write value) β†’ a single arbitrary-pointer-sized write. Additionally old->fdl_next = fdtol (:3361) writes a known kernel pointer into a controlled offset.
  • Victim objects (64-byte zone). Candidate victims would be any kmalloc of ≀64 bytes containing an attacker-interesting field: a function pointer, a uid, a struct ucred */struct file * pointer, a refcount. Grooming would spray the 64-byte zone (sockets, pipes, small kinfo structs) to place such a victim adjacent to the freed fdtol slot.
  • How far it got / what blocks root. The write primitive is real but gated behind a non-deterministic race (AC:H): the attacker must first win the refcount lost-update, then reclaim the slot, then trigger the splice β€” all in the right order across two CPUs. On this INVARIANTS kernel the KASSERT (Panic A) catches the underflow before the corruption phase, masking the exploitable path. On a production (non-INVARIANTS) kernel the KASSERT is compiled out and the UAF proceeds silently to the arbitrary write; turning that into uid=0 requires slab grooming + a chosen victim object + a deterministic race trigger, which is substantial exploit development and was not completed in this session. The demonstrated, reproducible ceiling here is local-unprivileged kernel panic (DoS), which is already a High-impact outcome for a default-config kernel.

A maintainer should treat the LPE ceiling as plausible-but-unproven; the DoS is proven.

PoC changes (vs. the seeded fdtol_race.c)

The seeded PoC compiled and ran but fork-bombed into kern.maxprocperuid within seconds, wedging ssh before the race had time to fire. Rewrote it as a controlled racer:

  • Bounded concurrency: each peer does rfork(RFPROC|RFTHREAD) + child _exit(0) and the parent-peer reaps with waitpid, so the live process count stays well under the per-uid cap (the original looped fork() forever and orphaned everything).
  • Added slab pressure (open("/dev/null")/close churn in both peer and child) so that, once a premature free occurs, the freed 64-byte slot is likely reclaimed and the next deref hits clobbered memory β†’ visible panic (this is exactly what produced Panic C).
  • The parent also runs rfork(RFPROC|RFTHREAD)+_exit to add a third contender for the refcount word (more cross-CPU ++/-- overlap).
  • SIGALRM-bounded runtime; the run scripts loop short bursts because the race is intermittent.

Build/run: ./build.sh && ./run.sh (or ./fdtol_race <secs> <peers>).

Why this is not a false positive

The lock mismatch is structural, not a reviewer oversight: - The increment and decrement sides are guarded by two different, non-mutually-held locks (p1->p_token is per-proc; fd_spin is shared). - No atomic_t, no atomic_add_int, no common spinlock protects the word. - filedesc_to_leader_alloc's own comment (kern_descrip.c:3343) "NOT MPSAFE" corroborates that the splice was known-unsafe. - Three independent kernel panics reproduce from unprivileged userland, one with a stack through the exact cited call chain.

fix.diff in this folder (git-apply-able, verified). It takes the shared fd_spin (the same lock fdfree already uses for the decrement and list walk) around both the fdl_refcount++ and the filedesc_to_leader_alloc() splice in fork1's fdshare branch. This matches the finding markdown's proposal (the markdown sketched the spin_lock around the ++; this diff additionally wraps the else-branch filedesc_to_leader_alloc call, which is the unlocked splice the markdown also flagged). A more thorough follow-up would convert fdl_refcount/fdl_holdcount to atomic_t and add a dedicated fdl list lock, but the minimal correct fix is the fd_spin pairing.

Confirmed kernel references

Detail

Exploit chain

Primitive is a UAF on M_FILEDESC_TO_LEADER (struct is 40 bytes -> 64-byte slab zone via powerof2_size). After the premature free, a surviving peer's next non-RFTHREAD rfork calls filedesc_to_leader_alloc(old=p1->p_fdtol) which executes old->fdl_next->fdl_prev = old->fdl_prev (kern_descrip.c:3362): if the attacker reclaims the freed 64-byte slot with a crafted object, this yields a single arbitrary pointer-sized write (controlled write-address via old->fdl_next, controlled value via old->fdl_prev). LPE ceiling is plausible on a non-INVARIANTS production kernel (where the KASSERT that currently masks the corruption is compiled out) via 64-byte-zone grooming against a victim object carrying a function pointer or ucred/uid, but the write is gated behind a non-deterministic race (AC:H) and root was not achieved in this session. Demonstrated, reproducible ceiling: local-unprivileged kernel panic (DoS) on the default INVARIANTS kernel. Characterized in VERDICT.md; no separate exploit.c shipped since the deterministic-race+grooming chain was not completed.

Evidence (decisive lines)

findings/poc/DF-0033/panic.txt holds all three panic signatures+stacks; VERDICT.md has the full line-level mechanism; manifest.json catalogs the pack. Decisive bytes: 'panic: filedesc_to_refcount botch: fdl_refcount=0' (KASSERT at kern_descrip.c:2627 fired), and 'panic: memory chunk 0xfffff800461052ad is already allocated!' with stack 'chunk_mark_allocated <- _kmalloc <- filedesc_to_leader_alloc+0x23 <- fork1+0xbe9 <- sys_rfork+0x43' -- the exact cited path. fdtol_race.c is the controlled racer; build.sh/run.sh reproduce.

PoC changes

Rewrote fdtol_race.c: the seeded version fork-bombed into kern.maxprocperuid and wedged ssh before the race could fire. v2 bounds concurrency (each peer rfork(RFPROC|RFTHREAD)+child _exit, reaped with waitpid), adds slab pressure (open/close /dev/null churn) so a prematurely-freed fdtol slot is reclaimed and the next deref hits clobbered memory (this is what produced panic C), and adds a SIGALRM-bounded runtime plus a third contender via the parent loop. Added build.sh, run.sh (looped bursts since the race is intermittent), env.txt, panic.txt, VERDICT.md, manifest.json, and fix.diff.

Verified recommended fix

fix.diff (git-apply-able, verified clean) takes the shared p1->p_fd->fd_spin -- the same lock fdfree already uses for the decrement and list walk -- around both the fdl_refcount++ (kern_fork.c:569) and the filedesc_to_leader_alloc() list splice (kern_fork.c:575) in fork1's fdshare branch. Matches the finding markdown's proposal (which sketched the spin_lock around the ++) and additionally wraps the else-branch unlocked splice the markdown also flagged. A more thorough follow-up would convert fdl_refcount/fdl_holdcount to atomic_t and add a dedicated fdl list lock.

Verdict

REPRODUCED. The lock-mismatch claim is airtight and unfixed on master: fdl_refcount is a plain int (sys/sys/filedesc.h:110), incremented at kern_fork.c:569 under only p1->p_token (acquired :324) which is PER-PROCESS, while decremented at kern_descrip.c:2675 under the SHARED fd_spin (acquired :2622). Peers sharing p_fd/p_fdtol hold different p_tokens, so the ++/-- share no serialization lock -> lost update -> premature kfree(fdtol) at kern_descrip.c:2686 while peers still reference it -> UAF. Confirmed by three distinct kernel panics from unprivileged user maxx: (A) 'panic: filedesc_to_refcount botch: fdl_refcount=0' -- the KASSERT at kern_descrip.c:2627 catching the refcount underflow directly; (B) 'panic: BADFREE2' -- slab double-free of the prematurely-freed fdtol; (C) 'panic: memory chunk ... is already allocated!' with stack chunk_mark_allocated<-_kmalloc<-filedesc_to_leader_alloc<-fork1<-sys_rfork, i.e. the exact cited call chain, slab corruption from the dangling write into the freed 64-byte-zone slot. filedesc_to_leader_alloc (kern_descrip.c:3342-3368) also splices the shared fdl list under no lock ('NOT MPSAFE' :3343), as the finding states.