โฌข DragonFlyBSD Kernel Audit
โ† dashboard
DF-0032

fdcopy() failure in fork1() permanently leaks the child proc, nprocs, and the per-uid proc-count (system-wide fork DoS)

Field Value
ID DF-0032
Status new
Severity Medium
CVSS 3.1 CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H
CWE CWE-401 Missing Release of Memory after Effective Lifetime; CWE-772
File sys/kern/kern_fork.c
Lines 491, 551-555, 724-732
Area kern
Confidence likely
Discovered 2026-06-29
Reported pending

Summary

When fdcopy() returns failure on the RFFDG path, fork1() branches to done:, which only releases tokens. But p2 was already kmalloc'd, placed on allproc in SIDL state (:491), given held references (reaper, uidpcpu, ucred, p_args, sigacts, textvp, textnch), and the global nprocs and per-uid chgproccnt were already charged โ€” none of which is undone. fdcopy is the only fd-duplication primitive that can fail (it uses M_NULLOK at kern_descrip.c:2481-2486, unlike fdinit/fdshare). Default fork()/vfork() use RFFDG, so any unprivileged user can drive this under kernel malloc pressure: each failed fork() permanently consumes one system-wide maxproc slot and one per-uid RLIMIT_NPROC slot, and once nprocs == maxproc no process on the system (including root) can fork/ vfork/create threads โ€” persisting until reboot.

Root cause

sys/kern/kern_fork.c:

proc_add_allproc(p2);                          /* :491  on allproc in SIDL  */
...
if (flags & RFFDG) {
    error = fdcopy(p1, &p2->p_fd);             /* :551  only failing fd op  */
    if (error != 0) {
        error = ENOMEM;
        goto done;                             /* :554  no teardown of p2   */
    }
    ...
}
...
done:                                           /* :724 */
    if (p2)
        lwkt_reltoken(&p2->p_token);
    lwkt_reltoken(&p1->p_token);
    if (plkgrp) { lockmgr(...LK_RELEASE); pgrel(plkgrp); }
    return (error);                            /* :732  p2 leaked */

nprocs (incremented earlier in fork1) is only ever decremented at kern_exit.c:1337, and chgproccnt only at kern_exit.c:1280 โ€” both require a runnable/exiting lwp, which a SIDL orphan (no lwp, no parent linkage) never has. allproc scans skip SIDL procs (kern_proc.c), so nothing reclaims it.

Threat model & preconditions

  • Attacker position: any unprivileged local user.
  • Privileges gained or impact: permanent system-wide availability DoS. Inducing kernel malloc pressure (large mmap+touch, swap exhaustion) so the M_NULLOK kmalloc in fdcopy returns NULL, then looping fork(): each failure permanently consumes one maxproc slot + one per-uid proc slot
  • leaks struct proc/uidpcpu/ucred/sigacts and vnode/namecache refs. Once nprocs == maxproc, fork()/vfork()/thread-creation returns EAGAIN for all users (including root) until reboot. Survives the attacker's own process exit.
  • Required config or capabilities: none; default kernel. The trigger needs sustained memory pressure.
  • Reachability: fork(2)/vfork(2) (both RFFDG) under malloc pressure.

Proof of concept

PoC source: findings/poc/DF-0032/fork_leak.c

Build & run (unprivileged, disposable VM)

cc -o fork_leak findings/poc/DF-0032/fork_leak.c
./fork_leak

Expected output

Proc count climbs toward maxproc; once exhausted, fork EAGAIN for all users until reboot.

Impact

Permanent, system-wide fork/thread-creation exhaustion reachable by an unprivileged user under memory pressure โ€” affects every user including root, persists across the attacker's own exit until reboot. Rated Medium (availability; the precondition is sustained memory pressure).

Make fdcopy use M_WAITOK|M_ZERO (matching fdinit), eliminating the only failure mode fork1 is unprepared to clean up:

--- a/sys/kern/kern_descrip.c
+++ b/sys/kern/kern_descrip.c
@@ -2481
-   newfdp = kmalloc(sizeof(struct filedesc),
-            M_FILEDESC, M_WAITOK | M_ZERO | M_NULLOK);
-   if (newfdp == NULL) {
-       *fpp = NULL;
-       return (-1);
-   }
+   newfdp = kmalloc(sizeof(struct filedesc),
+            M_FILEDESC, M_WAITOK | M_ZERO);

Defense-in-depth: fork1's done: label should additionally gain a full teardown for a partially-built p2 (LIST_REMOVE from allproc, crfree, refcount_release on sigacts/p_args, vrele textvp, cache_drop textnch, reaper_drop, kfree uidpcpu, kfree p2, atomic_add_int(&nprocs,-1), chgproccnt(uid,-1,0)) so a future error path added after p2 allocation cannot reintroduce the same leak.

References

Timeline

  • 2026-06-29 Discovered during automated file-by-file audit of sys/kern/kern_fork.c.
  • pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0032 ยท 13 files
FileTypeDescriptionSize
exhaust.c trigger-source working unprivileged trigger: fd-table amplification (dup2 to high fds) drives M_FILEDESC to its ks_limit, fdcopy's M_NULLOK kmalloc returns NULL, fork() returns ENOMEM, leaking nprocs/struct-proc 4.9 KB view raw
exhaust_slow.c trigger-source slow variant used for parallel kernel-state sampling that caught M_FILEDESC hitting ~176M at the failure moment 1.4 KB view raw
forktest.c auxiliary single-fork errno reporter 773 B view raw
forktest_bomb.c auxiliary root fork-bomb proving root fork capacity collapsed to ~272 (from ~3890) and root is fork-blocked 1.2 KB view raw
fork_leak.c trigger-source original reviewer PoC (mmap pressure); does NOT trigger the bug - retained for provenance 2.2 KB view raw
build.sh build-script cc commands for all PoC binaries 549 B view raw
run.sh run-script runs exhaust and prints before/after malloc-type counts 1.5 KB view raw
run.log run-log decisive untrimmed run output: leak trigger + multi-uid accumulation to 91% maxproc exhaustion + root fork-blocked, with interpretation 7.3 KB view raw
dmesg.txt dmesg kernel 'maxproc limit exceeded by uid 0' (+ attacker uids) messages 1.3 KB view raw
env.txt environment uname, cc version, sysctls, M_FILEDESC ks_limit derivation 1.4 KB view raw
VERDICT.md verdict full narrative: mechanism, evidence, system-wide DoS, fix rationale 8.2 KB โ†“ raw
README.md readme human-facing build/run/expected + caveats 3.2 KB โ†“ raw
fix.diff suggested-fix git-apply-able (verified clean): full p2 teardown on the fdcopy-failure path + new proc_remove_allproc() helper in kern_proc.c 3.1 KB view raw
README.md readme human-facing build/run/expected + caveats
โ†“ download raw

DF-0032 โ€” PoC

fdcopy() failure in fork1() permanently leaks the child struct proc, the system-wide nprocs counter, and the per-uid proc-count โ†’ local unprivileged system-wide fork-DoS (permanent until reboot). Severity Medium, CWE-401/772.

Status

REPRODUCED. See VERDICT.md for the full narrative and run.log for the decisive evidence.

The original fork_leak.c (mmap-pressure trigger) does not actually fire the bug โ€” anonymous mmap consumes user VM, not the kernel M_FILEDESC malloc pool. exhaust.c is the working trigger (fd-table amplification).

The bug

fork1() charges nprocs++ (kern_fork.c:415) and chgproccnt++ (:421), kmalloc's p2 (:444), gives it uidpcpu/ucred/sigacts/textvp refs, and proc_add_allproc(p2) (:491) โ€” all before fdcopy (:551). fdcopy is the only fd op that can fail (its struct filedesc kmalloc is M_WAITOK|M_ZERO|M_NULLOK at kern_descrip.c:2481-2486; fdinit/fdshare and the fd_files[] array are plain M_WAITOK). On failure fork1 does goto done (:552-554) and done: (:724-732) only releases tokens โ€” no teardown of p2, no nprocs--, no chgproccnt--, no crfree/vrele. The SIDL orphan (no lwp, no parent โ€” lwp_fork1 is at :674, after fdcopy) never reaches exit, so nprocs/chgproccnt are never decremented. Each failure permanently consumes one system-wide maxproc slot + one per-uid RLIMIT_NPROC slot.

When fdcopy actually fails

Its M_NULLOK kmalloc returns NULL when M_FILEDESC's per-type ks_limit is exceeded (kern_slaballoc.c:863-879). ks_limit = kmem_lim_size()/10 = min(physmem, KvaSize)/10 (~195 MB on a 2 GB guest). Each fork() on the RFFDG path charges M_FILEDESC for a copy of the parent's fd_files[] table โ€” so growing the parent's fd table (via dup2 to high fds) amplifies the charge: ~260 children with ~15000-fd tables push M_FILEDESC to its limit, and the next fdcopy returns NULL โ†’ fork() returns ENOMEM โ†’ leak.

Build & run (unprivileged; disposable VM)

./build.sh        # cc -o exhaust exhaust.c  (and the other PoC binaries)
./run.sh          # runs exhaust, prints before/after malloc-type counts

Expected (bug present)

[!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259
[*] summary: ok=259 eagain=51 enomem=705 other=0

After the run the proc (M_PROC) malloc-type Count is permanently elevated by ~705, while lwp and file_desc stay flat and ps ax|wc -l is unchanged (leaked SIDL orphans are invisible). Repeating across ~4โ€“6 unprivileged uids exhausts maxproc and fork-DoSes the whole system (root included) until reboot.

Expected (bug fixed)

fork() no longer returns ENOMEM (the per-type limit is never reached because the failed allocations are rolled back, and/or M_FILEDESC no longer climbs because the leak is gone); proc Count returns to baseline after the run.

CAUTION

Each ./exhaust run permanently consumes ~700 system-wide maxproc slots until reboot. The guest is left ~18 % fork-exhausted after a single run. Run vm.sh reset to clean up. Do not loop across many uids on a host you are not prepared to reboot โ€” that is the full DoS.

VERDICT.md verdict full narrative: mechanism, evidence, system-wide DoS, fix rationale
โ†“ download raw

DF-0032 โ€” VERDICT

Verdict: REPRODUCED. Real, unprivileged, local, system-wide fork-DoS (permanent until reboot). The reviewer-written PoC's trigger was wrong (mmap pressure), but the underlying bug is real and exploitable; a corrected trigger reproduces it reliably.

The bug (confirmed line-by-line in sys/)

In fork1() the irreversible steps happen before the only failing fd op:

kern_fork.c operation undone on fdcopy failure?
:415 atomic_add_int(&nprocs, 1) NO
:421 chgproccnt(uid, 1, RLIMIT_NPROC) (per-uid) NO
:444 p2 = kmalloc(sizeof(struct proc), M_PROC, M_WAITOK\|M_ZERO) NO
:475 p2->p_uidpcpu = kmalloc(..., M_SUBPROC, ...) NO
:491 proc_add_allproc(p2) (on allproc, SIDL) NO
:509 p2->p_ucred = crhold(...) NO
:521-529 sigacts (share-ref or kmalloc) NO
:536-542 p_textvp vref / p_textnch cache_copy NO
:551 error = fdcopy(p1, &p2->p_fd); โ† only failing fd op โ€”
:552-554 if (error) { error = ENOMEM; goto done; } โ€”
:724-732 done: releases only p_token/p1_token/pglock โ€”

nprocs is decremented only at kern_exit.c:1337 and chgproccnt only at kern_exit.c:1280 โ€” both require a runnable/exiting lwp, which a SIDL orphan (no lwp, no parent linkage โ€” lwp_fork1 is at :674, after fdcopy) never has. allproc scans skip SIDL procs, so ps/procstat never show them.

fdcopy is the only fd op that can fail because it is the only one whose struct filedesc kmalloc uses M_NULLOK:

sys/kern/kern_descrip.c:2481   newfdp = kmalloc(sizeof(struct filedesc),
sys/kern/kern_descrip.c:2482                    M_FILEDESC, M_WAITOK|M_ZERO|M_NULLOK);
sys/kern/kern_descrip.c:2483   if (newfdp == NULL) { *fpp = NULL; return (-1); }

fdinit (:2408) and fdshare use plain M_WAITOK (cannot fail), and fdcopy's own fd_files[] array (:2504) is M_WAITOK (no M_NULLOK), so it would panic on limit exhaustion rather than return NULL. The single clean failure mode is the M_NULLOK newfdp.

When does that M_NULLOK kmalloc actually return NULL?

kmalloc returns NULL with M_NULLOK when the per-type ks_limit is exceeded (kern_slaballoc.c:863-879):

ks_limit = kmem_lim_size() * 1MB / 10          (kern_slaballoc.c:371-372)
kmem_lim_size() = min(physmem, KvaSize)/1MB    (kern_slaballoc.c:255-263)

On this 2 GB guest: ks_limit(M_FILEDESC) = ~195 MB.

The original PoC tried to induce this with mmap+touch. That does not work: anonymous mmap consumes user VM and physical pages, not the kernel M_FILEDESC malloc pool. Live measurement showed M_FILEDESC unchanged (27 KB โ†’ 31 KB) across an mmap-pressure run. So the original PoC never triggers the bug โ€” it only causes userland OOM.

The working trigger (fd-table amplification)

Each successful fork() on the RFFDG path calls fdcopy, which allocates a copy of the parent's fd_files[] table (kern_descrip.c:2504) under M_FILEDESC. A process that has grown its fd table large (via dup2 to high fds) therefore forces every child's fdcopy to charge M_FILEDESC for a large (~hundreds-of-KB) fd_files array. With a ~15000-entry fd table, each child costs ~700 KB of M_FILEDESC; ~260 such children push M_FILEDESC to its ~195 MB limit. At that point the next fdcopy's M_NULLOK newfdp kmalloc returns NULL โ†’ fdcopy returns -1 โ†’ fork1 does goto done โ†’ leak.

exhaust.c does exactly this and is fully unprivileged.

Evidence (all in this folder)

run.log is the decisive record. Highlights:

$ ./exhaust
[*] grew fd table to fd=14976 (fd_files[] ~234KB per fdcopy)
[!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259
[*] summary: ok=259 eagain=51 enomem=705 other=0

Parallel root sampling during the slow variant caught the failure moment:

[t=57] file_desc=18.0M   proc=51
[t=59] file_desc=176M    proc=261     <- M_FILEDESC hit its ~195M ks_limit

The leak is confirmed by the kernel malloc-type counters (the leaked structs are never freed, so they persist):

type baseline after one run meaning
proc (M_PROC) 25 744 +719 struct proc permanently leaked
subproc (uidpcpu) 48 1450 +1402 p_uidpcpu leaked
lwp 34 34 unchanged โ†’ leak is before lwp_fork1 (:674)
file_desc 28 28 unchanged โ†’ newfdp returned NULL, no filedesc made
ps ax \| wc -l 146 146 leaked SIDL orphans are invisible to ps

lwp/file_desc being flat is the fingerprint that pins the leak to the exact point the code trace predicts: fdcopy failure (:551) after p2 was put on allproc (:491) but before lwp_fork1 (:674). If the leak were anywhere else, one of those counters would move.

System-wide impact (DoS demonstrated)

nprocs is a global counter; the leaked slots reduce fork capacity for every user, including root. Because the per-uid chgproccnt is also leaked (never decremented), one unprivileged uid can permanently burn ~700โ€“1000 system-wide maxproc slots before self-capping at its own RLIMIT_NPROC. ~4โ€“6 unprivileged uids exhaust all of maxproc=4036.

Multi-uid staged attack (proc Count โ‰ˆ global nprocs; ps ax frozen at 146):

baseline โ†’ 25   maxxโ†’732   u1002โ†’1.41K   u1003โ†’2.10K   u1004โ†’2.79K
u1005โ†’3.48K   u1006โ†’3.68K (+203; system nprocs check now pre-blocks fdcopy)

Then, with nprocs permanently ~3680/4036, a root fork-bomb:

$ /root/forktest_bomb
forktest_bomb: root fork() EAGAIN after 272 children (errno=35 Resource temporarily unavailable)
forktest_bomb: ROOT RESULT ok=272 eagain=3  (clean system would allow ~4036)

Root can fork only ~272 children (vs ~3890 on a clean system) โ€” a ~93 % collapse โ€” and is itself fork-blocked. dmesg corroborates with maxproc limit exceeded by uid 0. The leaked slots are permanent (they do not recover after the attackers exit); only a reboot clears them.

Caveats / precision

  • The original PoC (fork_leak.c) is not a valid trigger (mmap โ‰  M_FILEDESC). It is retained for provenance; exhaust.c is the working trigger.
  • Full maxproc exhaustion needs ~4โ€“6 unprivileged uids (single user is capped at ~1009 leaked slots by its own leaked per-uid chgproccnt). On any multi-user system (or for any user able to raise RLIMIT_NPROC / run from several accounts) full system-wide fork-DoS is straightforward. Even a single user permanently destroys ~18โ€“25 % of system fork capacity and permanently fork-blocks their own uid.
  • No kernel panic occurred at any point; the failure is a clean ENOMEM leak, exactly the path cited.

Files in this folder

file purpose
exhaust.c / exhaust working trigger โ€” fd-table amplification โ†’ fdcopy failure โ†’ leak
exhaust_slow.c slow variant for parallel kernel-state sampling
forktest.c, forktest_bomb.c prove root fork capacity collapses / root fork-blocked
fork_leak.c original reviewer PoC (mmap pressure; does not trigger)
build.sh, run.sh exact build / run commands
run.log decisive untrimmed run output + interpretation
dmesg.txt kernel maxproc limit exceeded messages (incl. uid 0)
env.txt guest uname, sysctls, ks_limit derivation
fix.diff git-apply-able fix (verified git apply --check clean)
manifest.json machine-readable artifact catalog

Fix

fix.diff adds a full teardown of the partially-built p2 on the fdcopy-failure path (reversing every acquisition from :491 back through :415/:421), plus a new symmetric proc_remove_allproc() helper in kern_proc.c (the inverse of proc_add_allproc()). This supersedes the finding markdown's primary proposal (drop M_NULLOK from fdcopy): dropping M_NULLOK would convert the leak into a panic("malloc limit exceeded") at kern_slaballoc.c:877 (worse for availability). The teardown keeps fdcopy's clean ENOMEM failure mode and makes fork1 correctly clean up after it โ€” fixing the root cause and adding defense-in-depth for any future error path after p2 allocation.

Confirmed kernel references

Detail

Exploit chain

Unprivileged weaponization (not memory corruption, so no privesc chain -- pure availability DoS): (1) attacker grows its own fd table to ~15000 entries via dup2 to high fds; (2) loops fork() (default fork()==RFFDG) -- each child's fdcopy copies the parent's large fd_files[] array under M_FILEDESC, pushing the type to its ~195MB ks_limit in ~260 children; (3) subsequent forks hit fdcopy's M_NULLOK-kmalloc-returns-NULL, fork1 leaks one struct proc + nprocs++ + per-uid chgproccnt++ each time, returning ENOMEM; (4) repeat until the uid's own RLIMIT_NPROC self-caps (~700-1000 leaked system slots per uid); (5) run from ~4-6 unprivileged uids to push global nprocs to maxproc, after which NO process (including root) can fork/vfork/create threads until reboot. The leak is permanent (leaked SIDL procs never exit) and survives the attackers' exit. Demonstrated end-to-end; root fork-bomb could create only 272 children vs ~3890 on a clean system.

Evidence (decisive lines)

findings/poc/DF-0032/run.log (decisive), dmesg.txt, env.txt, VERDICT.md, manifest.json, fix.diff. Decisive lines: '[!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259 / summary: ok=259 eagain=51 enomem=705'; parallel sample '[t=59] file_desc=176M proc=261' (M_FILEDESC hit ks_limit); leak fingerprint 'proc 25->744, lwp 34->34, file_desc 28->28, ps 146->146'; multi-uid accumulation proc 25->732->1410->2100->2790->3480->3680; root fork-bomb 'forktest_bomb: root fork() EAGAIN after 272 children ... (clean system would allow ~4036)'; dmesg 'maxproc limit exceeded by uid 0'.

PoC changes

Original fork_leak.c (mmap-pressure trigger) does NOT fire the bug -- retained for provenance. Added exhaust.c (working unprivileged trigger: dup2-grown fd table + fork RFFDG -> fdcopy ENOMEM leak), exhaust_slow.c (slow variant for parallel kernel-state sampling), forktest.c/forktest_bomb.c (prove root fork capacity collapse / root fork-blocked), VERDICT.md, run.log, dmesg.txt, env.txt, build.sh, run.sh, manifest.json, and fix.diff.

Verified recommended fix

fix.diff (verified git apply --check clean) adds a full teardown of the partially-built p2 on fork1()'s fdcopy-failure path -- stopprofclock, cache_drop textnch, vrele textvp, sigacts/args refcount_release+free, crfree ucred, proc_remove_allproc(p2), reaper_drop, free uidpcpu+p2, nprocs--, chgproccnt-- -- plus a new symmetric proc_remove_allproc() helper in kern_proc.c. This SUPERSEDES the finding markdown's primary proposal (drop M_NULLOK from fdcopy): dropping M_NULLOK would convert the leak into panic('malloc limit exceeded') at kern_slaballoc.c:877, which is worse for availability; the teardown preserves fdcopy's clean ENOMEM failure mode and fixes the actual missing-cleanup root cause with defense-in-depth for any future post-allproc error path.

Verdict

REPRODUCED. fork1()'s RFFDG path bumps nprocs (kern_fork.c:415), chgproccnt (:421), kmallocs p2 (:444), gives it uidpcpu/ucred/sigacts/textvp refs, and calls proc_add_allproc(p2) (:491) all BEFORE fdcopy (:551); fdcopy is the only fd op that can fail (its struct-filedesc kmalloc is M_WAITOK|M_ZERO|M_NULLOK at kern_descrip.c:2481), and on failure fork1 does goto done (:552-554) whose done: (:724-732) only releases tokens -- no p2 teardown, no nprocs--, no chgproccnt--, no crfree/vrele. The SIDL orphan (no lwp; lwp_fork1 is at :674, after fdcopy) never reaches exit, so the counters are never decremented. The original PoC's mmap trigger is wrong (mmap consumes user VM, not the M_FILEDESC pool), but the bug is real: a corrected unprivileged trigger (grow the fd table via dup2, then fork RFFDG so each child's fdcopy charges M_FILEDESC for a large fd_files[] copy) drives M_FILEDESC to its ~195MB per-type ks_limit (kern_slaballoc.c:863-879), fdcopy's M_NULLOK kmalloc returns NULL, and fork() returns ENOMEM ~700x, permanently leaking ~700 struct proc + system maxproc slots per run. Confirmed empirically: M_PROC Count 25->744 (leaked), M_LWP and M_FILEDESC flat (pins the leak to exactly fdcopy failure), ps ax unchanged (SIDL orphans invisible); multi-uid accumulation to ~3680/4036 nprocs collapsed root's fork capacity to ~272 children and fork-blocked root (dmesg 'maxproc limit exceeded by uid 0').