fdcopy() failure in fork1() permanently leaks the child proc, nprocs, and the per-uid proc-count (system-wide fork DoS)
| Field | Value |
|---|---|
| ID | DF-0032 |
| Status | new |
| Severity | Medium |
| CVSS 3.1 | CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:N/I:N/A:H |
| CWE | CWE-401 Missing Release of Memory after Effective Lifetime; CWE-772 |
| File | sys/kern/kern_fork.c |
| Lines | 491, 551-555, 724-732 |
| Area | kern |
| Confidence | likely |
| Discovered | 2026-06-29 |
| Reported | pending |
Summary
When fdcopy() returns failure on the RFFDG path, fork1() branches to
done:, which only releases tokens. But p2 was already kmalloc'd, placed on
allproc in SIDL state (:491), given held references (reaper, uidpcpu,
ucred, p_args, sigacts, textvp, textnch), and the global nprocs and
per-uid chgproccnt were already charged โ none of which is undone.
fdcopy is the only fd-duplication primitive that can fail (it uses
M_NULLOK at kern_descrip.c:2481-2486, unlike fdinit/fdshare). Default
fork()/vfork() use RFFDG, so any unprivileged user can drive this under
kernel malloc pressure: each failed fork() permanently consumes one
system-wide maxproc slot and one per-uid RLIMIT_NPROC slot, and once
nprocs == maxproc no process on the system (including root) can fork/
vfork/create threads โ persisting until reboot.
Root cause
proc_add_allproc(p2); /* :491 on allproc in SIDL */
...
if (flags & RFFDG) {
error = fdcopy(p1, &p2->p_fd); /* :551 only failing fd op */
if (error != 0) {
error = ENOMEM;
goto done; /* :554 no teardown of p2 */
}
...
}
...
done: /* :724 */
if (p2)
lwkt_reltoken(&p2->p_token);
lwkt_reltoken(&p1->p_token);
if (plkgrp) { lockmgr(...LK_RELEASE); pgrel(plkgrp); }
return (error); /* :732 p2 leaked */
nprocs (incremented earlier in fork1) is only ever decremented at
kern_exit.c:1337, and chgproccnt only at kern_exit.c:1280 โ both require
a runnable/exiting lwp, which a SIDL orphan (no lwp, no parent linkage) never
has. allproc scans skip SIDL procs (kern_proc.c), so nothing reclaims it.
Threat model & preconditions
- Attacker position: any unprivileged local user.
- Privileges gained or impact: permanent system-wide availability DoS.
Inducing kernel malloc pressure (large
mmap+touch, swap exhaustion) so theM_NULLOKkmallocinfdcopyreturnsNULL, then loopingfork(): each failure permanently consumes onemaxprocslot + one per-uid proc slot - leaks
struct proc/uidpcpu/ucred/sigactsand vnode/namecache refs. Oncenprocs == maxproc,fork()/vfork()/thread-creation returnsEAGAINfor all users (including root) until reboot. Survives the attacker's own process exit. - Required config or capabilities: none; default kernel. The trigger needs sustained memory pressure.
- Reachability:
fork(2)/vfork(2)(bothRFFDG) under malloc pressure.
Proof of concept
PoC source: findings/poc/DF-0032/fork_leak.c
Build & run (unprivileged, disposable VM)
cc -o fork_leak findings/poc/DF-0032/fork_leak.c ./fork_leak
Expected output
Proc count climbs toward maxproc; once exhausted, fork EAGAIN for all
users until reboot.
Impact
Permanent, system-wide fork/thread-creation exhaustion reachable by an unprivileged user under memory pressure โ affects every user including root, persists across the attacker's own exit until reboot. Rated Medium (availability; the precondition is sustained memory pressure).
Recommended fix
Make fdcopy use M_WAITOK|M_ZERO (matching fdinit), eliminating the only
failure mode fork1 is unprepared to clean up:
--- a/sys/kern/kern_descrip.c
+++ b/sys/kern/kern_descrip.c
@@ -2481
- newfdp = kmalloc(sizeof(struct filedesc),
- M_FILEDESC, M_WAITOK | M_ZERO | M_NULLOK);
- if (newfdp == NULL) {
- *fpp = NULL;
- return (-1);
- }
+ newfdp = kmalloc(sizeof(struct filedesc),
+ M_FILEDESC, M_WAITOK | M_ZERO);
Defense-in-depth: fork1's done: label should additionally gain a full
teardown for a partially-built p2 (LIST_REMOVE from allproc, crfree,
refcount_release on sigacts/p_args, vrele textvp, cache_drop textnch,
reaper_drop, kfree uidpcpu, kfree p2, atomic_add_int(&nprocs,-1),
chgproccnt(uid,-1,0)) so a future error path added after p2 allocation
cannot reintroduce the same leak.
References
sys/kern/kern_fork.c:491,551-555,724-732โ leak path.sys/kern/kern_descrip.c:2481-2486โfdcopyM_NULLOK(only failing fd op).sys/kern/kern_descrip.c:2408โfdinitusesM_WAITOK|M_ZERO(pattern to match).sys/kern/kern_exit.c:1280,1337โ the onlynprocs/chgproccntdecrements.- CWE-401; CWE-772.
Timeline
- 2026-06-29 Discovered during automated file-by-file audit of
sys/kern/kern_fork.c. - pending Reported to DragonFlyBSD security contact.
PoC verification
Evidence pack
findings/poc/DF-0032 ยท 13 files| File | Type | Description | Size | |
|---|---|---|---|---|
| exhaust.c | trigger-source | working unprivileged trigger: fd-table amplification (dup2 to high fds) drives M_FILEDESC to its ks_limit, fdcopy's M_NULLOK kmalloc returns NULL, fork() returns ENOMEM, leaking nprocs/struct-proc | 4.9 KB | view raw |
| exhaust_slow.c | trigger-source | slow variant used for parallel kernel-state sampling that caught M_FILEDESC hitting ~176M at the failure moment | 1.4 KB | view raw |
| forktest.c | auxiliary | single-fork errno reporter | 773 B | view raw |
| forktest_bomb.c | auxiliary | root fork-bomb proving root fork capacity collapsed to ~272 (from ~3890) and root is fork-blocked | 1.2 KB | view raw |
| fork_leak.c | trigger-source | original reviewer PoC (mmap pressure); does NOT trigger the bug - retained for provenance | 2.2 KB | view raw |
| build.sh | build-script | cc commands for all PoC binaries | 549 B | view raw |
| run.sh | run-script | runs exhaust and prints before/after malloc-type counts | 1.5 KB | view raw |
| run.log | run-log | decisive untrimmed run output: leak trigger + multi-uid accumulation to 91% maxproc exhaustion + root fork-blocked, with interpretation | 7.3 KB | view raw |
| dmesg.txt | dmesg | kernel 'maxproc limit exceeded by uid 0' (+ attacker uids) messages | 1.3 KB | view raw |
| env.txt | environment | uname, cc version, sysctls, M_FILEDESC ks_limit derivation | 1.4 KB | view raw |
| VERDICT.md | verdict | full narrative: mechanism, evidence, system-wide DoS, fix rationale | 8.2 KB | โ raw |
| README.md | readme | human-facing build/run/expected + caveats | 3.2 KB | โ raw |
| fix.diff | suggested-fix | git-apply-able (verified clean): full p2 teardown on the fdcopy-failure path + new proc_remove_allproc() helper in kern_proc.c | 3.1 KB | view raw |
DF-0032 โ PoC
fdcopy() failure in fork1() permanently leaks the child struct proc, the
system-wide nprocs counter, and the per-uid proc-count โ local unprivileged
system-wide fork-DoS (permanent until reboot). Severity Medium, CWE-401/772.
Status
REPRODUCED. See VERDICT.md for the full narrative and run.log for the
decisive evidence.
The original
fork_leak.c(mmap-pressure trigger) does not actually fire the bug โ anonymousmmapconsumes user VM, not the kernelM_FILEDESCmalloc pool.exhaust.cis the working trigger (fd-table amplification).
The bug
fork1() charges nprocs++ (kern_fork.c:415) and chgproccnt++ (:421),
kmalloc's p2 (:444), gives it uidpcpu/ucred/sigacts/textvp refs,
and proc_add_allproc(p2) (:491) โ all before fdcopy (:551). fdcopy
is the only fd op that can fail (its struct filedesc kmalloc is
M_WAITOK|M_ZERO|M_NULLOK at kern_descrip.c:2481-2486; fdinit/fdshare
and the fd_files[] array are plain M_WAITOK). On failure fork1 does
goto done (:552-554) and done: (:724-732) only releases tokens โ no
teardown of p2, no nprocs--, no chgproccnt--, no crfree/vrele. The
SIDL orphan (no lwp, no parent โ lwp_fork1 is at :674, after fdcopy) never
reaches exit, so nprocs/chgproccnt are never decremented. Each failure
permanently consumes one system-wide maxproc slot + one per-uid
RLIMIT_NPROC slot.
When fdcopy actually fails
Its M_NULLOK kmalloc returns NULL when M_FILEDESC's per-type ks_limit is
exceeded (kern_slaballoc.c:863-879). ks_limit = kmem_lim_size()/10 =
min(physmem, KvaSize)/10 (~195 MB on a 2 GB guest). Each fork() on the
RFFDG path charges M_FILEDESC for a copy of the parent's fd_files[] table
โ so growing the parent's fd table (via dup2 to high fds) amplifies the
charge: ~260 children with ~15000-fd tables push M_FILEDESC to its limit, and
the next fdcopy returns NULL โ fork() returns ENOMEM โ leak.
Build & run (unprivileged; disposable VM)
./build.sh # cc -o exhaust exhaust.c (and the other PoC binaries) ./run.sh # runs exhaust, prints before/after malloc-type counts
Expected (bug present)
[!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259 [*] summary: ok=259 eagain=51 enomem=705 other=0
After the run the proc (M_PROC) malloc-type Count is permanently elevated by
~705, while lwp and file_desc stay flat and ps ax|wc -l is unchanged
(leaked SIDL orphans are invisible). Repeating across ~4โ6 unprivileged uids
exhausts maxproc and fork-DoSes the whole system (root included) until reboot.
Expected (bug fixed)
fork() no longer returns ENOMEM (the per-type limit is never reached because
the failed allocations are rolled back, and/or M_FILEDESC no longer climbs
because the leak is gone); proc Count returns to baseline after the run.
CAUTION
Each ./exhaust run permanently consumes ~700 system-wide maxproc slots
until reboot. The guest is left ~18 % fork-exhausted after a single run. Run
vm.sh reset to clean up. Do not loop across many uids on a host you are
not prepared to reboot โ that is the full DoS.
DF-0032 โ VERDICT
Verdict: REPRODUCED. Real, unprivileged, local, system-wide fork-DoS (permanent until reboot). The reviewer-written PoC's trigger was wrong (mmap pressure), but the underlying bug is real and exploitable; a corrected trigger reproduces it reliably.
The bug (confirmed line-by-line in sys/)
In fork1() the irreversible steps happen before the only failing fd op:
| kern_fork.c | operation | undone on fdcopy failure? |
|---|---|---|
:415 |
atomic_add_int(&nprocs, 1) |
NO |
:421 |
chgproccnt(uid, 1, RLIMIT_NPROC) (per-uid) |
NO |
:444 |
p2 = kmalloc(sizeof(struct proc), M_PROC, M_WAITOK\|M_ZERO) |
NO |
:475 |
p2->p_uidpcpu = kmalloc(..., M_SUBPROC, ...) |
NO |
:491 |
proc_add_allproc(p2) (on allproc, SIDL) |
NO |
:509 |
p2->p_ucred = crhold(...) |
NO |
:521-529 |
sigacts (share-ref or kmalloc) | NO |
:536-542 |
p_textvp vref / p_textnch cache_copy |
NO |
:551 |
error = fdcopy(p1, &p2->p_fd); โ only failing fd op |
โ |
:552-554 |
if (error) { error = ENOMEM; goto done; } |
โ |
:724-732 |
done: releases only p_token/p1_token/pglock |
โ |
nprocs is decremented only at kern_exit.c:1337 and chgproccnt only at
kern_exit.c:1280 โ both require a runnable/exiting lwp, which a SIDL orphan
(no lwp, no parent linkage โ lwp_fork1 is at :674, after fdcopy) never
has. allproc scans skip SIDL procs, so ps/procstat never show them.
fdcopy is the only fd op that can fail because it is the only one whose
struct filedesc kmalloc uses M_NULLOK:
sys/kern/kern_descrip.c:2481 newfdp = kmalloc(sizeof(struct filedesc), sys/kern/kern_descrip.c:2482 M_FILEDESC, M_WAITOK|M_ZERO|M_NULLOK); sys/kern/kern_descrip.c:2483 if (newfdp == NULL) { *fpp = NULL; return (-1); }
fdinit (:2408) and fdshare use plain M_WAITOK (cannot fail), and
fdcopy's own fd_files[] array (:2504) is M_WAITOK (no M_NULLOK), so it
would panic on limit exhaustion rather than return NULL. The single clean
failure mode is the M_NULLOK newfdp.
When does that M_NULLOK kmalloc actually return NULL?
kmalloc returns NULL with M_NULLOK when the per-type ks_limit is
exceeded (kern_slaballoc.c:863-879):
ks_limit = kmem_lim_size() * 1MB / 10 (kern_slaballoc.c:371-372) kmem_lim_size() = min(physmem, KvaSize)/1MB (kern_slaballoc.c:255-263)
On this 2 GB guest: ks_limit(M_FILEDESC) = ~195 MB.
The original PoC tried to induce this with mmap+touch. That does not work:
anonymous mmap consumes user VM and physical pages, not the kernel
M_FILEDESC malloc pool. Live measurement showed M_FILEDESC unchanged
(27 KB โ 31 KB) across an mmap-pressure run. So the original PoC never triggers
the bug โ it only causes userland OOM.
The working trigger (fd-table amplification)
Each successful fork() on the RFFDG path calls fdcopy, which allocates a
copy of the parent's fd_files[] table (kern_descrip.c:2504) under
M_FILEDESC. A process that has grown its fd table large (via dup2 to high
fds) therefore forces every child's fdcopy to charge M_FILEDESC for a large
(~hundreds-of-KB) fd_files array. With a ~15000-entry fd table, each child
costs ~700 KB of M_FILEDESC; ~260 such children push M_FILEDESC to its
~195 MB limit. At that point the next fdcopy's M_NULLOK newfdp kmalloc
returns NULL โ fdcopy returns -1 โ fork1 does goto done โ leak.
exhaust.c does exactly this and is fully unprivileged.
Evidence (all in this folder)
run.log is the decisive record. Highlights:
$ ./exhaust [*] grew fd table to fd=14976 (fd_files[] ~234KB per fdcopy) [!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259 [*] summary: ok=259 eagain=51 enomem=705 other=0
Parallel root sampling during the slow variant caught the failure moment:
[t=57] file_desc=18.0M proc=51 [t=59] file_desc=176M proc=261 <- M_FILEDESC hit its ~195M ks_limit
The leak is confirmed by the kernel malloc-type counters (the leaked structs are never freed, so they persist):
| type | baseline | after one run | meaning |
|---|---|---|---|
proc (M_PROC) |
25 | 744 | +719 struct proc permanently leaked |
subproc (uidpcpu) |
48 | 1450 | +1402 p_uidpcpu leaked |
lwp |
34 | 34 | unchanged โ leak is before lwp_fork1 (:674) |
file_desc |
28 | 28 | unchanged โ newfdp returned NULL, no filedesc made |
ps ax \| wc -l |
146 | 146 | leaked SIDL orphans are invisible to ps |
lwp/file_desc being flat is the fingerprint that pins the leak to the
exact point the code trace predicts: fdcopy failure (:551) after p2
was put on allproc (:491) but before lwp_fork1 (:674). If the leak
were anywhere else, one of those counters would move.
System-wide impact (DoS demonstrated)
nprocs is a global counter; the leaked slots reduce fork capacity for
every user, including root. Because the per-uid chgproccnt is also leaked
(never decremented), one unprivileged uid can permanently burn ~700โ1000
system-wide maxproc slots before self-capping at its own RLIMIT_NPROC.
~4โ6 unprivileged uids exhaust all of maxproc=4036.
Multi-uid staged attack (proc Count โ global nprocs; ps ax frozen at 146):
baseline โ 25 maxxโ732 u1002โ1.41K u1003โ2.10K u1004โ2.79K u1005โ3.48K u1006โ3.68K (+203; system nprocs check now pre-blocks fdcopy)
Then, with nprocs permanently ~3680/4036, a root fork-bomb:
$ /root/forktest_bomb forktest_bomb: root fork() EAGAIN after 272 children (errno=35 Resource temporarily unavailable) forktest_bomb: ROOT RESULT ok=272 eagain=3 (clean system would allow ~4036)
Root can fork only ~272 children (vs ~3890 on a clean system) โ a ~93 %
collapse โ and is itself fork-blocked. dmesg corroborates with
maxproc limit exceeded by uid 0. The leaked slots are permanent (they do
not recover after the attackers exit); only a reboot clears them.
Caveats / precision
- The original PoC (
fork_leak.c) is not a valid trigger (mmap โ M_FILEDESC). It is retained for provenance;exhaust.cis the working trigger. - Full
maxprocexhaustion needs ~4โ6 unprivileged uids (single user is capped at ~1009 leaked slots by its own leaked per-uidchgproccnt). On any multi-user system (or for any user able to raiseRLIMIT_NPROC/ run from several accounts) full system-wide fork-DoS is straightforward. Even a single user permanently destroys ~18โ25 % of system fork capacity and permanently fork-blocks their own uid. - No kernel panic occurred at any point; the failure is a clean
ENOMEMleak, exactly the path cited.
Files in this folder
| file | purpose |
|---|---|
exhaust.c / exhaust |
working trigger โ fd-table amplification โ fdcopy failure โ leak |
exhaust_slow.c |
slow variant for parallel kernel-state sampling |
forktest.c, forktest_bomb.c |
prove root fork capacity collapses / root fork-blocked |
fork_leak.c |
original reviewer PoC (mmap pressure; does not trigger) |
build.sh, run.sh |
exact build / run commands |
run.log |
decisive untrimmed run output + interpretation |
dmesg.txt |
kernel maxproc limit exceeded messages (incl. uid 0) |
env.txt |
guest uname, sysctls, ks_limit derivation |
fix.diff |
git-apply-able fix (verified git apply --check clean) |
manifest.json |
machine-readable artifact catalog |
Fix
fix.diff adds a full teardown of the partially-built p2 on the fdcopy-failure
path (reversing every acquisition from :491 back through :415/:421), plus a
new symmetric proc_remove_allproc() helper in kern_proc.c (the inverse of
proc_add_allproc()). This supersedes the finding markdown's primary proposal
(drop M_NULLOK from fdcopy): dropping M_NULLOK would convert the leak into a
panic("malloc limit exceeded") at kern_slaballoc.c:877 (worse for
availability). The teardown keeps fdcopy's clean ENOMEM failure mode and
makes fork1 correctly clean up after it โ fixing the root cause and adding
defense-in-depth for any future error path after p2 allocation.
Confirmed kernel references
- sys/kern/kern_fork.c:415
- sys/kern/kern_fork.c:421
- sys/kern/kern_fork.c:444
- sys/kern/kern_fork.c:475
- sys/kern/kern_fork.c:491
- sys/kern/kern_fork.c:551
- sys/kern/kern_fork.c:552
- sys/kern/kern_fork.c:554
- sys/kern/kern_fork.c:674
- sys/kern/kern_fork.c:724
- sys/kern/kern_fork.c:732
- sys/kern/kern_descrip.c:2481
- sys/kern/kern_descrip.c:2486
- sys/kern/kern_descrip.c:2504
- sys/kern/kern_slaballoc.c:863
- sys/kern/kern_slaballoc.c:873
- sys/kern/kern_exit.c:1280
- sys/kern/kern_exit.c:1337
Detail
Exploit chain
Unprivileged weaponization (not memory corruption, so no privesc chain -- pure availability DoS): (1) attacker grows its own fd table to ~15000 entries via dup2 to high fds; (2) loops fork() (default fork()==RFFDG) -- each child's fdcopy copies the parent's large fd_files[] array under M_FILEDESC, pushing the type to its ~195MB ks_limit in ~260 children; (3) subsequent forks hit fdcopy's M_NULLOK-kmalloc-returns-NULL, fork1 leaks one struct proc + nprocs++ + per-uid chgproccnt++ each time, returning ENOMEM; (4) repeat until the uid's own RLIMIT_NPROC self-caps (~700-1000 leaked system slots per uid); (5) run from ~4-6 unprivileged uids to push global nprocs to maxproc, after which NO process (including root) can fork/vfork/create threads until reboot. The leak is permanent (leaked SIDL procs never exit) and survives the attackers' exit. Demonstrated end-to-end; root fork-bomb could create only 272 children vs ~3890 on a clean system.
Evidence (decisive lines)
findings/poc/DF-0032/run.log (decisive), dmesg.txt, env.txt, VERDICT.md, manifest.json, fix.diff. Decisive lines: '[!!!] ENOMEM from fork() -- fdcopy failure leak TRIGGERED at child 259 / summary: ok=259 eagain=51 enomem=705'; parallel sample '[t=59] file_desc=176M proc=261' (M_FILEDESC hit ks_limit); leak fingerprint 'proc 25->744, lwp 34->34, file_desc 28->28, ps 146->146'; multi-uid accumulation proc 25->732->1410->2100->2790->3480->3680; root fork-bomb 'forktest_bomb: root fork() EAGAIN after 272 children ... (clean system would allow ~4036)'; dmesg 'maxproc limit exceeded by uid 0'.
PoC changes
Original fork_leak.c (mmap-pressure trigger) does NOT fire the bug -- retained for provenance. Added exhaust.c (working unprivileged trigger: dup2-grown fd table + fork RFFDG -> fdcopy ENOMEM leak), exhaust_slow.c (slow variant for parallel kernel-state sampling), forktest.c/forktest_bomb.c (prove root fork capacity collapse / root fork-blocked), VERDICT.md, run.log, dmesg.txt, env.txt, build.sh, run.sh, manifest.json, and fix.diff.
Verified recommended fix
fix.diff (verified git apply --check clean) adds a full teardown of the partially-built p2 on fork1()'s fdcopy-failure path -- stopprofclock, cache_drop textnch, vrele textvp, sigacts/args refcount_release+free, crfree ucred, proc_remove_allproc(p2), reaper_drop, free uidpcpu+p2, nprocs--, chgproccnt-- -- plus a new symmetric proc_remove_allproc() helper in kern_proc.c. This SUPERSEDES the finding markdown's primary proposal (drop M_NULLOK from fdcopy): dropping M_NULLOK would convert the leak into panic('malloc limit exceeded') at kern_slaballoc.c:877, which is worse for availability; the teardown preserves fdcopy's clean ENOMEM failure mode and fixes the actual missing-cleanup root cause with defense-in-depth for any future post-allproc error path.
Verdict
REPRODUCED. fork1()'s RFFDG path bumps nprocs (kern_fork.c:415), chgproccnt (:421), kmallocs p2 (:444), gives it uidpcpu/ucred/sigacts/textvp refs, and calls proc_add_allproc(p2) (:491) all BEFORE fdcopy (:551); fdcopy is the only fd op that can fail (its struct-filedesc kmalloc is M_WAITOK|M_ZERO|M_NULLOK at kern_descrip.c:2481), and on failure fork1 does goto done (:552-554) whose done: (:724-732) only releases tokens -- no p2 teardown, no nprocs--, no chgproccnt--, no crfree/vrele. The SIDL orphan (no lwp; lwp_fork1 is at :674, after fdcopy) never reaches exit, so the counters are never decremented. The original PoC's mmap trigger is wrong (mmap consumes user VM, not the M_FILEDESC pool), but the bug is real: a corrected unprivileged trigger (grow the fd table via dup2, then fork RFFDG so each child's fdcopy charges M_FILEDESC for a large fd_files[] copy) drives M_FILEDESC to its ~195MB per-type ks_limit (kern_slaballoc.c:863-879), fdcopy's M_NULLOK kmalloc returns NULL, and fork() returns ENOMEM ~700x, permanently leaking ~700 struct proc + system maxproc slots per run. Confirmed empirically: M_PROC Count 25->744 (leaked), M_LWP and M_FILEDESC flat (pins the leak to exactly fdcopy failure), ps ax unchanged (SIDL orphans invisible); multi-uid accumulation to ~3680/4036 nprocs collapsed root's fork capacity to ~272 children and fork-blocked root (dmesg 'maxproc limit exceeded by uid 0').