DragonFlyBSD Kernel Audit
← dashboard
DF-0079

Unprivileged local DoS via u_int truncation of iov_len in /dev/null and /dev/zero write (infinite kernel loop)

Field Value
ID DF-0079
Status new
Severity Medium
CVSS 3.1 CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:U/C:N/I:N/A:H
CWE CWE-835 Loop with Unreachable Exit Condition; CWE-197 Integer Truncation
File sys/kern/kern_memio.c
Lines 292-382
Area kern (/dev/mem, /dev/null, /dev/zero drivers)
Confidence certain
Discovered 2026-06-30
Reported pending

Summary

In mmrw(), the per-iteration byte count c is declared u_int (32-bit) (kern_memio.c:225), while iov->iov_len is size_t (64-bit on amd64). The /dev/null write path assigns c = iov->iov_len; directly with no clamp (:298); /dev/zero does the same (:364).

When the caller supplies a length whose low 32 bits are zero (e.g. exactly 2³²), c truncates to 0. After the switch, the bookkeeping at :379-382 performs iov->iov_len -= c (subtracting 0) and uio->uio_resid -= c (subtracting 0), leaving the loop state unchanged. The while (uio->uio_resid > 0 && error == 0) predicate at :232 is still true, so mmrw spins forever in kernel context.

The upper-layer sys_write path does not clamp nbyte below 2³²: it only rejects (ssize_t)nbyte < 0 (sys_generic.c:336-337), so iov_len = 2³² passes through. Worse, on the /dev/null write path no uiomove/copyin is ever issued, so the user buffer pointer is never validated — the attacker can pass an arbitrary (even unmapped) address.

/dev/null is created mode 0666 (:842) and /dev/zero mode 0666 (:848), so this is reachable by any local unprivileged user. A single syscall wedges one kernel thread indefinitely; N syscalls wedge N CPUs.

Widen c to size_t throughout mmrw (and adjust the (int)c casts passed to uiomove), or clamp c to a bounded value in the /dev/null and /dev/zero write cases:

--- a/sys/kern/kern_memio.c
+++ b/sys/kern/kern_memio.c
@@ case 2:              /* /dev/null */
    if (uio->uio_rw == UIO_READ)
        return (0);
-   c = iov->iov_len;
+   c = min(iov->iov_len, PAGE_SIZE);
    break;

Apply the same clamp to case 12 (/dev/zero write, :363-366) and case 1 (/dev/kmem, :264). The cleanest fix is to make c size_t everywhere.

Proof of concept

See findings/poc/DF-0079/. A one-line write(fd, buf, 0x100000000ULL) to /dev/null pegs a CPU forever.

Timeline

  • 2026-06-30 Discovered during automated file-by-file audit of sys/kern/kern_memio.c.
  • pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0079 · 11 files
FileTypeDescriptionSize
df0079.c trigger-source Minimal unprivileged trigger: write(/dev/null, (void*)0x1, 0x100000000). Fork-N mode to wedge N CPUs. 2.7 KB view raw
watch_df0079.sh observer Serial-console watcher: polls pgrep df0079 and writes ps/top to /dev/ttyd0 (survives reset). 730 B view raw
build.sh build-script cc -o df0079 df0079.c 196 B view raw
run.sh run-script Runs ./df0079 [N]; documents the DoS and observation guidance. 1.2 KB view raw
build.log build-log Clean build output + /dev/null & /dev/zero perms (0666) + vulnerable source excerpt. 567 B view raw
run.log run-log Host-side DoS timeline (ssh unreachable t+1s) + ps excerpt from decisive run. 2.6 KB view raw
serial_wedge_capture.txt run-log Per-iteration ps/top from serial watcher: pid 852, UID 1001, STAT R0, cputime 0.50s -> 20.56s, 50% system (one core wedged). 4.6 KB view raw
env.txt environment uname, cc, ncpu, device perms, vulnerable line citations. 1.8 KB view raw
fix.diff suggested-fix One-line root-cause fix: widen u_int c -> size_t c at kern_memio.c:225 (closes /dev/null, /dev/zero AND /dev/kmem). 268 B view raw
VERDICT.md verdict Full narrative: mechanism (path:line each hop), reproduced evidence, exploit ceiling, fix rationale. 6.6 KB ↓ raw
README.md readme Human summary: status, root cause, build/run, expected result, evidence index. 3.7 KB ↓ raw
README.md readme Human summary: status, root cause, build/run, expected result, evidence index.
↓ download raw

DF-0079 PoC — /dev/null (and /dev/zero) infinite kernel loop DoS

Status: REPRODUCED (trivial unprivileged local full-system DoS)

A single write(fd, buf, (size_t)1<<32) to world-writable /dev/null (mode 0666) by an unprivileged user pegs one CPU at 100% in kernel context forever. The write() syscall never returns; the process is unkillable from userspace; only a reboot recovers. Forking N copies (one per CPU) wedges all cores.

Root cause (verified in audited master DEV sys/kern/kern_memio.c)

In mmrw(), the per-iteration byte count c is declared u_int (32-bit) at kern_memio.c:225, while iov->iov_len is size_t (64-bit). The /dev/null write path assigns c = iov->iov_len; directly with no clamp (:298); /dev/zero write does the same (:364).

When the caller passes a length whose low 32 bits are zero (e.g. exactly 2³² = 0x100000000), c truncates to 0. After the switch, the bookkeeping at :379-382 performs iov->iov_len -= c (subtracting 0) and uio->uio_resid -= c (subtracting 0), leaving the loop state unchanged. The while (uio->uio_resid > 0 && error == 0) predicate at :232 is still true, so mmrw spins forever in kernel context.

The early if (iov->iov_len == 0) { ... continue; } guard at :234 does NOT trip, because it compares the full 64-bit iov_len (= 2³², not 0).

The upper-layer sys_write does not clamp nbyte below 2³²: it only rejects (ssize_t)nbyte < 0 (sys_generic.c:336), so iov_len = 2³² passes through. Worse, on the /dev/null write path no uiomove/copyin is ever issued, so the user buffer pointer is never validated — the attacker can pass an arbitrary (even unmapped) address; the PoC passes (void *)0x1.

Build & run

cc -o df0079 df0079.c          # or: ./build.sh
./df0079                       # pegs 1 CPU forever; or ./run.sh
./df0079 4                     # fork 4 copies to wedge 4 CPUs

Run as any local user/dev/null and /dev/zero are mode 0666 (kern_memio.c:842/:847). No privilege required.

Expected result (on a vulnerable kernel)

Each invocation calls write(/dev/null, buf, 0x100000000) and never returns. The kernel thread spins in mmrw at :232. Observation (serial console, since ssh itself gets starved) shows the wedged process in state R0 (running on CPU, not blocked) with cputime climbing ~1.18 s per wall second (= 100% of one core) indefinitely, and top reporting one CPU fully in sys. On a 2-CPU guest a single wedge typically makes the box unresponsive to ssh within ~1 s (the wedged CPU also services the network IRQ). Recovery is a hard reset only.

Verified evidence (in this folder)

  • run.log — host-side DoS timeline (ssh unreachable t+1 s) + ps excerpt.
  • serial_wedge_capture.txt— per-iteration ps/top from a serial-console watcher (pid 852, UID 1001, STAT R0, cputime 0.50 s → 20.56 s).
  • build.log — clean build + /dev/null//dev/zero perms + source excerpt.
  • env.txt — guest uname, cc, ncpu, device perms, vulnerable lines.
  • VERDICT.md — full narrative + path:line mechanism + fix.
  • fix.diff — one-line root-cause fix: widen u_int csize_t c.
  • watch_df0079.sh — serial-console observer (writes ps/top to /dev/ttyd0).
  • manifest.json — artifact catalog.

Notes

  • The same bug affects /dev/zero write (:364) and /dev/kmem (:264, root-only). The fix in fix.diff (widen c to size_t) closes all three.
  • The wedged process cannot be killed (SIGKILL/SIGTERM are never delivered: the thread is in an unyielding kernel loop with no signal-check point), so the only recovery is a reboot.
VERDICT.md verdict Full narrative: mechanism (path:line each hop), reproduced evidence, exploit ceiling, fix rationale.
↓ download raw

DF-0079 — VERDICT

Verdict: REPRODUCED — trivial unprivileged local full-system DoS

A single write(/dev/null, buf, (size_t)1<<32) by an unprivileged user (uid 1001, not in wheel) wedges one CPU at 100% in kernel context forever. The syscall never returns, the process is unkillable from userspace, and on a small guest the whole machine becomes unreachable within ~1 second. Forking N copies wedges N cores. Recovery is a hard reset only.

Mechanism (every hop cited path:line, confirmed in audited master DEV)

  1. sys_write (sys/kern/sys_generic.c:336) only rejects (ssize_t)nbyte < 0. nbyte = 2³² is positive as a 64-bit ssize_t, so it is accepted. sys_generic.c:340 sets aiov.iov_len = uap->nbyte (= 2³²) and :344 sets auio.uio_resid = uap->nbyte (= 2³²). No clamping anywhere.

  2. The write reaches mmwritemmrw (sys/kern/kern_memio.c:222). The per-iteration byte count is declared u_int c; (32-bit) at :225.

  3. mmrw enters while (uio->uio_resid > 0 && error == 0) at :232. The early guard if (iov->iov_len == 0) { ... continue; } at :234 does NOT fire because the full 64-bit iov_len is 2³², not 0.

  4. For /dev/null (minor 2), case 2: at :292 returns early on read (:296-297) and on write executes c = iov->iov_len; at :298 — a direct size_tu_int truncation. With iov_len = 0x100000000 the low 32 bits are zero, so c = 0. There is no uiomove/copyin on this path (the user buffer pointer (void*)0x1 is never dereferenced), so no EFAULT rescues us.

  5. After break at :299, control falls to the bookkeeping at :377-382: iov->iov_base += c (+= 0), iov->iov_len -= c (-= 0 → still 2³²), uio->uio_offset += c (+= 0), uio->uio_resid -= c (-= 0 → still 2³²).

  6. The while predicate at :232 is still true (uio_resid == 2³² > 0, error == 0). Steps 3–5 repeat with zero net change to the loop state. mmrw spins forever in kernel context on the calling CPU.

  7. The mem cdev is D_MPSAFE | D_QUICK (:85) and mmrw holds no lock while spinning, so the wedge is a pure unyielding tight kernel loop — not a lockup. It never blocks, never calls lwkt_yield/uiomove/tsleep, and never reaches a signal-check point, so: - the thread is stuck in state R (running) consuming ~100% of one CPU; - it cannot be preempted or signalled (SIGKILL never takes effect); - if that CPU also services the network IRQ (vtnet), sshd is starved and the guest becomes unreachable within ~1 s.

/dev/null and /dev/zero are created mode 0666 (:842/:847), so the attack is reachable by any local user. /dev/zero write (case 12, :363-365) has the identical c = iov->iov_len truncation and the same infinite loop. (/dev/kmem :264 has it too but is root-only.)

Reproduced evidence

  • serial_wedge_capture.txt — a serial-console watcher (writing ps/top to /dev/ttyd0, which lands in the host boot.log and survives reset) captured the wedged process across 18 iterations. Decisive excerpt: PID PPID STAT UID %CPU TIME COMMAND 852 1 R0 1001 0.0 0:00.50 ./df0079 (t+0.0s) 852 1 R0 1001 0.0 0:01.68 ./df0079 (t+0.7s) ... 852 1 R0 1001 0.0 0:20.56 ./df0079 (t+19.5s) CPU states: 0.0% user, 0.0% nice, 50.0% system, 0.0% interrupt, 50.0% idle STAT R0 = running on CPU (not blocked); UID 1001 = unprivileged maxx; cputime grows ~1.18 s per wall-second = 100% of one of two CPUs; residual never drains. (50% system / 50% idle = one CPU fully wedged; the fork-N variant takes the rest.)
  • run.log — host-side timeline: a single detached ./df0079 (one process) made the guest unresponsive to ssh within 1 second; ssh stayed unreachable for the full 10 s observation window. No kernel panic (clean starvation, per boot.log). Only vm.sh reset recovered the guest. Reproduced across three independent runs (each requiring a reset).
  • build.log, env.txt — clean build; /dev/null & /dev/zero both crw-rw-rw- (0666); the vulnerable u_int c; decl confirmed at :225.

Exploit chain / weaponization

Not a memory-corruption primitive — there is no corruption, only an infinite loop — so there is no escalation chain. The realistic impact ceiling is reliable unprivileged local full-system denial of service: - One write() per CPU pegs every core at 100% in-kernel and never returns. - The wedged threads are unkillable (no signal-check point), so even an administrator cannot recover short of a reboot. - Trivially scriptable: a one-line attacker (python -c 'import os; os.write(os.open("/dev/null",1),b"x"*(1<<32))' or the C PoC) run by any user, or auto-started at login by a compromised low-priv account, takes the whole machine down indefinitely.

Fix (in fix.diff)

Root-cause, one-line fix: widen the per-iteration byte count from 32-bit to 64-bit so c = iov->iov_len cannot truncate:

--- a/sys/kern/kern_memio.c
+++ b/sys/kern/kern_memio.c
@@ -222,7 +222,7 @@
 mmrw(cdev_t dev, struct uio *uio, int flags)
 {
    int o;
-   u_int c;
+   size_t      c;

With c being size_t, c = iov->iov_len (= 2³²) no longer truncates; the bookkeeping at :380/:382 subtracts the full 2³², draining uio_resid to 0 in a single iteration, so the while predicate at :232 exits and the write returns normally. This closes /dev/null (:298), /dev/zero (:364) and /dev/kmem (:264) in one change. All existing function-argument uses of c (uiomove/read_random/add_buffer_randomness_src at :253/:289/:309/:314/:319/:328/:354/:372) operate on values already bounded by min(…, PAGE_SIZE), so widening c to size_t compiles cleanly and changes no bounded-path behaviour.

This supersedes the finding markdown's per-case c = min(iov->iov_len, PAGE_SIZE) clamp proposal: that also works for /dev/null and /dev/zero but would need separate edits at three sites and would not fix the latent /dev/kmem (:264) truncation. Widening c is the single-change root-cause fix the finding itself names as "cleanest".

PoC changes made during verification

  • df0079.c: rewritten with an explanatory header citing every relevant kern_memio.c line, a trigger() helper, and a fork-N mode. The core trigger is unchanged from the original PoC: write(fd, (void*)0x1, 0x100000000ULL).
  • Added watch_df0079.sh (serial-console observer), build.sh, run.sh, env.txt, VERDICT.md, this README.md, fix.diff, manifest.json, and the full logs (build.log, run.log, serial_wedge_capture.txt).

Confirmed kernel references

Detail

Exploit chain

Not a memory-corruption class (no corruption, only an unyielding tight kernel loop) so there is no escalation chain -- the impact ceiling is reliable unprivileged local full-system denial of service. Weaponization is trivial and unprivileged: any local user runs write(open("/dev/null",1), ptr, (size_t)1<<32) once per CPU (or the PoC's fork-N mode) to peg every core in-kernel; because the wedged threads never reach a signal-check point they cannot be killed even by root, so the machine is down until reboot. A compromised low-priv account could auto-trigger this at login for permanent DoS.

Evidence (decisive lines)

Decisive proof is in serial_wedge_capture.txt (serial-console watcher writing ps/top to /dev/ttyd0 -> host boot.log, which survives reset): pid 852, PPID 1, UID 1001 (unprivileged maxx), STAT R0, COMMAND ./df0079, cputime 0:00.50 -> 0:20.56 (~1.18s CPU per 1.0s wall = 100% of one core), top 'CPU states: 50.0% system, 50.0% idle' on a 2-CPU guest. run.log has the host-side timeline: a single detached ./df0079 made ssh UNREACHABLE at t=1s and stayed unreachable through t=10s (only vm.sh reset recovered). build.log shows clean build + /dev/null & /dev/zero are crw-rw-rw- (0666). env.txt records uname/cc/ncpu/perms and the vulnerable source lines. VERDICT.md has the full path:line mechanism walkthrough.

PoC changes

Rewrote df0079.c with an explanatory header citing every relevant kern_memio.c line, a trigger() helper, and a fork-N mode (core trigger unchanged: write(fd,(void*)0x1,0x100000000ULL)). Added watch_df0079.sh (serial-console observer that polls pgrep df0079 and dumps ps/top to /dev/ttyd0 so evidence survives the wedge+reset), build.sh, run.sh, env.txt, VERDICT.md, README.md, fix.diff, manifest.json, and full logs (build.log, run.log, serial_wedge_capture.txt).

Verified recommended fix

In sys/kern/kern_memio.c:225 widen the per-iteration byte count from u_int c; to size_t c; (one-line root-cause fix in fix.diff, validated git-apply-able) so c = iov->iov_len at :298/:364/:264 cannot truncate; a 2^32 write then drains uio_resid in one iteration and the loop exits normally. This supersedes the finding markdown's per-case c = min(iov->iov_len, PAGE_SIZE) clamp (which also works for /dev/null & /dev/zero but needs three edits and leaves the latent /dev/kmem :264 truncation) -- widening c is the single-change fix the finding itself names as 'cleanest' and closes all three sites.

Verdict

REPRODUCED decisively. The cited path in sys/kern/kern_memio.c is unchanged on master DEV: mmrw() declares the per-iteration byte count as 32-bit u_int c; (:225); the /dev/null write case does c = iov->iov_len; (:298) and /dev/zero write does the same (:364) with no clamp, so a 64-bit iov_len of exactly 2^32 (low 32 bits == 0) truncates c to 0; the bookkeeping at :380/:382 then subtracts 0 from iov_len/uio_resid, leaving the while (uio->uio_resid > 0) predicate (:232) true forever (the :234 guard compares the full 64-bit iov_len==2^32, not 0, so it doesn't trip). sys_write only rejects (ssize_t)nbyte<0 (sys_generic.c:336) so 2^32 passes; /dev/null & /dev/zero are mode 0666 (:842/:847). A single unprivileged write() by uid 1001 (maxx) was observed in state R0 (running on CPU, never blocking) with cputime climbing ~1.18s/s indefinitely (0.50s->20.56s over 18 serial-console samples), top showing one core fully in sys, and the guest became unreachable to ssh within ~1s on a 2-CPU VM. Reproduced across three independent runs, each requiring a hard reset; no panic (pure CPU starvation).