β¬’ DragonFlyBSD Kernel Audit
← dashboard
DF-0017

Unbounded recursion in kdmsg_simulate_failure overflows the kernel thread stack (remote DoS)

Field Value
ID DF-0017
Status new
Severity High
CVSS 3.1 CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H
CWE CWE-674 Uncontrolled Recursion; CWE-787 Out-of-bounds Write
File sys/kern/kern_dmsg.c
Lines 1321-1351, 917, 1255, 555
Area kern
Confidence certain
Discovered 2026-06-29
Reported pending

Summary

kdmsg_simulate_failure() recurses without any depth limit through the kdmsg_state subq child tree. The DMSG protocol lets a peer build an arbitrarily deep parent→child chain by sending CREATE messages whose circuit field references the previously-created state's msgid. Triggering cleanup — a DELETE on the chain root, or simply closing the connection — drives unbounded recursion that overflows the 16 KB LWKT kernel thread stack, panicking or corrupting the kernel. The kernel does not verify DMSG CRCs on receive, so a peer that can reach a DMSG link (HAMMER2 cluster network via the userland relay daemon, or locally via DIOCRECLUSTER on a disk device node) can forge the triggering messages.

Root cause

sys/kern/kern_dmsg.c:1321-1351 β€” kdmsg_simulate_failure:

void
kdmsg_simulate_failure(kdmsg_state_t *state, int meto, int error)
{
    kdmsg_state_t *substate;
    kdmsg_state_hold(state);
    if (meto)
        kdmsg_state_abort(state);
again:
    TAILQ_FOREACH(substate, &state->subq, entry) {
        if (substate->flags & KDMSG_STATE_ABORTING)
            continue;
        state->scan = substate;
        kdmsg_simulate_failure(substate, 1, error);   /* :1346 unbounded recursion */
        if (state->scan != substate)
            goto again;
    }
    kdmsg_state_drop(state);
}

The deep chain is built on the receive path: the CREATE case selects a parent state pstate from the attacker-supplied msg->any.head.circuit and links the new state as its child (:917 TAILQ_INSERT_TAIL(&pstate->subq, state, entry)). There is no depth/nesting counter anywhere in struct kdmsg_state or the CREATE path.

The recursion is triggered by:

  • kdmsg_state_cleanuprx at sys/kern/kern_dmsg.c:1255 β€” kdmsg_simulate_failure(msg->state, 0, DMSG_ERR_LOSTLINK) when a state with a non-empty subq receives a DELETE.
  • the write-thread teardown at sys/kern/kern_dmsg.c:555 on connection close.

The LWKT thread stack is UPAGES * PAGE_SIZE = 4 * 4096 = 16384 bytes (sys/sys/thread.h, sys/cpu/x86_64/include/param.h). At roughly 50–64 bytes per combined frame, about 250 nesting levels suffice to overflow; an attacker can trivially create thousands.

Threat model & preconditions

  • Attacker position: a DMSG peer. Reachable via (a) the userland hammer2 relay daemon carrying cluster network traffic (HAMMER2 clustering in use) β€” a malicious/compromised cluster peer or network MITM, since LNK_AUTH is unimplemented; or (b) locally via the DIOCRECLUSTER ioctl on a disk device node (typically requires root/operator).
  • Privileges gained or impact: guaranteed kernel stack overflow β†’ panic (full-system DoS). On configurations without an effective kernel-stack guard page, the stack overflow is also a kernel-memory-corruption primitive with code-execution potential.
  • Required config or capabilities: a reachable DMSG link. For the network vector, HAMMER2 clustering must be in use.
  • Reachability: forge CREATE messages to build the chain, then a DELETE on the root (or close the connection). CRC is not verified on receive.

Proof of concept

PoC source: findings/poc/DF-0017/kdmsg_stackoverflow.c

Builds N chained CREATE messages (circuit = previous msgid) then a root DELETE, writing them to a supplied connected DMSG fd.

Build & run

cc -o kdmsg_stackoverflow findings/poc/DF-0017/kdmsg_stackoverflow.c
# attach a connected fd to a DMSG iocom (DIOCRECLUSTER on a disk device,
#  or speak the relay protocol to the hammer2 daemon), then:
./kdmsg_stackoverflow <fd> [depth]

Expected output

Kernel panic from a stack overflow / double-fault deep in the recursion (or a stack-guard hit) once the root DELETE (or connection close) drives the unbounded recursion.

Impact

For HAMMER2-clustered deployments, a malicious peer (or network MITM, given no receive-side CRC and unimplemented auth) can crash the kernel at will β€” a remote denial of service. The defect is deterministic (no race). Rated High.

Cap circuit nesting depth in the CREATE path and, as defense-in-depth, convert the recursive traversals in kdmsg_simulate_failure (:1321) and kdmsg_state_dying to iterative (explicit-stack) walks.

--- a/sys/kern/kern_dmsg.c
+++ b/sys/kern/kern_dmsg.c
@@ -56,6 +56,8 @@

 #include <sys/dmsg.h>

+#define DMSG_MAX_CIRCUIT_DEPTH 32
+
 RB_GENERATE(kdmsg_state_tree, kdmsg_state, rbnode, kdmsg_state_cmp);
@@ -899,6 +901,14 @@
        msg->state = state;     /* inherits freerd ref */
        state->parent = pstate;
+       if (pstate != &iocom->state0 &&
+           pstate->depth >= DMSG_MAX_CIRCUIT_DEPTH) {
+           kdio_printf(iocom, 1, "circuit nesting too deep (%d)\n",
+                   pstate->depth);
+           error = EINVAL;
+           break;
+       }
+       state->depth = (pstate == &iocom->state0) ? 0 : pstate->depth + 1;
        KKASSERT(state->iocom == iocom);

This requires adding a depth field to struct kdmsg_state in sys/sys/dmsg.h. (Defensive follow-up: rewrite the two recursive tree walks iteratively so a wide/unexpected tree cannot overflow the stack.)

References

Timeline

  • 2026-06-29 Discovered during automated file-by-file audit of sys/kern/kern_dmsg.c.
  • pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0017 Β· 10 files
FileTypeDescriptionSize
trigger.c trigger-source self-contained reproducer: disk open + socketpair + DIOCRECLUSTER setup + forged CREATE/DELETE chain -> stack overflow 6.0 KB view raw
kdmsg_stackoverflow.c trigger-source original wire-format builder (retained; expects caller-supplied fd) 3.6 KB view raw
build.sh build-script cc -o trigger trigger.c -lpthread 336 B view raw
run.sh run-script frees disk iocom (pkill hammer2) then runs ./trigger 300 2.8 KB view raw
VERDICT.md verdict full narrative: mechanism, setup, evidence, reachability, fix 8.8 KB ↓ raw
README.md readme how to build/run/interpret 3.0 KB ↓ raw
build.log build-log final successful build, full output + env 1.0 KB view raw
run.log run-log control (depth=5, no panic) + panic (depth=300, double fault) decisive runs 4.0 KB view raw
panic.txt panic-signature double-fault panic from boot.log (2 reproducible runs) 2.8 KB view raw
env.txt environment uname, cc, sysctl, LWKT stack size, disk perms, hammer2 daemon 2.1 KB view raw
README.md readme how to build/run/interpret
↓ download raw

DF-0017 β€” PoC (REPRODUCED)

Unbounded recursion in kdmsg_simulate_failure() / kdmsg_state_dying() (sys/kern/kern_dmsg.c:1346 / :1428) overflows the 16 KB LWKT kernel thread stack via a deep DMSG circuit-nesting chain. See VERDICT.md for the full analysis.

Status

REPRODUCED β€” kernel stack-overflow DoS (double-fault panic) on DragonFly master DEV v6.5.0.1712.g89e6a-DEVELOPMENT. Prior inconclusive (could not obtain a connected DMSG iocom fd) is resolved: trigger.c performs the disk-open + socketpair + DIOCRECLUSTER setup itself.

Files

  • trigger.c β€” self-contained reproducer (the one to use). Opens /dev/vbd0, builds a socketpair, attaches one end to the kernel disk DMSG iocom via DIOCRECLUSTER, writes N chained CREATE messages (circuit nesting) + a root DELETE. Includes a drain thread.
  • kdmsg_stackoverflow.c β€” the original wire-format builder (retained as the minimal trigger reference; it expects the caller to supply the fd).
  • build.sh / run.sh β€” one-command build + run (run.sh also frees the disk iocom by killing the boot-time hammer2 daemon β€” see below).
  • VERDICT.md β€” full narrative + evidence.
  • build.log, run.log, panic.txt, env.txt β€” untrimmed logs.

Build (on the DragonFly guest, as root)

./build.sh        # cc -o trigger trigger.c -lpthread

Run (as root)

# (once, to capture the panic on a headless guest)
echo 'console="comconsole"' >> /boot/loader.conf && reboot

./run.sh          # default depth 300; frees the disk iocom, then fires
# or:  ./run.sh 300

Why run.sh kills the hammer2 daemon

On this image the userland hammer2 cluster daemon (pid "hammer2") connects every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898) and relays peer DMSG traffic (TCP 987) into the kernel. That leaves each disk iocom's reader blocked in fp_read(), which deadlocks a follow-on DIOCRECLUSTER in kdmsg_iocom_reconnect() (kern_dmsg.c:141). Killing the daemon breaks the pipes, the readers exit, and a fresh DIOCRECLUSTER succeeds. The root fs (hammer2 on vbd0s1d) has its own kernel iocom and is unaffected. run.sh does pkill -9 -x hammer2 first.

Expected output

  • Bug present: kernel panic β€” Fatal double fault (total stack exhaustion, page-aligned rsp), guest freezes in DDB. vm.sh reset to recover.
  • Control (./run.sh 5): no panic, trigger exits 0 (shallow chain).
  • Fixed kernel (depth cap): trigger exits 0, no panic at any depth.

Reachability / impact

  • Local (DIOCRECLUSTER): needs to open a raw disk node (/dev/vbd0 is root:operator crw-r-----); unprivileged users are denied. β†’ root/operator.
  • Remote (HAMMER2 cluster relay, TCP 987): LNK_AUTH unimplemented, receive-side CRC not checked β†’ a network peer/MITM can forge the chain. β†’ unauthenticated DoS for HAMMER2-clustered deployments.
  • No guard page on the LWKT stack β†’ also an (uncontrolled) kernel memory-corruption primitive; realistically reliable impact = DoS.
VERDICT.md verdict full narrative: mechanism, setup, evidence, reachability, fix
↓ download raw

DF-0017 β€” VERDICT

Status: REPRODUCED (kernel stack-overflow DoS; also an uncontrolled kernel memory-corruption primitive). Prior inconclusive resolved: the iocom-fd setup gap is solved and the bug fires on master DEV.

One-line

Unbounded recursion in kdmsg_simulate_failure() / kdmsg_state_dying() overflows the 16 KB LWKT kernel thread stack when a DMSG peer builds a deep circuit-nesting chain. A 300-deep chain deterministically panics the kernel with a double fault (total stack exhaustion); a 5-deep chain (control) does not. Reachable locally via DIOCRECLUSTER (root/operator) and remotely via the unauthenticated HAMMER2 cluster relay (TCP 987).

The bug, confirmed in source

struct kdmsg_state (sys/sys/dmsg.h:735) has a subq child list and no depth/nesting counter anywhere. Two routines walk that tree recursively with no depth bound:

The deep chain is built on the receive path. The CREATE case (kern_dmsg.c:850 case DMSGF_CREATE:) selects a parent state pstate from the attacker-supplied msg->any.head.circuit (:868-879, RB_FIND by msgid in staterd_tree) and links the new state as its child at :917 TAILQ_INSERT_TAIL(&pstate->subq, state, entry). There is no depth check. So a peer sends CREATE #1 with circuit=0 (child of state0), CREATE #2 with circuit=1 (child of state #1), ... CREATE #N with circuit=N-1, building an N-deep linear chain.

The recursion is triggered by teardown:

  • kdmsg_state_cleanuprx() β€” sys/kern/kern_dmsg.c:1236. When a state with a non-empty subq receives a DELETE (:1247), it calls kdmsg_simulate_failure(msg->state, 0, DMSG_ERR_LOSTLINK) at :1255.
  • Write-thread teardown β€” sys/kern/kern_dmsg.c:547-555. On connection close it calls kdmsg_simulate_failure(&iocom->state0, 0, DMSG_ERR_LOSTLINK) at :555 for any leftover states.

Inside kdmsg_simulate_failure(state, meto=1, ...) each level also calls kdmsg_state_abort(state) (:1336) which calls kdmsg_state_dying(state) (:1368) β€” itself an unbounded recursive walk over the remaining chain. So the overflow is driven by both recursive functions.

The receive path does not verify the DMSG CRCs: kdmsg_iocom_thread_rd() (kern_dmsg.c:326-394) checks only the magic (:343) and sizes; hdr_crc/aux_crc are computed only on transmit (:2009-2012). So forged wire messages are accepted. LNK_AUTH is unimplemented (no kernel-layer auth on cluster links).

LWKT kernel thread stack = LWKT_THREAD_STACK = UPAGES*PAGE_SIZE = 4*4096 = 16384 bytes (sys/sys/thread.h:472, sys/cpu/x86_64/include/param.h:126). It has no guard page (kmem_alloc_stack in sys/vm/vm_extern.h:131-136 is just kmem_alloc1(..|KM_STACK) with no guard mapping), so the overflow corrupts adjacent kernel memory before double-faulting.

Setup (the prior blocker, solved)

A DMSG iocom fd is attached to the kernel disk-iocom parser by opening a raw disk device node and issuing DIOCRECLUSTER (sys/sys/diskslice.h:99, struct disk_ioc_recluster { int fd; }). The kernel does holdfp(curthread, recl->fd, -1) (subr_diskiocom.c:118) to obtain the struct file * and passes it to kdmsg_iocom_reconnect() (subr_diskiocom.c:141). The kernel reader thread then parses whatever is written to the other end of that fd as DMSG wire messages.

The PoC (trigger.c) is self-contained: it opens /dev/vbd0, builds an AF_UNIX SOCK_STREAM socketpair, issues DIOCRECLUSTER with one end, and writes the forged CREATE/DELETE messages to the other end (a drain thread absorbs kernel replies so the writer never blocks).

The reconnect-deadlock wrinkle. On this guest the userland hammer2 cluster daemon (pid 68, hammer2: hammer2 autoconn_thread, listens on TCP 987) connects every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), leaving each disk iocom's reader blocked in fp_read() on a pipe to the daemon. A follow-on DIOCRECLUSTER then deadlocks inside kdmsg_iocom_reconnect() (kern_dmsg.c:141, while (msgrd_td || msgwr_td)) because the stuck reader never wakes to notice KILLRX. Killing the daemon (pkill -9 -x hammer2) breaks the pipes, the readers get EOF and exit (msgrd_td -> NULL), and a fresh DIOCRECLUSTER then succeeds. The hammer2 root fs has its own kernel iocom (hmp->iocom) and is unaffected by killing the userland daemon β€” verified (root fs stays rw, ssh stays up). run.sh performs this kill as a documented setup step.

Evidence (decisive)

Run as root on DragonFly v6.5.0.1712.g89e6a-DEVELOPMENT (X86_64_GENERIC), kernel console switched to serial (console="comconsole") so the panic is captured in dfbsd-qemu/boot.log.

  • Control β€” ./trigger 5: trigger exits 0, guest stays up, dmesg clean. Shallow chain, no overflow. (Proves the panic is depth-driven, not an artefact of the DIOCRECLUSTER setup.)
  • Panic β€” ./trigger 300: guest freezes; serial console shows:

DOUBLE FAULT Fatal double fault rip = 0xffffffff806564d4 rsp = 0xfffff800ab38f000 (page-aligned == rbp: total stack exhaustion) panic: double fault dblfault_handler() at dblfault_handler+0x10c dblfault_handler() at dblfault_handler+0x10c Stopped at Debugger+0x7c: movb $0,0xbd77f9(%rip) db>

Reproduced twice (identical signature; only the exhausted-stack address differs). Full logs: run.log, panic.txt.

A double fault with a page-aligned rsp is the canonical signature of a kernel thread stack overflow on x86: the recursion exhausts the 16 KB stack, the stack pointer runs off the allocation, and the next push/fault finds no usable stack to dispatch even the page-fault handler -> double fault -> panic. The trace shows only dblfault_handler() because the original kdmsg frames are destroyed by the stack exhaustion (no recoverable frame to walk). There is no guard page, so the overflow is also a memory-corruption primitive that reliably manifests as a DoS.

Exploit chain

Not developed to root. The primitive is an uncontrolled kernel stack overflow into adjacent kernel memory (no guard page). Converting it to reliable code execution would require: (a) controlling the slab/heap layout adjacent to a chosen LWKT thread stack to place a victim object (function pointer / ucred *) at the overflow offset, and (b) surviving long enough past the overflow to dereference the corrupted object before the double-fault β€” the overflow happens inside a deep kernel recursion, so the double-fault lands almost immediately, making heap-grooming extremely fragile. The realistically reliable, defensible impact is the DoS (deterministic kernel panic). The original trigger PoC (kdmsg_stackoverflow.c) is retained as the minimal wire-format builder; trigger.c is the self-contained reproducer (setup + trigger).

Privilege / reachability note

  • Local vector (DIOCRECLUSTER): requires opening a raw disk node (/dev/vbd0 etc.), which is root:operator crw-r-----. Unprivileged maxx (uid 1001, not in operator) is denied (Permission denied confirmed). So the local vector needs root or operator.
  • Remote vector (HAMMER2 cluster relay): the hammer2 daemon listens on TCP 987 and relays peer DMSG traffic into the kernel disk iocom. LNK_AUTH is unimplemented and receive-side CRC is not checked, so a network peer (or MITM) can forge the chain-building CREATE messages. For HAMMER2-clustered deployments this is a remote, unauthenticated DoS β€” which is why the finding is rated High.

What changed vs the original PoC

The original kdmsg_stackoverflow.c only built the wire-format messages and expected the caller to supply a "connected DMSG fd" β€” which it never showed how to obtain, so it could not run (inconclusive). The new trigger.c is fully self-contained: it performs the disk open + socketpair + DIOCRECLUSTER setup itself, adds a drain thread to avoid reply-backpressure deadlock, and drives both trigger paths (root DELETE via cleanuprx, plus connection-close via the write-thread teardown). build.sh / run.sh make it one-command reproducible (run.sh also performs the documented hammer2-daemon kill needed to free the disk iocom for a local reconnect on this guest image).

(unchanged from the finding) Cap circuit nesting depth in the CREATE path (kern_dmsg.c:850 case) by adding a depth field to struct kdmsg_state and rejecting pstate->depth >= DMSG_MAX_CIRCUIT_DEPTH; and, as defense-in-depth, convert the recursive subq walks in kdmsg_simulate_failure (:1321) and kdmsg_state_dying (:1421) to iterative (explicit-stack) traversals so a wide/unexpected tree cannot overflow the stack.

Confirmed kernel references

Detail

Exploit chain

trigger: open /dev/vbd0 (root) + socketpair + DIOCRECLUSTER(subr_diskiocom.c:116) to attach a kernel DMSG iocom reader to the socketpair -> primitive: write N CREATE msgs with circuit=msgid(i-1) to build an N-deep kdmsg_state child chain (kern_dmsg.c:917), then a DELETE on the chain root drives kdmsg_state_cleanuprx (kern_dmsg.c:1255) -> kdmsg_simulate_failure, an unbounded recursive subq walk (kern_dmsg.c:1346) -> outcome: 16 KB LWKT stack overflow -> double-fault panic (DoS). No root-escalation chain developed: the overflow is into uncontrolled adjacent kernel heap (no guard page) and the double-fault lands inside the deep recursion, making heap-grooming for code-exec extremely fragile; reliable impact is the DoS.

Evidence (decisive lines)

PANIC (depth=300, twice): 'DOUBLE FAULT / Fatal double fault / rsp=0xfffff800ab38f000 (page-aligned==rbp) / panic: double fault / dblfault_handler() at dblfault_handler+0x10c / Stopped at Debugger+0x7c / db>' captured on serial console (boot.log). CONTROL (depth=5): trigger exits 0, guest stays up, dmesg clean -- no panic. Full logs in run.log/panic.txt. The double-fault trace shows only dblfault_handler because the original kdmsg frames are destroyed by total stack exhaustion; the mechanism is conclusively tied to the recursion via source (no depth field; recursive calls at kern_dmsg.c:1346 and :1428).

PoC changes

Replaced the non-running scaffold with a self-contained trigger.c that performs the disk-open + socketpair + DIOCRECLUSTER setup itself (the original kdmsg_stackoverflow.c only built wire bytes and expected a caller-supplied fd, so it could never run). Added a drain thread to avoid kernel-writer backpressure deadlock, a depth CLI arg (default 300), and stderr progress markers. Added build.sh/run.sh (run.sh kills the boot-time hammer2 cluster daemon to free the disk iocom -- required because the daemon pre-connects every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), deadlocking kdmsg_iocom_reconnect at kern_dmsg.c:141). Kept kdmsg_stackoverflow.c as the minimal wire-format reference. Wrote VERDICT.md, README.md, manifest.json, and full untrimmed logs.

Verified recommended fix

Add a depth field to struct kdmsg_state and cap circuit nesting at 32 in the CREATE path (reject over-deep CREATEs with EINVAL); bounds the recursion that overflows the kernel stack. See findings/poc/DF-0017/fix.diff.

Verdict

REPRODUCED on DragonFly master DEV v6.5.0.1712.g89e6a-DEVELOPMENT. I solved the prior 'inconclusive' iocom-fd blocker: a self-contained trigger.c opens /dev/vbd0, builds an AF_UNIX socketpair, and attaches one end to the kernel disk DMSG iocom via DIOCRECLUSTER (subr_diskiocom.c:116/141), then writes a 300-deep circuit-nested CREATE chain + a root DELETE. Driving teardown hits the unbounded recursion in kdmsg_simulate_failure (kern_dmsg.c:1346) and kdmsg_state_dying (kern_dmsg.c:1428) -- struct kdmsg_state (dmsg.h:735) has no depth field -- which exhausts the 16 KB LWKT thread stack (thread.h:472) and panics with a deterministic 'Fatal double fault' (rsp page-aligned == rbp; total stack exhaustion). A control run at depth=5 does NOT panic, proving the failure is depth/recursion-driven; the panic reproduced twice identically. No guard page on the LWKT stack (vm_extern.h:131) so it is also an uncontrolled kernel memory-corruption primitive, but the reliable impact is the DoS. Local vector needs root/operator (/dev/vbd0 is root:operator, maxx is denied); the remote HAMMER2-relay vector is unauthenticated (LNK_AUTH unimplemented, receive-side CRC not verified at kern_dmsg.c:343).