Unbounded recursion in kdmsg_simulate_failure overflows the kernel thread stack (remote DoS)
| Field | Value |
|---|---|
| ID | DF-0017 |
| Status | new |
| Severity | High |
| CVSS 3.1 | CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H |
| CWE | CWE-674 Uncontrolled Recursion; CWE-787 Out-of-bounds Write |
| File | sys/kern/kern_dmsg.c |
| Lines | 1321-1351, 917, 1255, 555 |
| Area | kern |
| Confidence | certain |
| Discovered | 2026-06-29 |
| Reported | pending |
Summary
kdmsg_simulate_failure() recurses without any depth limit through the
kdmsg_state subq child tree. The DMSG protocol lets a peer build an
arbitrarily deep parentβchild chain by sending CREATE messages whose circuit
field references the previously-created state's msgid. Triggering cleanup β a
DELETE on the chain root, or simply closing the connection β drives unbounded
recursion that overflows the 16 KB LWKT kernel thread stack, panicking or
corrupting the kernel. The kernel does not verify DMSG CRCs on receive, so a
peer that can reach a DMSG link (HAMMER2 cluster network via the userland
relay daemon, or locally via DIOCRECLUSTER on a disk device node) can forge
the triggering messages.
Root cause
sys/kern/kern_dmsg.c:1321-1351 β kdmsg_simulate_failure:
void
kdmsg_simulate_failure(kdmsg_state_t *state, int meto, int error)
{
kdmsg_state_t *substate;
kdmsg_state_hold(state);
if (meto)
kdmsg_state_abort(state);
again:
TAILQ_FOREACH(substate, &state->subq, entry) {
if (substate->flags & KDMSG_STATE_ABORTING)
continue;
state->scan = substate;
kdmsg_simulate_failure(substate, 1, error); /* :1346 unbounded recursion */
if (state->scan != substate)
goto again;
}
kdmsg_state_drop(state);
}
The deep chain is built on the receive path: the CREATE case selects a parent
state pstate from the attacker-supplied msg->any.head.circuit and links the
new state as its child (:917 TAILQ_INSERT_TAIL(&pstate->subq, state, entry)).
There is no depth/nesting counter anywhere in struct kdmsg_state or the
CREATE path.
The recursion is triggered by:
kdmsg_state_cleanuprxatsys/kern/kern_dmsg.c:1255βkdmsg_simulate_failure(msg->state, 0, DMSG_ERR_LOSTLINK)when a state with a non-emptysubqreceives a DELETE.- the write-thread teardown at
sys/kern/kern_dmsg.c:555on connection close.
The LWKT thread stack is UPAGES * PAGE_SIZE = 4 * 4096 = 16384 bytes
(sys/sys/thread.h, sys/cpu/x86_64/include/param.h). At roughly 50β64 bytes
per combined frame, about 250 nesting levels suffice to overflow; an attacker
can trivially create thousands.
Threat model & preconditions
- Attacker position: a DMSG peer. Reachable via (a) the userland hammer2
relay daemon carrying cluster network traffic (HAMMER2 clustering in use) β a
malicious/compromised cluster peer or network MITM, since
LNK_AUTHis unimplemented; or (b) locally via theDIOCRECLUSTERioctl on a disk device node (typically requires root/operator). - Privileges gained or impact: guaranteed kernel stack overflow β panic (full-system DoS). On configurations without an effective kernel-stack guard page, the stack overflow is also a kernel-memory-corruption primitive with code-execution potential.
- Required config or capabilities: a reachable DMSG link. For the network vector, HAMMER2 clustering must be in use.
- Reachability: forge CREATE messages to build the chain, then a DELETE on the root (or close the connection). CRC is not verified on receive.
Proof of concept
PoC source: findings/poc/DF-0017/kdmsg_stackoverflow.c
Builds N chained CREATE messages (circuit = previous msgid) then a root
DELETE, writing them to a supplied connected DMSG fd.
Build & run
cc -o kdmsg_stackoverflow findings/poc/DF-0017/kdmsg_stackoverflow.c # attach a connected fd to a DMSG iocom (DIOCRECLUSTER on a disk device, # or speak the relay protocol to the hammer2 daemon), then: ./kdmsg_stackoverflow <fd> [depth]
Expected output
Kernel panic from a stack overflow / double-fault deep in the recursion (or a stack-guard hit) once the root DELETE (or connection close) drives the unbounded recursion.
Impact
For HAMMER2-clustered deployments, a malicious peer (or network MITM, given no receive-side CRC and unimplemented auth) can crash the kernel at will β a remote denial of service. The defect is deterministic (no race). Rated High.
Recommended fix
Cap circuit nesting depth in the CREATE path and, as defense-in-depth, convert
the recursive traversals in kdmsg_simulate_failure (:1321) and
kdmsg_state_dying to iterative (explicit-stack) walks.
--- a/sys/kern/kern_dmsg.c
+++ b/sys/kern/kern_dmsg.c
@@ -56,6 +56,8 @@
#include <sys/dmsg.h>
+#define DMSG_MAX_CIRCUIT_DEPTH 32
+
RB_GENERATE(kdmsg_state_tree, kdmsg_state, rbnode, kdmsg_state_cmp);
@@ -899,6 +901,14 @@
msg->state = state; /* inherits freerd ref */
state->parent = pstate;
+ if (pstate != &iocom->state0 &&
+ pstate->depth >= DMSG_MAX_CIRCUIT_DEPTH) {
+ kdio_printf(iocom, 1, "circuit nesting too deep (%d)\n",
+ pstate->depth);
+ error = EINVAL;
+ break;
+ }
+ state->depth = (pstate == &iocom->state0) ? 0 : pstate->depth + 1;
KKASSERT(state->iocom == iocom);
This requires adding a depth field to struct kdmsg_state in
sys/sys/dmsg.h. (Defensive follow-up: rewrite the two recursive tree walks
iteratively so a wide/unexpected tree cannot overflow the stack.)
References
sys/kern/kern_dmsg.c:1346β unbounded recursion inkdmsg_simulate_failure.sys/kern/kern_dmsg.c:917β child linkage on CREATE (circuitnesting).sys/kern/kern_dmsg.c:1255/:555β trigger paths.sys/sys/thread.hβLWKT_THREAD_STACK = 16 KB.- CWE-674 Uncontrolled Recursion; CWE-787 Out-of-bounds Write (stack).
Timeline
- 2026-06-29 Discovered during automated file-by-file audit of
sys/kern/kern_dmsg.c. - pending Reported to DragonFlyBSD security contact.
PoC verification
Evidence pack
findings/poc/DF-0017 Β· 10 files| File | Type | Description | Size | |
|---|---|---|---|---|
| trigger.c | trigger-source | self-contained reproducer: disk open + socketpair + DIOCRECLUSTER setup + forged CREATE/DELETE chain -> stack overflow | 6.0 KB | view raw |
| kdmsg_stackoverflow.c | trigger-source | original wire-format builder (retained; expects caller-supplied fd) | 3.6 KB | view raw |
| build.sh | build-script | cc -o trigger trigger.c -lpthread | 336 B | view raw |
| run.sh | run-script | frees disk iocom (pkill hammer2) then runs ./trigger 300 | 2.8 KB | view raw |
| VERDICT.md | verdict | full narrative: mechanism, setup, evidence, reachability, fix | 8.8 KB | β raw |
| README.md | readme | how to build/run/interpret | 3.0 KB | β raw |
| build.log | build-log | final successful build, full output + env | 1.0 KB | view raw |
| run.log | run-log | control (depth=5, no panic) + panic (depth=300, double fault) decisive runs | 4.0 KB | view raw |
| panic.txt | panic-signature | double-fault panic from boot.log (2 reproducible runs) | 2.8 KB | view raw |
| env.txt | environment | uname, cc, sysctl, LWKT stack size, disk perms, hammer2 daemon | 2.1 KB | view raw |
DF-0017 β PoC (REPRODUCED)
Unbounded recursion in kdmsg_simulate_failure() / kdmsg_state_dying()
(sys/kern/kern_dmsg.c:1346 / :1428) overflows the 16 KB LWKT kernel
thread stack via a deep DMSG circuit-nesting chain. See VERDICT.md for
the full analysis.
Status
REPRODUCED β kernel stack-overflow DoS (double-fault panic) on
DragonFly master DEV v6.5.0.1712.g89e6a-DEVELOPMENT. Prior
inconclusive (could not obtain a connected DMSG iocom fd) is resolved:
trigger.c performs the disk-open + socketpair + DIOCRECLUSTER setup
itself.
Files
trigger.cβ self-contained reproducer (the one to use). Opens/dev/vbd0, builds a socketpair, attaches one end to the kernel disk DMSG iocom viaDIOCRECLUSTER, writes N chained CREATE messages (circuit nesting) + a root DELETE. Includes a drain thread.kdmsg_stackoverflow.cβ the original wire-format builder (retained as the minimal trigger reference; it expects the caller to supply the fd).build.sh/run.shβ one-command build + run (run.sh also frees the disk iocom by killing the boot-time hammer2 daemon β see below).VERDICT.mdβ full narrative + evidence.build.log,run.log,panic.txt,env.txtβ untrimmed logs.
Build (on the DragonFly guest, as root)
./build.sh # cc -o trigger trigger.c -lpthread
Run (as root)
# (once, to capture the panic on a headless guest) echo 'console="comconsole"' >> /boot/loader.conf && reboot ./run.sh # default depth 300; frees the disk iocom, then fires # or: ./run.sh 300
Why run.sh kills the hammer2 daemon
On this image the userland hammer2 cluster daemon (pid "hammer2") connects
every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898)
and relays peer DMSG traffic (TCP 987) into the kernel. That leaves each
disk iocom's reader blocked in fp_read(), which deadlocks a follow-on
DIOCRECLUSTER in kdmsg_iocom_reconnect() (kern_dmsg.c:141). Killing
the daemon breaks the pipes, the readers exit, and a fresh
DIOCRECLUSTER succeeds. The root fs (hammer2 on vbd0s1d) has its own
kernel iocom and is unaffected. run.sh does pkill -9 -x hammer2 first.
Expected output
- Bug present: kernel panic β
Fatal double fault(total stack exhaustion, page-alignedrsp), guest freezes in DDB.vm.sh resetto recover. - Control (
./run.sh 5): no panic, trigger exits 0 (shallow chain). - Fixed kernel (depth cap): trigger exits 0, no panic at any depth.
Reachability / impact
- Local (
DIOCRECLUSTER): needs to open a raw disk node (/dev/vbd0isroot:operator crw-r-----); unprivileged users are denied. β root/operator. - Remote (HAMMER2 cluster relay, TCP 987):
LNK_AUTHunimplemented, receive-side CRC not checked β a network peer/MITM can forge the chain. β unauthenticated DoS for HAMMER2-clustered deployments. - No guard page on the LWKT stack β also an (uncontrolled) kernel memory-corruption primitive; realistically reliable impact = DoS.
DF-0017 β VERDICT
Status: REPRODUCED (kernel stack-overflow DoS; also an uncontrolled
kernel memory-corruption primitive). Prior inconclusive resolved: the
iocom-fd setup gap is solved and the bug fires on master DEV.
One-line
Unbounded recursion in kdmsg_simulate_failure() /
kdmsg_state_dying() overflows the 16 KB LWKT kernel thread stack when a
DMSG peer builds a deep circuit-nesting chain. A 300-deep chain
deterministically panics the kernel with a double fault (total stack
exhaustion); a 5-deep chain (control) does not. Reachable locally via
DIOCRECLUSTER (root/operator) and remotely via the unauthenticated
HAMMER2 cluster relay (TCP 987).
The bug, confirmed in source
struct kdmsg_state (sys/sys/dmsg.h:735) has a subq child list and
no depth/nesting counter anywhere. Two routines walk that tree
recursively with no depth bound:
kdmsg_simulate_failure()βsys/kern/kern_dmsg.c:1321, recursive call at:1346:kdmsg_simulate_failure(substate, 1, error);kdmsg_state_dying()βsys/kern/kern_dmsg.c:1421, recursive call at:1428:kdmsg_state_dying(scan);
The deep chain is built on the receive path. The CREATE case
(kern_dmsg.c:850 case DMSGF_CREATE:) selects a parent state pstate
from the attacker-supplied msg->any.head.circuit (:868-879,
RB_FIND by msgid in staterd_tree) and links the new state as its child
at :917 TAILQ_INSERT_TAIL(&pstate->subq, state, entry). There is
no depth check. So a peer sends CREATE #1 with circuit=0 (child of
state0), CREATE #2 with circuit=1 (child of state #1), ... CREATE #N
with circuit=N-1, building an N-deep linear chain.
The recursion is triggered by teardown:
kdmsg_state_cleanuprx()βsys/kern/kern_dmsg.c:1236. When a state with a non-emptysubqreceives a DELETE (:1247), it callskdmsg_simulate_failure(msg->state, 0, DMSG_ERR_LOSTLINK)at:1255.- Write-thread teardown β
sys/kern/kern_dmsg.c:547-555. On connection close it callskdmsg_simulate_failure(&iocom->state0, 0, DMSG_ERR_LOSTLINK)at:555for any leftover states.
Inside kdmsg_simulate_failure(state, meto=1, ...) each level also calls
kdmsg_state_abort(state) (:1336) which calls kdmsg_state_dying(state)
(:1368) β itself an unbounded recursive walk over the remaining chain.
So the overflow is driven by both recursive functions.
The receive path does not verify the DMSG CRCs:
kdmsg_iocom_thread_rd() (kern_dmsg.c:326-394) checks only the magic
(:343) and sizes; hdr_crc/aux_crc are computed only on transmit
(:2009-2012). So forged wire messages are accepted. LNK_AUTH is
unimplemented (no kernel-layer auth on cluster links).
LWKT kernel thread stack = LWKT_THREAD_STACK = UPAGES*PAGE_SIZE =
4*4096 = 16384 bytes (sys/sys/thread.h:472,
sys/cpu/x86_64/include/param.h:126). It has no guard page
(kmem_alloc_stack in sys/vm/vm_extern.h:131-136 is just
kmem_alloc1(..|KM_STACK) with no guard mapping), so the overflow
corrupts adjacent kernel memory before double-faulting.
Setup (the prior blocker, solved)
A DMSG iocom fd is attached to the kernel disk-iocom parser by opening a
raw disk device node and issuing DIOCRECLUSTER
(sys/sys/diskslice.h:99, struct disk_ioc_recluster { int fd; }). The
kernel does holdfp(curthread, recl->fd, -1) (subr_diskiocom.c:118)
to obtain the struct file * and passes it to kdmsg_iocom_reconnect()
(subr_diskiocom.c:141). The kernel reader thread then parses whatever
is written to the other end of that fd as DMSG wire messages.
The PoC (trigger.c) is self-contained: it opens /dev/vbd0, builds an
AF_UNIX SOCK_STREAM socketpair, issues DIOCRECLUSTER with one end,
and writes the forged CREATE/DELETE messages to the other end (a drain
thread absorbs kernel replies so the writer never blocks).
The reconnect-deadlock wrinkle. On this guest the userland hammer2
cluster daemon (pid 68, hammer2: hammer2 autoconn_thread, listens on
TCP 987) connects every disk iocom at boot via DIOCRECLUSTER
(sbin/hammer2/cmd_service.c:898), leaving each disk iocom's reader
blocked in fp_read() on a pipe to the daemon. A follow-on
DIOCRECLUSTER then deadlocks inside kdmsg_iocom_reconnect()
(kern_dmsg.c:141, while (msgrd_td || msgwr_td)) because the stuck
reader never wakes to notice KILLRX. Killing the daemon
(pkill -9 -x hammer2) breaks the pipes, the readers get EOF and exit
(msgrd_td -> NULL), and a fresh DIOCRECLUSTER then succeeds. The
hammer2 root fs has its own kernel iocom (hmp->iocom) and is unaffected
by killing the userland daemon β verified (root fs stays rw, ssh stays
up). run.sh performs this kill as a documented setup step.
Evidence (decisive)
Run as root on DragonFly v6.5.0.1712.g89e6a-DEVELOPMENT (X86_64_GENERIC),
kernel console switched to serial (console="comconsole") so the panic
is captured in dfbsd-qemu/boot.log.
- Control β
./trigger 5: trigger exits 0, guest stays up, dmesg clean. Shallow chain, no overflow. (Proves the panic is depth-driven, not an artefact of the DIOCRECLUSTER setup.) - Panic β
./trigger 300: guest freezes; serial console shows:
DOUBLE FAULT
Fatal double fault
rip = 0xffffffff806564d4
rsp = 0xfffff800ab38f000 (page-aligned == rbp: total stack exhaustion)
panic: double fault
dblfault_handler() at dblfault_handler+0x10c
dblfault_handler() at dblfault_handler+0x10c
Stopped at Debugger+0x7c: movb $0,0xbd77f9(%rip)
db>
Reproduced twice (identical signature; only the exhausted-stack
address differs). Full logs: run.log, panic.txt.
A double fault with a page-aligned rsp is the canonical signature of a
kernel thread stack overflow on x86: the recursion exhausts the 16 KB
stack, the stack pointer runs off the allocation, and the next push/fault
finds no usable stack to dispatch even the page-fault handler -> double
fault -> panic. The trace shows only dblfault_handler() because the
original kdmsg frames are destroyed by the stack exhaustion (no
recoverable frame to walk). There is no guard page, so the overflow is
also a memory-corruption primitive that reliably manifests as a DoS.
Exploit chain
Not developed to root. The primitive is an uncontrolled kernel stack
overflow into adjacent kernel memory (no guard page). Converting it to
reliable code execution would require: (a) controlling the slab/heap
layout adjacent to a chosen LWKT thread stack to place a victim object
(function pointer / ucred *) at the overflow offset, and (b) surviving
long enough past the overflow to dereference the corrupted object before
the double-fault β the overflow happens inside a deep kernel recursion,
so the double-fault lands almost immediately, making heap-grooming
extremely fragile. The realistically reliable, defensible impact is the
DoS (deterministic kernel panic). The original trigger PoC
(kdmsg_stackoverflow.c) is retained as the minimal wire-format builder;
trigger.c is the self-contained reproducer (setup + trigger).
Privilege / reachability note
- Local vector (
DIOCRECLUSTER): requires opening a raw disk node (/dev/vbd0etc.), which isroot:operator crw-r-----. Unprivilegedmaxx(uid 1001, not inoperator) is denied (Permission deniedconfirmed). So the local vector needs root or operator. - Remote vector (HAMMER2 cluster relay): the hammer2 daemon listens
on TCP 987 and relays peer DMSG traffic into the kernel disk iocom.
LNK_AUTHis unimplemented and receive-side CRC is not checked, so a network peer (or MITM) can forge the chain-building CREATE messages. For HAMMER2-clustered deployments this is a remote, unauthenticated DoS β which is why the finding is rated High.
What changed vs the original PoC
The original kdmsg_stackoverflow.c only built the wire-format messages
and expected the caller to supply a "connected DMSG fd" β which it never
showed how to obtain, so it could not run (inconclusive). The new
trigger.c is fully self-contained: it performs the disk open +
socketpair + DIOCRECLUSTER setup itself, adds a drain thread to avoid
reply-backpressure deadlock, and drives both trigger paths (root DELETE
via cleanuprx, plus connection-close via the write-thread teardown).
build.sh / run.sh make it one-command reproducible (run.sh also
performs the documented hammer2-daemon kill needed to free the disk
iocom for a local reconnect on this guest image).
Recommended fix
(unchanged from the finding) Cap circuit nesting depth in the CREATE
path (kern_dmsg.c:850 case) by adding a depth field to
struct kdmsg_state and rejecting pstate->depth >= DMSG_MAX_CIRCUIT_DEPTH;
and, as defense-in-depth, convert the recursive subq walks in
kdmsg_simulate_failure (:1321) and kdmsg_state_dying (:1421) to
iterative (explicit-stack) traversals so a wide/unexpected tree cannot
overflow the stack.
Confirmed kernel references
- sys/kern/kern_dmsg.c:1346
- sys/kern/kern_dmsg.c:1321
- sys/kern/kern_dmsg.c:1428
- sys/kern/kern_dmsg.c:1421
- sys/kern/kern_dmsg.c:917
- sys/kern/kern_dmsg.c:868
- sys/kern/kern_dmsg.c:1255
- sys/kern/kern_dmsg.c:555
- sys/kern/kern_dmsg.c:343
- sys/sys/dmsg.h:735
- sys/sys/thread.h:472
- sys/vm/vm_extern.h:131
- sys/kern/subr_diskiocom.c:116
- sys/kern/subr_diskiocom.c:141
- sys/kern/kern_dmsg.c:141
Detail
Exploit chain
trigger: open /dev/vbd0 (root) + socketpair + DIOCRECLUSTER(subr_diskiocom.c:116) to attach a kernel DMSG iocom reader to the socketpair -> primitive: write N CREATE msgs with circuit=msgid(i-1) to build an N-deep kdmsg_state child chain (kern_dmsg.c:917), then a DELETE on the chain root drives kdmsg_state_cleanuprx (kern_dmsg.c:1255) -> kdmsg_simulate_failure, an unbounded recursive subq walk (kern_dmsg.c:1346) -> outcome: 16 KB LWKT stack overflow -> double-fault panic (DoS). No root-escalation chain developed: the overflow is into uncontrolled adjacent kernel heap (no guard page) and the double-fault lands inside the deep recursion, making heap-grooming for code-exec extremely fragile; reliable impact is the DoS.
Evidence (decisive lines)
PANIC (depth=300, twice): 'DOUBLE FAULT / Fatal double fault / rsp=0xfffff800ab38f000 (page-aligned==rbp) / panic: double fault / dblfault_handler() at dblfault_handler+0x10c / Stopped at Debugger+0x7c / db>' captured on serial console (boot.log). CONTROL (depth=5): trigger exits 0, guest stays up, dmesg clean -- no panic. Full logs in run.log/panic.txt. The double-fault trace shows only dblfault_handler because the original kdmsg frames are destroyed by total stack exhaustion; the mechanism is conclusively tied to the recursion via source (no depth field; recursive calls at kern_dmsg.c:1346 and :1428).
PoC changes
Replaced the non-running scaffold with a self-contained trigger.c that performs the disk-open + socketpair + DIOCRECLUSTER setup itself (the original kdmsg_stackoverflow.c only built wire bytes and expected a caller-supplied fd, so it could never run). Added a drain thread to avoid kernel-writer backpressure deadlock, a depth CLI arg (default 300), and stderr progress markers. Added build.sh/run.sh (run.sh kills the boot-time hammer2 cluster daemon to free the disk iocom -- required because the daemon pre-connects every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), deadlocking kdmsg_iocom_reconnect at kern_dmsg.c:141). Kept kdmsg_stackoverflow.c as the minimal wire-format reference. Wrote VERDICT.md, README.md, manifest.json, and full untrimmed logs.
Verified recommended fix
Add a depth field to struct kdmsg_state and cap circuit nesting at 32 in the CREATE path (reject over-deep CREATEs with EINVAL); bounds the recursion that overflows the kernel stack. See findings/poc/DF-0017/fix.diff.
Verdict
REPRODUCED on DragonFly master DEV v6.5.0.1712.g89e6a-DEVELOPMENT. I solved the prior 'inconclusive' iocom-fd blocker: a self-contained trigger.c opens /dev/vbd0, builds an AF_UNIX socketpair, and attaches one end to the kernel disk DMSG iocom via DIOCRECLUSTER (subr_diskiocom.c:116/141), then writes a 300-deep circuit-nested CREATE chain + a root DELETE. Driving teardown hits the unbounded recursion in kdmsg_simulate_failure (kern_dmsg.c:1346) and kdmsg_state_dying (kern_dmsg.c:1428) -- struct kdmsg_state (dmsg.h:735) has no depth field -- which exhausts the 16 KB LWKT thread stack (thread.h:472) and panics with a deterministic 'Fatal double fault' (rsp page-aligned == rbp; total stack exhaustion). A control run at depth=5 does NOT panic, proving the failure is depth/recursion-driven; the panic reproduced twice identically. No guard page on the LWKT stack (vm_extern.h:131) so it is also an uncontrolled kernel memory-corruption primitive, but the reliable impact is the DoS. Local vector needs root/operator (/dev/vbd0 is root:operator, maxx is denied); the remote HAMMER2-relay vector is unauthenticated (LNK_AUTH unimplemented, receive-side CRC not verified at kern_dmsg.c:343).