# DF-0117 — VERDICT

**Status: NOT REPRODUCED** (the UAF as described is not reachable on this
master DEV build; the commented-out `kdmsg_state_hold/drop` is redundant
defensive code, not the missing half of an active UAF).

## One-line

The `diskiodone` completion path dereferences `kdmsg_state_t *
bio->bio_caller_info1.ptr` with no explicit state hold
(`sys/kern/subr_diskiocom.c:375/451/503/554` commented out), but on the
audited kernel the state is **already** kept alive through `diskiodone` by
its two topology references (rbtree + parent-`subq`), which are only dropped
inside `kdmsg_state_cleanuptx()` *after* a DELETE-bearing kernel reply has
been produced — and that reply is produced by `diskiodone` **itself**.  So
the state cannot reach refcount 0 (and thus cannot be `kfree()`'d) until
after every `diskiodone` has already finished using it.  Empirically: ~2800
`diskiodone` completions fired during aggressive connection-teardown races
(three protocol modes, 256–1024 in-flight reads each, immediate socket
close) with **no panic, no page fault, no KKASSERT, no leaked slab
warning** — `blk_active` returns to 0 and the iocom teardown drains cleanly
every time.

## The claimed bug (and where the reasoning breaks down)

The finding asserts: peer sends DELETE / connection drops while disk I/O is
in flight → kdmsg core frees the state → `diskiodone` dereferences freed
state → UAF.  The commented-out `kdmsg_state_hold()` at the four dispatch
sites and `kdmsg_state_drop()` at `subr_diskiocom.c:661` are cited as the
fix.

Tracing the actual lifetime in `sys/kern/kern_dmsg.c`:

1. **State creation** (`kdmsg_state_msgrx`, CREATE case, `kern_dmsg.c:851`)
   gives the new state **two stable references**, both independent of any
   driver hold:
   - `kdmsg_state_hold(state); /* state on pstate->subq */` — `kern_dmsg.c:909`
   - `kdmsg_state_hold(state); /* state on rbtree */`       — `kern_dmsg.c:910`

2. **State freeing** (`kdmsg_state_free`, `kern_dmsg.c:1754`) only runs when
   `state->refs` hits 0 in `_kdmsg_state_drop` (`kern_dmsg.c:1748`).  Both
   topology refs are dropped together, **only inside
   `kdmsg_state_cleanuptx`** (`kern_dmsg.c:1670-1707`), and *only* when the
   message being cleaned up carries `DMSGF_DELETE` **and**
   `state->rxcmd` already carries `DMSGF_DELETE`:
   - `state->txcmd |= DMSGF_DELETE;`                          — `kern_dmsg.c:1672`
   - `RB_REMOVE(... staterd_tree, state);`                    — `kern_dmsg.c:1678-1683`
   - `if (TAILQ_EMPTY(&state->subq)) kdmsg_subq_delete(state);` — `kern_dmsg.c:1704-1705`
   - `kdmsg_state_drop(state); /* state on rbtree */`         — `kern_dmsg.c:1707`
   (`kdmsg_subq_delete` then drops the `subq` ref at `kern_dmsg.c:1301`.)

3. **A DELETE-bearing kernel reply is produced only by `diskiodone`**,
   specifically when `iost->eof` is set and the last in-flight I/O
   completes (`atomic_fetchadd_int(&iost->count,-1) == 1`):
   - `if (msg->any.head.cmd & DMSGF_DELETE) iost->eof = 1;` — `subr_diskiocom.c:378-379`
   - `if (iost->eof) { if (atomic_fetchadd_int(&iost->count, -1) == 1) cmd |= DMSGF_DELETE; }` — `subr_diskiocom.c:637-639`
   - `rmsg = kdmsg_msg_alloc(state, cmd, NULL, 0);`          — `subr_diskiocom.c:650`
   - `kdmsg_msg_write(rmsg);`                                 — `subr_diskiocom.c:662`

Putting (1)+(2)+(3) together: the state's refs can reach 0 **only after** a
DELETE-bearing reply has been transmitted/drained, and that reply exists
**only after** `diskiodone` has run.  `diskiodone` dereferences `state` at
`:580` (`kdmsg_state_t *state = bio->bio_caller_info1.ptr;`), `:582`
(`iost = state->any.any;`), `:635` (`state->txcmd`), `:650`/`:652`
(`kdmsg_msg_alloc(state,...)`, `state->iocom->mmsg`) — **all before**
`kdmsg_msg_write` at `:662`.  Even the synchronous free that can occur on
the DYING branch of `kdmsg_msg_write` (`kern_dmsg.c:1988-1991` →
`kdmsg_state_cleanuptx` → `kdmsg_state_free`) runs *inside* that
`:662` call, i.e. **after** every `diskiodone` dereference.  So there is no
point at which `diskiodone` touches a freed state.

The connection-drop teardown path (`kdmsg_iocom_thread_wr`,
`kern_dmsg.c:547-557` → `kdmsg_simulate_failure(&state0, 0, ...)`) does
**not** force-free states either: it loops calling
`kdmsg_simulate_failure` every `hz/2`, which via `kdmsg_state_abort`
(`kern_dmsg.c:1355`) only *simulates receiving* a DELETE (setting
`rxcmd |= DELETE`); it never bypasses the `txcmd |= DELETE` requirement.
`kdmsg_iocom_uninit` (`kern_dmsg.c:264`) likewise waits for the reader and
writer threads to exit (line 284), and the writer thread does not exit until
the teardown loop has drained every state — which is gated on the in-flight
I/O completing.  No path short-circuits the refcount.

The narrow theoretical race that remains (two `diskiodone`s on two CPUs on
the same state, where the *last* one's synchronous DYING-branch free in
`kdmsg_msg_write` overlaps the *previous* one's window between `:580` and
`:662`) requires (a) `eof=1` so a DELETE reply is ever produced, (b) the
state already `DYING`, and (c) true cross-CPU concurrency of two completions
on one state — and even then the window is a handful of instructions with
lock-acquisition serialization at `:650`.  The PoC hammers exactly this
(many I/Os per state, immediate teardown, up to 1024 in-flight) and never
hits it.  Whatever the commented-out hold/drop was originally meant to
guard, on this build the rbtree+`subq` topology refs already provide the
lifetime guarantee, so the hold/drop is **redundant**, not a missing fix.

## Setup used (the iocom attach, solved — reused from DF-0017)

1. `pkill -9 -x hammer2` — the userland hammer2 cluster daemon pre-connects
   every disk iocom at boot via `DIOCRECLUSTER`
   (`sbin/hammer2/cmd_service.c:898`), leaving each disk iocom's reader
   blocked in `fp_read()` on a pipe; a follow-on `DIOCRECLUSTER` then
   deadlocks in `kdmsg_iocom_reconnect()` (`kern_dmsg.c:141`) waiting for
   the stuck reader.  Killing the daemon breaks the pipes, the readers get
   EOF and exit (`msgrd_td → NULL`), and a fresh `DIOCRECLUSTER` succeeds.
   (The root-fs hammer2 mount has its own kernel iocom and is unaffected.)
2. `open("/dev/vbd0", O_RDWR)` (root:operator `crw-r-----` — needs
   root/operator; unprivileged `maxx` cannot trigger the LOCAL vector).
3. `socketpair(AF_UNIX, SOCK_STREAM)`; one end drained by a pthread so the
   kernel writer never blocks.
4. `ioctl(diskfd, DIOCRECLUSTER, {fd=sv_kern})` (`subr_diskiocom.c:116/141`).
5. Write forged `DMSG_BLK_READ|CREATE[|DELETE]` wire messages
   (128-byte extended header; receive path does not verify CRC,
   `kern_dmsg.c:343`); `disk_rcvdmsg` (`subr_diskiocom.c:230`) routes each
   to `disk_blk_read` which dispatches a real async `dev_dstrategy` on the
   root disk, storing `bio->bio_caller_info1.ptr = msg->state` with no hold.
6. Close the socket → reader EOF → writer-thread
   `kdmsg_simulate_failure(state0)` teardown racing the in-flight
   `diskiodone`s.

## Evidence (decisive)

Guest: `DragonFly v6.5.0.1712.g89e6DEVELOPMENT` (master DEV, X86_64_GENERIC),
`comconsole` baked into the clean-install baseline so panics land in
`dfbsd-qemu/boot.log`.

| run | mode | reads | preclose | result |
|-----|------|-------|----------|--------|
| `run.log`     | 1 (CREATE\|DELETE, eof=1)            |  256 | 2000 µs | trigger exit 0, guest up, dmesg clean, `blk_active`→0 |
| `run.mode2.log` | 2 (CREATE then explicit DELETE)    |  512 |    0 µs | trigger exit 0, guest up, dmesg clean |
| `run.hipress.log` | 1 (eof=1)                         | 1024 |    0 µs | no panic, guest up, `boot.log` panic-marker count = 0 |

(Plus the initial mode-0 run of 256 reads, also clean.)  Across all runs,
~2800 `diskiodone` completions fired during active teardown races with **no
panic, no page fault, no KKASSERT, no slab warning**.  `debug.blk_active`
returns to 0 every time, confirming every dispatched bio completed through
`diskiodone` (the alleged UAF site) without faulting.

No `panic.txt` is written because there is no panic.

## Exploit chain

Not applicable — no memory-corruption primitive manifested, so there is no
chain to develop.  If the narrow cross-CPU race described above *were*
reachable (which this verification could not trigger), the primitive would
be a `kdmsg_state_t` UAF in a small `kmalloc` bucket; grooming + conversion
would follow the usual slab-adjacency path.  But that is speculative given
the non-reproduction.

## What changed vs the finding's PoC scaffold

The finding shipped **no PoC directory** (`findings/poc/DF-0117/` did not
exist).  This verification wrote a fully self-contained trigger
(`trigger.c`) plus `build.sh` / `run.sh` that reuse DF-0017's solved iocom
attach (hammer2 kill + `DIOCRECLUSTER` + socketpair + drain thread) and
drive real `BLK_READ` transactions through `diskiodone`, then race
connection teardown against the in-flight completions in three protocol
modes.  The trigger is a genuine attempt to make the bug fire; it does not.

## Recommendation

The commented-out `kdmsg_state_hold/drop` in `subr_diskiocom.c` should be
**either deleted** (as confirmed-redundant dead code) **or uncommented** (as
defense-in-depth that costs nothing) — but DF-0117 as filed, "UAF on
kdmsg state in diskiodone → reliable kernel panic / exploitable
memory corruption," is **not a live vulnerability on master DEV**.  Downgrade
from High.  (The iost-leak referenced as DF-0118 is a separate question.)

## Reproduce

```
./dfbsd-qemu/vm.sh reset && ./dfbsd-qemu/vm.sh up 90          # fresh guest
ssh -F dfbsd-qemu/config dfbsd 'mkdir -p /root/poc/DF-0117'
scp -F dfbsd-qemu/config -r findings/poc/DF-0117/{trigger.c,build.sh,run.sh} \
    dfbsd:/root/poc/DF-0117/
ssh -F dfbsd-qemu/config dfbsd 'cd /root/poc/DF-0117 && ./build.sh && ./run.sh 1 256 2000'
# expect: trigger exits 0, guest stays up, no panic in dfbsd-qemu/boot.log
```
