โฌข DragonFlyBSD Kernel Audit
โ† dashboard
DF-0117

UAF on kdmsg state in diskiodone: state refcount not held across async I/O

Field Value
ID DF-0117
Status new
Severity High
CVSS 3.1 CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H
CWE CWE-911 Use After Free
File sys/kern/subr_diskiocom.c
Lines 372-662
Area kern
Confidence likely
Discovered 2026-06-30
Reported pending

Summary

In disk_blk_read/write/flush/freeblks, the kdmsg state pointer is stored in bio->bio_caller_info1.ptr and dispatched to dev_dstrategy for async I/O, but no reference is taken on the state โ€” the kdmsg_state_hold() and matching kdmsg_state_drop() calls are commented out (:375, :451, :503, :554, :661). If the kdmsg peer sends a DELETE or the connection drops while I/O is in flight, the kdmsg core frees the state. When diskiodone fires on I/O completion, it dereferences the freed state (:580, :582), producing a use-after-free on a small kmalloc object.

Root cause

All four I/O dispatch sites follow the same pattern (:372-381):

bio->bio_caller_info1.ptr = msg->state;
/* kdmsg_state_hold(msg->state); */   // <-- COMMENTED OUT
...
dev_dstrategy(dp->d_rawdev, bio);

The completion handler (:577-662) dereferences it unconditionally:

kdmsg_state_t *state = bio->bio_caller_info1.ptr;  // :580
struct dios_io *iost = state->any.any;              // :582 โ€” UAF
...
kdmsg_msg_alloc(state, ...);                        // :650 โ€” UAF
...
/* kdmsg_state_drop(state); */                       // :661 โ€” COMMENTED OUT

The kdmsg core frees states when both rxcmd and txcmd carry DMSGF_DELETE (kern_dmsg.c:1678-1707). The peer controls when to send DELETE. If the I/O has not completed, the state is freed while the bio still holds a raw pointer to it. Connection drop (kdmsg_state_abort) also walks the state tree and drops all references.

Threat model & preconditions

  • Attacker position: kdmsg cluster peer (authenticated via the cluster protocol), or local privileged user who issued DIOCRECLUSTER on a disk device node.
  • Impact: Use-after-free on kdmsg_state_t (small kmalloc object). Reliable kernel panic (DoS). Potentially exploitable for kernel code execution via slab grooming โ€” the attacker controls the timing between free and reuse.
  • Required config: kdmsg/iocom active (HAMMER2 cluster or DIOCRECLUSTER). Default kernel builds include the code.

Proof of concept

PoC source: findings/poc/DF-0117/

Build & run

# Requires a kdmsg/iocom connection to a target disk.
# 1. Establish connection (local: DIOCRECLUSTER with socket fd)
# 2. Open a BLK transaction
# 3. Send BLK_READ (no DELETE) to start I/O
# 4. Before I/O completes, send DELETE on the same state
# 5. Wait for diskiodone to fire on the freed state

Expected output

Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x...  (freed kdmsg_state_t memory)
panic: page fault

Impact

Remote/local kernel UAF. In a DragonFly cluster, any authenticated cluster peer can trigger this. The freed kdmsg_state_t is a small kmalloc object โ€” its slab is predictable and can be groomed with attacker-controlled data between the free and the diskiodone dereference, making code execution plausible.

Uncomment the hold/drop pair in all four dispatch sites and the completion handler:

--- a/sys/kern/subr_diskiocom.c
+++ b/sys/kern/subr_diskiocom.c
@@ -372,7 +372,7 @@
        bio->bio_caller_info1.ptr = msg->state;
-       /* kdmsg_state_hold(msg->state); */
+       kdmsg_state_hold(msg->state);
@@ -451,7 +451,7 @@
        bio->bio_caller_info1.ptr = msg->state;
-       /* kdmsg_state_hold(msg->state); */
+       kdmsg_state_hold(msg->state);
@@ -503,7 +503,7 @@
        bio->bio_caller_info1.ptr = msg->state;
-       /* kdmsg_state_hold(msg->state); */
+       kdmsg_state_hold(msg->state);
@@ -554,7 +554,7 @@
        bio->bio_caller_info1.ptr = msg->state;
-       /* kdmsg_state_hold(msg->state); */
+       kdmsg_state_hold(msg->state);
@@ -661,7 +661,7 @@
-       /* kdmsg_state_drop(state); */
+       kdmsg_state_drop(state);

Additionally, free iost when its refcount hits zero in the eof path (see DF-0118), before dropping the state reference.

References

  • The commented-out hold/drop is a clear indication the developer knew the reference was needed but disabled it (possibly during debugging).
  • kdmsg_state_hold/kdmsg_state_drop are defined in sys/kern/kern_dmsg.c and manipulate state->refs.

Timeline

  • 2026-06-30 Discovered during automated audit.

PoC verification

Evidence pack

findings/poc/DF-0117 ยท 11 files
FileTypeDescriptionSize
trigger.c trigger-source DIOCRECLUSTER iocom attach + forged DMSG BLK_READ wire messages + connection-teardown race (single-shot per process) 7.6 KB view raw
build.sh build-script cc -O2 -o trigger trigger.c -lpthread 391 B view raw
run.sh run-script pkill hammer2 + ./trigger <mode> <nreads> <preclose_us>; one mode per invocation (reset between modes) 2.7 KB view raw
build.log build-log final successful build, full output 67 B view raw
run.log run-log decisive run: mode=1 (CREATE|DELETE eof=1), 256 reads, exit 0, no panic 337 B view raw
run.mode2.log run-log mode=2 (CREATE then explicit DELETE), 512 reads, immediate close, exit 0 416 B view raw
run.hipress.log run-log high-pressure mode=1, 1024 reads, immediate close, no panic, boot.log panic markers = 0 267 B view raw
env.txt environment uname, cc version, post-run blk_active=0, no iocom threads, dmesg clean 316 B view raw
VERDICT.md verdict full narrative: line-by-line lifetime trace, why the UAF is not reachable, evidence table 9.8 KB โ†“ raw
README.md readme summary, file list, build/run, expected output 2.6 KB โ†“ raw
manifest.json manifest this catalog 2.8 KB view raw
README.md readme summary, file list, build/run, expected output
โ†“ download raw

DF-0117 โ€” PoC (UAF on kdmsg state in diskiodone)

Finding: findings/DF-0117-diskiocom-uaf-kdmsg-state.md Claim: in disk_blk_read/write/flush/freeblks the kdmsg state pointer is stored in bio->bio_caller_info1.ptr and dispatched to dev_dstrategy async with no kdmsg_state_hold (commented out at sys/kern/subr_diskiocom.c:375/451/503/554, drop at :661). If the peer sends DELETE or the connection drops while I/O is in flight, the kdmsg core frees the state and diskiodone (:577-662) dereferences freed memory โ†’ UAF.

Verdict: NOT REPRODUCED on master DEV. See VERDICT.md for the full line-by-line trace of why: the state's two topology references (rbtree + parent subq) are only dropped inside kdmsg_state_cleanuptx after a DELETE-bearing kernel reply is produced, and that reply is produced by diskiodone itself โ€” so the state cannot be freed until after every diskiodone has finished using it. The commented-out hold/drop is redundant defensive code on this build, not the missing half of an active UAF. ~2800 diskiodone completions during aggressive teardown races (three protocol modes, 256โ€“1024 in-flight reads, immediate socket close) produced no panic, no fault, no KKASSERT.

Files

file purpose
trigger.c self-contained trigger: DIOCRECLUSTER iocom attach + forged DMSG BLK_READ wire messages + connection-teardown race
build.sh cc -O2 -o trigger trigger.c -lpthread
run.sh kills hammer2 (frees the disk iocom), runs the trigger once for a given mode
build.log full untrimmed final build output
run.log decisive run (mode=1, 256 reads) โ€” full untrimmed output
run.mode2.log / run.hipress.log extra stress runs (mode 2; 1024-read high-pressure)
env.txt guest uname, cc version, post-run blk_active / thread state
VERDICT.md full narrative: mechanism trace, why-not, evidence
manifest.json machine-readable catalog

Build & run (as root on the guest)

./build.sh
# each run consumes the disk iocom; reset the guest between modes
./run.sh <mode> <nreads> <preclose_us>
#   mode        0 = CREATE only (eof=0)
#               1 = CREATE|DELETE (eof=1)  -- exercises the DELETE-reply/free path
#               2 = CREATE then explicit DELETE msg on each state
#   nreads      default 256
#   preclose_us default 2000 (delay before socket close / teardown)

Expected on this kernel: trigger exits 0, guest stays up, dmesg clean, debug.blk_active returns to 0, no panic markers in dfbsd-qemu/boot.log. (A vulnerable kernel would panic in diskiodone / kdmsg_state_free / page-fault on freed slab memory.)

VERDICT.md verdict full narrative: line-by-line lifetime trace, why the UAF is not reachable, evidence table
โ†“ download raw

DF-0117 โ€” VERDICT

Status: NOT REPRODUCED (the UAF as described is not reachable on this master DEV build; the commented-out kdmsg_state_hold/drop is redundant defensive code, not the missing half of an active UAF).

One-line

The diskiodone completion path dereferences kdmsg_state_t * bio->bio_caller_info1.ptr with no explicit state hold (sys/kern/subr_diskiocom.c:375/451/503/554 commented out), but on the audited kernel the state is already kept alive through diskiodone by its two topology references (rbtree + parent-subq), which are only dropped inside kdmsg_state_cleanuptx() after a DELETE-bearing kernel reply has been produced โ€” and that reply is produced by diskiodone itself. So the state cannot reach refcount 0 (and thus cannot be kfree()'d) until after every diskiodone has already finished using it. Empirically: ~2800 diskiodone completions fired during aggressive connection-teardown races (three protocol modes, 256โ€“1024 in-flight reads each, immediate socket close) with no panic, no page fault, no KKASSERT, no leaked slab warning โ€” blk_active returns to 0 and the iocom teardown drains cleanly every time.

The claimed bug (and where the reasoning breaks down)

The finding asserts: peer sends DELETE / connection drops while disk I/O is in flight โ†’ kdmsg core frees the state โ†’ diskiodone dereferences freed state โ†’ UAF. The commented-out kdmsg_state_hold() at the four dispatch sites and kdmsg_state_drop() at subr_diskiocom.c:661 are cited as the fix.

Tracing the actual lifetime in sys/kern/kern_dmsg.c:

  1. State creation (kdmsg_state_msgrx, CREATE case, kern_dmsg.c:851) gives the new state two stable references, both independent of any driver hold: - kdmsg_state_hold(state); /* state on pstate->subq */ โ€” kern_dmsg.c:909 - kdmsg_state_hold(state); /* state on rbtree */ โ€” kern_dmsg.c:910

  2. State freeing (kdmsg_state_free, kern_dmsg.c:1754) only runs when state->refs hits 0 in _kdmsg_state_drop (kern_dmsg.c:1748). Both topology refs are dropped together, only inside kdmsg_state_cleanuptx (kern_dmsg.c:1670-1707), and only when the message being cleaned up carries DMSGF_DELETE and state->rxcmd already carries DMSGF_DELETE: - state->txcmd |= DMSGF_DELETE; โ€” kern_dmsg.c:1672 - RB_REMOVE(... staterd_tree, state); โ€” kern_dmsg.c:1678-1683 - if (TAILQ_EMPTY(&state->subq)) kdmsg_subq_delete(state); โ€” kern_dmsg.c:1704-1705 - kdmsg_state_drop(state); /* state on rbtree */ โ€” kern_dmsg.c:1707 (kdmsg_subq_delete then drops the subq ref at kern_dmsg.c:1301.)

  3. A DELETE-bearing kernel reply is produced only by diskiodone, specifically when iost->eof is set and the last in-flight I/O completes (atomic_fetchadd_int(&iost->count,-1) == 1): - if (msg->any.head.cmd & DMSGF_DELETE) iost->eof = 1; โ€” subr_diskiocom.c:378-379 - if (iost->eof) { if (atomic_fetchadd_int(&iost->count, -1) == 1) cmd |= DMSGF_DELETE; } โ€” subr_diskiocom.c:637-639 - rmsg = kdmsg_msg_alloc(state, cmd, NULL, 0); โ€” subr_diskiocom.c:650 - kdmsg_msg_write(rmsg); โ€” subr_diskiocom.c:662

Putting (1)+(2)+(3) together: the state's refs can reach 0 only after a DELETE-bearing reply has been transmitted/drained, and that reply exists only after diskiodone has run. diskiodone dereferences state at :580 (kdmsg_state_t *state = bio->bio_caller_info1.ptr;), :582 (iost = state->any.any;), :635 (state->txcmd), :650/:652 (kdmsg_msg_alloc(state,...), state->iocom->mmsg) โ€” all before kdmsg_msg_write at :662. Even the synchronous free that can occur on the DYING branch of kdmsg_msg_write (kern_dmsg.c:1988-1991 โ†’ kdmsg_state_cleanuptx โ†’ kdmsg_state_free) runs inside that :662 call, i.e. after every diskiodone dereference. So there is no point at which diskiodone touches a freed state.

The connection-drop teardown path (kdmsg_iocom_thread_wr, kern_dmsg.c:547-557 โ†’ kdmsg_simulate_failure(&state0, 0, ...)) does not force-free states either: it loops calling kdmsg_simulate_failure every hz/2, which via kdmsg_state_abort (kern_dmsg.c:1355) only simulates receiving a DELETE (setting rxcmd |= DELETE); it never bypasses the txcmd |= DELETE requirement. kdmsg_iocom_uninit (kern_dmsg.c:264) likewise waits for the reader and writer threads to exit (line 284), and the writer thread does not exit until the teardown loop has drained every state โ€” which is gated on the in-flight I/O completing. No path short-circuits the refcount.

The narrow theoretical race that remains (two diskiodones on two CPUs on the same state, where the last one's synchronous DYING-branch free in kdmsg_msg_write overlaps the previous one's window between :580 and :662) requires (a) eof=1 so a DELETE reply is ever produced, (b) the state already DYING, and (c) true cross-CPU concurrency of two completions on one state โ€” and even then the window is a handful of instructions with lock-acquisition serialization at :650. The PoC hammers exactly this (many I/Os per state, immediate teardown, up to 1024 in-flight) and never hits it. Whatever the commented-out hold/drop was originally meant to guard, on this build the rbtree+subq topology refs already provide the lifetime guarantee, so the hold/drop is redundant, not a missing fix.

Setup used (the iocom attach, solved โ€” reused from DF-0017)

  1. pkill -9 -x hammer2 โ€” the userland hammer2 cluster daemon pre-connects every disk iocom at boot via DIOCRECLUSTER (sbin/hammer2/cmd_service.c:898), leaving each disk iocom's reader blocked in fp_read() on a pipe; a follow-on DIOCRECLUSTER then deadlocks in kdmsg_iocom_reconnect() (kern_dmsg.c:141) waiting for the stuck reader. Killing the daemon breaks the pipes, the readers get EOF and exit (msgrd_td โ†’ NULL), and a fresh DIOCRECLUSTER succeeds. (The root-fs hammer2 mount has its own kernel iocom and is unaffected.)
  2. open("/dev/vbd0", O_RDWR) (root:operator crw-r----- โ€” needs root/operator; unprivileged maxx cannot trigger the LOCAL vector).
  3. socketpair(AF_UNIX, SOCK_STREAM); one end drained by a pthread so the kernel writer never blocks.
  4. ioctl(diskfd, DIOCRECLUSTER, {fd=sv_kern}) (subr_diskiocom.c:116/141).
  5. Write forged DMSG_BLK_READ|CREATE[|DELETE] wire messages (128-byte extended header; receive path does not verify CRC, kern_dmsg.c:343); disk_rcvdmsg (subr_diskiocom.c:230) routes each to disk_blk_read which dispatches a real async dev_dstrategy on the root disk, storing bio->bio_caller_info1.ptr = msg->state with no hold.
  6. Close the socket โ†’ reader EOF โ†’ writer-thread kdmsg_simulate_failure(state0) teardown racing the in-flight diskiodones.

Evidence (decisive)

Guest: DragonFly v6.5.0.1712.g89e6DEVELOPMENT (master DEV, X86_64_GENERIC), comconsole baked into the clean-install baseline so panics land in dfbsd-qemu/boot.log.

run mode reads preclose result
run.log 1 (CREATE|DELETE, eof=1) 256 2000 ยตs trigger exit 0, guest up, dmesg clean, blk_activeโ†’0
run.mode2.log 2 (CREATE then explicit DELETE) 512 0 ยตs trigger exit 0, guest up, dmesg clean
run.hipress.log 1 (eof=1) 1024 0 ยตs no panic, guest up, boot.log panic-marker count = 0

(Plus the initial mode-0 run of 256 reads, also clean.) Across all runs, ~2800 diskiodone completions fired during active teardown races with no panic, no page fault, no KKASSERT, no slab warning. debug.blk_active returns to 0 every time, confirming every dispatched bio completed through diskiodone (the alleged UAF site) without faulting.

No panic.txt is written because there is no panic.

Exploit chain

Not applicable โ€” no memory-corruption primitive manifested, so there is no chain to develop. If the narrow cross-CPU race described above were reachable (which this verification could not trigger), the primitive would be a kdmsg_state_t UAF in a small kmalloc bucket; grooming + conversion would follow the usual slab-adjacency path. But that is speculative given the non-reproduction.

What changed vs the finding's PoC scaffold

The finding shipped no PoC directory (findings/poc/DF-0117/ did not exist). This verification wrote a fully self-contained trigger (trigger.c) plus build.sh / run.sh that reuse DF-0017's solved iocom attach (hammer2 kill + DIOCRECLUSTER + socketpair + drain thread) and drive real BLK_READ transactions through diskiodone, then race connection teardown against the in-flight completions in three protocol modes. The trigger is a genuine attempt to make the bug fire; it does not.

Recommendation

The commented-out kdmsg_state_hold/drop in subr_diskiocom.c should be either deleted (as confirmed-redundant dead code) or uncommented (as defense-in-depth that costs nothing) โ€” but DF-0117 as filed, "UAF on kdmsg state in diskiodone โ†’ reliable kernel panic / exploitable memory corruption," is not a live vulnerability on master DEV. Downgrade from High. (The iost-leak referenced as DF-0118 is a separate question.)

Reproduce

./dfbsd-qemu/vm.sh reset && ./dfbsd-qemu/vm.sh up 90          # fresh guest
ssh -F dfbsd-qemu/config dfbsd 'mkdir -p /root/poc/DF-0117'
scp -F dfbsd-qemu/config -r findings/poc/DF-0117/{trigger.c,build.sh,run.sh} \
    dfbsd:/root/poc/DF-0117/
ssh -F dfbsd-qemu/config dfbsd 'cd /root/poc/DF-0117 && ./build.sh && ./run.sh 1 256 2000'
# expect: trigger exits 0, guest stays up, no panic in dfbsd-qemu/boot.log

Confirmed kernel references

Detail

Evidence (decisive lines)

Decisive: run.log (mode=1 eof=1, 256 reads) -> '[done] 256 reads dispatched; no panic observed by userspace', TRIGGER_EXIT=0, guest up. run.mode2.log (mode=2, 512 reads + 512 DELETEs, immediate close) -> exit 0. run.hipress.log (mode=1, 1024 reads, immediate close) -> no panic; boot.log panic-marker grep = 0. env.txt: debug.blk_active=0, no vbd0-msg threads (teardown drained), dmesg clean (no kdmsg/fault/assert). No panic.txt because no panic occurred.

PoC changes

Finding shipped no PoC directory (findings/poc/DF-0117/ did not exist). Wrote a fully self-contained trigger.c plus build.sh/run.sh/VERDICT.md/README.md/manifest.json. The trigger reuses DF-0017's solved iocom-attach (hammer2 kill + DIOCRECLUSTER + AF_UNIX socketpair + drain thread) and drives real BLK_READ transactions through diskiodone, then races connection teardown against in-flight completions. Three race modes: 0=CREATE(eof=0), 1=CREATE|DELETE(eof=1, exercises the DELETE-reply/free path the finding claims frees the state), 2=CREATE then explicit DELETE. Single-shot per process to avoid the reconnect deadlock DF-0017 documented.

Verdict

NOT REPRODUCED. The finding shipped no PoC; I built a self-contained trigger (DIOCRECLUSTER iocom attach reused from DF-0017 + forged DMSG BLK_READ wire messages + connection-teardown race) that drove real disk I/O through diskiodone and raced it against state teardown in three protocol modes (eof=0, eof=1, CREATE+explicit-DELETE). ~2800 diskiodone completions fired during aggressive teardown races (256-1024 in-flight reads, immediate socket close) with no panic, no page fault, no KKASSERT, blk_active->0, clean iocom drain every time. Line-by-line trace of sys/kern/kern_dmsg.c shows why: the state's two topology refs (kdmsg_state_hold at :909 for parent subq, :910 for rbtree) are only dropped inside kdmsg_state_cleanuptx (:1707 + kdmsg_subq_delete :1301), and cleanuptx runs only after a DELETE-bearing kernel reply is produced -- which diskiodone itself produces (subr_diskiocom.c:637-639 -> :662). So the state cannot be kfree'd (refs cannot hit 0, _kdmsg_state_drop kern_dmsg.c:1748) until after every diskiodone has finished using it (:580/:582/:635/:650 all precede the :662 kdmsg_msg_write whose DYING-branch synchronous free at kern_dmsg.c:1988 is the earliest possible free). The commented-out kdmsg_state_hold/drop is redundant defensive code on this build, not the missing half of an active UAF.