UAF on kdmsg state in diskiodone: state refcount not held across async I/O
| Field | Value |
|---|---|
| ID | DF-0117 |
| Status | new |
| Severity | High |
| CVSS 3.1 | CVSS:3.1/AV:A/AC:L/PR:L/UI:N/S:U/C:H/I:H/A:H |
| CWE | CWE-911 Use After Free |
| File | sys/kern/subr_diskiocom.c |
| Lines | 372-662 |
| Area | kern |
| Confidence | likely |
| Discovered | 2026-06-30 |
| Reported | pending |
Summary
In disk_blk_read/write/flush/freeblks, the kdmsg state pointer is
stored in bio->bio_caller_info1.ptr and dispatched to dev_dstrategy
for async I/O, but no reference is taken on the state โ the
kdmsg_state_hold() and matching kdmsg_state_drop() calls are
commented out (:375, :451, :503, :554, :661). If the kdmsg
peer sends a DELETE or the connection drops while I/O is in flight,
the kdmsg core frees the state. When diskiodone fires on I/O
completion, it dereferences the freed state (:580, :582),
producing a use-after-free on a small kmalloc object.
Root cause
All four I/O dispatch sites follow the same pattern (:372-381):
bio->bio_caller_info1.ptr = msg->state;
/* kdmsg_state_hold(msg->state); */ // <-- COMMENTED OUT
...
dev_dstrategy(dp->d_rawdev, bio);
The completion handler (:577-662) dereferences it unconditionally:
kdmsg_state_t *state = bio->bio_caller_info1.ptr; // :580
struct dios_io *iost = state->any.any; // :582 โ UAF
...
kdmsg_msg_alloc(state, ...); // :650 โ UAF
...
/* kdmsg_state_drop(state); */ // :661 โ COMMENTED OUT
The kdmsg core frees states when both rxcmd and txcmd carry
DMSGF_DELETE (kern_dmsg.c:1678-1707). The peer controls when to
send DELETE. If the I/O has not completed, the state is freed while
the bio still holds a raw pointer to it. Connection drop
(kdmsg_state_abort) also walks the state tree and drops all
references.
Threat model & preconditions
- Attacker position: kdmsg cluster peer (authenticated via the
cluster protocol), or local privileged user who issued
DIOCRECLUSTERon a disk device node. - Impact: Use-after-free on
kdmsg_state_t(small kmalloc object). Reliable kernel panic (DoS). Potentially exploitable for kernel code execution via slab grooming โ the attacker controls the timing between free and reuse. - Required config: kdmsg/iocom active (HAMMER2 cluster or
DIOCRECLUSTER). Default kernel builds include the code.
Proof of concept
PoC source: findings/poc/DF-0117/
Build & run
# Requires a kdmsg/iocom connection to a target disk. # 1. Establish connection (local: DIOCRECLUSTER with socket fd) # 2. Open a BLK transaction # 3. Send BLK_READ (no DELETE) to start I/O # 4. Before I/O completes, send DELETE on the same state # 5. Wait for diskiodone to fire on the freed state
Expected output
Fatal trap 12: page fault while in kernel mode fault virtual address = 0x... (freed kdmsg_state_t memory) panic: page fault
Impact
Remote/local kernel UAF. In a DragonFly cluster, any authenticated
cluster peer can trigger this. The freed kdmsg_state_t is a small
kmalloc object โ its slab is predictable and can be groomed with
attacker-controlled data between the free and the diskiodone
dereference, making code execution plausible.
Recommended fix
Uncomment the hold/drop pair in all four dispatch sites and the completion handler:
--- a/sys/kern/subr_diskiocom.c
+++ b/sys/kern/subr_diskiocom.c
@@ -372,7 +372,7 @@
bio->bio_caller_info1.ptr = msg->state;
- /* kdmsg_state_hold(msg->state); */
+ kdmsg_state_hold(msg->state);
@@ -451,7 +451,7 @@
bio->bio_caller_info1.ptr = msg->state;
- /* kdmsg_state_hold(msg->state); */
+ kdmsg_state_hold(msg->state);
@@ -503,7 +503,7 @@
bio->bio_caller_info1.ptr = msg->state;
- /* kdmsg_state_hold(msg->state); */
+ kdmsg_state_hold(msg->state);
@@ -554,7 +554,7 @@
bio->bio_caller_info1.ptr = msg->state;
- /* kdmsg_state_hold(msg->state); */
+ kdmsg_state_hold(msg->state);
@@ -661,7 +661,7 @@
- /* kdmsg_state_drop(state); */
+ kdmsg_state_drop(state);
Additionally, free iost when its refcount hits zero in the eof path
(see DF-0118), before dropping the state reference.
References
- The commented-out hold/drop is a clear indication the developer knew the reference was needed but disabled it (possibly during debugging).
kdmsg_state_hold/kdmsg_state_dropare defined insys/kern/kern_dmsg.cand manipulatestate->refs.
Timeline
- 2026-06-30 Discovered during automated audit.
PoC verification
Evidence pack
findings/poc/DF-0117 ยท 11 files| File | Type | Description | Size | |
|---|---|---|---|---|
| trigger.c | trigger-source | DIOCRECLUSTER iocom attach + forged DMSG BLK_READ wire messages + connection-teardown race (single-shot per process) | 7.6 KB | view raw |
| build.sh | build-script | cc -O2 -o trigger trigger.c -lpthread | 391 B | view raw |
| run.sh | run-script | pkill hammer2 + ./trigger <mode> <nreads> <preclose_us>; one mode per invocation (reset between modes) | 2.7 KB | view raw |
| build.log | build-log | final successful build, full output | 67 B | view raw |
| run.log | run-log | decisive run: mode=1 (CREATE|DELETE eof=1), 256 reads, exit 0, no panic | 337 B | view raw |
| run.mode2.log | run-log | mode=2 (CREATE then explicit DELETE), 512 reads, immediate close, exit 0 | 416 B | view raw |
| run.hipress.log | run-log | high-pressure mode=1, 1024 reads, immediate close, no panic, boot.log panic markers = 0 | 267 B | view raw |
| env.txt | environment | uname, cc version, post-run blk_active=0, no iocom threads, dmesg clean | 316 B | view raw |
| VERDICT.md | verdict | full narrative: line-by-line lifetime trace, why the UAF is not reachable, evidence table | 9.8 KB | โ raw |
| README.md | readme | summary, file list, build/run, expected output | 2.6 KB | โ raw |
| manifest.json | manifest | this catalog | 2.8 KB | view raw |
DF-0117 โ PoC (UAF on kdmsg state in diskiodone)
Finding: findings/DF-0117-diskiocom-uaf-kdmsg-state.md
Claim: in disk_blk_read/write/flush/freeblks the kdmsg state pointer is
stored in bio->bio_caller_info1.ptr and dispatched to dev_dstrategy
async with no kdmsg_state_hold (commented out at
sys/kern/subr_diskiocom.c:375/451/503/554, drop at :661). If the peer
sends DELETE or the connection drops while I/O is in flight, the kdmsg core
frees the state and diskiodone (:577-662) dereferences freed memory โ
UAF.
Verdict: NOT REPRODUCED on master DEV. See VERDICT.md for the full
line-by-line trace of why: the state's two topology references (rbtree +
parent subq) are only dropped inside kdmsg_state_cleanuptx after a
DELETE-bearing kernel reply is produced, and that reply is produced by
diskiodone itself โ so the state cannot be freed until after every
diskiodone has finished using it. The commented-out hold/drop is
redundant defensive code on this build, not the missing half of an active
UAF. ~2800 diskiodone completions during aggressive teardown races (three
protocol modes, 256โ1024 in-flight reads, immediate socket close) produced
no panic, no fault, no KKASSERT.
Files
| file | purpose |
|---|---|
trigger.c |
self-contained trigger: DIOCRECLUSTER iocom attach + forged DMSG BLK_READ wire messages + connection-teardown race |
build.sh |
cc -O2 -o trigger trigger.c -lpthread |
run.sh |
kills hammer2 (frees the disk iocom), runs the trigger once for a given mode |
build.log |
full untrimmed final build output |
run.log |
decisive run (mode=1, 256 reads) โ full untrimmed output |
run.mode2.log / run.hipress.log |
extra stress runs (mode 2; 1024-read high-pressure) |
env.txt |
guest uname, cc version, post-run blk_active / thread state |
VERDICT.md |
full narrative: mechanism trace, why-not, evidence |
manifest.json |
machine-readable catalog |
Build & run (as root on the guest)
./build.sh # each run consumes the disk iocom; reset the guest between modes ./run.sh <mode> <nreads> <preclose_us> # mode 0 = CREATE only (eof=0) # 1 = CREATE|DELETE (eof=1) -- exercises the DELETE-reply/free path # 2 = CREATE then explicit DELETE msg on each state # nreads default 256 # preclose_us default 2000 (delay before socket close / teardown)
Expected on this kernel: trigger exits 0, guest stays up, dmesg clean,
debug.blk_active returns to 0, no panic markers in dfbsd-qemu/boot.log.
(A vulnerable kernel would panic in diskiodone / kdmsg_state_free /
page-fault on freed slab memory.)
DF-0117 โ VERDICT
Status: NOT REPRODUCED (the UAF as described is not reachable on this
master DEV build; the commented-out kdmsg_state_hold/drop is redundant
defensive code, not the missing half of an active UAF).
One-line
The diskiodone completion path dereferences kdmsg_state_t *
bio->bio_caller_info1.ptr with no explicit state hold
(sys/kern/subr_diskiocom.c:375/451/503/554 commented out), but on the
audited kernel the state is already kept alive through diskiodone by
its two topology references (rbtree + parent-subq), which are only dropped
inside kdmsg_state_cleanuptx() after a DELETE-bearing kernel reply has
been produced โ and that reply is produced by diskiodone itself. So
the state cannot reach refcount 0 (and thus cannot be kfree()'d) until
after every diskiodone has already finished using it. Empirically: ~2800
diskiodone completions fired during aggressive connection-teardown races
(three protocol modes, 256โ1024 in-flight reads each, immediate socket
close) with no panic, no page fault, no KKASSERT, no leaked slab
warning โ blk_active returns to 0 and the iocom teardown drains cleanly
every time.
The claimed bug (and where the reasoning breaks down)
The finding asserts: peer sends DELETE / connection drops while disk I/O is
in flight โ kdmsg core frees the state โ diskiodone dereferences freed
state โ UAF. The commented-out kdmsg_state_hold() at the four dispatch
sites and kdmsg_state_drop() at subr_diskiocom.c:661 are cited as the
fix.
Tracing the actual lifetime in sys/kern/kern_dmsg.c:
-
State creation (
kdmsg_state_msgrx, CREATE case,kern_dmsg.c:851) gives the new state two stable references, both independent of any driver hold: -kdmsg_state_hold(state); /* state on pstate->subq */โkern_dmsg.c:909-kdmsg_state_hold(state); /* state on rbtree */โkern_dmsg.c:910 -
State freeing (
kdmsg_state_free,kern_dmsg.c:1754) only runs whenstate->refshits 0 in_kdmsg_state_drop(kern_dmsg.c:1748). Both topology refs are dropped together, only insidekdmsg_state_cleanuptx(kern_dmsg.c:1670-1707), and only when the message being cleaned up carriesDMSGF_DELETEandstate->rxcmdalready carriesDMSGF_DELETE: -state->txcmd |= DMSGF_DELETE;โkern_dmsg.c:1672-RB_REMOVE(... staterd_tree, state);โkern_dmsg.c:1678-1683-if (TAILQ_EMPTY(&state->subq)) kdmsg_subq_delete(state);โkern_dmsg.c:1704-1705-kdmsg_state_drop(state); /* state on rbtree */โkern_dmsg.c:1707(kdmsg_subq_deletethen drops thesubqref atkern_dmsg.c:1301.) -
A DELETE-bearing kernel reply is produced only by
diskiodone, specifically wheniost->eofis set and the last in-flight I/O completes (atomic_fetchadd_int(&iost->count,-1) == 1): -if (msg->any.head.cmd & DMSGF_DELETE) iost->eof = 1;โsubr_diskiocom.c:378-379-if (iost->eof) { if (atomic_fetchadd_int(&iost->count, -1) == 1) cmd |= DMSGF_DELETE; }โsubr_diskiocom.c:637-639-rmsg = kdmsg_msg_alloc(state, cmd, NULL, 0);โsubr_diskiocom.c:650-kdmsg_msg_write(rmsg);โsubr_diskiocom.c:662
Putting (1)+(2)+(3) together: the state's refs can reach 0 only after a
DELETE-bearing reply has been transmitted/drained, and that reply exists
only after diskiodone has run. diskiodone dereferences state at
:580 (kdmsg_state_t *state = bio->bio_caller_info1.ptr;), :582
(iost = state->any.any;), :635 (state->txcmd), :650/:652
(kdmsg_msg_alloc(state,...), state->iocom->mmsg) โ all before
kdmsg_msg_write at :662. Even the synchronous free that can occur on
the DYING branch of kdmsg_msg_write (kern_dmsg.c:1988-1991 โ
kdmsg_state_cleanuptx โ kdmsg_state_free) runs inside that
:662 call, i.e. after every diskiodone dereference. So there is no
point at which diskiodone touches a freed state.
The connection-drop teardown path (kdmsg_iocom_thread_wr,
kern_dmsg.c:547-557 โ kdmsg_simulate_failure(&state0, 0, ...)) does
not force-free states either: it loops calling
kdmsg_simulate_failure every hz/2, which via kdmsg_state_abort
(kern_dmsg.c:1355) only simulates receiving a DELETE (setting
rxcmd |= DELETE); it never bypasses the txcmd |= DELETE requirement.
kdmsg_iocom_uninit (kern_dmsg.c:264) likewise waits for the reader and
writer threads to exit (line 284), and the writer thread does not exit until
the teardown loop has drained every state โ which is gated on the in-flight
I/O completing. No path short-circuits the refcount.
The narrow theoretical race that remains (two diskiodones on two CPUs on
the same state, where the last one's synchronous DYING-branch free in
kdmsg_msg_write overlaps the previous one's window between :580 and
:662) requires (a) eof=1 so a DELETE reply is ever produced, (b) the
state already DYING, and (c) true cross-CPU concurrency of two completions
on one state โ and even then the window is a handful of instructions with
lock-acquisition serialization at :650. The PoC hammers exactly this
(many I/Os per state, immediate teardown, up to 1024 in-flight) and never
hits it. Whatever the commented-out hold/drop was originally meant to
guard, on this build the rbtree+subq topology refs already provide the
lifetime guarantee, so the hold/drop is redundant, not a missing fix.
Setup used (the iocom attach, solved โ reused from DF-0017)
pkill -9 -x hammer2โ the userland hammer2 cluster daemon pre-connects every disk iocom at boot viaDIOCRECLUSTER(sbin/hammer2/cmd_service.c:898), leaving each disk iocom's reader blocked infp_read()on a pipe; a follow-onDIOCRECLUSTERthen deadlocks inkdmsg_iocom_reconnect()(kern_dmsg.c:141) waiting for the stuck reader. Killing the daemon breaks the pipes, the readers get EOF and exit (msgrd_td โ NULL), and a freshDIOCRECLUSTERsucceeds. (The root-fs hammer2 mount has its own kernel iocom and is unaffected.)open("/dev/vbd0", O_RDWR)(root:operatorcrw-r-----โ needs root/operator; unprivilegedmaxxcannot trigger the LOCAL vector).socketpair(AF_UNIX, SOCK_STREAM); one end drained by a pthread so the kernel writer never blocks.ioctl(diskfd, DIOCRECLUSTER, {fd=sv_kern})(subr_diskiocom.c:116/141).- Write forged
DMSG_BLK_READ|CREATE[|DELETE]wire messages (128-byte extended header; receive path does not verify CRC,kern_dmsg.c:343);disk_rcvdmsg(subr_diskiocom.c:230) routes each todisk_blk_readwhich dispatches a real asyncdev_dstrategyon the root disk, storingbio->bio_caller_info1.ptr = msg->statewith no hold. - Close the socket โ reader EOF โ writer-thread
kdmsg_simulate_failure(state0)teardown racing the in-flightdiskiodones.
Evidence (decisive)
Guest: DragonFly v6.5.0.1712.g89e6DEVELOPMENT (master DEV, X86_64_GENERIC),
comconsole baked into the clean-install baseline so panics land in
dfbsd-qemu/boot.log.
| run | mode | reads | preclose | result |
|---|---|---|---|---|
run.log |
1 (CREATE|DELETE, eof=1) | 256 | 2000 ยตs | trigger exit 0, guest up, dmesg clean, blk_activeโ0 |
run.mode2.log |
2 (CREATE then explicit DELETE) | 512 | 0 ยตs | trigger exit 0, guest up, dmesg clean |
run.hipress.log |
1 (eof=1) | 1024 | 0 ยตs | no panic, guest up, boot.log panic-marker count = 0 |
(Plus the initial mode-0 run of 256 reads, also clean.) Across all runs,
~2800 diskiodone completions fired during active teardown races with no
panic, no page fault, no KKASSERT, no slab warning. debug.blk_active
returns to 0 every time, confirming every dispatched bio completed through
diskiodone (the alleged UAF site) without faulting.
No panic.txt is written because there is no panic.
Exploit chain
Not applicable โ no memory-corruption primitive manifested, so there is no
chain to develop. If the narrow cross-CPU race described above were
reachable (which this verification could not trigger), the primitive would
be a kdmsg_state_t UAF in a small kmalloc bucket; grooming + conversion
would follow the usual slab-adjacency path. But that is speculative given
the non-reproduction.
What changed vs the finding's PoC scaffold
The finding shipped no PoC directory (findings/poc/DF-0117/ did not
exist). This verification wrote a fully self-contained trigger
(trigger.c) plus build.sh / run.sh that reuse DF-0017's solved iocom
attach (hammer2 kill + DIOCRECLUSTER + socketpair + drain thread) and
drive real BLK_READ transactions through diskiodone, then race
connection teardown against the in-flight completions in three protocol
modes. The trigger is a genuine attempt to make the bug fire; it does not.
Recommendation
The commented-out kdmsg_state_hold/drop in subr_diskiocom.c should be
either deleted (as confirmed-redundant dead code) or uncommented (as
defense-in-depth that costs nothing) โ but DF-0117 as filed, "UAF on
kdmsg state in diskiodone โ reliable kernel panic / exploitable
memory corruption," is not a live vulnerability on master DEV. Downgrade
from High. (The iost-leak referenced as DF-0118 is a separate question.)
Reproduce
./dfbsd-qemu/vm.sh reset && ./dfbsd-qemu/vm.sh up 90 # fresh guest
ssh -F dfbsd-qemu/config dfbsd 'mkdir -p /root/poc/DF-0117'
scp -F dfbsd-qemu/config -r findings/poc/DF-0117/{trigger.c,build.sh,run.sh} \
dfbsd:/root/poc/DF-0117/
ssh -F dfbsd-qemu/config dfbsd 'cd /root/poc/DF-0117 && ./build.sh && ./run.sh 1 256 2000'
# expect: trigger exits 0, guest stays up, no panic in dfbsd-qemu/boot.log
Confirmed kernel references
- sys/kern/subr_diskiocom.c:375
- sys/kern/subr_diskiocom.c:451
- sys/kern/subr_diskiocom.c:503
- sys/kern/subr_diskiocom.c:554
- sys/kern/subr_diskiocom.c:580
- sys/kern/subr_diskiocom.c:582
- sys/kern/subr_diskiocom.c:637
- sys/kern/subr_diskiocom.c:650
- sys/kern/subr_diskiocom.c:661
- sys/kern/subr_diskiocom.c:662
- sys/kern/kern_dmsg.c:909
- sys/kern/kern_dmsg.c:910
- sys/kern/kern_dmsg.c:1670
- sys/kern/kern_dmsg.c:1707
- sys/kern/kern_dmsg.c:1748
- sys/kern/kern_dmsg.c:1988
Detail
Evidence (decisive lines)
Decisive: run.log (mode=1 eof=1, 256 reads) -> '[done] 256 reads dispatched; no panic observed by userspace', TRIGGER_EXIT=0, guest up. run.mode2.log (mode=2, 512 reads + 512 DELETEs, immediate close) -> exit 0. run.hipress.log (mode=1, 1024 reads, immediate close) -> no panic; boot.log panic-marker grep = 0. env.txt: debug.blk_active=0, no vbd0-msg threads (teardown drained), dmesg clean (no kdmsg/fault/assert). No panic.txt because no panic occurred.
PoC changes
Finding shipped no PoC directory (findings/poc/DF-0117/ did not exist). Wrote a fully self-contained trigger.c plus build.sh/run.sh/VERDICT.md/README.md/manifest.json. The trigger reuses DF-0017's solved iocom-attach (hammer2 kill + DIOCRECLUSTER + AF_UNIX socketpair + drain thread) and drives real BLK_READ transactions through diskiodone, then races connection teardown against in-flight completions. Three race modes: 0=CREATE(eof=0), 1=CREATE|DELETE(eof=1, exercises the DELETE-reply/free path the finding claims frees the state), 2=CREATE then explicit DELETE. Single-shot per process to avoid the reconnect deadlock DF-0017 documented.
Verdict
NOT REPRODUCED. The finding shipped no PoC; I built a self-contained trigger (DIOCRECLUSTER iocom attach reused from DF-0017 + forged DMSG BLK_READ wire messages + connection-teardown race) that drove real disk I/O through diskiodone and raced it against state teardown in three protocol modes (eof=0, eof=1, CREATE+explicit-DELETE). ~2800 diskiodone completions fired during aggressive teardown races (256-1024 in-flight reads, immediate socket close) with no panic, no page fault, no KKASSERT, blk_active->0, clean iocom drain every time. Line-by-line trace of sys/kern/kern_dmsg.c shows why: the state's two topology refs (kdmsg_state_hold at :909 for parent subq, :910 for rbtree) are only dropped inside kdmsg_state_cleanuptx (:1707 + kdmsg_subq_delete :1301), and cleanuptx runs only after a DELETE-bearing kernel reply is produced -- which diskiodone itself produces (subr_diskiocom.c:637-639 -> :662). So the state cannot be kfree'd (refs cannot hit 0, _kdmsg_state_drop kern_dmsg.c:1748) until after every diskiodone has finished using it (:580/:582/:635/:650 all precede the :662 kdmsg_msg_write whose DYING-branch synchronous free at kern_dmsg.c:1988 is the earliest possible free). The commented-out kdmsg_state_hold/drop is redundant defensive code on this build, not the missing half of an active UAF.