# DF-0107 — VERDICT: REPRODUCED (kernel panic, probabilistic DoS)

## One-line verdict

**REPRODUCED.** The missing `d_npartitions > MAXPARTITIONS32` guard in
`l32_setdisklabel` (`sys/kern/subr_disklabel32.c:264-265`) is confirmed in the
audited master DEV source, and the resulting unbounded `dkcksum32` walk
(`sys/sys/disklabel32.h:157`) panics the kernel via `DIOCSDINFO32` with
`d_npartitions=0xFFFF`. The panic is heap-layout dependent (probabilistic) but
reproduces cleanly on fresh boots.

## Mechanism (trigger → primitive → effect)

1. **Trigger.** Open a disk slice device `O_RDWR` (`/dev/vn0s0`; requires root
   or `operator` — device nodes are `crw-r----- root:operator`, and the `maxx`
   test user is not in `operator`). Issue
   `ioctl(fd, DIOCSDINFO32, &label)` with a `disklabel32` whose
   `d_magic = d_magic2 = DISKMAGIC32` and `d_npartitions = 0xFFFF`.

2. **Dispatch.** The `DIOCSDINFO32` handler `dsioctl()`
   (`subr_diskslice.c:561-609`) passes the user label through. The gate checks
   pass: `slice != WHOLE_DISK_SLICE` (vn0s0 is the compatibility slice,
   `slice=0`) and `part == WHOLE_SLICE_PART` (`part=255`); `FWRITE` is set
   (`:583`). The user pointer is `lptmp.opaque = data` (`:608`).

3. **Where the label buffer lives.** `mapped_ioctl()` copies the 404-byte label
   into a **heap** allocation (`sys/kern/sys_generic.c:674-676`) because
   `sizeof(struct disklabel32)=404 > STK_PARAMS=128`:
   ```c
   if ((com & IOC_VOID) == 0 && size > sizeof(ubuf.stkbuf))
       memp = kmalloc(size, M_IOCTLOPS, M_WAITOK);   /* <-- 404-byte heap buf */
   ```

4. **Missing guard (the bug).** `l32_setdisklabel` checks only the magics and
   the checksum — **no** `d_npartitions` bound (`subr_disklabel32.c:264-266`):
   ```c
   if (nlp->d_magic != DISKMAGIC32 || nlp->d_magic2 != DISKMAGIC32 ||
       dkcksum32(nlp) != 0)
       return (EINVAL);
   ```
   Contrast the read path `l32_readdisklabel` (`:225-226`) which DOES guard:
   ```c
   } else if (dlp->d_npartitions > MAXPARTITIONS32 || dkcksum32(dlp) != 0) {
   ```
   Because `d_magic`/`d_magic2` match, the `||` short-circuit reaches
   `dkcksum32(nlp)`.

5. **OOB walk.** `dkcksum32` (`disklabel32.h:150-161`):
   ```c
   start = (u_int16_t *)lp;
   end   = (u_int16_t *)&lp->d_partitions[lp->d_npartitions];  /* UNBOUNDED */
   while (start < end) sum ^= *start++;
   ```
   With `d_npartitions=0xFFFF`, `end = &d_partitions[65535]` =
   `offsetof(d_partitions)=148 + 65535*16 = 1048708` bytes from `lp`. The
   buffer is only 404 bytes, so the walk reads **~1 048 560 bytes of kernel
   heap past the buffer** (XOR-folded into a 16-bit sum).

6. **Effect — panic.** When the 1 MiB walk crosses an unmapped page fault
   occurs. Empirically the faulting page is a **kernel thread-stack guard
   page** (the panic message names it explicitly):
   ```
   panic: vm_fault: fault on stack guard, addr: 0xfffff800ab221000
   trap 0xc (12) page fault, rip = l32_setdisklabel+0x57   (dkcksum32 inlined)
   dsioctl+0x721
   ```

## Why it is probabilistic (not deterministic)

The `M_IOCTLOPS` `kmalloc(404)` buffer sits in the kernel malloc arena. The
1 MiB walk faults **iff** a thread-stack guard page (or other unmapped page)
lies within ~1 MiB forward of the buffer. On a **fresh boot** the arena is
tightly laid out and the guard page is close → it fires within a few
invocations (observed on fresh-boot run 2, run 3, and run 1 of separate
tests). After the heap is churned (many allocations), the buffer may land in a
region where the next 1 MiB is fully mapped → `dkcksum32` returns nonzero (no
fault) → `EINVAL`, no panic. The **bug executes every time** (the OOB read
always happens); only the *fault* is layout-dependent. This is consistent with
the INVARIANTS (but no SLAB_DEBUG/guard-page) kernel config
(`sys/config/X86_64_GENERIC`): there is no kmalloc redzone that would make the
fault deterministic.

## Confirmation / stress

- Reproduced **3×** on fresh-boot heaps, identical RIP (`l32_setdisklabel+0x57`
  = `0xffffffff80694ff7`), two distinct fault addresses
  (`0xfffff800ab1a8000`, `0xfffff800ab221000`) — both stack-guard pages,
  confirming the OOB read reaches varying kernel addresses (real OOB, not a
  cosmetic artifact).
- On a churned heap the same trigger returns `EINVAL` (no panic) — the OOB
  read still executes, just through mapped memory.

## Impact

**Local kernel panic (DoS)** by any principal with write access to a disk
device node (root or `operator`). The XOR-folded 16-bit checksum result is
never copied back to userspace on this `IOC_IN` path, so there is **no
extractable information disclosure** — the realistic worst case is denial of
service. (The OOB read itself is a latent info-leak into an internal checksum,
but it is not observable by the attacker.)

## PoC changes

Authored from scratch (the finding shipped with no PoC folder). The trigger is
a self-contained C program that opens `/dev/vn0s0` and issues `DIOCSDINFO32`
with `d_npartitions=0xFFFF`; `build.sh`/`run.sh` wire up the `vnconfig` setup.
The first compile attempt succeeded with no source fixes needed (syscall
number, struct layout, and ioctl constants all match the audited headers).

## Recommended fix

Root-cause: clamp `dkcksum32`'s end pointer in `sys/sys/disklabel32.h:157`:
```c
end = (u_int16_t *)&lp->d_partitions[MIN(lp->d_npartitions, MAXPARTITIONS32)];
```
This closes DF-0107 **and** sibling DF-0106 (and every other caller) at the
source. Defense-in-depth: also add the explicit
`nlp->d_npartitions > MAXPARTITIONS32` guard to `l32_setdisklabel`
(`subr_disklabel32.c:264`) mirroring `l32_readdisklabel:225`. The standalone
`fix.diff` implements the root-cause clamp; it **supersedes** the finding
markdown's two-site caller-only proposal by fixing the shared helper once.
Verified with `git apply --check` (clean).
