# DF-0315 — `wg_peer` use-after-free in `sys/net/wg/if_wg.c`

## Verdict: NOT REPRODUCED dynamically on this kernel — but a **real, reachable, code-confirmed UAF** (not a false positive)

The use-after-free described in DF-0315 is **genuinely present in the code** and
the `if_wg` module is **loaded and reachable** on the audited master DEV guest.
However, after a thorough, multi-config racing campaign I was **unable to turn
the destroy-race UAF into an observable kernel panic** on this non-`INVARIANTS`
`X86_64_GENERIC` kernel: the `wg_output` post-lookup window is too narrow and
`wg_peer_destroy()` (which blocks on `taskqueue_drain`) too slow for the
`kfree(peer)` to land inside it, and freeing a large object on the slab leaves
the freed slot's relevant fields stale-but-valid (the slab free-list only
clobbers the first word), so even a hit reads valid-looking data and does not
fault.  This file is the full reasoning with `path:line` evidence.

---

## 1. The bug, confirmed line by line (read-only `sys/`)

`wg_peer` has **no refcount of its own**.  The data plane obtains a `struct
wg_peer *` and uses it **without holding `sc->sc_lock`**, while the control
plane frees the peer under `sc->sc_lock`.

* **Allocation / no peer refcount** — `sys/net/wg/if_wg.c:226-266` (`struct
  wg_peer`); created at `wg_peer_create` `:589` `kmalloc(sizeof(*peer), M_WG,
  M_WAITOK|M_ZERO)`.  Nothing counts references to the *peer* struct.
* **The data-plane entry** — `wg_output` `:2334`
  `peer = wg_aip_lookup(sc, af, ...)`.  `wg_aip_lookup` (`:854`) takes
  `sc->sc_aip_lock` SHARED only to do the radix lookup and then calls
  `noise_remote_ref(peer->p_remote)` at `:880` — i.e. it pins **`peer->p_remote`
  (the `noise_remote`), not `peer`** — and releases `sc_aip_lock` at `:884`
  before returning.  After the return, `wg_output` has a **raw `peer` pointer
  with no lock and no peer refcount**.
* **The unprotected window** — back in `wg_output`: `peer->p_endpoint...` is
  read at `:2342-2345`; `wg_queue_push_staged(&peer->p_stage_queue, pkt)` at
  `:2350`; `wg_peer_send_staged(peer)` at `:2351` (which itself dereferences
  `peer->p_sc`, `peer->p_stage_queue`, `peer->p_remote`, `peer->p_encrypt_serial`
  at `:2230-2269`); and finally `noise_remote_put(peer->p_remote)` at `:2352`
  (a UAF read of the freed `peer->p_remote` field).  **None of this holds
  `sc->sc_lock`.**  The same pattern recurs at the other cited sites:
  `wg_encrypt` `:1849-1874`, `wg_decrypt` `:1886-1940`, the `wg_input` data path
  `:2187-2194`, and the task handlers `wg_deliver_out` `:1991-2034` /
  `wg_deliver_in` `:2037+` (which receive the peer as a raw task argument).
* **The free** — `wg_peer_destroy` `:640` (`KKASSERT` sc_lock EXCLUSIVE at
  `:644`) does its teardown and then `kfree(peer, M_WG)` at `:692`.  It is
  reached from the `SIOCSWG` ioctl (`wg_ioctl_set` holds sc_lock EXCLUSIVE at
  `:2522`, calls `wg_peer_destroy` at `:2539` and `:2602`; `SIOCSWG` requires
  `SYSCAP_RESTRICTEDROOT` at `:2671`).  The `noise_remote` refcount held by the
  data plane keeps the *remote* alive but **does not keep `peer` alive**:
  `kfree(peer)` happens unconditionally at `:692` regardless of outstanding
  remote refs.

So: **`peer` can be `kfree()`d while `wg_output`/`wg_encrypt`/`wg_decrypt`/the
deliver tasks are still dereferencing it.**  That is a textbook heap UAF.

## 2. Reachability (confirmed)

```
kldstat: if_wg.ko (/boot/kernel/if_wg.ko)  -- already loaded on this guest
uname:    DragonFly dfbsd 6.5-DEVELOPMENT ... X86_64_GENERIC x86_64
```
`ifconfig wg0 create` works, peers/allowed-IPs/endpoints are configurable via
`SIOCSWG`, and bringing `wg0` up installs the connected route so a local user's
`sendto()` into the wg subnet reaches `wg_output`.  The cited paths are live
code, not behind a koption.

## 3. Why it did not manifest as a panic (the negative result)

I raced `wg_output` (senders flooding UDP into wg0) against `wg_peer_destroy`
(root `SIOCSWG` REPLACE_PEERS destroy/re-add loop), with **correct** heap
grooming: `kern_exec.c:602` allocates proc args via `kmalloc(sizeof(struct
pargs)+i, M_PARGS)`; a ~700-byte `argv` lands in the **same kmalloc-1024 zone
as `wg_peer` (~700 B)**, so freed peer slots *are* reclaimed with foreign bytes.

| Run | config | iters / sends | result |
|-----|--------|---------------|--------|
| baseline (no destroy) | NOEP, 4 senders, 4 socket-groomers | 6.09M sends | **stable** (guest up) |
| destroy race | NOEP, 4 senders, 6 exec-groomers (1024-bkt) | 22.7k destroy/readd, 16.8M sends | **stable** (guest up) |

The destroy race ran 16.8 million `wg_output` calls with the correct bucket
grooming and **never faulted**.  Root causes of the non-fault:

1. **Window is too narrow.**  In the only *stable* config (peer with **no
   endpoint**), `wg_output` returns `EHOSTUNREACH` at `:2346` immediately after
   the single `peer->p_endpoint` read at `:2342`.  The window between
   `wg_aip_lookup` (`:2334`) and the last peer deref is a handful of
   instructions.
2. **`wg_peer_destroy` is too slow to fit in that window.**  Destroy blocks on
   `taskqueue_drain` (`:669-670`) and runs five `callout_terminate`s before
   reaching `kfree` (`:692`); the `kfree` essentially never lands inside the
   microsecond-scale `wg_output` window.
3. **Slab behaviour on GENERIC.**  Even when a hit occurs, `kfree` of a
   ~700-byte object only overwrites the **first pointer** of the slab slot with
   the free-list link; the fields `wg_output` reads (`p_endpoint` at ~offset 140,
   `p_stage_queue` at ~offset 200+) stay stale-but-valid, so the dereference
   succeeds.  A panic needs the freed slot to be *reclaimed with foreign
   content* AND the reclaim to land inside the tiny window — possible in
   principle, not achieved in 16.8M sends.

Widening the window by giving the peer an endpoint (so `wg_output` runs the full
`wg_peer_send_staged` path) was blocked by a **separate** bug: under even
trivial UDP load into a wg interface whose peer has an endpoint, the kernel
panics in `wg_deliver_out -> wg_send -> udp_send` (`current process = Idle`)
within ~15-20 s, **with no peer destruction at all** — i.e. *not* the DF-0315
UAF.  See `endpoint_crash_separate_bug.txt`.  That crash is reproducible (4/4)
but is a distinct defect and should be filed separately.

## 4. What would make it fire

The UAF would reliably panic on:
* an **`INVARIANTS`/`DEBUG`** kernel that poisons freed slab memory (the stale
  reads would then hit poison bytes and fault), or
* a build with **`Witness`/use-after-free detection**, or
* racing the **taskqueue** data plane (`wg_encrypt`/`wg_decrypt`/the deliver
  tasks, whose peer pointer is held for the whole task and is *not* drained for
  the parallel encrypt/decrypt tasks) against destroy — but that needs a
  completed handshake (keypair), which needs a real responding peer.

## 5. Suggested fix (`fix.diff`, verified `git apply --check` clean)

Add a refcount to `wg_peer`, take it in the data-plane entry points
(`wg_aip_lookup`, `wg_encrypt`, `wg_decrypt`, the `wg_input` data path) and drop
it on exit, and have `wg_peer_destroy()` *detach* the peer (so no new refs are
taken) and then drop the table's reference via `wg_peer_put()`; the peer is
freed only when the last in-flight data-plane reference is released.  This
mirrors the peer-refcounting used by FreeBSD's `if_wg`.  The remaining
refinement (task-lifetime refs for `wg_deliver_out`/`wg_deliver_in`, which
receive the peer as a raw task argument) is noted in the diff comments; the
diff as shipped closes the primary local-attack vector (`wg_output` /
`wg_aip_lookup`) plus the inline encrypt/decrypt/input sites.

## 6. Reproduce

```
ssh dfbsd-maxx                # unprivileged build
  cd poc/DF-0315 && cc -O2 -pthread -o wgrace wgrace.c
ssh dfbsd                      # root run (SIOCSWG needs SYSCAP_RESTRICTEDROOT)
  cd ~maxx/poc/DF-0315 && env NOEP=1 NSEND=4 NGROOM=6 ./wgrace 55
```
On this kernel it exits cleanly (no panic, guest up).  To see the **separate**
wg-send crash instead: `env NOPRIV=1 NORACE=1 NSEND=1 NGROOM=0 SDELAY=2000
./wgrace 20` → panic in `udp_send+0x2cf` within ~20 s (NOT DF-0315).
