# DF-0590 — Verdict

**Verdict: NOT REPRODUCED via userspace ng_socket injection — code-confirmed real race with a reachability caveat.**

`status = not_reproduced`, `reproduced = 0`, `impact = none`, `confidence = certain`.

The legacy `ng_bridge` node has **genuinely no SMP serialization** of its shared
hashtable state — the finding's code-level claim is correct. However, the PoC's
chosen trigger vector (userspace `ng_socket` frame injection) **cannot open the
race window on this kernel** due to a netgraph-socket-layer serialization detail
the finding did not examine. The race *would* manifest in the realistic threat
model (a root-configured bridge connected to `ng_ether` on a multi-CPU NIC),
where incoming packets are dispatched per-flow across CPUs.

This file records both the rigorous code-level proof that the race is real and
the precise reason it cannot be triggered from userspace `ng_socket` on this
guest, with `path:line` citations at every hop.

---

## 1. The code-level race is REAL (confirmed by source trace)

### 1a. `NG_SEND_DATA` dispatches `rcvdata` INLINE on the caller's CPU

`sys/netgraph/netgraph/netgraph.h:224-228`:
```c
#define NG_SEND_DATA(error, hook, m, a)                 \
        do {                                            \
                (error) = ng_send_data((hook), (m), (a)); \
                (m) = NULL;                             \
                (a) = NULL;                             \
        } while (0)
```

`sys/netgraph/netgraph/ng_base.c:1677-1698`:
```c
int
ng_send_data(hook_p hook, struct mbuf *m, meta_p meta)
{
        int (*rcvdata)(hook_p, struct mbuf *, meta_p);
        int error;
        CHECK_DATA_MBUF(m);
        if (hook && (hook->flags & HK_INVALID) == 0) {
                rcvdata = hook->peer->node->type->rcvdata;
                if (rcvdata != NULL)
                        error = (*rcvdata)(hook->peer, m, meta);   /* <-- LINE 1687: direct call, no queue, no CPU migration */
                ...
```

There is **no `NG_NODE_FORCE_WRITER`** anywhere in `sys/netgraph/`:
```
$ grep -rn FORCE_WRITER sys/netgraph/   # (empty)
```
The macro simply does not exist in this tree. Legacy nodes that do not opt into
writer-serialization (and `ng_bridge` does not) get inline dispatch on whatever
CPU the sender runs on.

### 1b. `ng_bridge` installs NO lock

`sys/netgraph/bridge/ng_bridge.c` constructor (lines 297-336): allocates `priv`,
`callout_init`, allocates `tab`, sets defaults, `ng_make_node_common`,
`callout_reset`. **No `lockinit`, no `lwkt_token_init`, no `mtx_init`, no
`spinlock_init`:**
```
$ grep -n 'lockinit\|lockmgr\|lwkt_token\|mtx_init\|spinlock' sys/netgraph/bridge/ng_bridge.c
# (only the timeout's crit_enter/crit_exit at lines 940/945/1005)
```

The typestruct at `ng_bridge.c:275-284` registers `ng_bridge_rcvdata` for both
`rcvdata` and `rcvdataq` — there is no writer-queue indirection.

### 1c. `ng_bridge_rcvdata` mutates shared state with no protection

`ng_bridge.c:517-` (`ng_bridge_rcvdata`): pulls the ether header, looks up the
source MAC via `ng_bridge_get` (`:571`), and on a miss calls `ng_bridge_put`
(`:620`) which does **all** of:
- `SLIST_INSERT_HEAD(&priv->tab[bucket], hent, next)` (`:825`)
- `priv->numHosts++` (`:826`)
- `ng_bridge_rehash(priv)` (`:829`) — which `kfree(priv->tab)` (`:885`) and
  replaces the pointer (`:888`) when the table grows/shrinks.

None of this is under any lock or token. The fan-out path (`:636-708`) also
reads `priv->tab` and `priv->links[]` concurrently.

### 1d. The 1 Hz `ng_bridge_timeout` sweep is `crit_enter()`-only

`ng_bridge.c:932-1005`: `crit_enter()` at `:940`, then walks every bucket
(`:953`) removing stale entries (`priv->numHosts--` at `:969`), counts survivors
into a local `counter`, and at `:984` hits:
```c
KASSERT(priv->numHosts == counter,
    ("%s: hosts: %d != %d", __func__, priv->numHosts, counter));
```
`crit_enter()` on DragonFly blocks interrupts/preemption on the **current CPU
only**; it does **not** serialize against other CPUs. So a concurrent
`ng_bridge_put` on another CPU doing `priv->numHosts++` (and inserting into a
bucket the sweep has already passed) makes this KASSERT trip → panic.

### 1e. INVARIANTS is compiled into the running module

The guest kernel is `X86_64_GENERIC` (`sysctl kern.ident`), which has
`options INVARIANTS` at `sys/config/X86_64_GENERIC:56`. The `ng_bridge.ko`
module embeds the KASSERT strings (proof the `#ifdef INVARIANTS` blocks
compiled in):
```
$ strings /boot/kernel/ng_bridge.ko | grep -iE "hosts:|exists in table|nonexistent"
%s: hosts: %d != %d
%s: entry %s exists in table
%s: host %s on nonexistent link %d
```

So **if** a concurrent `ng_bridge_put` and the timeout sweep landed on different
CPUs, this kernel **would** panic with `ng_bridge_timeout: hosts: N != M`.

---

## 2. WHY the PoC cannot trigger it on this kernel

### 2a. `ng_socket` sends are serialized onto CPU 0

`sys/kern/uipc_socket.c:254-259` (port assignment in `socreate`):
```c
if (prp->pr_flags & PR_SYNC_PORT)
        so->so_port = &netisr_sync_port;
else if (prp->pr_initport != NULL)
        so->so_port = prp->pr_initport();
else
        so->so_port = netisr_cpuport(0);          /* <-- ng_socket hits this branch */
```

`sys/netgraph/socket/ng_socket.c:991-1006` (the `ng_socket` protosw):
```c
static struct protosw ngsw[] = {
    { .pr_type = SOCK_DGRAM, .pr_domain = &ngdomain, .pr_protocol = NG_CONTROL,
      .pr_flags = PR_ATOMIC | PR_ADDR /* | PR_RIGHTS */, ... },     /* no PR_SYNC_PORT */
    { .pr_type = SOCK_DGRAM, .pr_domain = &ngdomain, .pr_protocol = NG_DATA,
      .pr_flags = PR_ATOMIC | PR_ADDR, ... },                        /* no pr_initport */
};
```

No `PR_SYNC_PORT`, no `pr_initport` → **every** `ng_socket` gets
`so->so_port = netisr_cpuport(0)`. All `pru_send` dispatches (`sys/kern/uipc_msg.c:441`:
`lwkt_domsg(so->so_port, ...)`) therefore process on **CPU 0's netisr thread**,
one at a time.

### 2b. The bridge callout also runs on CPU 0

The bridge node is constructed inside `ng_mkpeer` which runs in the
control-socket's message handler — dispatched on CPU 0 (same port). The
constructor's `callout_reset(&priv->timer, hz, ng_bridge_timeout, node)`
(`ng_bridge.c:332`) arms the callout on the **current CPU = CPU 0**. The callout
re-arms itself on the same CPU each tick (`:950`). So the timeout sweep runs on
CPU 0.

### 2c. Both sides of the "race" are on CPU 0 → serialized

Because every `ngd_send` → `NG_SEND_DATA` → `ng_send_data` →
`ng_bridge_rcvdata` chain executes inline inside CPU 0's netisr thread, and the
timeout callout also runs on CPU 0, the two can **never** execute concurrently.
CPU 0's netisr is a single-threaded message processor; while the timeout sweep
holds `crit_enter()`, no `ngd_send` message is processed. The
concurrent-mutation window the finding describes **cannot open** from a
userspace `ng_socket` sender, no matter how many threads or sockets the PoC
spawns or how many frames it injects.

### 2d. Empirical confirmation

The PoC (after fixing backpressure with drain threads and using two independent
control/data socket pairs) injects **~1–3 million frames per 20–40 s** run across
two bridge hooks, with **zero** ENOBUFS failures, and the guest **never panics**
and **stays up** across multiple stress runs:
```
graph: ctlA:p0<->bridge:link0 , ctlB:p0<->bridge:link1
flooding 20 s on 2 injectors x 1 thread each + drains ...
[inj 0 cpu-1] sent 604003 frames (0 non-ENOBUFS fail)
[inj 1 cpu-1] sent 424987 frames (0 non-ENOBUFS fail)
flood complete: 1028990 total frames injected ...
RUN_EXIT=0     # guest still up, no panic, boot.log clean
```

This is the **expected** result given §2a–c: the flood hammers CPU 0's netisr
serially; the timeout takes its turn in the same queue. Millions of `ng_bridge_put`
calls land, but never concurrently with the sweep.

---

## 3. The race WOULD manifest via `ng_ether` (the realistic threat model)

The finding's stated impact scenario is "a root-configured bridge processing
frames from an untrusted/remote segment." In that topology the bridge is
connected to `ng_ether` hooks on real NICs, and incoming packets arrive via
`ether_input`, which dispatches **per-flow across CPUs**:

`sys/net/if_ethersubr.c:1189`:
```c
                netisr_handle(isr, m);
```
where the `isr` is selected by `netisr_hashport(m->m_pkthdr.hash)` — a per-flow
CPU hash. With SMP and multi-flow traffic, `ng_ether`'s `NG_SEND_DATA` to the
bridge's hook (at `ng_ether.c:270`) therefore calls `ng_bridge_rcvdata` on
**different CPUs concurrently**. That is the race window the finding describes,
and the INVARIANTS KASSERT at `ng_bridge.c:984` would trip in that deployment.

Reproducing that path from userspace on this single-vtnet VM would require
injecting multi-flow raw Ethernet traffic from outside the guest (the QEMU host
network), which is not feasible in this harness. The code-level proof in §1 plus
the dispatch-path confirmation in this section establish that the bug is **real
and ship-exploitable in the documented deployment scenario**, even though the
PoC's userspace vector cannot reach it on this kernel.

---

## 4. Privilege model — `PR:L` in the CVSS is overstated

The control socket is root-gated:
`sys/netgraph/socket/ng_socket.c:172-173`:
```c
if (caps_priv_check(ai->p_ucred,
            SYSCAP_RESTRICTEDROOT | __SYSCAP_NULLCRED) != 0)
        return (EPERM);
```
Verified on the guest: as unprivileged uid 1001 (`maxx`), the first
`socket(AF_NETGRAPH, SOCK_DGRAM, NG_CONTROL)` returns `EPERM` ("Operation not
permitted"). So an **unprivileged local user cannot construct the graph**. The
realistic attacker is either root already, or — far more importantly — a
**remote/untrusted host on a bridged segment** that controls only the frame
content, not the graph topology. The CVSS `PR:L` (low privilege) should be
`PR:H` (high/root) for the "user builds the graph" path; the realistic exposure
is frame-injection-into-a-root-configured-bridge, which is `PR:N` from the
attacker's perspective but `AC:H` (the race window is per-tick against a 1 Hz
callout).

---

## 5. Exploit chain

`none` — not a memory-corruption class the PoC could exercise on this kernel.
The theoretical primitives (UAF read in `ng_bridge_rehash` freeing `priv->tab`
while the timeout walks it; OOB write in a `GET_TABLE` copy loop racing the
flood) are real in the code but unreachable from this vector. Developing them
would require the `ng_ether`+SMP-traffic trigger, which is outside this harness.
The realistic impact ceiling for the confirmed race is **kernel panic / DoS**
on an INVARIANTS production bridge, with **potential heap corruption** on a
non-INVARIANTS build (the `kfree(priv->tab)` during rehash vs the timeout's
bucket walk is a genuine UAF).

---

## 6. PoC changes from the seeded draft

The seeded `race.c` was a sketch that the previous (aborted) runner had already
refined into a working graph-builder. This run made the following fixes to make
it honestly exercise the bridge at high rate (even though the race itself cannot
fire from this vector):

1. **Fixed `NG_DATA`/`NG_CONTROL` constant order** — DragonFly's
   `ng_socket.h:52-53` defines `NG_DATA=1, NG_CONTROL=2` (not the intuitive
   order). The original draft had them swapped, yielding `ENOTCONN`.
2. **Added a drain thread** on the data socket — without it, the bridge's
   broadcast fan-out fills the peer socket node's receive buffer within ~100
   frames and every subsequent `sendto` returns `ENOBUFS`, starving the flood
   (the seeded draft's 84-frames-per-8-seconds). With the drain, throughput
   rises to ~100 Kfps.
3. **Used two independent control/data socket pairs** (`ctlA`+`d0`,
   `ctlB`+`d1`) instead of one — a single netgraph socket node refuses a second
   data socket (`ng_connect_data` → `EADDRINUSE` at `ng_socket.c:664`), and a
   single data socket serializes `sendto` via `so_snd`. Two independent pairs
   remove that userspace-side serialization (though, per §2, the kernel-side
   `netisr_cpuport(0)` serialization remains and is the real blocker).
4. **Added a header comment** documenting the `netisr_cpuport(0)` finding so a
   future maintainer doesn't waste time trying to win the race from this vector.
5. **Added the `ng_name` helper** — `NGM_NAME` requires a fixed-size
   `struct ngm_name { char name[NG_NODESIZ=32]; }` (zero-padded), not a bare
   string; the seeded draft passed `strlen(name)+1` bytes and got `EINVAL`.
6. **Verified the privilege gate empirically** (EPERM as uid 1001) and the
   INVARIANTS-in-module fact (KASSERT strings present).

---

## 7. Recommended fix

`fix.diff` adds a per-node `struct lock` to `ng_bridge`'s `priv`, initializes it
in the constructor, and acquires it:
- **shared** (`LK_SHARED`) in `ng_bridge_get` (read-only lookup),
- **exclusive** (`LK_EXCLUSIVE`) in `ng_bridge_put` (insert + `numHosts++` +
  `rehash`),
- **exclusive** (`LK_EXCLUSIVE`) around the timeout sweep + `rehash`.

`LK_CANRECURSE` covers the `put` → `rehash` and `timeout` → `rehash` nested
acquires (rehash itself does not take the lock; it runs under the caller's held
lock). All return paths in the locked sections release the lock.

`git apply --check` passes against the read-only `sys/` tree. This fix
**supersedes** any vague "add FORCE_WRITER" proposal — `NG_NODE_FORCE_WRITER`
does not exist in this tree, so a lockmgr-based per-node lock is the correct,
minimal, DragonFly-idiomatic fix. The same fundamental issue (no serialization)
likely affects other legacy netgraph node types that mutate per-instance state
from `rcvdata` — flagged for the maintainer.
