DF-0014

enterpgrp() lwkt_reltoken on an un-acquired token -> race-triggered kernel panic

Field	Value
ID	DF-0014
Status	new
Severity	Medium
CVSS 3.1	CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H
CWE	CWE-362 Concurrent Execution using Shared Resource ('Race Condition'); CWE-667 Improper Locking
File	sys/kern/kern_proc.c
Lines	763-768
Area	kern
Confidence	likely
Discovered	2026-06-29
Reported	pending

Summary

In the new-pgrp branch of enterpgrp(), the error path calls lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 — but prg->proc_token is not held by the current thread at that point. pfindn() never returns with the token held (curproc shortcut returns without acquiring; hash path releases before returning), and enterpgrp only acquires prg->proc_token on its success path at :770. Releasing an un-held token triggers panic("lwkt_reltoken: illegal release"). The error branch is reached when the target process becomes a zombie (pfindn skips zombies) between sys_setpgid's initial pfind and enterpgrp's pfindn. An unprivileged parent can win this fork/exit-vs-setpgid race, causing an unconditional kernel panic.

Root cause

sys/kern/kern_proc.c:763-768:

if ((np = pfindn(savepid)) == NULL || np != p) {
    lwkt_reltoken(&prg->proc_token);     /* :764  token NOT held */
    error = ESRCH;
    kfree(pgrp, M_PGRP);
    goto fatal;
}

lwkt_gettoken(&prg->proc_token);         /* :770  acquired here (success) */

pfindn() (kern_proc.c:544-575):

if (p && p->p_pid == pid)
    return (p);                          /* :555  no token acquired */
...
lwkt_gettoken_shared(&prg->proc_token);
LIST_FOREACH(...) {
    if (p->p_stat == SZOMB) continue;    /* :565  zombies skipped */
    if (p->p_pid == pid) { lwkt_reltoken(&prg->proc_token); return (p); }
}
lwkt_reltoken(&prg->proc_token);
return (NULL);                           /* token released */

So pfindn never returns with the token held. enterpgrp (function entry at :727) does not acquire prg->proc_token before :763 either (only at :770/:802). Thus :764 releases a token the thread does not hold.

lwkt_reltoken validates that the topmost token-ref matches and panics otherwise (lwkt_token.c panic("lwkt_reltoken: illegal release")).

Threat model & preconditions

Attacker position: unprivileged local user. setpgid(2) can target a child (the parent need not be privileged), so the attacker forks children and races them.
Privileges gained or impact: kernel panic (full-system DoS). No integrity/confidentiality impact.
Required config or capabilities: none; default kernel.
Reachability: win the narrow fork()+_exit()-vs-setpgid() race so the child is SZOMB when enterpgrp's pfindn runs. The window spans multiple blocking ops and is reliably winnable in a tight loop.

Proof of concept

PoC source: findings/poc/DF-0014/setpgid_panic.c

Build & run

cc -o setpgid_panic findings/poc/DF-0014/setpgid_panic.c
./setpgid_panic        # as a non-root user

Expected output

panic: lwkt_reltoken: illegal release

Impact

Reliable (race-loop) local denial of service — kernel panic. The race is narrow but trivially winnable by looping; hence Medium (AC:H reflects the race window; A:H is a full panic).

Recommended fix

Remove the erroneous lwkt_reltoken in the error path (the token is not held there):

--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -763,7 +763,6 @@
        if ((np = pfindn(savepid)) == NULL || np != p) {
-           lwkt_reltoken(&prg->proc_token);
            error = ESRCH;
            kfree(pgrp, M_PGRP);
            goto fatal;

(Confirm against the fatal label's cleanup so no token is leaked or double- released on that path.)

References

sys/kern/kern_proc.c:764 — erroneous lwkt_reltoken on un-held token.
sys/kern/kern_proc.c:544-575 — pfindn never returns with token held.
sys/kern/kern_proc.c:770 — token acquired only on success path.
CWE-362 Race Condition; CWE-667 Improper Locking.

Timeline

2026-06-29 Discovered during automated file-by-file audit of sys/kern/kern_proc.c.
pending Reported to DragonFlyBSD security contact.

PoC verification

Evidence pack

findings/poc/DF-0014 · 11 files

File	Type	Description	Size
setpgid_panic.c	trigger-source	race trigger: fork+exit+setpgid loop with tunable delay sweep, parallel racers, optional sched_yield	3.6 KB	view raw
build.sh	build-script	builds setpgid_panic with cc -O2	363 B	view raw
run.sh	run-script	runs ./setpgid_panic 150000 4 0	423 B	view raw
build.log	build-log	final build output (BUILD_EXIT=0)	13 B	view raw
run.log	run-log	8-racer confirmation run: 131K iters, no panic	379 B	view raw
run.2.log	run-log	4-racer wide-sweep run: 327K iters, no panic	393 B	view raw
run.3.log	run-log	single-racer no-delay run: 65K iters, no panic	160 B	view raw
env.txt	environment	uname, cc version, hw.ncpu=2, scheduler=usched_dfly	377 B	view raw
fix.diff	suggested-fix	defense-in-depth: removes erroneous lwkt_reltoken at :764 (dead but wrong code)	613 B	view raw
VERDICT.md	verdict	full false-positive analysis with PHOLD/PSTALL serialization trace	7.9 KB	↓ raw
README.md	readme	build/run instructions and verdict summary	1.6 KB	↓ raw

README.md readme build/run instructions and verdict summary

↓ download raw

DF-0014 — PoC

setpgid_panic.c — race-trigger PoC for the enterpgrp() lwkt_reltoken-on- unheld-token claim.

Verdict: FALSE POSITIVE (error path unreachable)

The lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 IS erroneous code, but the error branch at :763 is dead code: sys_setpgid's pfind() does PHOLD(targp), which prevents the target from becoming SZOMB (via PSTALL in proc_move_allproc_zombie) until after enterpgrp() returns. See VERDICT.md for the full trace.

The (claimed) bug

enterpgrp() new-pgrp branch (kern_proc.c:763-768) calls lwkt_reltoken(&prg->proc_token) at :764 without the token held.

Why it doesn't reproduce

pfind() at kern_prot.c:372 does PHOLD(targp) → p->p_lock ≥ 1.
proc_move_allproc_zombie() calls PSTALL(p, "reap1", 0) at kern_proc.c:1185 which blocks until p_lock == 0.
PHOLD is released at kern_prot.c:415 — AFTER enterpgrp() at :409.
So the target stays SACTIVE throughout enterpgrp(); pfindn at :763 always finds it → success path → error branch never runs.

Build

./build.sh
# or: cc -O2 -o setpgid_panic setpgid_panic.c

Run

As an unprivileged user:

./run.sh
# or: ./setpgid_panic [sweep_max] [parallel] [yield_every]
#     defaults: sweep=150000 parallel=4 yield=0

Expected output (on THIS kernel = master DEV)

No panic, ever. The guest stays up indefinitely. This is the correct behavior because the error path is unreachable (see VERDICT.md).

(On a hypothetical vulnerable kernel where the error path were reachable, the expected output would be panic: lwkt_reltoken: illegal release.)

VERDICT.md verdict full false-positive analysis with PHOLD/PSTALL serialization trace

↓ download raw

DF-0014 — VERDICT: FALSE POSITIVE (error path unreachable)

Field	Value
Finding	DF-0014
Verdict	NOT REPRODUCED — false positive (unreachable code)
Status	not_reproduced
Impact	none
Confidence	certain
Attempts	6 build/run iterations (~850K race iterations total)

One-line summary

The lwkt_reltoken(&prg->proc_token) at sys/kern/kern_proc.c:764 IS genuinely erroneous code (it releases a token the thread does not hold), but the error branch at :763 that reaches it is dead code on the current kernel — the only caller path (sys_setpgid → enterpgrp) makes it unreachable because pfind()'s PHOLD() prevents the target process from transitioning to SZOMB until after enterpgrp() returns.

What the finding claims (and gets right)

The reviewer correctly identified two facts:

pfindn() never returns with prg->proc_token held. Confirmed: kern_proc.c:554 (curproc shortcut — no token acquired), :568 (hash match — token released before return), :572 (not-found — token released before return).
enterpgrp() acquires prg->proc_token only on the success path at :770, not before :763. Confirmed: no lwkt_gettoken(&prg->proc_token) between function entry (:727) and :763.

Therefore the lwkt_reltoken(&prg->proc_token) at :764 WOULD release an un-held token and WOULD trigger panic("lwkt_reltoken: illegal release") (sys/kern/lwkt_token.c:853) — if the error branch were ever reached.

Why the error branch is unreachable (the race cannot be won)

The finding claims the error branch fires when the target process becomes SZOMB between sys_setpgid's pfind() (kern_prot.c:372) and enterpgrp's pfindn() (kern_proc.c:763). This race cannot be won because of the PHOLD/PSTALL serialization:

The serialization chain

sys_setpgid calls pfind(uap->pid) at kern_prot.c:372. pfind() (kern_proc.c:510-535) does PHOLD(p) (:512 or :527), which atomically increments p->p_lock to ≥ 1. The returned targp is referenced.
sys_setpgid calls enterpgrp(targp, pgid, 0) at kern_prot.c:409 while still holding the PHOLD reference. The reference is released only at PRELE(targp) at kern_prot.c:415, which is after enterpgrp returns. So p->p_lock ≥ 1 throughout enterpgrp().
The ONLY code that sets p->p_stat = SZOMB is proc_move_allproc_zombie() at kern_proc.c:1189. Verified: grep -rn 'p_stat.*=.*SZOMB' sys/ returns exactly one write site.
proc_move_allproc_zombie() calls PSTALL(p, "reap1", 0) at kern_proc.c:1185 BEFORE setting SZOMB at :1189. PSTALL is defined as: if ((p)->p_lock > count) pstall(...) (sys/sys/proc.h:489-490). pstall() (kern_proc.c:302-336) blocks (tsleep) until p->p_lock drops to ≤ count (0 in this case).
Therefore, while the parent holds PHOLD (p_lock ≥ 1), the child CANNOT reach :1189 and CANNOT become SZOMB. It blocks at PSTALL in proc_move_allproc_zombie, staying SACTIVE.

The kernel's own comment confirms this

sys/kern/kern_exit.c:501-503:

/*
 * We have to handle PPWAIT here or proc_move_allproc_zombie()
 * will block on the PHOLD() the parent is doing.
 */

And kern_exit.c:515-518:

/*
 * Move the process to the zombie list.  This will block
 * until the process p_lock count reaches 0.
 */

Result inside enterpgrp

Because p->p_lock ≥ 1 (parent's PHOLD) throughout enterpgrp(), the target process is always SACTIVE when pfindn(savepid) runs at :763:

pfindn() scans prg->allproc and skips SZOMB entries (:565).
The target is SACTIVE (blocked at PSTALL in proc_move_allproc_zombie), so pfindn finds it and returns it.
np == p (same pid, same process), so the condition np == NULL || np != p is false.
Execution falls through to the success path at :770. The error branch at :763-768 is never executed.

No alternate path to the error branch

enterpgrp() has exactly two callers (verified via grep):

sys_setpgid at kern_prot.c:409 — target is PHOLDed (proven above).
sys_setsid at kern_prot.c:338 — target is curproc (p), which is the currently-running process and can never be SZOMB.

pfindn() can only return NULL for a SZOMB or absent process; np != p is impossible (pids are unique and the target is still in allproc — it is not removed until proc_remove_zombie() during wait4, which the single-threaded parent cannot call concurrently with setpgid).

Conclusion: the lwkt_reltoken at :764 is unreachable dead code. The race the finding describes is structurally impossible.

Empirical confirmation

The PoC (setpgid_panic.c) was built and run on the DragonFlyBSD master DEV guest (2 vCPUs, usched_dfly). Multiple race strategies were tried across ~6 iterations totaling ~850K+ fork+exit+setpgid race attempts:

Run config	Racers	Total iters	Panic?
sweep=200, parallel=1 (original)	1	~209K	no
sweep=200, parallel=4	4	~524K	no
sweep=150000, parallel=4	4	~327K	no
sweep=50000, parallel=8, yield=1000	8	~131K	no

Guest remained healthy (vm.sh status → up) after every run. No kernel warnings in dmesg. No panic signature in dfbsd-qemu/boot.log. This is consistent with the code trace: the error branch cannot fire.

PoC source changes

The original PoC was a single-threaded fork+_exit+setpgid loop with no timing control. I rewrote it to: - Add a tunable spin-delay sweep (0 to N iterations) between fork and setpgid to give the child time to enter exit1() and acquire p_token. - Add parallel racers (configurable process count) to increase race pressure. - Add optional periodic sched_yield() during the spin to cover the same-CPU scheduling case. - Document the full race mechanism with line citations in the source comments.

These changes were an honest attempt to widen the race window as far as possible. They failed to trigger the panic — not because the timing was wrong, but because the race is structurally impossible (see above).

Recommended fix (defense-in-depth)

Although the error branch is currently unreachable, the lwkt_reltoken at :764 is genuinely erroneous code — it releases a token that is not held. If a future change adds a new caller of enterpgrp() (or relaxes the PHOLD/ PSTALL serialization), this would become a live kernel panic. The fix (fix.diff) removes the erroneous lwkt_reltoken and adds a comment explaining why the token is not held.

This is a defense-in-depth hardening, not a security fix — no exploitable vulnerability exists on the current kernel.

Key citations

sys/kern/kern_proc.c:763-768 — error branch with erroneous lwkt_reltoken.
sys/kern/kern_proc.c:1185-1189 — PSTALL(p, "reap1", 0) before SZOMB.
sys/kern/kern_prot.c:372 — pfind() → PHOLD(targp).
sys/kern/kern_prot.c:409 — enterpgrp() called with PHOLD held.
sys/kern/kern_prot.c:415 — PRELE(targp) (after enterpgrp).
sys/kern/kern_exit.c:501-503 — kernel comment confirming PSTALL/PHOLD block.
sys/sys/proc.h:489-490 — PSTALL macro definition.
sys/kern/kern_proc.c:302-336 — pstall() blocks on p_lock.
sys/kern/lwkt_token.c:853 — the panic that WOULD fire if reachable.

Confirmed kernel references

Detail

Evidence (decisive lines)

VERDICT.md contains the full PHOLD/PSTALL serialization trace. run.log (8 racers, 131K iters), run.2.log (4 racers sweep=150000, 327K iters), run.3.log (1 racer no-delay, 65K iters) -- all show 'still alive' progress markers and no panic. boot.log unchanged (193 lines, no panic signature). Guest vm.sh status=up after every run. Key code citations: kern_prot.c:372 (PHOLD via pfind), kern_prot.c:409 (enterpgrp called with PHOLD held), kern_prot.c:415 (PRELE after enterpgrp), kern_proc.c:1185 (PSTALL before SZOMB), kern_exit.c:501-503 (kernel comment confirming PSTALL/PHOLD block).

PoC changes

Rewrote setpgid_panic.c from a bare single-threaded loop into a configurable multi-strategy racer: added tunable spin-delay sweep (0..N) between fork and setpgid to give the child time to enter exit1 and acquire p_token; added parallel racer processes (configurable count); added optional periodic sched_yield() during the spin for same-CPU scheduling coverage; documented the full race mechanism with line citations in source comments. Added build.sh, run.sh, env.txt, VERDICT.md, manifest.json, fix.diff. These were honest attempts to widen the race window; they failed because the race is structurally impossible, not because timing was off.

Verified recommended fix

Defense-in-depth only (no exploitable vulnerability): remove the dead but erroneous lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 -- the token is genuinely not held there. fix.diff (git-apply-able, verified) replaces the line with a comment explaining why. Supersedes the finding's proposal (same deletion, with explanatory comment).

Verdict

FALSE POSITIVE -- the lwkt_reltoken at kern_proc.c:764 IS genuinely erroneous code (releases an un-held token), but the error branch at :763 that reaches it is DEAD CODE on master DEV. The only caller path (sys_setpgid -> enterpgrp) makes it unreachable: pfind() at kern_prot.c:372 does PHOLD(targp) (p->p_lock >= 1), which is held until PRELE at kern_prot.c:415 (AFTER enterpgrp at :409). The sole path to SZOMB -- proc_move_allproc_zombie at kern_proc.c:1189 -- calls PSTALL(p,"reap1",0) at :1185 which blocks (tsleep) until p_lock==0 (sys/sys/proc.h:489, kern_proc.c:302-336). The kernel's own comment at kern_exit.c:501-503 confirms this: 'proc_move_allproc_zombie() will block on the PHOLD() the parent is doing.' So the target stays SACTIVE throughout enterpgrp; pfindn at :763 always finds it -> success path -> the error branch never executes. The race the finding describes (child SZOMB between pfind and pfindn) is structurally impossible. Confirmed empirically: ~850K race iterations across 6 strategies (no delay, wide sweep 0-150K, 8 parallel racers, sched_yield) produced zero panics; guest stayed healthy throughout.