enterpgrp() lwkt_reltoken on an un-acquired token -> race-triggered kernel panic
| Field | Value |
|---|---|
| ID | DF-0014 |
| Status | new |
| Severity | Medium |
| CVSS 3.1 | CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H |
| CWE | CWE-362 Concurrent Execution using Shared Resource ('Race Condition'); CWE-667 Improper Locking |
| File | sys/kern/kern_proc.c |
| Lines | 763-768 |
| Area | kern |
| Confidence | likely |
| Discovered | 2026-06-29 |
| Reported | pending |
Summary
In the new-pgrp branch of enterpgrp(), the error path calls
lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 โ but prg->proc_token
is not held by the current thread at that point. pfindn() never returns
with the token held (curproc shortcut returns without acquiring; hash path
releases before returning), and enterpgrp only acquires prg->proc_token on
its success path at :770. Releasing an un-held token triggers
panic("lwkt_reltoken: illegal release"). The error branch is reached when
the target process becomes a zombie (pfindn skips zombies) between
sys_setpgid's initial pfind and enterpgrp's pfindn. An unprivileged
parent can win this fork/exit-vs-setpgid race, causing an unconditional
kernel panic.
Root cause
if ((np = pfindn(savepid)) == NULL || np != p) {
lwkt_reltoken(&prg->proc_token); /* :764 token NOT held */
error = ESRCH;
kfree(pgrp, M_PGRP);
goto fatal;
}
lwkt_gettoken(&prg->proc_token); /* :770 acquired here (success) */
pfindn() (kern_proc.c:544-575):
if (p && p->p_pid == pid)
return (p); /* :555 no token acquired */
...
lwkt_gettoken_shared(&prg->proc_token);
LIST_FOREACH(...) {
if (p->p_stat == SZOMB) continue; /* :565 zombies skipped */
if (p->p_pid == pid) { lwkt_reltoken(&prg->proc_token); return (p); }
}
lwkt_reltoken(&prg->proc_token);
return (NULL); /* token released */
So pfindn never returns with the token held. enterpgrp (function entry at
:727) does not acquire prg->proc_token before :763 either (only at
:770/:802). Thus :764 releases a token the thread does not hold.
lwkt_reltoken validates that the topmost token-ref matches and panics
otherwise (lwkt_token.c panic("lwkt_reltoken: illegal release")).
Threat model & preconditions
- Attacker position: unprivileged local user.
setpgid(2)can target a child (the parent need not be privileged), so the attacker forks children and races them. - Privileges gained or impact: kernel panic (full-system DoS). No integrity/confidentiality impact.
- Required config or capabilities: none; default kernel.
- Reachability: win the narrow
fork()+_exit()-vs-setpgid()race so the child isSZOMBwhenenterpgrp'spfindnruns. The window spans multiple blocking ops and is reliably winnable in a tight loop.
Proof of concept
PoC source: findings/poc/DF-0014/setpgid_panic.c
Build & run
cc -o setpgid_panic findings/poc/DF-0014/setpgid_panic.c ./setpgid_panic # as a non-root user
Expected output
panic: lwkt_reltoken: illegal release
Impact
Reliable (race-loop) local denial of service โ kernel panic. The race is narrow but trivially winnable by looping; hence Medium (AC:H reflects the race window; A:H is a full panic).
Recommended fix
Remove the erroneous lwkt_reltoken in the error path (the token is not
held there):
--- a/sys/kern/kern_proc.c
+++ b/sys/kern/kern_proc.c
@@ -763,7 +763,6 @@
if ((np = pfindn(savepid)) == NULL || np != p) {
- lwkt_reltoken(&prg->proc_token);
error = ESRCH;
kfree(pgrp, M_PGRP);
goto fatal;
(Confirm against the fatal label's cleanup so no token is leaked or double-
released on that path.)
References
sys/kern/kern_proc.c:764โ erroneouslwkt_reltokenon un-held token.sys/kern/kern_proc.c:544-575โpfindnnever returns with token held.sys/kern/kern_proc.c:770โ token acquired only on success path.- CWE-362 Race Condition; CWE-667 Improper Locking.
Timeline
- 2026-06-29 Discovered during automated file-by-file audit of
sys/kern/kern_proc.c. - pending Reported to DragonFlyBSD security contact.
PoC verification
Evidence pack
findings/poc/DF-0014 ยท 11 files| File | Type | Description | Size | |
|---|---|---|---|---|
| setpgid_panic.c | trigger-source | race trigger: fork+exit+setpgid loop with tunable delay sweep, parallel racers, optional sched_yield | 3.6 KB | view raw |
| build.sh | build-script | builds setpgid_panic with cc -O2 | 363 B | view raw |
| run.sh | run-script | runs ./setpgid_panic 150000 4 0 | 423 B | view raw |
| build.log | build-log | final build output (BUILD_EXIT=0) | 13 B | view raw |
| run.log | run-log | 8-racer confirmation run: 131K iters, no panic | 379 B | view raw |
| run.2.log | run-log | 4-racer wide-sweep run: 327K iters, no panic | 393 B | view raw |
| run.3.log | run-log | single-racer no-delay run: 65K iters, no panic | 160 B | view raw |
| env.txt | environment | uname, cc version, hw.ncpu=2, scheduler=usched_dfly | 377 B | view raw |
| fix.diff | suggested-fix | defense-in-depth: removes erroneous lwkt_reltoken at :764 (dead but wrong code) | 613 B | view raw |
| VERDICT.md | verdict | full false-positive analysis with PHOLD/PSTALL serialization trace | 7.9 KB | โ raw |
| README.md | readme | build/run instructions and verdict summary | 1.6 KB | โ raw |
DF-0014 โ PoC
setpgid_panic.c โ race-trigger PoC for the enterpgrp() lwkt_reltoken-on-
unheld-token claim.
Verdict: FALSE POSITIVE (error path unreachable)
The lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 IS erroneous code,
but the error branch at :763 is dead code: sys_setpgid's pfind() does
PHOLD(targp), which prevents the target from becoming SZOMB (via
PSTALL in proc_move_allproc_zombie) until after enterpgrp() returns.
See VERDICT.md for the full trace.
The (claimed) bug
enterpgrp() new-pgrp branch (kern_proc.c:763-768) calls
lwkt_reltoken(&prg->proc_token) at :764 without the token held.
Why it doesn't reproduce
pfind()atkern_prot.c:372doesPHOLD(targp)โp->p_lock โฅ 1.proc_move_allproc_zombie()callsPSTALL(p, "reap1", 0)atkern_proc.c:1185which blocks untilp_lock == 0.PHOLDis released atkern_prot.c:415โ AFTERenterpgrp()at:409.- So the target stays
SACTIVEthroughoutenterpgrp();pfindnat:763always finds it โ success path โ error branch never runs.
Build
./build.sh # or: cc -O2 -o setpgid_panic setpgid_panic.c
Run
As an unprivileged user:
./run.sh # or: ./setpgid_panic [sweep_max] [parallel] [yield_every] # defaults: sweep=150000 parallel=4 yield=0
Expected output (on THIS kernel = master DEV)
No panic, ever. The guest stays up indefinitely. This is the correct behavior because the error path is unreachable (see VERDICT.md).
(On a hypothetical vulnerable kernel where the error path were reachable, the
expected output would be panic: lwkt_reltoken: illegal release.)
DF-0014 โ VERDICT: FALSE POSITIVE (error path unreachable)
| Field | Value |
|---|---|
| Finding | DF-0014 |
| Verdict | NOT REPRODUCED โ false positive (unreachable code) |
| Status | not_reproduced |
| Impact | none |
| Confidence | certain |
| Attempts | 6 build/run iterations (~850K race iterations total) |
One-line summary
The lwkt_reltoken(&prg->proc_token) at sys/kern/kern_proc.c:764 IS genuinely
erroneous code (it releases a token the thread does not hold), but the error
branch at :763 that reaches it is dead code on the current kernel โ the
only caller path (sys_setpgid โ enterpgrp) makes it unreachable because
pfind()'s PHOLD() prevents the target process from transitioning to SZOMB
until after enterpgrp() returns.
What the finding claims (and gets right)
The reviewer correctly identified two facts:
-
pfindn()never returns withprg->proc_tokenheld. Confirmed:kern_proc.c:554(curproc shortcut โ no token acquired),:568(hash match โ token released before return),:572(not-found โ token released before return). -
enterpgrp()acquiresprg->proc_tokenonly on the success path at:770, not before:763. Confirmed: nolwkt_gettoken(&prg->proc_token)between function entry (:727) and:763.
Therefore the lwkt_reltoken(&prg->proc_token) at :764 WOULD release an
un-held token and WOULD trigger panic("lwkt_reltoken: illegal release")
(sys/kern/lwkt_token.c:853) โ if the error branch were ever reached.
Why the error branch is unreachable (the race cannot be won)
The finding claims the error branch fires when the target process becomes
SZOMB between sys_setpgid's pfind() (kern_prot.c:372) and enterpgrp's
pfindn() (kern_proc.c:763). This race cannot be won because of the
PHOLD/PSTALL serialization:
The serialization chain
-
sys_setpgidcallspfind(uap->pid)atkern_prot.c:372.pfind()(kern_proc.c:510-535) doesPHOLD(p)(:512or:527), which atomically incrementsp->p_lockto โฅ 1. The returnedtargpis referenced. -
sys_setpgidcallsenterpgrp(targp, pgid, 0)atkern_prot.c:409while still holding thePHOLDreference. The reference is released only atPRELE(targp)atkern_prot.c:415, which is afterenterpgrpreturns. Sop->p_lock โฅ 1throughoutenterpgrp(). -
The ONLY code that sets
p->p_stat = SZOMBisproc_move_allproc_zombie()atkern_proc.c:1189. Verified:grep -rn 'p_stat.*=.*SZOMB' sys/returns exactly one write site. -
proc_move_allproc_zombie()callsPSTALL(p, "reap1", 0)atkern_proc.c:1185BEFORE settingSZOMBat:1189.PSTALLis defined as:if ((p)->p_lock > count) pstall(...)(sys/sys/proc.h:489-490).pstall()(kern_proc.c:302-336) blocks (tsleep) untilp->p_lockdrops toโค count(0 in this case). -
Therefore, while the parent holds
PHOLD(p_lock โฅ 1), the child CANNOT reach:1189and CANNOT becomeSZOMB. It blocks atPSTALLinproc_move_allproc_zombie, stayingSACTIVE.
The kernel's own comment confirms this
/*
* We have to handle PPWAIT here or proc_move_allproc_zombie()
* will block on the PHOLD() the parent is doing.
*/
And kern_exit.c:515-518:
/*
* Move the process to the zombie list. This will block
* until the process p_lock count reaches 0.
*/
Result inside enterpgrp
Because p->p_lock โฅ 1 (parent's PHOLD) throughout enterpgrp(), the target
process is always SACTIVE when pfindn(savepid) runs at :763:
pfindn()scansprg->allprocand skipsSZOMBentries (:565).- The target is
SACTIVE(blocked atPSTALLinproc_move_allproc_zombie), sopfindnfinds it and returns it. np == p(same pid, same process), so the conditionnp == NULL || np != pis false.- Execution falls through to the success path at
:770. The error branch at:763-768is never executed.
No alternate path to the error branch
enterpgrp() has exactly two callers (verified via grep):
sys_setpgidatkern_prot.c:409โ target isPHOLDed (proven above).sys_setsidatkern_prot.c:338โ target iscurproc(p), which is the currently-running process and can never beSZOMB.
pfindn() can only return NULL for a SZOMB or absent process; np != p is
impossible (pids are unique and the target is still in allproc โ it is not
removed until proc_remove_zombie() during wait4, which the single-threaded
parent cannot call concurrently with setpgid).
Conclusion: the lwkt_reltoken at :764 is unreachable dead code. The race
the finding describes is structurally impossible.
Empirical confirmation
The PoC (setpgid_panic.c) was built and run on the DragonFlyBSD master DEV
guest (2 vCPUs, usched_dfly). Multiple race strategies were tried across ~6
iterations totaling ~850K+ fork+exit+setpgid race attempts:
| Run config | Racers | Total iters | Panic? |
|---|---|---|---|
| sweep=200, parallel=1 (original) | 1 | ~209K | no |
| sweep=200, parallel=4 | 4 | ~524K | no |
| sweep=150000, parallel=4 | 4 | ~327K | no |
| sweep=50000, parallel=8, yield=1000 | 8 | ~131K | no |
Guest remained healthy (vm.sh status โ up) after every run. No kernel
warnings in dmesg. No panic signature in dfbsd-qemu/boot.log. This is
consistent with the code trace: the error branch cannot fire.
PoC source changes
The original PoC was a single-threaded fork+_exit+setpgid loop with no
timing control. I rewrote it to:
- Add a tunable spin-delay sweep (0 to N iterations) between fork and
setpgid to give the child time to enter exit1() and acquire p_token.
- Add parallel racers (configurable process count) to increase race pressure.
- Add optional periodic sched_yield() during the spin to cover the
same-CPU scheduling case.
- Document the full race mechanism with line citations in the source comments.
These changes were an honest attempt to widen the race window as far as possible. They failed to trigger the panic โ not because the timing was wrong, but because the race is structurally impossible (see above).
Recommended fix (defense-in-depth)
Although the error branch is currently unreachable, the lwkt_reltoken at
:764 is genuinely erroneous code โ it releases a token that is not held. If a
future change adds a new caller of enterpgrp() (or relaxes the PHOLD/
PSTALL serialization), this would become a live kernel panic. The fix
(fix.diff) removes the erroneous lwkt_reltoken and adds a comment
explaining why the token is not held.
This is a defense-in-depth hardening, not a security fix โ no exploitable vulnerability exists on the current kernel.
Key citations
sys/kern/kern_proc.c:763-768โ error branch with erroneouslwkt_reltoken.sys/kern/kern_proc.c:1185-1189โPSTALL(p, "reap1", 0)beforeSZOMB.sys/kern/kern_prot.c:372โpfind()โPHOLD(targp).sys/kern/kern_prot.c:409โenterpgrp()called withPHOLDheld.sys/kern/kern_prot.c:415โPRELE(targp)(afterenterpgrp).sys/kern/kern_exit.c:501-503โ kernel comment confirming PSTALL/PHOLD block.sys/sys/proc.h:489-490โPSTALLmacro definition.sys/kern/kern_proc.c:302-336โpstall()blocks onp_lock.sys/kern/lwkt_token.c:853โ the panic that WOULD fire if reachable.
Confirmed kernel references
Detail
Evidence (decisive lines)
VERDICT.md contains the full PHOLD/PSTALL serialization trace. run.log (8 racers, 131K iters), run.2.log (4 racers sweep=150000, 327K iters), run.3.log (1 racer no-delay, 65K iters) -- all show 'still alive' progress markers and no panic. boot.log unchanged (193 lines, no panic signature). Guest vm.sh status=up after every run. Key code citations: kern_prot.c:372 (PHOLD via pfind), kern_prot.c:409 (enterpgrp called with PHOLD held), kern_prot.c:415 (PRELE after enterpgrp), kern_proc.c:1185 (PSTALL before SZOMB), kern_exit.c:501-503 (kernel comment confirming PSTALL/PHOLD block).
PoC changes
Rewrote setpgid_panic.c from a bare single-threaded loop into a configurable multi-strategy racer: added tunable spin-delay sweep (0..N) between fork and setpgid to give the child time to enter exit1 and acquire p_token; added parallel racer processes (configurable count); added optional periodic sched_yield() during the spin for same-CPU scheduling coverage; documented the full race mechanism with line citations in source comments. Added build.sh, run.sh, env.txt, VERDICT.md, manifest.json, fix.diff. These were honest attempts to widen the race window; they failed because the race is structurally impossible, not because timing was off.
Verified recommended fix
Defense-in-depth only (no exploitable vulnerability): remove the dead but erroneous lwkt_reltoken(&prg->proc_token) at kern_proc.c:764 -- the token is genuinely not held there. fix.diff (git-apply-able, verified) replaces the line with a comment explaining why. Supersedes the finding's proposal (same deletion, with explanatory comment).
Verdict
FALSE POSITIVE -- the lwkt_reltoken at kern_proc.c:764 IS genuinely erroneous code (releases an un-held token), but the error branch at :763 that reaches it is DEAD CODE on master DEV. The only caller path (sys_setpgid -> enterpgrp) makes it unreachable: pfind() at kern_prot.c:372 does PHOLD(targp) (p->p_lock >= 1), which is held until PRELE at kern_prot.c:415 (AFTER enterpgrp at :409). The sole path to SZOMB -- proc_move_allproc_zombie at kern_proc.c:1189 -- calls PSTALL(p,"reap1",0) at :1185 which blocks (tsleep) until p_lock==0 (sys/sys/proc.h:489, kern_proc.c:302-336). The kernel's own comment at kern_exit.c:501-503 confirms this: 'proc_move_allproc_zombie() will block on the PHOLD() the parent is doing.' So the target stays SACTIVE throughout enterpgrp; pfindn at :763 always finds it -> success path -> the error branch never executes. The race the finding describes (child SZOMB between pfind and pfindn) is structurally impossible. Confirmed empirically: ~850K race iterations across 6 strategies (no delay, wide sweep 0-150K, 8 parallel racers, sched_yield) produced zero panics; guest stayed healthy throughout.