Privileged Container Escape: Linux Capability Abuse and Host Device Access for Breakout
Theory
Why This Matters
In 2019, the CVE-2019-5736 runc vulnerability demonstrated that container escape was possible without --privileged, but --privileged containers have always been trivially escapable by design. Multiple Kubernetes cluster compromises — including the 2018 Tesla AWS cryptojacking incident and numerous subsequent red team engagements — involved privileged containers deployed as CI/CD agents, monitoring tools, or system utilities that gave attackers immediate host access upon code execution inside the container. The --privileged flag is so dangerous precisely because it appears to be a simple configuration option while functionally eliminating every security boundary Docker provides.
Core Concept
The --privileged flag in Docker does two things simultaneously: it grants the container all Linux capabilities (including CAP_SYS_ADMIN, CAP_NET_ADMIN, CAP_SYS_PTRACE, CAP_SYS_MODULE, and 34 others), and it disables the default seccomp and AppArmor profiles, removing syscall filtering. The combined effect is that a process inside a privileged container is functionally equivalent to a process running as root on the host — the only remaining isolation is the container's separate PID and network namespaces, which can be collapsed with --pid=host --net=host.
Escape technique — mounting the host disk:
Because a privileged container can access all host devices via /dev, the attacker mounts the host root disk partition directly, gaining full read-write access to the host filesystem without needing any container escape vulnerability.
Escape technique — loading a kernel module:
With CAP_SYS_MODULE, the attacker compiles and loads a malicious kernel module (rootkit) directly from inside the container, executing code in kernel space with no isolation whatsoever.
Escape technique — cgroup release_agent:
The CAP_SYS_ADMIN capability allows mounting cgroup filesystems. A published technique (CVE-2022-0492, Felix Wilhelm's PoC) uses the cgroup release_agent file to execute arbitrary commands on the host when the last process in a cgroup exits.
Legitimate use cases for --privileged are narrow: network packet capture requiring raw socket access, loading kernel modules for storage drivers, and certain hardware testing scenarios. In almost all production deployments where --privileged is found, it was added to "fix" a capability error and never removed.
Detection: docker inspect --format='{{.HostConfig.Privileged}}' CONTAINER_ID returns true for privileged containers. The CIS Docker Benchmark check 5.4 explicitly prohibits privileged containers.
Technical Deep-Dive
# Detect all running privileged containers
docker ps -q | while read cid; do
name=$(docker inspect "$cid" --format '{{.Name}}')
priv=$(docker inspect "$cid" --format '{{.HostConfig.Privileged}}')
caps=$(docker inspect "$cid" --format '{{.HostConfig.CapAdd}}')
if [ "$priv" = "true" ]; then
echo "PRIVILEGED: $name (ID: $cid)"
elif [ "$caps" != "[]" ] && [ -n "$caps" ]; then
echo "EXTRA_CAPS: $name CapAdd=$caps"
fi
done
# Check a specific container's full security configuration
docker inspect target-container --format '{{json .HostConfig}}' |
python3 -c "
import sys, json
hc = json.load(sys.stdin)
print('Privileged:', hc.get('Privileged'))
print('CapAdd:', hc.get('CapAdd'))
print('CapDrop:', hc.get('CapDrop'))
print('SecurityOpt:', hc.get('SecurityOpt'))
print('PidMode:', hc.get('PidMode'))
print('NetworkMode:', hc.get('NetworkMode'))
print('ReadonlyRootfs:', hc.get('ReadonlyRootfs'))
"
# From inside a privileged container: verify escape is possible
# Check available capabilities
cat /proc/self/status | grep CapEff
# Decode capability bitmask
capsh --decode=$(cat /proc/self/status | grep CapEff | awk '{print $2}')
# ESCAPE: Mount host root disk (for authorised testing only)
# Find the host root device
fdisk -l 2>/dev/null | grep "Linux filesystem"
# Mount it
mkdir /tmp/host && mount /dev/sda1 /tmp/host
ls /tmp/host/root/ # host root home directory
cat /tmp/host/etc/shadow # host shadow password file
# Run CIS Docker Benchmark
docker run --rm --net host --pid host --userns host --cap-add audit_control
-e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST
-v /etc:/etc:ro -v /lib/systemd/system:/lib/systemd/system:ro
-v /usr/bin/containerd:/usr/bin/containerd:ro
-v /usr/bin/runc:/usr/bin/runc:ro
-v /usr/lib/systemd:/usr/lib/systemd:ro
-v /var/lib:/var/lib:ro
-v /var/run/docker.sock:/var/run/docker.sock:ro
--label docker_bench_security
docker/docker-bench-security
# VULNERABLE Kubernetes pod spec
spec:
containers:
- name: monitoring-agent
image: monitor:latest
securityContext:
privileged: true # CRITICAL — remove this
# SECURE: use only required capabilities
spec:
containers:
- name: monitoring-agent
image: monitor:latest
securityContext:
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
capabilities:
drop: ["ALL"]
add: ["NET_RAW"] # only if packet capture is required
Security Assessment Methodology
- Enumerate all privileged containers. Inspect every running container for
HostConfig.Privileged: true. In Kubernetes, usekubectl get pods -A -o json | jq '.items[].spec.containers[].securityContext.privileged'to enumerate cluster-wide. - Check for dangerous individual capabilities. Even without full
--privileged, containers withCAP_SYS_ADMIN,CAP_SYS_PTRACE,CAP_SYS_MODULE,CAP_NET_ADMIN, orCAP_DAC_OVERRIDEadded individually can be exploited. List added caps viadocker inspect. - Verify seccomp and AppArmor profiles. Check
SecurityOptindocker inspect. A privileged container disables these; a non-privileged container should showseccomp:defaultand an AppArmor profile. Missing profiles indicate the container has wider syscall access than necessary. - Check for
--pid=hostor--net=host. These flags in combination with--privilegedor high capabilities allow accessing host process memory and host network interfaces, enabling MITM attacks and credential extraction from host process memory. - Demonstrate the escape in a safe test environment. Using the device mount technique or cgroup release_agent technique, demonstrate that a process in the privileged container can achieve root on the host. Document the full chain.
- Remediate by removing
--privilegedand replacing with only the specific capabilities required. Usecapsh --printinside the container to identify which capabilities are actually used. Drop all capabilities by default (--cap-drop=ALL) and add back only required ones. Enable read-only root filesystem and enforce seccomp profiles.
Common Assessment Errors
- Treating Kubernetes Pod Security Standards as automatic protection. Pod Security Standards (PSS)
restrictedprofile blocksprivileged: true, but PSS must be enforced via admission control (Pod Security Admission or an OPA policy). Many clusters have PSS inwarnmode, notenforce— privileged pods still run. - Overlooking init containers. Kubernetes init containers run before the main container and may have
privileged: truefor setup tasks. They deserve the same scrutiny as main containers — a privileged init container can modify the host before the main container starts. - Missing containers with capabilities equivalent to privileged.
CAP_SYS_ADMINalone provides most of the attack surface of full--privileged. An assessment that only flagsPrivileged: trueand missesCapAdd: [SYS_ADMIN]misses equivalently dangerous configurations. - Not testing AppArmor/seccomp bypass. Some container runtimes have
SecurityOpt: [seccomp:unconfined]orapparmor:unconfinedset without full--privileged. These containers have unrestricted syscall access even thoughPrivileged: false. Always check SecurityOpt. - Assuming the container image's USER instruction prevents escalation. Even if the container runs as a non-root user,
--privilegedgrants that user all capabilities. A UID 1000 process in a privileged container can still mount host disks and load kernel modules.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0053 | Knowledge of security risk management processes | Understanding that --privileged containers eliminate Docker's security model entirely — the risk is not container-level but host-level compromise |
| K0167 | Knowledge of system administration, network, and OS hardening techniques | Hardening container security contexts: dropping all capabilities, enforcing seccomp/AppArmor profiles, and using least-capability principle |
| S0073 | Skill in conducting vulnerability scans and recognizing vulnerabilities | Using docker inspect, kubectl get pods, and CIS Docker Benchmark scans to detect privileged containers and dangerous capability additions |
| T0144 | Conduct penetration testing as required for new or updated applications | Demonstrating host escape from privileged containers using device mounting and cgroup release_agent techniques during container security assessments |
| T0395 | Write code to address security vulnerabilities | Writing secure Kubernetes pod security contexts with privileged: false, capabilities.drop: [ALL], and explicit narrow capability additions |
Further Reading
- CIS Docker Benchmark, Section 5.4: Do not use privileged containers — Center for Internet Security (cisecurity.org)
- Understanding and Hardening Linux Containers — NCC Group Whitepaper, Capabilities section (research.nccgroup.com)
- Container Security — Liz Rice, Chapter 8: Linux Capabilities (O'Reilly Media)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.