Browse CTFs New CTF Sign in

Runtime String Obfuscation Reversal: XOR/Base64 Decode Hook Extraction via Dynamic and Static Analysis

reverse_engineering Difficulty 1–5 30 min certifiable

Theory

Reverse Engineering Methodology

String obfuscation is applied to hide sensitive literals — flag check strings, API keys, error messages, decryption keys — from static analysis tools like strings(1) and IDA's string window. Four techniques dominate CTF binaries and real-world malware alike:

1. XOR encoding. Each byte of the plaintext string is XORed with a single-byte or multi-byte key. The encoded bytes are stored in the .data or .rodata section. A decode stub (typically a short loop) XORs each byte with the key at runtime and stores the result on the heap or stack. In disassembly: look for a loop that reads from a data array, XORs with a register or immediate, and writes to another location. The key is visible as the XOR immediate or the value loaded into the XOR register.

2. Base64 at runtime. The string is Base64-encoded and stored as a printable constant. A call to a decode function (often a custom implementation, not the C standard library) produces the plaintext at runtime. Static: search for Base64-shaped string constants in the binary with grep -aoP '[A-Za-z0-9+/]{20,}=*'. Dynamic: place a breakpoint on the custom decoder's return instruction and read the returned pointer.

3. Stack construction. The string is never stored in the binary as a contiguous sequence of bytes. Instead, the compiler (or obfuscator) emits a series of mov instructions that write individual bytes or 4-byte groups directly onto the stack: mov dword ptr [rbp-0x20], 0x67616c66 (spells "flag" in little-endian). Identification: a long sequence of mov [rsp+N], imm instructions prior to a function call with the stack address as argument. Reassemble: collect all immediates in order and interpret as bytes.

4. Encrypted string table. All strings are stored in a single blob encrypted with a symmetric cipher (often AES-128-ECB or ChaCha20). An index structure maps string IDs to offsets within the blob. A get_string(id) function decrypts and returns the requested string. Static: find the decryption function by looking for calls to known cipher primitives; extract the key from the function. Dynamic: hook get_string with Frida and log all decrypted outputs.

Technical Deep-Dive

# Recover XOR-encoded strings: brute-force single-byte key
def xor_brute(encoded: bytes, expected_substr: bytes = b'{') -> list[tuple[int, bytes]]:
    """Try all 256 single-byte XOR keys; return candidates containing expected_substr."""
    results = []
    for key in range(256):
        decoded = bytes(b ^ key for b in encoded)
        if expected_substr in decoded:
            results.append((key, decoded))
    return results

# Multi-byte XOR key recovery using known-plaintext (e.g., flag prefix "CTF{")
def recover_xor_key(encoded: bytes, known_plain: bytes) -> bytes:
    return bytes(e ^ p for e, p in zip(encoded, known_plain))

# Reconstruct string from stack-construction mov immediates
def reconstruct_stack_string(moves: list[tuple[int, int]]) -> str:
    """
    moves: list of (stack_offset, dword_value) from disassembly
    e.g., [(0, 0x67616c66), (4, 0x7d345f33)]
    Returns the string assembled in order.
    """
    import struct
    max_off = max(off for off, _ in moves)
    buf = bytearray(max_off + 4)
    for offset, dword in sorted(moves):
        struct.pack_into("<I", buf, offset, dword)
    return buf.rstrip(b'x00').decode("latin-1")
# Static: grep for Base64-shaped strings in binary
strings target_binary | grep -P '^[A-Za-z0-9+/]{16,}={0,2}$'

# Frida: hook a decode function at known address and log result
frida -l hook_decode.js ./target_binary
# hook_decode.js:
# Interceptor.attach(ptr("0x401234"), {
#   onLeave: function(retval) {
#     console.log("[+] decoded: " + retval.readUtf8String());
#   }
# });

# gdb: breakpoint on decode function return, print result pointer
gdb -batch 
  -ex "break *0x401234" 
  -ex "commands 1" -ex "silent" 
  -ex "x/s $rax" 
  -ex "continue" 
  -ex "end" 
  -ex "run" ./target

# Extract XOR key dynamically: break at first byte XOR, print registers
gdb -ex "break *0x401100" -ex "run" -ex "info registers rax rdx" ./target

Common Reversing Errors

1. Trusting strings output as complete. strings only finds contiguous printable sequences above a minimum length (default 4). Stack-constructed strings and XOR-encoded strings are invisible to it. Always supplement with dynamic analysis.

2. Assuming single-byte XOR. If single-byte brute force produces no readable output, the key is multi-byte. Try key lengths 2–16 using the Vigenère-style IC (index of coincidence) test, or use known-plaintext if the flag format is known (e.g., CTF{).

3. Hooking the wrong function. Frida hooks on library functions like strlen or malloc will not catch custom decoder implementations. Identify the decoder by finding the data flow from the encoded constant to the decoded output in the disassembly.

4. Missing encrypted string table entries. If the binary has an get_string(id) pattern, hooking only the first call misses all subsequent strings. Use Frida's Interceptor.attach with onLeave logging to capture every invocation throughout execution.

5. Endianness confusion in stack strings. x86 is little-endian. The dword 0x67616c66 stores bytes 66 6c 61 67 in memory, spelling "flag". Reversing the byte order when reassembling produces "galf". Always unpack with <I (little-endian) when reading stack string immediates.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.