Detecting XOR-Encoded Payloads in Memory Dumps: Entropy Analysis and Brute-Force Key Recovery

cloud_container_security Difficulty 1–5 30 min certifiable

Theory

Why This Matters

XOR encoding is one of the oldest and most persistent malware obfuscation techniques precisely because it is trivially simple to implement, requires no external libraries, and is sufficient to defeat signature-based detection of static strings. The Emotet banking trojan family, active from 2014 through 2021 and revived in 2022, used single-byte XOR encoding to hide its C2 server list and configuration data within the unpacked process image in memory. Analysts who understood XOR encoding patterns were able to recover the C2 list from memory dumps within minutes using a brute-force script; those who looked only for plaintext strings found nothing. XOR analysis is a core technique that belongs in every memory analyst's repertoire.

Core Concept

XOR (exclusive-or) is a bitwise operation with the property that A XOR K XOR K = A — applying the same key twice recovers the original value. In single-byte XOR encoding, each byte of the plaintext is XORed with a fixed key byte: ciphertext[i] = plaintext[i] XOR key. To decode, apply the same operation: plaintext[i] = ciphertext[i] XOR key.

Detection heuristics for XOR-encoded data in a memory dump:

Key byte 0x00 is a no-op: XOR with 0x00 leaves the byte unchanged. Skip it in brute-force searches as it indicates unencoded data.
High entropy: XOR encoding does not change the statistical distribution of byte values. If the plaintext is natural-language or structured data with low entropy, the XOR output will have similarly distributed entropy — but because it mixes values more uniformly, the Shannon entropy of the encoded blob will typically exceed 7.0 bits/byte.
Null-byte preservation signature: In a UTF-16LE string XORed with a key, the null padding bytes become the key itself. Scanning for repeating byte sequences of length 2 within what appears to be random binary data reveals UTF-16LE XOR keys.
Known-plaintext attack: If the analyst knows or suspects the first few bytes of the plaintext (e.g., MZ for a PE header, http for a URL, { for a JSON config), XOR the suspected plaintext bytes with the observed ciphertext bytes to derive the key candidate.

Multi-byte XOR uses a key longer than 1 byte (e.g., 4-byte key), applied cyclically: ciphertext[i] = plaintext[i] XOR key[i % key_length]. xortool automates key-length detection using the Index of Coincidence (IC) and then brute-forces the key.

Technical Deep-Dive

# Entropy analysis on a binary blob suspected of containing XOR data
ent suspicious_blob.bin
# Shannon entropy > 7.5 bits/byte suggests encryption or compression
# 7.0–7.5 may indicate XOR of structured data

# xortool: automated XOR key detection and decryption
# Install: pip install xortool
xortool suspicious_blob.bin
# xortool guesses key length using IC, then tries all key bytes
# Output: decoded files in xortool-out/

# Brute-force single-byte XOR: try all 256 keys, print printable-looking results
python3 -c "
import sys
data = open('suspicious_blob.bin','rb').read()
for key in range(1,256):
    decoded = bytes(b ^ key for b in data)
    printable = sum(0x20 <= b < 0x7f or b in (9,10,13) for b in decoded)
    ratio = printable / len(decoded)
    if ratio > 0.7:
        print(f'Key 0x{key:02x} ({ratio:.0%} printable): {decoded[:80]}')
"

# Full XOR brute-force with known-plaintext assist
def xor_brute(data: bytes, known_prefix: bytes = b'') -> list:
    """Brute-force single-byte XOR, optionally using a known plaintext prefix."""
    results = []
    if known_prefix:
        # Derive key candidates from known prefix
        candidates = set(data[i] ^ known_prefix[i] for i in range(len(known_prefix)))
    else:
        candidates = range(1, 256)

    for key in candidates:
        decoded = bytes(b ^ key for b in data)
        printable = sum(0x20 <= b < 0x7f or b in (9,10,13) for b in decoded)
        if printable / len(decoded) > 0.65:
            results.append((key, decoded))
    return results

with open("suspicious_blob.bin", "rb") as f:
    blob = f.read()

# If we suspect the blob starts with "http"
for key, decoded in xor_brute(blob, known_prefix=b'http'):
    print(f"Key 0x{key:02x}: {decoded[:120]}")

# Multi-byte XOR decode with known key (after xortool identifies it)
def xor_multi(data: bytes, key: bytes) -> bytes:
    return bytes(data[i] ^ key[i % len(key)] for i in range(len(data)))

decoded = xor_multi(blob, key=b'x3ax7fx21xb4')
print(decoded[:200])

# Volatility 2: scan for XOR-encoded PE headers in process memory
# malfind identifies memory regions with executable permissions + PE patterns
vol.py -f memdump.raw --profile=Win7SP1x64 malfind --dump-dir=./malfind_out/

# Run xortool on each dumped malfind region
for f in ./malfind_out/*.dmp; do
    xortool "$f" 2>/dev/null && echo "=== $f ==="
done

Analytical Methodology

Identify regions of high-entropy data in the memory dump. Use binwalk entropy analysis: binwalk -E memdump.raw. A spike in entropy at a specific offset indicates compressed, encrypted, or XOR-encoded data at that location.
For each high-entropy region, extract the data with dd or by offset-seeking in Python. Run ent on the extracted blob to quantify entropy precisely.
Run xortool on the extracted blob: xortool blob.bin. xortool will display key-length candidates ranked by IC score. Try the top 3 candidates and examine the decoded output for recognisable structure (PE header, HTTP strings, JSON, or human-readable text).
If xortool fails (blob is too short, or entropy is ambiguous), apply the known-plaintext approach. If you suspect the blob is a PE file, XOR bytes 0–1 of the blob with x4Dx5A (MZ header) to derive the first key byte candidates.
Apply Volatility malfind to identify process memory regions with execute permissions that contain potential XOR-encoded payloads. Dump flagged regions and apply XOR analysis to each.
Once the key is identified, decode the full blob and run file, strings, and binwalk on the output to classify the decoded content. Import decoded PE files into a disassembler (Ghidra, IDA) for further analysis.
Search the wider memory dump for other occurrences of the identified XOR key as a byte sequence. A key stored in memory (as a variable in the decoder routine) will appear near the encoded data.
Document: memory offset of the encoded blob, key value (hex), key derivation method (brute-force, known-plaintext, xortool), decoded content classification, and any C2 indicators or configuration data extracted.

Common Analytical Errors

Skipping entropy analysis and jumping to brute force: Entropy analysis first narrows the search to genuinely encoded regions. Running brute-force XOR against the whole dump wastes time and produces noisy output.
Assuming single-byte XOR: Many malware families use 2–8 byte keys. If single-byte brute force yields no printable results, use xortool to test multi-byte key lengths before concluding the data uses a different encoding.
Discarding XOR key 0x00: Key byte 0x00 produces no change, so the "decoded" output for key 0x00 equals the input — this is not a valid decode. Explicitly exclude key 0x00 from brute-force loops.
Not verifying decoded output structure: A brute-force decode that produces 70% printable characters is a candidate, not a confirmed decode. Always verify the decoded output has expected structure (file magic, valid JSON, recognisable strings) before treating it as the plaintext.
Missing rolling XOR and ADD encoding variants: Some malware uses rolling XOR (each key byte depends on the previous ciphertext byte) or XOR combined with ADD. If single-byte and multi-byte XOR fail, consider these variants.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0017	Knowledge of concepts and practices of processing digital forensic data	Understanding XOR encoding as a memory artifact pattern and applying entropy analysis to locate encoded regions
K0042	Knowledge of incident response and handling methodologies	Recovering adversary configuration data from XOR-encoded memory regions during malware analysis
K0187	Knowledge of file type abuse by adversaries for data exfiltration	Recognising XOR as the most common in-memory obfuscation technique used to hide C2 infrastructure and payloads
S0047	Skill in preserving evidence integrity according to standard operating procedures	Preserving raw encoded blobs and decoded outputs as separate hashed artifacts
T0049	Decrypt seized data using technical means	Applying brute-force and known-plaintext XOR decryption to recover adversary data from binary memory regions