Identifying Base-Encoded Chunk Queries in DNS Exfiltration Log Records

web_injection_logic Difficulté 1–5 30 min certifiable

Théorie

Why This Matters

Before mastering statistical analysis of large DNS datasets, an analyst must be able to recognise a single DNS exfiltration query and manually decode what it carries. Practical CTF challenges and real incident first-responder tasks both require this foundational skill: look at a short log extract, identify which queries are suspicious, extract the encoded payload from the subdomains, and recover the hidden data. This card builds that baseline competence.

Core Concept

In DNS exfiltration, an attacker encodes a secret (file content, credential, flag) as a series of short encoded strings, then queries them as subdomains of a domain they control. The authoritative nameserver for that domain receives each query, extracts the subdomain, and logs the data without the attacker ever opening a direct TCP connection to the victim network.

A minimal example using hex encoding:

Query 1: 48656c6c6f.evil.com   → decodes to "Hello"
Query 2: 576f726c64.evil.com   → decodes to "World"

Or base64:

Query 1: SGVsbG8=.evil.com
Query 2: V29ybGQ=.evil.com

At the beginner level, the challenge presents a small DNS log (10–30 queries) to a single suspicious domain. Your tasks: (1) identify which domain is the exfiltration target, (2) extract and order the subdomain payloads, (3) decode them to recover the flag.

Technical Deep-Dive

# Step 1: List all queries in a simple log (one FQDN per line)
cat dns.log | grep -v "^#" | awk '{print $NF}'

# Step 2: Filter queries to the suspected exfil domain
grep ".evil.com$" dns.log | awk '{print $NF}'

# Step 3: Strip the SLD suffix to isolate the subdomain payload
grep ".evil.com$" dns.log 
  | awk -F"." '{print $1}' 
  | sort   # or sort by sequence number embedded in prefix

# Step 4: Decode hex
echo "48656c6c6f576f726c64" | xxd -r -p

# Step 5: Decode base64 (handle URL-safe variant too)
echo "SGVsbG8=" | base64 -d
echo "SGVsbG8=" | tr '_-' '/+' | base64 -d   # URL-safe base64

# Step 6: If queries are numbered (e.g., 01-48656c6c6f.evil.com)
grep ".evil.com$" dns.log 
  | awk -F"." '{print $1}' 
  | sort 
  | sed 's/^[0-9]*-//' 
  | tr -d '
' 
  | xxd -r -p

# Python one-shot decode: extract, sort, concatenate, decode
import base64, re

log_lines = open("dns.log").readlines()
payloads = []
for line in log_lines:
    m = re.search(r'([a-zA-Z0-9+/=_-]+).evil.com', line)
    if m:
        payloads.append(m.group(1))

# Attempt hex decode
try:
    raw = bytes.fromhex('.'.join(payloads).replace('-','')).decode()
    print("HEX decoded:", raw)
except Exception:
    pass

# Attempt base64 decode
try:
    raw = base64.b64decode('.'.join(payloads) + '==').decode()
    print("B64 decoded:", raw)
except Exception:
    pass

Analytical Methodology

Open the DNS log. Identify all unique second-level domains queried. Most legitimate traffic resolves to well-known SLDs (google.com, microsoft.com, akamai.net). One unfamiliar SLD with many subdomains is your starting point.
Examine the subdomain labels of the suspect domain. Ask: do they look like human-readable words, or like encoded data (all hex characters, base64 alphabet, uniform length)?
Check for a sequence indicator embedded in the subdomain — often a numeric prefix like 01-, 02-, or a counter appended: data-1, data-2. Sort accordingly.
Concatenate the subdomain payloads in order. Try hex decode first (xxd -r -p). If the output is not printable, try base64. If still garbled, try base32 (base32 -d).
Inspect the decoded output for a flag string, recognisable file header, or plaintext content. Document the full decode chain.
Note the query timestamps and source IP. Even at beginner level, recording which internal host generated the exfiltration queries is part of a complete answer.

Common Analytical Errors

Sorting alphabetically instead of by sequence: Encoded chunks must be reassembled in the correct order. If subdomains include a sequence counter, always sort numerically, not lexicographically.
Missing the padding: base64 strings must have length divisible by 4. If your concatenated string is truncated, add = padding characters before decoding.
Confusing hex with base64: Hex uses only 0–9 and a–f; base64 uses A–Z, a–z, 0–9, +, /. Inspect the character set of the payload before choosing a decoder.
Forgetting URL-safe base64: Some tools emit URL-safe base64 (- and _ instead of + and /). Translate before decoding: tr '_-' '/+'.

NICE Framework Alignment

Code	Work Role Knowledge / Skill / Task	Relevance
K0046	Knowledge of intrusion detection methodologies	Recognising DNS query anomalies is a foundational detection skill
K0145	Knowledge of security event correlation tools	Correlating multiple DNS queries across a log to reconstruct a single exfiltration stream
K0187	Knowledge of file type abuse by adversaries	Exfiltrated data often includes files with stripped or obfuscated magic bytes
S0047	Skill in preserving evidence integrity	Log files must not be modified during extraction and decode operations
T0049	Decrypt seized data / analyze forensic artifacts	Decoding hex and base64 encoded payloads from DNS subdomain artifacts