Browse CTFs New CTF Sign in

Detecting DNS Exfiltration Through Entropy-Based Subdomain Anomaly Analysis

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

DNS exfiltration was central to the data-theft stage of the 2016 DNSpionage campaign and has appeared in APT toolkits from Winnti to OilRig. Because DNS is almost universally permitted through perimeter firewalls and often excluded from DLP inspection, it is an attractive covert channel. Recognising the distinctive signatures of data-over-DNS requires familiarity with entropy mathematics, label length constraints, and the volume patterns produced by automated exfiltration tools.

Core Concept

DNS exfiltration encodes stolen data into the subdomain labels of DNS queries directed at an attacker-controlled authoritative nameserver. The attacker's nameserver receives the queries and reassembles the data from the encoded subdomains. No direct TCP connection to the attacker is required — only recursive DNS resolution, which traverses the firewall.

Common encoding schemes include base64, base32, and hex. A typical query looks like: 4b6f6e74656e74.evil-c2.com. The attacker controls the authoritative NS for evil-c2.com and logs every query.

Key indicators: high query volume to a single second-level domain (SLD), unusually long subdomain labels (RFC 1035 limits labels to 63 characters; legitimate traffic rarely exceeds 30), high Shannon entropy in subdomain labels (encoded data looks random, entropy > 3.5 bits/character), and use of rare query types such as TXT or NULL that carry larger payloads than A/AAAA records.

Shannon entropy for a string: H = -Σ p(c) × log₂(p(c)) for each unique character c. Random-looking encoded data approaches 4–5 bits/character; human-readable domain labels score 2–3.

Technical Deep-Dive

# Extract all DNS queries from a PCAP and compute subdomain label lengths
tshark -r capture.pcap -Y "dns.flags.response == 0" 
  -T fields -e frame.time -e dns.qry.name 
  | awk '{name=$2; n=split(name,a,"."); sub=a[1]; print length(sub), name}' 
  | sort -rn | head -30

# High-volume query count per SLD (last two labels)
tshark -r capture.pcap -Y "dns.flags.response == 0" 
  -T fields -e dns.qry.name 
  | awk -F. '{print $(NF-1)"."$NF}' 
  | sort | uniq -c | sort -rn | head -20
# Entropy calculator for DNS subdomain labels
import math, re
from collections import Counter

def entropy(s):
    if not s: return 0.0
    freq = Counter(s)
    total = len(s)
    return -sum((c/total)*math.log2(c/total) for c in freq.values())

def parse_subdomain(fqdn):
    parts = fqdn.rstrip('.').split('.')
    return '.'.join(parts[:-2]) if len(parts) > 2 else ''

with open("dns_queries.txt") as fh:
    for line in fh:
        qname = line.strip()
        sub = parse_subdomain(qname)
        if sub:
            h = entropy(sub)
            label_len = max(len(l) for l in sub.split('.'))
            if h > 3.5 or label_len > 45:
                print(f"ALERT  entropy={h:.2f}  maxlabel={label_len}  {qname}")
# Splunk: detect high-entropy DNS subdomains using eval + stats
index=dns sourcetype=dns_logs query_type=A OR query_type=TXT
| rex field=query "^(?P<subdomain>.+?).[^.]+.[^.]+$"
| eval sub_len = len(subdomain)
| where sub_len > 40
| stats count dc(query) AS unique_queries BY src_ip dest_domain
| where count > 50
| sort -count

Analytical Methodology

  1. Pull DNS query logs for the investigation window. Aggregate by second-level domain (SLD). Identify any SLD receiving more than 100 queries per hour from internal hosts — flag for further analysis.
  2. For flagged SLDs, extract all queried FQDNs. Measure label lengths. Any label exceeding 45 characters is strong evidence of encoded data, as legitimate labels are typically short and human-readable.
  3. Compute Shannon entropy for each subdomain portion. Scores above 3.5 indicate non-English-language encoded content. Combine with label length to prioritise.
  4. Check QTYPE distribution. Elevated TXT, NULL, or MX queries to a single domain with no corresponding mail infrastructure is anomalous.
  5. Reconstruct the exfiltrated data: sort queries by timestamp, strip the SLD suffix, concatenate subdomain values in order, then decode (base64 -d, xxd -r -p, or python base64.b32decode).
  6. Examine the reassembled bytes for file magic numbers (PDF: %PDF, ZIP: PKx03x04). Document the data type and estimated size in the incident report.
  7. Correlate the source IP against endpoint logs to identify the process generating the queries (Windows: Sysmon Event 22 DNS query; Linux: auditd or DNS resolver logs).
  8. Pivot to network: confirm there is no legitimate business use for the queried SLD. Verify domain registration date (new domains are high-risk).

Common Analytical Errors

  • Relying on single indicators: High volume alone is insufficient — CDN-heavy applications generate high DNS volume. Always combine volume, entropy, and label length before escalating.
  • Missing base32 encoding: base32 uses an alphabet of A–Z and 2–7. Entropy is lower than base64 (~3.2 vs 4.0 bits/char) and may fall below naive thresholds. Adjust thresholds and inspect visually.
  • Not accounting for DNSSEC: DNSSEC-signed zones contain long base32-encoded NSEC3 hashes in their labels. Exclude known DNSSEC infrastructure before flagging length anomalies.
  • Forgetting response data: The attacker may also send tasking back via DNS TXT responses. Capture and analyse DNS response records, not just queries.

NICE Framework Alignment

Code Work Role Knowledge / Skill / Task Relevance
K0046 Knowledge of intrusion detection methodologies DNS exfiltration detection requires signature and anomaly-based detection in parallel
K0145 Knowledge of security event correlation tools SIEM aggregation and entropy calculation applied across millions of DNS log records
K0187 Knowledge of file type abuse by adversaries Recovered exfiltrated files may be renamed or fragmented to avoid DLP detection
S0047 Skill in preserving evidence integrity Raw PCAP and DNS log preservation with verified checksums before any reassembly
T0049 Decrypt seized data / analyze forensic artifacts Decoding base64/hex-encoded subdomain fragments to reconstruct exfiltrated files

Further Reading

  • SANS ISC: "DNS as a Data Exfiltration Channel" handler diary series
  • iodine and dnscat2 tool documentation — understand attacker tooling to recognise its signatures
  • Palo Alto Unit 42: "DNS Tunneling in the Wild" (threat research report)
  • RFC 1035 §2.3.4 — DNS label size constraints (authoritative reference for length limits)
  • Elastic EQL: using sequence to correlate high-volume DNS queries with subsequent outbound connections

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.