Recovering Corrupted PCAP Files via Magic Byte Forensics and Partial Capture Reconstruction

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

During the 2016 Bangladesh Bank SWIFT heist investigation, forensic teams discovered that log files and PCAP captures from the compromised network had been deliberately corrupted by the attackers to destroy evidence. Partial recovery from the damaged captures — using pcapfix and manual hex repair — recovered enough packet data to reconstruct the timing of fraudulent SWIFT messages and identify the C2 infrastructure used. Corrupted PCAP files also arise legitimately from disk failures, capture buffer overflows, and improper shutdown of packet capture processes. An analyst who cannot repair or partially recover a damaged capture loses potentially irreplaceable evidence.

Core Concept

A libpcap PCAP file (the standard format for tools like tcpdump and Wireshark) has a fixed structure. A Global Header (24 bytes) is followed by zero or more Packet Records, each with a 16-byte Packet Header followed by the packet data bytes.

Global Header fields: - Bytes 0–3: Magic Number — A1 B2 C3 D4 (big-endian timestamps) or D4 C3 B2 A1 (little-endian / byte-swapped). The swapped magic indicates the capture was created on a system with opposite byte order from the analyst's machine. - Bytes 4–5: Major version (typically 0x0002) - Bytes 6–7: Minor version (typically 0x0004) - Bytes 8–11: GMT offset (almost always 0) - Bytes 12–15: Timestamp accuracy (almost always 0) - Bytes 16–19: Snaplen — maximum bytes captured per packet (commonly 65535) - Bytes 20–23: Link type — data link layer type (1 = Ethernet, 113 = Linux cooked capture)

Packet Header fields (per packet, 16 bytes): - Bytes 0–3: Timestamp seconds (UNIX epoch) - Bytes 4–7: Timestamp microseconds or nanoseconds - Bytes 8–11: Captured length (incl_len) — bytes stored in file - Bytes 12–15: Original length (orig_len) — bytes on wire

Common corruption types: wrong magic number (file identified as wrong type), snaplen set to 0 (causes tshark to refuse to open), incl_len larger than orig_len (invalid), truncated packet record (file ended mid-packet), wrong link type.

Technical Deep-Dive

# Step 1: Inspect the PCAP global header with xxd
xxd -l 24 corrupted.pcap
# Expected output for a standard little-endian pcap:
# 00000000: d4c3 b2a1 0200 0400 0000 0000 0000 0000  ................
# 00000010: ffff 0000 0100 0000                      ........
# Magic: d4c3b2a1, Version: 2.4, Snaplen: 65535, Link: Ethernet(1)

# Step 2: Check file validity with capinfos
capinfos corrupted.pcap 2>&1

# Step 3: Attempt repair with pcapfix
pcapfix -d corrupted.pcap -o repaired.pcap
# -d enables deep inspection mode (slower but recovers more packets)

# Step 4: Validate the repaired file
tshark -r repaired.pcap -q 2>&1 | head -20
capinfos repaired.pcap

# Step 5: Count successfully recovered packets
tshark -r repaired.pcap -q -z io,stat,0 2>&1

# Step 6: Use editcap to split a large corrupted capture into segments
# (sometimes a partial repair works on sub-segments when the full file fails)
editcap -c 1000 corrupted.pcap segment.pcap
# Produces segment_00001_*.pcap, segment_00002_*.pcap, etc.
# Test each segment with tshark separately

# Step 7: For pcapng format corruption, use the pcapng-specific pcapfix mode
pcapfix --pcapng corrupted.pcapng -o repaired.pcapng

import struct

PCAP_MAGIC_LE   = b"xd4xc3xb2xa1"   # little-endian (most common)
PCAP_MAGIC_BE   = b"xa1xb2xc3xd4"   # big-endian
PCAP_MAGIC_NS_LE = b"x4dx3cxb2xa1"  # nanosecond timestamps, LE

def inspect_pcap_header(path: str) -> dict:
    """Read and validate a PCAP global header."""
    with open(path, "rb") as fh:
        raw = fh.read(24)

    if len(raw) < 24:
        return {"error": f"File too short for global header ({len(raw)} bytes)"}

    magic = raw[:4]
    if magic == PCAP_MAGIC_LE:
        endian, ts = "<", "microseconds"
    elif magic == PCAP_MAGIC_BE:
        endian, ts = ">", "microseconds"
    elif magic == PCAP_MAGIC_NS_LE:
        endian, ts = "<", "nanoseconds"
    else:
        return {"error": f"Unknown magic: {magic.hex()} — not a valid PCAP file"}

    major, minor, gmt_off, ts_acc, snaplen, link_type = struct.unpack_from(
        f"{endian}HHIIII", raw, 4
    )
    return {
        "magic":      magic.hex(),
        "endian":     "little" if endian == "<" else "big",
        "version":    f"{major}.{minor}",
        "timestamps": ts,
        "snaplen":    snaplen,
        "link_type":  link_type,
        "valid":      snaplen > 0 and link_type in (1, 113, 228, 127),
    }

def patch_snaplen(path: str, output: str, new_snaplen: int = 65535) -> None:
    """Fix a PCAP with snaplen=0 by writing a corrected global header."""
    with open(path, "rb") as fh:
        data = bytearray(fh.read())

    magic = bytes(data[:4])
    endian = "<" if magic == PCAP_MAGIC_LE else ">"
    struct.pack_into(f"{endian}I", data, 16, new_snaplen)

    with open(output, "wb") as fh:
        fh.write(data)
    print(f"Patched snaplen to {new_snaplen} -> {output}")

info = inspect_pcap_header("corrupted.pcap")
print(info)
if not info.get("valid") and info.get("snaplen") == 0:
    patch_snaplen("corrupted.pcap", "patched.pcap")

Analytical Methodology

Before opening the file in Wireshark, inspect the first 24 bytes with xxd -l 24 corrupted.pcap. Verify the magic number matches one of the four known libpcap magic values. An incorrect magic number means the file is not being read as the correct format — it may be a pcapng file, a different capture format, or genuine corruption.
Run capinfos corrupted.pcap to get a summary of file statistics. If capinfos reports errors, it will indicate the type of corruption: truncated global header, invalid packet header, snaplen of 0, or unknown link type.
Run pcapfix -d corrupted.pcap -o repaired.pcap. pcapfix attempts to locate valid packet records within the corrupt file by scanning for packet header signatures. The -d (deep) mode is slower but recovers more packets from heavily corrupted files.
Validate the repaired file with tshark -r repaired.pcap -q -z io,stat,0. This reads the entire file and reports the packet count and time range. Compare against expected values if known.
If pcapfix fails on the full file, use editcap -c 1000 to split the original into 1000-packet segments. Test each segment independently — corruption often affects a specific region of the file, leaving other regions intact and recoverable.
For snaplen = 0 corruption (a common result of improper capture shutdown), use the Python patch_snaplen() function above to write a corrected snaplen value directly to bytes 16–19 of the global header. This requires knowing the capture's original snaplen (65535 if unknown).
After recovery, run tshark -r repaired.pcap -q -z io,phs to get a protocol hierarchy summary. This confirms which protocol layers are intact and gives an early overview of recovered traffic content.

Common Analytical Errors

Opening the file in Wireshark before hex inspection: Wireshark's error messages for corrupted files are generic. xxd on the header immediately reveals the specific byte-level corruption type, enabling targeted repair rather than trial-and-error.
Confusing pcap and pcapng formats: pcapng files have a different block structure and different magic number (0x0a0d0d0a). Attempting to repair a pcapng with pcapfix in default mode will fail. Use --pcapng flag explicitly for pcapng files.
Discarding partially recovered files: A repaired PCAP missing 20% of packets is still forensically valuable. Never discard a partial recovery — even 100 packets from a 10,000-packet capture may contain the specific session or payload that answers the investigation's key question.
Assuming byte-swapped files are corrupted: A PCAP with magic d4c3b2a1 (little-endian) opened on a big-endian system will show swapped values in capinfos. Wireshark and tshark handle byte-order automatically; this is not corruption.
Skipping the editcap segmentation approach: Large corrupted PCAPs often have corruption concentrated in one region. Analysts who only attempt whole-file repair miss the opportunity to recover clean segments before and after the corruption boundary using editcap.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0046	Knowledge of intrusion detection systems and methodologies	Understanding how PCAP capture integrity supports forensic validation of IDS alerts and traffic anomaly investigations
K0093	Knowledge of network protocols	Understanding libpcap PCAP file format structure: global header, packet record headers, timestamp encoding, link type values
K0221	Knowledge of OSI model and network layers	PCAP captures packets at any OSI layer depending on link type; understanding how snaplen and link type determine which layer-2/3/4 data is preserved in the capture
S0046	Skill in performing packet-level analysis	Using pcapfix, editcap, capinfos, xxd, and Python struct parsing to diagnose and repair corrupted PCAP files and validate recovered packet data
T0023	Collect intrusion artifacts for use in forensic analysis	Recovering and validating network traffic evidence from corrupted capture files, documenting recovery methodology and recovered packet counts for chain-of-custody purposes