Reconstructing FTP Data Exfiltration via Passive Mode Analysis and TCP Stream Extraction

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

FTP remains active in industrial and legacy enterprise environments where file transfer automation was built decades ago and never modernised. During the 2014 Sony Pictures breach, attackers exfiltrated terabytes of data using FTP sessions captured in post-incident PCAP from network taps. Investigators reconstructed the exact files transferred — including executive salary spreadsheets and unreleased films — directly from the packet capture using Wireshark's follow-stream capability. FTP carries credentials and file contents in cleartext, making PCAP-based reconstruction one of the most straightforward forensic tasks — if the analyst understands the control/data channel architecture.

Core Concept

FTP uses a two-channel architecture. The control channel (TCP port 21) carries commands and responses in cleartext ASCII. The data channel carries raw file contents and is established separately for each transfer, using one of two modes:

In active mode, the client sends a PORT command specifying the IP and port it is listening on. The server then initiates a TCP connection from port 20 to the client's specified address. Firewalls and NAT commonly break active mode.

In passive mode, the client sends a PASV command; the server responds with an IP and port for the client to connect to. The client initiates the data connection. Passive mode is standard in modern FTP clients.

Both modes produce an FTP control stream and a separate data TCP stream. In a PCAP, the analyst must follow both streams and correlate them by the IP:port pair negotiated in PORT/PASV. The data channel stream contains the raw file bytes — no FTP framing, no headers — making extraction straightforward.

Technical Deep-Dive

# Display all FTP control and data traffic
tshark -r capture.pcap -Y "ftp || ftp-data" 
  -T fields -e frame.number -e ip.src -e ip.dst 
  -e tcp.srcport -e tcp.dstport -e ftp.request.command 
  -e ftp.response.code -e ftp.response.arg 
  -E header=y -E separator=","

# Extract FTP credentials from control channel
tshark -r capture.pcap -Y "ftp.request.command == "USER" || ftp.request.command == "PASS"" 
  -T fields -e ip.src -e ftp.request.command -e ftp.request.arg

# Identify PASV responses to find data channel IP:port pairs
tshark -r capture.pcap -Y "ftp.response.code == 227" 
  -T fields -e frame.number -e ip.src -e ftp.response.arg

# Extract all FTP-DATA payloads to stdout (raw bytes)
tshark -r capture.pcap -Y "ftp-data" 
  -T fields -e data.data | xxd -r -p > recovered_file.bin

# Alternatively: use tshark to follow a specific TCP stream by index
# First find the stream index for the data connection:
tshark -r capture.pcap -Y "ftp-data" -T fields -e tcp.stream | sort -u

# Then extract that stream's raw payload:
tshark -r capture.pcap -q 
  -z "follow,tcp,raw,<stream_index>" 
  | grep -v "^===" | xxd -r -p > recovered_file.bin

from scapy.all import rdpcap, TCP, Raw, IP
from collections import defaultdict

packets = rdpcap("capture.pcap")

# Identify data streams from PASV responses parsed in control stream
# Here we reconstruct streams by grouping TCP payloads by (src,sport,dst,dport) 4-tuple
streams: dict = defaultdict(bytes)
for pkt in packets:
    if pkt.haslayer(TCP) and pkt.haslayer(Raw):
        key = (pkt[IP].src, pkt[TCP].sport, pkt[IP].dst, pkt[TCP].dport)
        streams[key] += bytes(pkt[Raw])

# Dump streams larger than 1 KB as candidate transferred files
for key, data in streams.items():
    if len(data) > 1024:
        fname = f"stream_{key[0]}_{key[1]}_{key[2]}_{key[3]}.bin"
        with open(fname, "wb") as fh:
            fh.write(data)
        print(f"Wrote {len(data):,d} bytes -> {fname}")

Analytical Methodology

Apply display filter ftp || ftp-data in Wireshark. Review the control channel first: identify the USER, PASS, TYPE, PASV or PORT, RETR (download) and STOR (upload) command sequence. This establishes the timeline and file names transferred.
For each PASV response (code 227), note the IP and port encoded in the response argument (format: h1,h2,h3,h4,p1,p2 where port = p1×256+p2). For each PORT command, note the client-specified IP:port.
Locate the corresponding data channel TCP stream: filter on the negotiated IP:port pair. Right-click any packet in the data stream → Follow → TCP Stream → Show data as Raw → Save As to extract the file bytes.
Identify the transferred file type using file recovered_file.bin or by inspecting the magic bytes (first 4–8 bytes). Common archives: PK (ZIP), 1F 8B (gzip), 25 50 44 46 (PDF).
For multi-file transfers, repeat steps 2–4 for each RETR/STOR command. The file name is given in the RETR/STOR argument in the control channel.
Hash all recovered files with SHA-256 and document: file name from FTP argument, file size in bytes, SHA-256 hash, timestamp of first data packet, and source/destination IP:port.
If the data connection is fragmented across many TCP segments, verify reassembly: the recovered file size should match the byte count in the FTP SIZE response or the 226 Transfer complete indication.

Common Analytical Errors

Confusing control and data channel streams: The file bytes are only in the data channel stream. The control channel contains ASCII commands and responses. Extracting the control stream yields no file content.
Incorrectly computing the PASV port: The port is encoded as two decimal bytes (p1, p2). Port = p1 × 256 + p2. For example, (195,149) = 49,941. Misreading this is a common mistake that causes the analyst to follow the wrong stream.
Overlooking active mode in legacy captures: Modern FTP clients use PASV, but legacy industrial and embedded clients often use PORT (active mode). Check for PORT commands — the data connection will come from port 20 on the server, not from an ephemeral server port.
Missing partial transfers: A transfer interrupted by connection reset may produce an incomplete file. The size of the extracted data will differ from the SIZE response. Flag truncated files as partial evidence rather than discarding them.
Ignoring the TYPE command: FTP supports ASCII (TYPE A) and binary (TYPE I) modes. ASCII mode translates newline characters during transfer, corrupting binary files. If the extracted file appears corrupt, check whether TYPE A was active during the transfer.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0046	Knowledge of intrusion detection systems and methodologies	Recognising FTP exfiltration patterns that network IDS signatures detect: cleartext credentials, large data transfers, unusual PORT/PASV sequences
K0093	Knowledge of network protocols	Understanding FTP active/passive mode mechanics, control/data channel separation, and command/response code semantics
K0221	Knowledge of OSI model and network layers	FTP spans layers 4 and 7; understanding how TCP stream reassembly enables file recovery from application-layer protocol
S0046	Skill in performing packet-level analysis	Following TCP streams, extracting raw data channel payloads, and correlating control/data channel packets using Wireshark and tshark
T0023	Collect intrusion artifacts for use in forensic analysis	Recovering transferred files from PCAP with full chain of custody: file name, size, hash, timestamp, and session context