Reconstructing SMTP Email Sessions and Extracting Attachments from Network Traffic Captures

osint_collection Difficulté 1–5 30 min certifiable

Théorie

Why This Matters

The 2016 DNC breach — attributed to APT28 — used spearphishing emails delivered via SMTP as the initial access vector, but the lateral-movement phase also involved exfiltrating harvested credentials and internal documents through the victim organisation's own SMTP relay. Post-incident forensic analysts recovered the full email content, attachment data, and destination addresses by replaying captured SMTP traffic. Corporate email accounts are among the most valuable exfiltration channels available to attackers: they are permitted outbound through almost every firewall, they blend with legitimate high-volume traffic, and they support large attachments that can carry compressed archives. Recognising and dissecting SMTP traffic in a PCAP — including extracting BASE64-encoded credentials and reconstructing MIME attachments — is a required skill for network forensic analysts.

Core Concept

SMTP (Simple Mail Transfer Protocol) operates by default on TCP port 25 (server-to-server relay), port 587 (mail submission from authenticated clients — MSA), and port 465 (SMTPS — SMTP wrapped in TLS). The protocol is command/response: the client issues a command, the server responds with a three-digit numeric code.

The standard session sequence: EHLO/HELO (identify client), optionally STARTTLS (upgrade to TLS), AUTH (authenticate), MAIL FROM (envelope sender), RCPT TO (one or more recipients), DATA (begin message body, terminated by .), QUIT.

AUTH LOGIN sends the username and password as separate base64-encoded values in response to server challenges. AUTH PLAIN sends a single base64-encoded string containing usernamepassword. Both are trivially decoded.

The DATA section carries the full RFC 5322 message: headers (From:, To:, Subject:, Date:, Content-Type:, Content-Transfer-Encoding:) followed by the body. MIME (Multipurpose Internet Mail Extensions) encodes attachments and HTML bodies as base64 or quoted-printable within the DATA section. Each MIME part is delimited by a boundary string specified in the Content-Type: multipart/... header.

STARTTLS (on port 587 or 25) upgrades the connection to TLS mid-session. Commands and data before the 220 STARTTLS acknowledgment are plaintext; everything after the TLS handshake is encrypted.

Technical Deep-Dive

# View all SMTP commands and responses
tshark -r capture.pcap -Y "smtp" -T fields 
  -e frame.number -e frame.time_relative -e ip.src -e ip.dst 
  -e smtp.req.command -e smtp.req.parameter 
  -e smtp.rsp.code -e smtp.rsp.parameter 
  -E header=y -E separator="|"

# Extract AUTH LOGIN credentials (base64 encoded)
tshark -r capture.pcap -Y "smtp" -T fields 
  -e frame.time_relative -e smtp.req.command -e smtp.req.parameter 
  | grep -E "AUTH|^[[:space:]]" | head -20
# Decode base64: echo "dXNlcm5hbWU=" | base64 -d

# List all RCPT TO addresses in the capture
tshark -r capture.pcap 
  -Y "smtp.req.command == "RCPT"" 
  -T fields -e frame.time_relative 
  -e ip.src -e smtp.req.parameter

# Follow an SMTP session to read full email including headers and body
# Identify stream number with: tshark -r capture.pcap -Y smtp -T fields -e tcp.stream
# Then follow:
tshark -r capture.pcap -z "follow,tcp,ascii,0" 2>/dev/null | head -100

# Detect STARTTLS negotiation
tshark -r capture.pcap 
  -Y "smtp.req.command == "STARTTLS" or smtp.rsp.code == 220" 
  -T fields -e frame.number -e frame.time_relative 
  -e ip.src -e smtp.req.command -e smtp.rsp.code

# Python: decode AUTH LOGIN credentials from captured SMTP stream
import base64, re

# Paste the raw ASCII from Follow TCP Stream here
smtp_transcript = """
250-AUTH LOGIN PLAIN
AUTH LOGIN
334 VXNlcm5hbWU6
YWxpY2VAY29ycC5jb20=
334 UGFzc3dvcmQ6
U3VwZXJTZWNyZXQxMjM=
235 2.7.0 Authentication successful
"""

for line in smtp_transcript.strip().splitlines():
    line = line.strip()
    if not line or line.startswith(("2", "3", "4", "5", "AUTH", "EHLO")):
        continue
    try:
        decoded = base64.b64decode(line).decode("utf-8", errors="replace")
        print(f"BASE64 {line!r} => {decoded!r}")
    except Exception:
        pass

# Python: extract MIME attachment from DATA section
import email
from email import policy

raw_email = open("email_data.eml", "rb").read()
msg = email.message_from_bytes(raw_email, policy=policy.default)
print(f"From: {msg['from']}")
print(f"Subject: {msg['subject']}")
for part in msg.walk():
    if part.get_content_disposition() == "attachment":
        fname = part.get_filename()
        data = part.get_payload(decode=True)
        open(fname, "wb").write(data)
        print(f"Saved attachment: {fname} ({len(data)} bytes)")

Analytical Methodology

Open the PCAP in Wireshark and apply display filter smtp. Note the source IPs — legitimate mail relays typically have reverse DNS entries; an internal host sending SMTP directly is unusual and warrants investigation.
Apply filter smtp.req.command == "AUTH" to locate authentication frames. Right-click → Follow → TCP Stream to read the full AUTH exchange. Copy the base64-encoded strings and decode them with base64 -d or Python.
Apply filter smtp.req.command == "RCPT" to list all envelope recipients. A single session with many RCPT TO: lines (mass recipient enumeration) is a strong exfiltration indicator.
Apply filter smtp.req.command == "DATA" to locate message body transmission. Right-click → Follow → TCP Stream to read the full message including headers and body. Look for Content-Transfer-Encoding: base64 in MIME headers — the body below is base64-encoded.
For each email with attachments: locate the MIME boundary string in the Content-Type: multipart/mixed; boundary="XXX" header. In the TCP stream view, identify each MIME part between boundary markers. Base64-decode each attachment payload and save with its filename.
Export reassembled emails using NetworkMiner: the Files tab extracts MIME attachments automatically; the Messages tab presents full email content. Both provide MD5 hashes suitable for forensic reporting.
Use tshark -z "follow,tcp,ascii,N" for programmatic transcript extraction. Pipe output to the Python MIME parser for automated attachment recovery across many sessions.
If STARTTLS is present, identify the exact frame where TLS handshake begins. All commands and data before that frame are plaintext and available for analysis; commands after are encrypted.
Correlate SMTP source IP, AUTH username, sender address, recipient addresses, and attachment filenames with other PCAP events and endpoint logs to build a complete exfiltration narrative.

Common Analytical Errors

Missing the base64 decode step for AUTH: Analysts sometimes note "AUTH LOGIN" frames without decoding the challenge-response exchange. The credentials are always base64-encoded, never plaintext; they require explicit decoding to be useful.
Treating STARTTLS presence as proof of encryption: STARTTLS is opportunistic — it only encrypts if both parties agree. A capture showing EHLO and AUTH commands after a STARTTLS attempt that received 454 TLS not available means the session fell back to plaintext. Verify whether the TLS handshake actually completed.
Ignoring port 587 and 465: Port 25 captures relay traffic; ports 587 and 465 capture authenticated client submissions where credential harvesting is most valuable. A capture filtered only on port 25 misses all client-submitted exfiltration via mail clients.
Not correlating envelope addresses with message headers: The SMTP envelope MAIL FROM / RCPT TO addresses can differ from the From: and To: headers inside the message. Attackers frequently use spoofed header addresses while the envelope reveals the true routing. Always record both.
Overlooking attachment MIME type vs filename extension: An attachment declared as Content-Type: image/jpeg with filename photo.jpg that decodes to a ZIP archive or PE executable is a strong indicator of deliberate obfuscation. Always decode and identify by magic bytes, not by declared MIME type.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0046	Knowledge of intrusion detection systems and methodologies	Recognising SMTP exfiltration patterns flagged by DLP and IDS: many recipients, large DATA payloads, AUTH from unusual source IPs
K0093	Knowledge of network protocols	Understanding the full SMTP command sequence, AUTH LOGIN/PLAIN base64 encoding, MIME structure, and STARTTLS negotiation
K0221	Knowledge of OSI model and network layers	Situating SMTP at the application layer (7), identifying its TCP transport connections at layer 4, and understanding TLS encapsulation
S0046	Skill in performing packet-level analysis	Using Wireshark smtp filters, Follow TCP Stream, tshark field extraction, and Python email library to extract credentials and reconstruct attachments
T0023	Collect intrusion artifacts for use in forensic analysis	Recovering email bodies, decoded credentials, and MIME attachment files from SMTP traffic as forensic exhibits with preserved hashes