SMTP Inbox OSINT: Mail Content Analysis, Sender Tracing and Inbox-Based Identity Discovery

web_auth_sessions Difficulté 1–5 30 min certifiable

Théorie

Why This Matters

Email header forensics is a foundational skill for threat intelligence analysts, incident responders, and fraud investigators. Law enforcement agencies use email header analysis to trace phishing campaigns back to originating infrastructure, correlating X-Originating-IP values with known malicious hosting providers and threat actor infrastructure. Corporate security teams use Received: chain analysis to identify when internal mail relays have been compromised and are relaying attacker-controlled email. Investigative journalists use SPF and DKIM record analysis to prove that emails claiming to originate from a specific organization were actually sent via a different infrastructure — critical evidence in disinformation investigations. Bounce message analysis has revealed internal hostname structures of major financial institutions in documented penetration testing engagements, with the hostnames appearing in DSN (Delivery Status Notification) messages returned by internal mail servers.

Core Concept

The Received: header chain is the core forensic artifact of email header analysis. Each mail server that handles an email prepends a new Received: header documenting the handoff: the receiving server's name and IP, the sending server's name and IP (as reported in the SMTP EHLO/HELO command), the protocol, and the timestamp. Reading the chain from bottom to top traces the message's path from origin to destination. The bottom-most Received: header closest to the From: address reflects the first server that accepted the message and is the most intelligence-rich — it reveals the true originating IP before any anonymizing relay.

X-Originating-IP is a non-standard header added by some Microsoft Exchange and Outlook installations that records the IP address of the mail client that submitted the message to the Exchange server. Before organizations deployed outbound email proxies consistently, this header leaked the sender's public IP address — including home IP addresses for executives working remotely. This header is still present in emails from organizations that have not explicitly stripped it.

Message-ID is a unique identifier in the format <unique-string@domain>. The domain component is set by the mail server that generates the ID and typically reflects the internal mail server hostname (e.g., <[email protected]>). When the visible From: address uses a public domain but the Message-ID domain is an internal hostname, this reveals internal infrastructure not otherwise publicly disclosed.

X-Mailer and User-Agent headers identify the mail client or application that generated the email (e.g., Microsoft Outlook 16.0.16827, Thunderbird 115.4.0, PHPMailer 6.6.5, Python smtplib). This fingerprints the sender's mail client, which has threat intelligence implications — a phishing email claiming to be from a corporate communications team but generated by PHPMailer is an indicator of a scripted campaign.

SPF record intelligence: An SPF record lists every IP range and hostname authorized to send mail for the domain. The ip4: and ip6: mechanisms directly disclose IP address ranges assigned to the organization. The include: mechanisms reveal third-party email providers. The exists: mechanism can reveal internal DNS zones. A fully enumerated SPF chain provides a near-complete map of an organization's outbound email infrastructure.

DKIM analysis: The DKIM selector field (in the DKIM-Signature: header) names the signing key record. Selectors often encode the mail provider (google for G Suite, mandrill for Mandrill/Mailchimp), the key rotation cycle, or the deployment year. Multiple DKIM signatures on a single email indicate it transited multiple signing systems — each signature's d= (domain) and s= (selector) fields map additional infrastructure components.

Bounce message forensics: When an email cannot be delivered, the receiving server generates a DSN (Delivery Status Notification) message — a "bounce" — returned to the sender. DSN messages contain the original email headers, the delivery failure reason, and often the internal hostname of the mail server that rejected the message. Sending test emails to non-existent addresses at a target domain (within authorized scope) and analyzing the bounce reveals internal mail relay hostnames not visible in external DNS.

MX record history on SecurityTrails or RiskIQ reveals changes in mail provider over time — organization migrations from self-hosted to Google Workspace or Microsoft 365, temporary mail server changes, and legacy infrastructure that may still be reachable.

Technical Deep-Dive

# 1. Extract and display full email headers (save email as .eml first)
cat suspicious_email.eml | grep -E "^(Received|X-Originating-IP|Message-ID|X-Mailer|DKIM-Signature|Authentication-Results|Return-Path|Reply-To):" | head -40

# 2. Parse Received chain (bottom-up for origin tracing)
# Typical chain from a phishing email:
# Received: from mail.victim.com by mx.recipient.com (top — last hop)
# Received: from mx-out.sendgrid.com by mail.victim.com (middle — ESP)
# Received: from [198.51.100.5] (helo=mail.attacker.biz) by mx-in.sendgrid.com (bottom — origin)
# => True origin IP: 198.51.100.5

# 3. X-Originating-IP leak detection
grep "X-Originating-IP" email.eml
# X-Originating-IP: 203.0.113.42
# Reveals: sender's public IP address (home/office)

# 4. Message-ID domain extraction for internal hostname discovery
grep "Message-ID" email.eml
# Message-ID: <[email protected]>
# Internal hostname: mail-relay-dc01.internal.company.com

# 5. SPF record full enumeration (recursive resolution)
dig TXT company.com | grep spf
# v=spf1 ip4:203.0.113.0/24 include:sendgrid.net include:_spf.google.com ~all

# Recursively resolve includes:
dig TXT sendgrid.net | grep spf
dig TXT _spf.google.com | grep spf
# Full IP map of authorized senders

# 6. DKIM signature analysis
grep "DKIM-Signature" email.eml
# DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
#   d=company.com; s=google; h=from:to:subject:...
# Selector: s=google => G Suite / Google Workspace confirmed
# Retrieve the DKIM public key:
dig TXT google._domainkey.company.com
# v=DKIM1; k=rsa; p=MIIBIjANBgkq... (2048-bit RSA public key)

# 7. Bounce message forensics (send to non-existent address)
# (Within authorized scope only)
swaks --to [email protected] --from [email protected] 
  --server mail.company.com
# Analyze DSN response:
# Diagnostic-Code: smtp; 550 5.1.1 The email account that you tried to reach
#   does not exist. mail-relay-internal-02.company.com
# => Internal hostname revealed: mail-relay-internal-02.company.com

# 8. MX record history lookup
# SecurityTrails CLI:
securitytrails domain company.com history/dns --type MX
# Past MX records: hosted on self-managed Postfix before migrating to Exchange Online

Intelligence Collection Methodology

Header collection: For emails of interest, retrieve the full raw headers. In Gmail: three-dot menu → "Show original". In Outlook: File → Properties → Internet headers. Save as .eml for command-line analysis. Use grep to extract the fields of interest.
Received: chain tracing: Read the chain from bottom to top. Identify the bottom-most Received: header for the originating IP and EHLO hostname. Map the relay path through any ESPs. Note timestamps to detect header manipulation (timestamps out of chronological order are a forgery indicator).
Originating IP intelligence: Extract the originating IP from the bottom Received: header or X-Originating-IP. Look up the IP in Shodan for hosted services, BGP.he.net for ASN and organization, and threat intelligence platforms (VirusTotal, AbuseIPDB) for known malicious associations.
Mail client fingerprinting: Extract X-Mailer, User-Agent, and MIME-Version headers. Identify the mail client, version, and whether the email was generated programmatically (PHPMailer, smtplib, swaks) versus by a desktop client. Programmatic generation is a phishing campaign indicator.
SPF full enumeration: Query SPF TXT records with dig and recursively resolve all include: mechanisms. Compile the complete authorized IP ranges and third-party providers into an infrastructure map.
DKIM selector analysis: Extract the s= (selector) and d= (domain) from DKIM-Signature headers. Query SELECTOR._domainkey.DOMAIN via dig to retrieve the public key. Note the key length and the selector name for provider fingerprinting.
Bounce message collection: Within authorized scope, send emails to likely non-existent addresses at the target domain using swaks. Analyze DSN bounce messages for internal relay hostnames, error message formats (revealing MTA software), and postmaster addresses.
MX history correlation: Query SecurityTrails or RiskIQ MX history for the target domain. Note past mail providers and the transition dates. Cross-reference with breach timelines — compromises during a mail provider migration period may have exploited transitional infrastructure.

Common Intelligence Collection Errors

Reading the Received: chain top-down: The topmost Received: header is added by the recipient's mail server and reveals nothing about the origin. Analysts who read the chain top-down misidentify the recipient's server as the sender's server and misdirect subsequent investigation.
Trusting the EHLO hostname in Received: headers: The EHLO hostname in Received: headers is self-reported by the connecting server. Attackers can set EHLO to any arbitrary value (e.g., mail.legitimate-company.com). Always verify the reported hostname against the connecting IP via reverse DNS lookup before treating it as reliable intelligence.
Missing SPF recursive resolution: An SPF record that includes include:sendgrid.net does not directly disclose IPs — the actual IP ranges are in SendGrid's SPF record, which itself may include sub-includes. Analysts who enumerate only the first level of SPF miss the majority of the authorized IP surface.
Ignoring DKIM signature domain mismatch: When the d= field in DKIM-Signature differs from the From: address domain, the email was signed by a different organization's key — which may indicate a legitimate third-party signing service or a spoofing attempt. Always compare d= with From: domain.
Treating bounce messages as exclusively negative: A 550 User does not exist bounce from a hardened mail server is still intelligence — the error format, the MTA software name, and any hostnames in the diagnostic message reveal internal infrastructure. Even a well-configured rejection discloses some data.
Not accounting for Received header stripping by mail gateways: Some organizations configure outbound gateways (Mimecast, Proofpoint) to strip internal Received: headers before delivery, replacing them with a single gateway-added header. Analysts who expect a full chain may miss this stripping and incorrectly conclude that the gateway is the origin server.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0058	Knowledge of network protocols	Analyzing SMTP protocol mechanics: Received: header construction, EHLO handshake, DSN generation, and DKIM/SPF DNS record structures
K0145	Knowledge of security assessment approaches	Applying systematic email header forensics methodology from origin IP tracing through SPF enumeration, DKIM analysis, and bounce forensics
K0272	Knowledge of network security architecture	Mapping mail infrastructure topology: MTA chains, ESP relay paths, internal relay hostnames, and gateway configurations from header analysis
K0427	Knowledge of encryption algorithms	Analyzing DKIM RSA signature key lengths, DMARC policy encryption requirements, and SMTPS/STARTTLS usage indicators in Received: headers
S0040	Skill in identifying and extracting data of interest	Extracting originating IPs, internal hostnames, mail client fingerprints, and infrastructure maps from email header artifacts
T0569	Apply and utilize authorized cyber capabilities to achieve objectives	Using dig, swaks, grep, and SecurityTrails to conduct email header intelligence collection and bounce forensics within authorized scope