Corporate OSINT Chain: WHOIS, Website and SMTP Enumeration for Targeted Intelligence Gathering

forensic_file_artifacts Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Email infrastructure is simultaneously a primary attack vector and a rich intelligence source. Phishing campaigns, BEC fraud, and credential harvesting operations all depend on accurate email infrastructure mapping: knowing which mail server software the target runs, what authentication mechanisms are enforced, and which specific accounts exist. Domain intelligence analysts and threat investigators routinely reconstruct email infrastructure from public records to assess whether a domain is a legitimate organisation or a freshly registered phishing site. The indicators that distinguish a legitimate corporate domain from a malicious impersonation — registrar age, nameserver consistency, MX record stability, SMTP banner — are all accessible through WHOIS and passive DNS queries before any email is sent. This card builds the skills to map any organisation's email infrastructure completely from public sources.

Core Concept

The chain proceeds from domain registration metadata to mail server identification to SMTP-level enumeration.

WHOIS analysis yields registrar identity, registration and expiry dates, nameservers, and registrant data. Legitimacy indicators include: registration age greater than two years (newly registered domains are a phishing signal), consistent nameserver operator (matching the stated organisation's hosting relationship), and expiry dates sufficiently far in the future (organisations that care about their domain renew years in advance). Nameserver analysis reveals the DNS hosting provider, which often indicates the broader infrastructure stack: AWS Route 53 implies AWS-hosted services; Cloudflare nameservers imply Cloudflare CDN and DDoS protection.

MX record discovery is the primary step for mail server identification. dig MX targetcorp.com returns the priority-ordered list of mail exchangers. MX hostname patterns reveal the email provider: mail.protection.outlook.com confirms Microsoft Exchange Online (Office 365); aspmx.l.google.com confirms Google Workspace; a company-specific hostname (e.g., mail.targetcorp.com) indicates self-hosted Exchange or another on-premise solution. SPF and DKIM records (TXT records) further confirm the authorised sending infrastructure and identify any third-party email service providers (Mailchimp, Sendgrid, HubSpot) that handle marketing traffic.

SMTP banner analysis connects to the mail server on port 25 or 587 and reads the server's greeting. The banner contains the software name and version (Postfix, Microsoft ESMTP, Exim) and sometimes the internal hostname. The EHLO response lists supported extensions — STARTTLS indicates encryption support; AUTH LOGIN PLAIN indicates authentication methods; the absence of STARTTLS on an internet-facing relay is a significant misconfiguration indicator.

Website-hosted email harvesting extracts addresses from contact pages, team pages, press releases, and embedded PDFs. exiftool extracts metadata (including Author field) from PDFs. Google dorks (site:targetcorp.com filetype:pdf) locate all indexed PDFs. strings and grep extract email patterns from downloaded documents.

Technical Deep-Dive

# Step 1: WHOIS domain analysis
whois targetcorp.com | grep -E "Registrar:|Creation Date:|Updated Date:|Expiry Date:|Name Server:"
# Assess: age, nameserver consistency, upcoming expiry (< 30 days = risk)

# Step 2: MX record discovery and provider identification
dig MX targetcorp.com +short
# => 10 targetcorp-com.mail.protection.outlook.com.  (Office 365)
# => 1 aspmx.l.google.com.                           (Google Workspace)
# => 10 mail.targetcorp.com.                         (self-hosted)

# SPF record — reveals all authorised sending infrastructure
dig TXT targetcorp.com +short | grep "v=spf1"
# => "v=spf1 include:spf.protection.outlook.com include:sendgrid.net ~all"
# => Confirmed: Office 365 + SendGrid for marketing email

# DKIM selector discovery (common selectors)
for sel in selector1 selector2 google k1 mail dkim default; do
  result=$(dig TXT "${sel}._domainkey.targetcorp.com" +short 2>/dev/null)
  [ -n "$result" ] && echo "DKIM selector '${sel}': ${result:0:80}..."
done

# Step 3: SMTP banner analysis
# Port 25 (SMTP relay — for inbound delivery)
(echo "EHLO probe.local"; sleep 2; echo "QUIT") | nc -w 5 mail.targetcorp.com 25
# Port 587 (submission — for authenticated client sending)
(echo "EHLO probe.local"; sleep 2; echo "QUIT") | nc -w 5 mail.targetcorp.com 587

# Step 4: Harvest email addresses from website
# Download all indexed PDFs and extract metadata + embedded addresses
wget -q -r -l 2 -A pdf "https://www.targetcorp.com" -P /tmp/target_pdfs/ 2>/dev/null
find /tmp/target_pdfs/ -name "*.pdf" -exec exiftool -Author -Creator -Producer {} ;
# Extract email addresses from PDF text:
find /tmp/target_pdfs/ -name "*.pdf" -exec pdftotext {} - ; 2>/dev/null 
  | grep -oE "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+.[a-zA-Z]{2,}" | sort -u

# Google dork for indexed contact information:
# site:targetcorp.com (contact OR "email us" OR "@targetcorp.com")
# site:targetcorp.com filetype:pdf

# Step 5: SMTP VRFY and RCPT TO enumeration
TARGET_SMTP="mail.targetcorp.com"
# VRFY attempt:
(echo "EHLO test"; sleep 1; echo "VRFY alice.smith"; sleep 1; echo "QUIT") 
  | nc -w 5 ${TARGET_SMTP} 25 | grep -E "^[0-9]{3}"
# RCPT TO with harvested addresses:
for email in $(cat harvested_emails.txt); do
  response=$(printf "EHLO x
MAIL FROM:<[email protected]>
RCPT TO:<%s>
QUIT
" 
    "$email" | nc -w 4 ${TARGET_SMTP} 25 2>/dev/null | grep "^250" | wc -l)
  [ "$response" -ge 2 ] && echo "CONFIRMED: $email"
  sleep 2
done

# Sample SMTP EHLO response (self-hosted Exchange):
# 220 mail.targetcorp.com Microsoft ESMTP MAIL Service ready
# 250-mail.targetcorp.com Hello [203.0.113.1]
# 250-SIZE 36700160
# 250-PIPELINING
# 250-DSN
# 250-ENHANCEDSTATUSCODES
# 250-STARTTLS
# 250-X-ANONYMOUSTLS
# 250-AUTH NTLM
# 250-X-EXPS GSSAPI NTLM
# 250 OK
# Intelligence: Exchange Server (NTLM auth), STARTTLS supported, no AUTH PLAIN

Intelligence Collection Methodology

Run whois on the target domain. Record registration date, expiry date, registrar, and nameservers. Calculate domain age. Flag domains younger than 18 months or expiring within 60 days as either new infrastructure or potential abandonment.
Query MX records with dig MX targetcorp.com +short. Identify the email provider from MX hostname patterns. Query SPF (dig TXT targetcorp.com +short | grep v=spf1) to enumerate all authorised sending IP ranges and third-party service providers.
Attempt DKIM selector discovery using the 10 most common selector names. Any recovered DKIM public key confirms the signing domain and key length — RSA-1024 DKIM keys are deprecated and may indicate neglected email security hygiene.
Connect to each discovered mail server on port 25 with netcat or telnet. Read the banner and issue EHLO your-domain.local. Document software, version, authentication mechanisms, and STARTTLS support.
Use Google dorks to discover all email addresses published on the target website: site:targetcorp.com "@targetcorp.com". Download all indexed PDFs and run exiftool to extract document metadata, particularly the Author and Creator fields.
Check the /contact, /about, /team, and /press pages manually for address disclosure. Also check HTML source for addresses embedded in mailto links or JavaScript.
Use theHarvester with -b google,bing,duckduckgo to automate web-based email harvesting. Merge results with manually discovered addresses and deduplicate.
Probe each confirmed mail server with VRFY for listed usernames. If VRFY is disabled, use RCPT TO for addresses in the harvested list. Rate-limit to one probe every 2–3 seconds.
Correlate confirmed email addresses with HaveIBeenPwned and note breach exposure. Combine with LinkedIn role data to prioritise high-value targets.

Common Intelligence Collection Errors

Ignoring SMTP banner version information: The 220 greeting and EHLO extension list frequently disclose the mail server software and version. An Exchange server on an unpatched version (visible in the banner's build number) may be vulnerable to documented CVEs. Never skip banner reading.
Treating Office 365 MX records as evidence of no internal infrastructure: Many organisations use Exchange Online Protection (EOP) as a front-end relay while running an on-premise Exchange server for mailbox hosting. The EOP MX record does not preclude a self-hosted endpoint reachable on the internal network — LDAP and OWA may still be accessible.
Not checking for catch-all SMTP configuration: Some mail servers accept RCPT TO for any address (*@targetcorp.com), returning 250 OK regardless of account existence. Before interpreting RCPT TO responses, send a probe to a clearly fictional address (e.g., [email protected]) and treat all responses as invalid if this probe also returns 250.
Missing email addresses in PDF metadata: Documents generated by Microsoft Office applications frequently embed the author's username or full name in metadata. Corporate template documents (annual reports, press releases) are particularly rich sources. Always run exiftool on every downloaded PDF before discarding it.
Overlooking DMARC record for insight into enforcement posture: dig TXT _dmarc.targetcorp.com reveals whether DMARC is in monitor mode (p=none), quarantine, or reject. A p=none policy means spoofed emails claiming to be from the domain will be delivered — a significant phishing risk indicator and intelligence signal.
Treating SPF ~all (softfail) as equivalent to -all (hardfail): SPF ~all instructs receivers to accept but mark mail from unauthorised sources; -all instructs rejection. Organisations using ~all are likely to accept spoofed email from non-listed senders, making them more susceptible to domain spoofing in phishing scenarios.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0058	Knowledge of network protocols	Deep analysis of SMTP (EHLO, VRFY, RCPT TO), DNS (MX, TXT/SPF/DKIM/DMARC), and their intelligence value in email infrastructure mapping
K0145	Knowledge of security assessment approaches	Applying a layered reconnaissance approach: passive WHOIS/DNS → banner analysis → active SMTP enumeration
K0272	Knowledge of network security architecture	Understanding the relationship between MX relays, SPF authorisation records, DKIM signing, and DMARC enforcement as components of email security architecture
K0427	Knowledge of encryption algorithms	Assessing STARTTLS deployment, DKIM key strength (RSA-1024 vs RSA-2048), and their implications for email confidentiality and authentication
S0040	Skill in identifying and extracting data of interest	Extracting email addresses from web pages, PDF metadata, and SMTP responses; correlating with breach and social intelligence
T0569	Apply and utilize authorized cyber capabilities to achieve objectives	Using dig, netcat, theHarvester, and exiftool within an authorised assessment to produce a complete email infrastructure intelligence picture