Browse CTFs New CTF Sign in

Email-to-Pastebin OSINT Pivot: Address-Based Identity Tracing to Exposed Secret Discovery

forensic_file_artifacts Difficulty 1–5 30 min certifiable

Theory

Why This Matters

An email address is the most durable digital identifier a person possesses — more stable than a username, more linkable than a phone number, and required for registration on virtually every internet service. In 2019, the Collection #1 dataset — a compilation of 773 million unique email addresses and 21 million unique passwords aggregated from thousands of prior breaches — was posted to a hacking forum and subsequently widely distributed. Security teams and threat intelligence analysts used HaveIBeenPwned to understand which of their users' credentials were in scope, enabling targeted password reset campaigns. The same email-to-Pastebin-to-breach workflow used in that incident response is used daily by penetration testers assessing the credential attack surface of an organisation, by fraud investigators tracing phishing campaign infrastructure, and by OSINT analysts building a subject dossier. Understanding this pivot chain — from a known or discovered email address to credential exposure to platform account enumeration — is a core competency for any intelligence collection role.

Core Concept

Email address discovery begins before the pivot: hunter.io provides the email format pattern for a target organisation and validates individual addresses via SMTP probing. Email permutation generates candidate addresses from a known name and domain: first.last@, flast@, f.last@, firstname@, first_last@. Tools such as EmailHippo and NeverBounce perform bulk validation without sending messages, using SMTP RCPT TO probing and catch-all detection.

A catch-all domain is a mail server configured to accept email for any address at the domain regardless of whether the mailbox exists. SMTP validation tools detect catch-all servers and flag all addresses at that domain as unverifiable — an important caveat when constructing a confirmed address list from permutations.

Pastebin credential discovery from a known email follows the same dork pattern as username-based searches: site:pastebin.com "[email protected]" and a psbdmp.ws API query. Credential dumps typically contain the email in one of three formats: email:plaintext_password, email:hash, or simply the email as part of a comma-separated data leak record. The surrounding context — other fields in the dump, the paste title, and the paste date — informs which breach this record originated from.

HaveIBeenPwned (HIBP) provides the authoritative public breach enumeration service. The API returns the breach name, date, and data classes for each confirmed breach a given email appears in. dehashed.com provides a paid service with more comprehensive coverage, including partial password hashes and plaintext passwords from smaller breaches not covered by HIBP. Together, these services answer: was this address compromised, and if so, what credential material is in circulation?

Holehe takes a different approach: rather than checking breach databases, it tests whether an email address is registered on each of 120+ platforms by triggering the account recovery flow and observing the response. A platform that returns "we'll send a reset link to that address" confirms registration. This reveals the subject's digital footprint across services they have registered with the target email.

smtp-user-enum directly probes a mail server to confirm whether a given email address has a valid mailbox, using SMTP VRFY, EXPN, or RCPT TO commands. This is active reconnaissance and should only be performed within an authorised scope.

Technical Deep-Dive

# Phase 1: Email permutation generation
# Known name: John Smith, domain: targetcorp.com
python3 -c "
name_first = 'john'
name_last = 'smith'
domain = 'targetcorp.com'
patterns = [
    f'{name_first}.{name_last}@{domain}',
    f'{name_first[0]}{name_last}@{domain}',
    f'{name_first[0]}.{name_last}@{domain}',
    f'{name_first}@{domain}',
    f'{name_first}_{name_last}@{domain}',
    f'{name_last}.{name_first}@{domain}',
]
for p in patterns:
    print(p)
"

# Phase 2: hunter.io — discover email format for domain
curl -s "https://api.hunter.io/v2/domain-search?domain=targetcorp.com&api_key=YOUR_KEY" 
  | python3 -c "import sys,json; d=json.load(sys.stdin); print(d['data']['pattern'])"
# Output: {first}.{last}  =>  [email protected] confirmed format

# Phase 3: HaveIBeenPwned API — breach enumeration for discovered email
EMAIL="[email protected]"
curl -s -H "hibp-api-key: YOUR_API_KEY" -H "User-Agent: OSINT/1.0" 
  "https://haveibeenpwned.com/api/v3/breachedaccount/${EMAIL}?truncateResponse=false" 
  | python3 -m json.tool
# Returns: [{Name: "Adobe", BreachDate: "2013-10-04", DataClasses: ["Passwords", "Emails"]}, ...]

# Phase 4: Pastebin search for the email
googler --count 20 "site:pastebin.com "${EMAIL}""
# Also query psbdmp.ws:
curl -s "https://psbdmp.ws/api/search/${EMAIL}" | python3 -m json.tool

# Phase 5: Holehe — platform registration enumeration from email
holehe ${EMAIL} --only-used --no-color
# [+] twitter.com   (reset link sent to email)
# [+] github.com    (reset link sent to email)
# [-] instagram.com (email not registered)

# Phase 6: smtp-user-enum (active recon — authorised scope only)
smtp-user-enum -M RCPT -u ${EMAIL} -t mail.targetcorp.com
# RCPT TO response: 250 OK => mailbox exists
# RCPT TO response: 550 => mailbox does not exist
# Batch email breach summary: given a list of emails, summarise breach exposure
import requests, time, collections

HIBP_KEY = "YOUR_HIBP_API_KEY"
HEADERS  = {"hibp-api-key": HIBP_KEY, "User-Agent": "OSINT/1.0"}

emails = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
]

breach_summary = collections.defaultdict(list)

for email in emails:
    r = requests.get(
        f"https://haveibeenpwned.com/api/v3/breachedaccount/{email}",
        headers=HEADERS, params={"truncateResponse": "false"}
    )
    if r.status_code == 200:
        for breach in r.json():
            name = breach["Name"]
            classes = breach["DataClasses"]
            breach_summary[email].append({"breach": name, "data_classes": classes})
            if "Passwords" in classes:
                print(f"[HIGH] {email} in {name} — PASSWORDS exposed")
    elif r.status_code == 404:
        print(f"[CLEAN] {email}")
    time.sleep(1.6)

import json
print(json.dumps(breach_summary, indent=2))

Intelligence Collection Methodology

  1. Establish the seed email or discover candidate addresses: if the email is unknown, run hunter.io against the target domain to obtain the email format pattern. Generate permutations from the target's full name using the confirmed pattern.
  2. Validate candidate emails without sending messages: use hunter.io's verify endpoint (/api/v2/email-verifier) or EmailHippo for bulk SMTP validation. Flag and exclude catch-all domains from further analysis.
  3. Submit all validated emails to HaveIBeenPwned API: for each email, record the breach names, dates, and data classes. Prioritise emails that appear in breaches with plaintext passwords or recent breach dates.
  4. Query psbdmp.ws and Google dork for each confirmed email: site:pastebin.com "email@domain". Download and analyse all returned pastes for credential pairs, associated usernames, or contextual information about the breach source.
  5. Run Holehe against confirmed emails to enumerate platform registrations: holehe email@domain --only-used. Each confirmed registration is a new intelligence node to investigate further.
  6. Check dehashed.com (paid) for more comprehensive breach coverage: dehashed includes smaller breach compilations, partial password hashes, and username associations not indexed by HIBP.
  7. Correlate discovered usernames back into the username enumeration workflow: any username found in Holehe results or paste content should be run through Sherlock for additional platform discovery.
  8. Document each breach finding with data class severity: distinguish between password hash exposure (medium severity — cracking required) and plaintext password exposure (high severity — immediately actionable for credential stuffing).

Common Intelligence Collection Errors

  • Using email permutations without validating against a confirmed format: generating all possible permutations and treating unvalidated emails as confirmed addresses wastes time and produces false intelligence. Always confirm the email format via hunter.io or SMTP validation before treating a permutated address as real.
  • Ignoring catch-all detection when validating emails: SMTP validation tools that probe a catch-all server receive 250 OK for every address whether or not the mailbox exists. Reporting all addresses at a catch-all domain as validated is a systematic false-positive error.
  • Treating a HIBP breach date as the date credentials were first used maliciously: breach data is often aggregated from multiple sources and circulated for months or years before HIBP indexes it. The breach date is when the incident occurred, not when credential stuffing using those credentials began.
  • Querying HIBP without an API key and receiving 401 errors: HIBP requires an API key for the breached-account endpoint. Attempting to use the API without authentication or with an expired key produces 401 responses that may be mistaken for "no breaches found." Always confirm API key validity before running batch queries.
  • Not checking psbdmp.ws after Pastebin search returns no results: Pastebin's own search is limited and inconsistent. Google's site: operator indexes only a fraction of Pastebin content. psbdmp.ws independently archives pastes and frequently contains content that neither Pastebin search nor Google returns.
  • Confusing email registration confirmation from Holehe with account ownership: Holehe confirms that an email was used to register an account, not that the subject actively uses the account or that the account contains relevant intelligence. Always proceed to manually inspect any Holehe-confirmed account for active content.

NICE Framework Alignment

Code Knowledge/Skill/Task Statement How This Card Develops It
K0058 Knowledge of network protocols Understanding SMTP RCPT TO probing used by smtp-user-enum and hunter.io, and how HTTP account-recovery flows are exploited by Holehe for email-to-platform enumeration
K0145 Knowledge of security assessment approaches Applying the email-to-breach-to-platform pivot as a structured OSINT assessment methodology using HIBP, Holehe, psbdmp.ws, and hunter.io
K0272 Knowledge of network security architecture Recognising how email address reuse across services, breach database indexing, and paste site archiving combine to create a persistent credential exposure surface
K0427 Knowledge of encryption algorithms Interpreting credential dump data classes (MD5, bcrypt, SHA-1, plaintext) returned by HIBP and dehashed.com to assess the immediate exploitability of exposed credentials
S0040 Skill in identifying and extracting data of interest Extracting email:password pairs from Pastebin pastes, enumerating platform registrations via Holehe, and correlating breach data classes with risk severity
T0569 Apply and utilize authorized cyber capabilities to achieve objectives Executing the email pivot chain — permutation generation, SMTP validation, HIBP check, Holehe enumeration, psbdmp.ws search — within an authorised OSINT collection mandate

Further Reading

  • HaveIBeenPwned API Documentation — Troy Hunt, haveibeenpwned.com/API/v3
  • Open Source Intelligence Techniques, 10th Edition — Michael Bazzell, Chapter 8: Email Addresses (IntelTechniques)
  • Collection #1 Breach Analysis — Troy Hunt, troyhunt.com (January 2019 blog post)

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.