Social Media Identity Operations OSINT: Detecting Coordinated Inauthentic Behavior and Sockpuppet Networks
Theory
Why This Matters
Building a subject dossier from open sources is the foundational skill of intelligence analysis, investigative journalism, and corporate due diligence. In 2020, Bellingcat used open-source dossier construction to identify the FSB officers responsible for the Novichok poisoning of Alexei Navalny — correlating flight records, phone numbers, hotel check-ins, and social media profiles into an irrefutable attribution package built entirely from public data. Private investigators, HR background-check firms, and social engineering red teams use the same methodology: start from a name, accumulate linked identifiers, and build a profile that reveals far more than any single data source would suggest. For CTF intelligence challenges, mastering this workflow — name to email to breach data to username to platform profiles to phone to reverse image search — is the equivalent of ROP chain construction in binary exploitation: individual primitives that combine into complete attribution.
Core Concept
A subject dossier is a structured collection of personally identifiable information (PII) and associated digital artefacts aggregated from multiple open sources. The core principle is identifier pivoting: each confirmed data point becomes the seed for the next search. A name yields an email; an email yields breach data and platform accounts; a username yields additional profiles; a profile photo yields reverse-image matches revealing alternate accounts; a phone number yields carrier data and potentially a physical location.
hunter.io is the industry-standard tool for professional email discovery. Given a company domain, it infers the organisation's email address format (first.last@, f.last@, flast@) from indexed email addresses and applies that pattern to enumerate staff emails. It also validates whether a given email address exists via SMTP probing, providing confidence scoring.
Sherlock performs username enumeration across 300+ platforms simultaneously. It is the most efficient tool for determining which services a subject has registered under a known alias, enabling profile aggregation in minutes rather than hours.
Reverse image search exploits the fact that profile photos are reused across platforms. Running an avatar image through Google Images (drag-and-drop), TinEye (perceptual hash matching), and Yandex Images (superior facial recognition algorithm) frequently surfaces alternate accounts, older profiles, and instances where the image was scraped or shared. Yandex is particularly effective at finding Eastern European platform registrations and older social profiles.
Holehe checks whether a given email address is registered on 120+ online services by attempting account recovery flows, which leak a "yes/no" registration response without requiring authentication. This is a legally grey technique — the account recovery endpoint is public — but is widely used in authorised OSINT engagements.
Chain of custody in OSINT refers to the documented record of how each piece of intelligence was obtained: the exact tool or query, the source URL, the timestamp, and any transformations applied. Without this documentation, the intelligence cannot be independently verified and may be inadmissible in legal proceedings or corporate investigations.
Technical Deep-Dive
# Phase 1: Name-to-email discovery via hunter.io CLI
# Install: pip install hunter
hunter domain --domain targetcorp.com --limit 100 --api-key YOUR_KEY
# Sample output:
# [email protected] (confidence: 94%, position: CEO)
# [email protected] (confidence: 87%, position: CTO)
# Pattern: {first}.{last}@targetcorp.com
# Direct email verification (SMTP probe):
hunter verify --email [email protected] --api-key YOUR_KEY
# {"result": "deliverable", "score": 99, "mx_records": true}
# Phase 2: Username enumeration across 300+ platforms
python3 sherlock/sherlock.py "target_alias"
--output target_alias_results.txt
--timeout 10
--print-found
# [+] GitHub: https://github.com/target_alias
# [+] HackerNews: https://news.ycombinator.com/user?id=target_alias
# [+] LinkedIn: https://www.linkedin.com/in/target_alias
# Phase 3: Email-to-platform registration check via Holehe
holehe [email protected] --only-used
# [+] twitter.com - [email protected] is registered
# [+] instagram.com - [email protected] is registered
# [-] facebook.com - [email protected] is NOT registered
# Phase 4: Reverse image search via CLI (downloads image, runs search)
# Save profile photo locally first, then submit to multiple engines:
exiftool profile_photo.jpg # check EXIF metadata (camera, GPS, software)
# Submit to: images.google.com (drag), tineye.com (upload), yandex.com/images (upload)
# Phase 5: Phone number OSINT
# numverify API — carrier, line type, location
curl "https://apilayer.net/api/validate?access_key=KEY&number=+15551234567&country_code=US"
# {"valid":true,"carrier":"Verizon","line_type":"mobile","location":"New York"}
# Truecaller — crowdsourced caller ID (requires web scraping or app API reverse engineering)
# Dossier builder — structured output from multiple sources
import json, requests, subprocess, datetime
def build_dossier(name, email, username, domain):
dossier = {
"subject": name,
"collected_at": datetime.datetime.utcnow().isoformat() + "Z",
"identifiers": {"email": email, "username": username, "domain": domain},
"sources": []
}
# Hunter.io email pattern detection
r = requests.get(
f"https://api.hunter.io/v2/domain-search",
params={"domain": domain, "api_key": "YOUR_KEY", "limit": 10}
)
if r.status_code == 200:
pattern = r.json().get("data", {}).get("pattern", "unknown")
dossier["sources"].append({"tool": "hunter.io", "finding": f"email_pattern={pattern}"})
# HIBP breach check
hibp_r = requests.get(
f"https://haveibeenpwned.com/api/v3/breachedaccount/{email}",
headers={"hibp-api-key": "YOUR_HIBP_KEY", "User-Agent": "OSINT/1.0"}
)
if hibp_r.status_code == 200:
breaches = [b["Name"] for b in hibp_r.json()]
dossier["sources"].append({"tool": "haveibeenpwned", "finding": breaches})
return dossier
result = build_dossier("John Smith", "[email protected]", "jsmith99", "targetcorp.com")
print(json.dumps(result, indent=2))
Intelligence Collection Methodology
- Anchor on a seed identifier — Begin with the strongest known identifier: full legal name, primary email address, or a confirmed username. Record the source and confidence level of this anchor before proceeding.
- Email discovery via hunter.io — Submit the target organisation domain to hunter.io to retrieve the email format pattern and validate candidate addresses. For individuals not tied to an organisation, attempt common email permutations (first.last, flast, firstname) at major providers (gmail, protonmail, outlook).
- Breach data correlation via HaveIBeenPwned — Submit all confirmed email addresses to the HIBP API. Note which breaches each address appears in; the breach name, date, and data classes inform what credential material may be circulating and which passwords to prioritise in a social engineering assessment.
- Username enumeration via Sherlock — Run Sherlock against all confirmed or inferred aliases. Review each confirmed platform hit manually to extract additional PII: bio information, linked accounts, post history, and location tags.
- LinkedIn and professional network mapping — Search LinkedIn for the subject by name, email, and company. Extract: current and past employers, team members, connections, skills, and any published contact information. Note mutual connections that may be leveraged in social engineering scenarios.
- Reverse image search on all available photos — Download profile images from confirmed platform accounts. Submit each to Google Images, TinEye, and Yandex Images. Document any new platforms or accounts discovered through image matching.
- Phone number OSINT — If a phone number is recovered from any source, submit it to numverify for carrier and line-type data. Search it in Truecaller's database via the API or web interface for a crowdsourced caller ID name.
- Compile and link the dossier in Maltego — Create entity nodes for each confirmed identifier and draw relationship links. Use the OSINT Framework taxonomy (people > social networks > email > username > phone > images) to ensure systematic coverage. Add evidence annotations including source URLs and collection timestamps.
Common Intelligence Collection Errors
- Conflating digital profiles with confirmed real-world identity: A username appearing on 12 platforms is strong evidence of a single actor but is not confirmation that the actor is the named individual. Identity confirmation requires a convergence of biometric matches (face, voice), government-record corroboration, or direct confirmation from the subject.
- Skipping EXIF analysis on recovered images: Profile photos and posted images frequently retain GPS coordinates, camera model, software version, and creation timestamp in EXIF metadata. Running
exiftooltakes seconds and has been the decisive intelligence find in numerous investigations. - Using Holehe without understanding its legal boundary: Holehe exploits account-recovery flows that are public by design but may violate Terms of Service of some platforms. In jurisdictions with broad computer fraud statutes, automated account enumeration against a platform without authorisation may expose the analyst to legal risk. Always confirm that the engagement scope authorises this technique.
- Building a dossier without chain-of-custody documentation: OSINT findings used in legal proceedings, HR investigations, or law enforcement referrals must be reproducible. Failing to log the exact query, tool version, source URL, and timestamp makes findings challengeable and potentially inadmissible.
- Treating LinkedIn data as current and accurate: LinkedIn profiles are self-reported and frequently outdated or strategically curated. Employment history, job titles, and skill endorsements should be cross-referenced against company websites, press releases, and other corroborating sources before being treated as fact.
- Ignoring the OSINT Framework taxonomy as a checklist: Analysts often pursue the leads that are easiest to follow rather than systematically covering all data categories. Using the OSINT Framework taxonomy as a checklist ensures that phone, images, public records, and domain data are all examined, not just the most obvious social media hits.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0058 | Knowledge of network protocols | Understanding SMTP probing used by hunter.io for email validation and HTTP account-recovery flows exploited by Holehe for platform enumeration |
| K0145 | Knowledge of security assessment approaches | Applying the structured dossier-building methodology — seed identifier to pivot chain to compiled profile — as a repeatable intelligence assessment workflow |
| K0272 | Knowledge of network security architecture | Recognising how professional network APIs, breach databases, and social platform account-recovery endpoints combine to create an unintended intelligence exposure surface |
| K0427 | Knowledge of encryption algorithms | Interpreting credential data classes returned by HIBP (plaintext passwords vs. bcrypt hashes vs. SHA-1) to assess the real-world risk level of each breach |
| S0040 | Skill in identifying and extracting data of interest | Extracting PII, professional history, and platform registrations from heterogeneous open sources including LinkedIn, hunter.io, HIBP, and reverse image search results |
| T0569 | Apply and utilize authorized cyber capabilities to achieve objectives | Executing a full dossier-construction pipeline — email discovery, breach correlation, username enumeration, image search, phone lookup — within an authorised OSINT collection mandate |
Further Reading
- Open Source Intelligence Techniques, 10th Edition — Michael Bazzell (IntelTechniques)
- We Are Bellingcat: Global Crime, Online Sleuths, and the Bold Future of News — Eliot Higgins (Bloomsbury Publishing)
- The Investigator's OSINT Handbook — SANS Institute, Reading Room (sans.org)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.