Full Corporate Breach Simulation: Five-Service OSINT Chain from Reconnaissance to Data Exfiltration
Theory
Why This Matters
Intelligence analysts supporting red-team engagements must be able to construct a complete pre-breach intelligence picture using only open sources — not because active scanning is prohibited, but because passive intelligence collected before any active action defines the quality of targeting and minimises operational exposure. This five-step chain mirrors the actual pre-compromise reconnaissance phase observed in documented APT campaigns: domain infrastructure is mapped, email addresses are harvested, LinkedIn provides the social graph, GitHub exposes accidentally committed credentials, and Shodan/Censys identifies the weakest technical entry point. Every major ransomware affiliate group and corporate espionage operation documented in public threat intelligence reports follows a functionally identical chain. Analysts who understand the chain can both execute it offensively and build detection logic that identifies when it is being run against their own organisation.
Core Concept
A five-service OSINT chain is a structured intelligence collection sequence where each step's output becomes the next step's input, progressively narrowing from broad organisational context to specific exploitable weaknesses. The chain is designed to be entirely passive: no packets are sent to target systems at any stage.
Step 1 — WHOIS/Domain: The domain registration record provides registrant name, email, organisation, nameservers, and registration timeline. These seed all subsequent steps.
Step 2 — Email enumeration: Hunter.io aggregates email addresses from public web pages, PDFs, and press releases, inferring the organisation's email format (e.g., [email protected]). theHarvester automates harvesting across search engines, LinkedIn, and DNS records. Combined, they produce a candidate email list and a confirmed format pattern.
Step 3 — LinkedIn employee directory: The organisation's LinkedIn company page lists all employees with public profiles. Cross-referencing names against the email format confirmed in Step 2 produces a high-confidence list of valid corporate email addresses — phishing targets or credential stuffing candidates.
Step 4 — GitHub org secrets scan: Development teams frequently commit credentials, API keys, and internal hostnames to public GitHub repositories before realising the error. Even after deletion, the data persists in git history. Tools such as truffleHog and gitleaks scan repository history for high-entropy strings and known credential patterns.
Step 5 — Shodan/Censys exposed service discovery: With the organisation's IP ranges inferred from WHOIS ASN data and DNS A-record resolution, Shodan and Censys identify exposed services — unpatched software versions, default credentials, anonymous-bind LDAP, exposed Elasticsearch, or internet-facing RDP — constituting the weakest technical entry point.
Correlation across all five steps produces the attack surface report: the phishing targets most likely to click (from LinkedIn seniority/role), the credential most likely to work (from breach correlation), and the service most likely to be vulnerable (from Shodan version data).
Technical Deep-Dive
# Step 1: WHOIS — extract registrant email, org, nameservers
whois targetcorp.com | grep -E "Registrant|Name Server|Updated Date|Creation Date"
# Step 2a: theHarvester — email and subdomain harvesting
theHarvester -d targetcorp.com -b google,bing,linkedin,hunter -l 500 -f harvest_output
# Results in harvest_output.json: emails[], hosts[], ips[]
# Step 2b: Hunter.io domain search (requires API key)
curl -s "https://api.hunter.io/v2/domain-search?domain=targetcorp.com&api_key=$HUNTER_API_KEY"
| jq -r '.data.emails[].value' > hunter_emails.txt
# Also extract email format:
curl -s "https://api.hunter.io/v2/domain-search?domain=targetcorp.com&api_key=$HUNTER_API_KEY"
| jq -r '.data.pattern'
# => "{first}.{last}"
# Step 3: LinkedIn — generate email list from employee names
# (Manual: collect names from LinkedIn company page, then apply format)
python3 - <<'PY'
import itertools
names = [("Alice","Smith"), ("Bob","Jones"), ("Carol","Williams")]
pattern = "{first}.{last}@targetcorp.com"
for first, last in names:
print(pattern.format(first=first.lower(), last=last.lower()))
PY
# Step 4: GitHub org secrets scan with truffleHog
trufflehog github --org=targetcorp --json 2>/dev/null |
jq -r 'select(.Verified==true) | "(.DetectorName) => (.SourceMetadata.Data.Github.link)"'
# gitleaks on a cloned repo
git clone https://github.com/targetcorp/public-repo /tmp/targetrepo
gitleaks detect --source /tmp/targetrepo --report-format json --report-path /tmp/leaks.json
jq '.[] | {RuleID, File, Secret}' /tmp/leaks.json
# Step 5: Shodan/Censys exposed service discovery
shodan search --fields ip_str,port,product,version,vulns
'org:"Target Corporation" has_vuln:true' | head -30
# Censys via CLI (pip install censys)
censys search 'autonomous_system.organization:"Target Corporation" AND services.port:3389'
--index-type hosts --fields ip,services.port,services.software.product
{
"attack_surface_summary": {
"phishing_targets": ["[email protected] (CISO dept, LinkedIn)", "[email protected] (IT Admin)"],
"breach_correlated_credentials": ["[email protected]: password found in 2023 breach corpus"],
"github_leaks": ["AWS_ACCESS_KEY_ID in targetcorp/infra-scripts commit a3f8c2d (now deleted but in history)"],
"weakest_service": "RDP on 203.0.113.47:3389 running Windows Server 2012 R2 (EOL, BlueKeep-era patch level)"
}
}
Intelligence Collection Methodology
- Begin with whois on the primary domain. Record registrant email, organisation string, nameservers (which reveal hosting provider), and creation date (age indicates infrastructure maturity). Use the registrant email to seed reverse WHOIS on SecurityTrails.
- Run theHarvester across all major search backends (
-b google,bing,linkedin,hunter,duckduckgo). Save output in JSON (-f) for programmatic processing. Merge discovered emails with Hunter.io results. - From Hunter.io's
patternfield, confirm the organisation's email format. Apply the format to every full name visible on the LinkedIn company page. This produces a validated address list even for employees who do not appear in theHarvester results. - Check every email address from the merged list against the HaveIBeenPwned API (
GET https://haveibeenpwned.com/api/v3/breachedaccount/{email}) to identify breach exposure and the most recent breach date. - Enumerate the target's GitHub organisation (
https://github.com/orgs/ORGNAME/repositories). Run truffleHog across all public repositories with--since-commit HEAD~500to scan recent history. Run gitleaks for pattern-based detection of API keys, private keys, and connection strings. - Resolve the organisation's primary domain to its IP and perform an ASN lookup (
whois -h whois.cymru.com " -v <IP>"). Use the returned ASN CIDR to constrain Shodan searches:net:<CIDR> org:"Target Corporation". - In Shodan, filter for high-value exposed services:
has_vuln:true(known CVEs in banner),port:3389(RDP),port:445(SMB),port:9200(Elasticsearch),port:27017(MongoDB). Record service versions and correlate with public CVE databases. - Synthesise findings into the attack surface report: map each phishing target to their LinkedIn role, note any breach-compromised credentials, document GitHub leaks with commit hash and file path, list the top three exploitable services with CVE references.
Common Intelligence Collection Errors
- Trusting a single email format across the entire organisation: Acquisitions, regional offices, and legacy systems frequently produce multiple email format patterns within the same company. Always validate format guesses against hunter.io's confidence scores and cross-check at least five confirmed addresses before generating bulk lists.
- Missing private GitHub repositories via organisation member enumeration: Public org repositories are not the only exposure surface. Enumerate organisation members and check each member's personal public repositories — developers often push sensitive code to personal repos and later transfer them, leaving the original history intact.
- Failing to check historical WHOIS for expired registrant emails: When a company changes domain registrars or privacy settings, the historical registrant email may be an abandoned personal address that has been re-registered by a third party. Sending phishing simulation emails to such addresses constitutes unauthorised disclosure — always verify currency before use in any test.
- Treating Shodan vulnerability flags as definitive: The
has_vuln:trueShodan filter is based on banner version strings, not active exploitation testing. Version string spoofing and minor patch revisions can generate both false positives and false negatives. Always correlate Shodan version data with the vendor's CVE list and confirm patch status through other means. - Stopping GitHub scanning at the default branch: Deleted branches, orphaned refs, and stashed commits are not scanned by tools that only clone the default branch. Use
git clone --mirrorto fetch all refs before running gitleaks or truffleHog. - Not correlating breach dates with employment dates: A credential found in a 2019 breach may belong to an employee who left in 2020 — the account is likely disabled. Always cross-reference breach date against LinkedIn employment timeline before prioritising a credential as a live target.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0058 | Knowledge of network protocols | Using DNS resolution, ASN WHOIS, and SMTP validation as intelligence-gathering mechanisms within the five-step chain |
| K0145 | Knowledge of security assessment approaches | Structuring reconnaissance as a progressive, evidence-led chain where each step is gated on the previous step's output |
| K0272 | Knowledge of network security architecture | Mapping the full attack perimeter — domains, email infrastructure, code repositories, and exposed network services — into a unified model |
| K0427 | Knowledge of encryption algorithms | Interpreting TLS certificate data in Shodan/Censys results to identify infrastructure relationships and certificate reuse patterns |
| S0040 | Skill in identifying and extracting data of interest | Correlating email addresses, breach records, GitHub leaks, and Shodan service data into an actionable attack surface report |
| T0569 | Apply and utilize authorized cyber capabilities to achieve objectives | Executing the full five-step passive OSINT chain as the intelligence preparation phase of an authorised red-team engagement |
Further Reading
- Red Team Development and Operations — Joe Vest & James Tubberville, Chapter 4: Intelligence Preparation (Self-published)
- Open Source Intelligence Techniques, 9th Edition — Michael Bazzell, Chapter 7: Email Addresses (IntelTechniques)
- Hunting Cyber Criminals — Vinny Troia, Chapter 3: Correlation and Attribution (Wiley)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.