Corporate Digital Footprint OSINT: Passive Reconnaissance and External Attack Surface Mapping

forensic_file_artifacts Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Before a threat actor sends a single packet to a corporate network, they spend hours — sometimes days — building a complete picture of the target organisation's exposed surface using entirely passive means. Nation-state intrusion teams, ransomware affiliates, and corporate espionage operators all follow the same playbook: collect every publicly available data point, correlate it into a coherent map, and identify the weakest entry point before committing to any active action. Journalists investigating corporate misconduct use the same methodology to uncover shell companies, shadow domains, and undisclosed infrastructure. Fraud investigators trace domains and hosting relationships to connect fictitious storefronts to known criminal networks. Understanding this process from the attacker's perspective is prerequisite knowledge for any analyst designing defensive monitoring or conducting an authorised red-team assessment.

Core Concept

A corporate digital footprint is the complete collection of internet-facing assets attributable to an organisation — registered domains, subdomains, IP ranges, mail servers, web applications, and exposed services. Attack surface mapping is the systematic process of discovering and cataloguing that footprint using only publicly available data sources.

The pivot chain begins with the organisation's legal name. Reverse WHOIS queries — searching registration records by organisation name or registrant email rather than by domain — surface every domain the company has ever registered, including forgotten test environments, acquired subsidiaries, and marketing microsites that receive no maintenance. Platforms such as SecurityTrails and DomainTools maintain historical WHOIS snapshots, which reveal domains that have since been privacy-masked.

Certificate transparency (CT) logs are an authoritative and real-time source of subdomain intelligence. Every public TLS certificate issued by a trusted CA is logged to public CT logs. crt.sh exposes those logs through a searchable interface: querying %.targetcorp.com returns every certificate issued for any subdomain of the target domain, including internal staging environments that were never intended to be publicly reachable. Amass and subfinder automate CT log queries alongside DNS brute-force and API-based enumeration to produce a comprehensive subdomain inventory.

Technology stack fingerprinting via Wappalyzer (browser extension or CLI) and BuiltWith identifies CMS platforms, JavaScript frameworks, CDN providers, analytics vendors, and WAF signatures from HTTP headers, HTML source, and DNS records. Each identified technology narrows the exploit surface: a WordPress installation implies plugin vulnerabilities; a specific WAF implies bypass techniques; a specific CDN implies origin IP disclosure risks.

Job postings are an underappreciated intelligence source. A posting for "Senior AWS Security Engineer with experience in GuardDuty and Macie" confirms the cloud provider, the specific security tooling in use, and — by inference — which tooling is absent or understaffed. Postings for "Palo Alto NGFW Administrator" reveal firewall vendor and model family. LinkedIn's department structure (visible without connection via the company page) discloses headcount per function, surfacing understaffed security teams.

Shodan's org: filter (org:"Target Corporation") returns every IP address in the Shodan index that self-identifies with the target organisation in its banner data. Results include open ports, service banners, TLS certificate details, and in some cases default credential indicators.

Technical Deep-Dive

# Step 1: Reverse WHOIS — find all domains registered by the organisation
# (SecurityTrails CLI — requires API key in SECURITYTRAILS_API_KEY env var)
curl -s "https://api.securitytrails.com/v1/domains/list" 
  -H "APIKEY: $SECURITYTRAILS_API_KEY" 
  -H "Content-Type: application/json" 
  -d '{"filter":{"whois_organization":"Target Corporation"}}' 
  | jq '.domains[]'

# Step 2: Certificate transparency via crt.sh (no auth required)
curl -s "https://crt.sh/?q=%25.targetcorp.com&output=json" 
  | jq -r '.[].name_value' | sort -u | grep -v '*' > ct_subdomains.txt

# Step 3: Comprehensive subdomain enumeration with amass
amass enum -passive -d targetcorp.com -o amass_passive.txt
amass enum -active -d targetcorp.com -brute -w /usr/share/wordlists/subdomains-top1million.txt 
  -o amass_active.txt

# Step 4: Merge and resolve all discovered subdomains
cat ct_subdomains.txt amass_passive.txt amass_active.txt | sort -u > all_subdomains.txt
# Resolve to IPs (requires massdns or dnsx)
dnsx -l all_subdomains.txt -a -resp -o resolved.txt

# Step 5: Shodan org search (requires shodan CLI: pip install shodan)
shodan search --fields ip_str,port,transport,org,product,version 
  'org:"Target Corporation"' > shodan_results.txt

# Step 6: Technology fingerprinting via Wappalyzer CLI
npm install -g wappalyzer
wappalyzer https://www.targetcorp.com --pretty

# Step 7: Identify exposed admin panels from resolved subdomains
# Common admin paths to check on each discovered host
while read host; do
  for path in /admin /wp-admin /phpmyadmin /manager/html /_ah/admin /console; do
    code=$(curl -sk -o /dev/null -w "%{http_code}" "https://${host}${path}")
    [ "$code" != "404" ] && echo "${host}${path} => $code"
  done
done < <(awk '{print $1}' resolved.txt)

# Sample shodan_results.txt excerpt:
203.0.113.45   443  tcp  Target Corporation  nginx  1.18.0
203.0.113.46   8080 tcp  Target Corporation  Apache Tomcat  9.0.45
203.0.113.47   389  tcp  Target Corporation  OpenLDAP  (anonymous bind)
203.0.113.48   3389 tcp  Target Corporation  Microsoft Terminal Services

Intelligence Collection Methodology

Seed the investigation with the legal entity name. Search Companies House (UK), SEC EDGAR (US), or the relevant national company registry to confirm the exact registered name — minor spelling variations affect reverse WHOIS results.
Run reverse WHOIS on SecurityTrails and DomainTools using the organisation name and the primary domain's registrant email. Record every discovered domain in a tracking spreadsheet with registration date and registrar.
For each discovered domain, query crt.sh (https://crt.sh/?q=%.domain.com&output=json) and pipe results through jq to extract unique name_value entries. Wildcard entries (*.sub.domain.com) indicate active subdomain use worth brute-forcing.
Run amass in passive mode against all discovered root domains. Follow with active brute-force using a curated wordlist. Feed results into dnsx for live resolution; dead subdomains waste time.
Submit each resolved IP to Shodan (shodan host <IP>) and run the org: filter for the organisation name. Note open ports 21, 22, 23, 25, 80, 389, 443, 445, 3389, 8080, 8443, 9200, 27017 — each implies a specific service family.
Run Wappalyzer or BuiltWith against all live web applications. Record CMS, framework, CDN, and WAF findings. Tag each host by technology category.
Scrape LinkedIn for the company page: note total employee count, department headcounts, and recent job postings. Copy verbatim technology references from postings into the technology matrix.
Cross-reference job posting technology requirements against Shodan service versions. A posting for a specific version of a product alongside an exposed service running that product confirms the target's actual environment.
Search recon-ng modules: use recon/domains-hosts/google_site_web and use recon/hosts-hosts/shodan_ip to automate correlation across sources in a single workspace.
Produce the final attack surface report: domains → subdomains → resolved IPs → open ports → technologies → weakest entry points (outdated software, exposed admin panels, anonymous-bind LDAP, RDP exposed to internet).

Common Intelligence Collection Errors

Stopping at the primary domain: Organisations routinely expose more attack surface through subsidiary domains, recently acquired companies, and legacy marketing microsites than through their main domain. Reverse WHOIS and CT log queries against all discovered registrant emails are mandatory before declaring the domain inventory complete.
Treating unresolved subdomains as inactive: DNS NXDOMAIN responses do not mean a service is gone — the host may be referenced only in internal DNS. Unresolved CT log entries warrant investigation via direct IP scanning of the organisation's known netblocks.
Missing historical infrastructure in current Shodan results: Shodan's index reflects the state at last scan time, which may be days or weeks stale. Use Shodan's before: and after: date filters and cross-reference with Censys to identify services that appeared and disappeared — possible indicators of incident response or infrastructure rotation.
Ignoring certificate SANs: A TLS certificate's Subject Alternative Names field often lists internal hostnames, staging environments, and IP addresses that never appear in public DNS. Always extract and investigate SANs from every certificate returned by Shodan or crt.sh.
Overlooking job posting temporal signals: A sudden cluster of security engineering postings in a specific domain (cloud, SIEM, endpoint) signals a recent gap — potentially a team departure or a recent security incident driving urgent hiring. These signals should be correlated with breach notification databases and news searches.
Confusing CDN IPs with origin IPs: Many targets sit behind Cloudflare, Akamai, or Fastly. Shodan results for the primary domain may return CDN edge IPs rather than the application server. Use DNS history (SecurityTrails historical records) and certificate CN/SAN mismatches to locate the actual origin IP ranges.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0058	Knowledge of network protocols	Interpreting DNS records (A, MX, NS, TXT, SAN), TLS certificate structure, and HTTP headers as intelligence sources
K0145	Knowledge of security assessment approaches	Applying passive-first reconnaissance doctrine: exhausting open-source sources before any active probing
K0272	Knowledge of network security architecture	Understanding how subdomains, IP ranges, and exposed services compose an organisation's attack surface
K0427	Knowledge of encryption algorithms	Reading TLS certificate fields (CN, SAN, issuer, validity) to infer infrastructure topology and certificate management practices
S0040	Skill in identifying and extracting data of interest	Correlating WHOIS records, CT log entries, Shodan banners, and job postings into a unified attack surface model
T0569	Apply and utilize authorized cyber capabilities to achieve objectives	Executing the full passive reconnaissance chain — reverse WHOIS, CT enumeration, Shodan, Wappalyzer — within an authorised assessment scope