Git Commit-to-Pastebin OSINT Pivot: Repository Secret Discovery Chained to Paste Platform Intelligence
Theory
Why This Matters
Git commits are signed with an identity — name and email address — that most developers set once and forget. These identities are embedded in every public commit, creating a permanent, searchable trail connecting developers to their code, their accounts, and their mistakes. Threat intelligence analysts use git commit identities to attribute malware repositories to specific individuals. Fraud investigators use the same technique to link anonymous shell company websites (built on public GitHub repositories) to their actual operators. Bug bounty researchers have discovered live API keys by pivoting from a commit email to a Pastebin paste where the same developer shared a code snippet including credentials. Red-team operators use the chain to build complete developer identity profiles — email, social accounts, personal repositories, and breach-exposed passwords — as the intelligence foundation for targeted spear-phishing against high-privilege developers. This card teaches the complete pivot chain from git commit metadata to correlated intelligence across GitHub, breach databases, and paste sites.
Core Concept
The commit-to-Pastebin chain is a multi-pivot intelligence sequence that begins with the immutable identity records embedded in git commits and progressively correlates them across platforms to produce a complete developer intelligence profile.
Git commit metadata extraction collects the author email, author name, committer email (which may differ from the author in rebase/merge workflows), and commit timestamp from every commit in a repository's history. The --format option of git log provides fine-grained control over which fields are extracted. For organisation-wide collection, all public repositories must be cloned and their histories merged.
GitHub username discovery from email exploits GitHub's user search API, which accepts email addresses as search terms: GET /search/users?q=<email>+in:email. When a GitHub user has their email set to public, this directly resolves the email to a GitHub username. Even without a public email, the GitHub events API (/users/<username>/events/public) logs all public push events with commit metadata — reversing this to find the username from an email requires searching repository pushes where the commit author email matches.
HaveIBeenPwned correlation checks the commit email against the HIBP breach index. A developer who used a work email for commits may have that email in a credential breach — the breach date, breach source, and data classes determine the intelligence value.
Pastebin intelligence exploits the common developer behaviour of pasting code snippets, configuration files, and debug output to Pastebin or similar services (hastebin, paste.ee, dpaste). psbdmp.ws indexes Pastebin pastes and exposes a search API. Google dorks (site:pastebin.com "[email protected]" or site:pastebin.com "github.com/username") surface indexed pastes. Pastes containing credentials are particularly valuable — developers frequently paste connection strings or API key examples that include live values.
Commit timestamp analysis for timezone inference applies the same statistical methodology as social media posting analysis: the distribution of commit hours (in UTC) reveals the developer's local timezone, which correlates with geolocation for attribution purposes.
Technical Deep-Dive
# Step 1: Extract all contributor emails from repository history
git clone --mirror https://github.com/targetorg/target-repo /tmp/trepo
cd /tmp/trepo
git log --all --format="%ae %an %ai" | sort -u > contributors.txt
# Output: email name timestamp
# Example: [email protected] Alice Smith 2023-06-15 14:32:11 +0100
# Step 2: GitHub username discovery from commit email
# Method A: GitHub user search API (requires token)
while read email name ts; do
result=$(curl -s -H "Authorization: Bearer $GH_TOKEN"
"https://api.github.com/search/users?q=${email}+in:email"
| jq -r '.items[0].login // "NOT_FOUND"')
echo "${email} => GitHub: ${result}"
sleep 1
done < contributors.txt
# Method B: Parse GitHub push events for email-to-username mapping
# (Search public push events where commit author email matches)
# For each discovered GitHub username:
GH_USER="alice-smith-dev"
curl -s "https://api.github.com/users/${GH_USER}"
| jq '{login, name, email, bio, location, company, public_repos, created_at}'
# Step 3: HaveIBeenPwned API check on all commit emails
HIBP_KEY="your-hibp-api-key"
while read email _rest; do
breaches=$(curl -s -H "hibp-api-key: $HIBP_KEY"
-H "User-Agent: OSINT-Research"
"https://haveibeenpwned.com/api/v3/breachedaccount/${email}?truncateResponse=false"
| jq -r '.[].Name' 2>/dev/null | tr '
' ',')
[ -n "$breaches" ] && echo "BREACHED ${email}: ${breaches}"
sleep 1.5 # HIBP rate limit: 1 request per 1.5 seconds
done < contributors.txt
# Step 4: Pastebin search via psbdmp.ws API
GH_USER_EMAIL="[email protected]"
curl -s "https://psbdmp.ws/api/v3/search/${GH_USER_EMAIL}"
| jq -r '.data[] | "https://pastebin.com/(.id) (.tags)"'
# Google dork approach (manual):
# site:pastebin.com "[email protected]"
# site:pastebin.com "alice-smith-dev"
# site:pastebin.com "targetcorp.com" password
# site:pastebin.com "targetcorp.com" API_KEY
# Step 5: Timestamp analysis for timezone inference
git log --all --format="%ai" | awk -F"T" '{print $2}' 2>/dev/null |
awk -F":" '{print $1}' | sort | uniq -c | sort -rn | head -5
# Alternatively, use timezone offset in git timestamps:
git log --all --format="%ai" | awk '{print $3}' | sort | uniq -c | sort -rn
# Output: +0530 = IST, +0100 = BST, -0500 = EST
# Step 6: Linked social account discovery from GitHub profile
curl -s "https://api.github.com/users/${GH_USER}"
| jq '{blog, twitter_username, location, company}'
# blog field often contains personal website or LinkedIn URL
# twitter_username directly links to Twitter/X account
# Step 7: Personal repository enumeration for additional secrets
curl -s "https://api.github.com/users/${GH_USER}/repos?per_page=100&sort=updated"
-H "Authorization: Bearer $GH_TOKEN"
| jq -r '.[].clone_url' | while read repo; do
git clone --quiet "$repo" "/tmp/personal_repos/$(basename $repo)" 2>/dev/null
gitleaks detect --source "/tmp/personal_repos/$(basename $repo)"
--report-format json --report-path "/tmp/leaks_$(basename $repo).json" 2>/dev/null
count=$(jq 'length' "/tmp/leaks_$(basename $repo).json" 2>/dev/null)
[ "$count" -gt 0 ] && echo "LEAKS in $repo: $count findings"
done
{
"developer_profile": {
"commit_email": "[email protected]",
"commit_name": "Alice Smith",
"github_username": "alice-smith-dev",
"github_profile": {"location": "London, UK", "company": "Target Corporation", "twitter": "alice_codes"},
"breach_exposure": ["LinkedInScrape (2021)", "ExposedForums2023 (passwords)"],
"pastebin_hits": ["https://pastebin.com/aBcD1234 (contains DB connection string, 2022-08)"],
"inferred_timezone": "UTC+0100 (BST)",
"personal_repos_with_leaks": ["alice-smith-dev/homelab-configs (AWS key in history)"]
}
}
Intelligence Collection Methodology
- Begin with one or more target repository URLs. Clone each with
--mirrorto capture full history including all branches and tags. Rungit log --all --format="%ae %an %ai"to extract all contributor emails, names, and timestamps. - Deduplicate the contributor list. Prioritise emails matching the target organisation's domain. Note any personal domain emails (Gmail, Outlook) used alongside corporate emails — these are the same individual's personal address.
- Query the GitHub user search API for each email address. Record confirmed GitHub usernames. For each username, retrieve the full profile: location, company, linked website, Twitter handle. Record all cross-platform links explicitly.
- Run HaveIBeenPwned API checks on all collected emails (both corporate and personal). For any breach hit, note breach name, date, and whether passwords were included in the breach data class.
- Search psbdmp.ws for each email address and each GitHub username. Review all returned paste URLs manually — prioritise any paste with tokens, passwords, or connection strings in the tags or content preview.
- Run Google dorks for each email and username against
site:pastebin.com,site:gist.github.com,site:hastebin.com. Save all identified paste URLs for manual review. - Clone all public repositories of each discovered GitHub user. Run gitleaks against each. Any findings in personal repositories are particularly valuable — they often represent pre-employment code or side projects where the developer was less cautious.
- Analyse commit timestamps across all repositories for each contributor. Compute the hourly distribution and identify the peak 3-hour window. Cross-reference with the timezone offset field in git timestamps (
%aiformat includes+HHMM). - Correlate Pastebin credentials with the target organisation's infrastructure: a database connection string in a paste that references an IP address in the organisation's Shodan footprint is a critical finding requiring immediate escalation.
- Produce the final developer intelligence report: one section per identified contributor, listing confirmed email(s), GitHub username, social accounts, breach exposure, paste findings, timezone, and all discovered credentials with their inferred scope.
Common Intelligence Collection Errors
- Only searching the default branch for commit emails: Feature branches, release branches, and orphaned commits (created by force-push rebases) contain commits with author emails not present in the default branch history. Always use
git log --allwith a mirrored clone. - Missing committer email when it differs from author email: Git distinguishes between the author (who wrote the change) and the committer (who applied it to the repository). In CI/CD systems and rebase workflows, the committer email may be a bot account (
github-actions[bot]) while the author email is the human developer. Both should be extracted (%aefor author,%cefor committer). - Assuming psbdmp.ws index is complete: psbdmp.ws indexes only a portion of Pastebin pastes — those that are public and not deleted before indexing. Many paste sites offer no searchable index at all. Google dorks provide access to search-engine-indexed pastes but miss pastes created after the last crawl or set to unlisted.
- Not validating Pastebin paste dates against employment timeline: A paste containing credentials from 2018 when the developer was a student may be for a personal project with no relevance to the current target. Always note paste creation date and correlate with LinkedIn employment history before treating a paste as live intelligence.
- Overlooking gist repositories: GitHub Gists are separate from regular repositories and have separate search interfaces. A developer may publish sensitive configuration snippets as gists while maintaining clean public repositories. Search
gist.github.com/<username>directly and viasite:gist.github.com <email>Google dork. - Failing to check collaborator and team membership via the GitHub API: Even when a developer's personal repositories are clean, they may be listed as collaborators on private repositories that are visible (with metadata) via the API. The repository name alone can reveal sensitive project names or internal infrastructure naming conventions.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0058 | Knowledge of network protocols | Using git's network protocol, the GitHub REST API, and the HaveIBeenPwned API as intelligence collection channels |
| K0145 | Knowledge of security assessment approaches | Applying a structured multi-pivot chain that converts git commit metadata into a comprehensive developer identity and credential intelligence profile |
| K0272 | Knowledge of network security architecture | Correlating developer identity, credential exposure, and paste site intelligence with the target organisation's known infrastructure footprint |
| K0427 | Knowledge of encryption algorithms | Identifying cryptographic key material (API keys, private keys, OAuth tokens) in paste site content and assessing their cryptographic validity |
| S0040 | Skill in identifying and extracting data of interest | Extracting contributor emails from git history, correlating them across GitHub, HIBP, and paste sites, and identifying live credentials |
| T0569 | Apply and utilize authorized cyber capabilities to achieve objectives | Executing the commit-to-Pastebin chain within an authorised developer OSINT phase to identify credential exposure and support attack surface assessment |
Further Reading
- Open Source Intelligence Techniques, 9th Edition — Michael Bazzell, Chapter 12: Online Communities and Code Repositories (IntelTechniques)
- Hacking APIs — Corey Ball, Chapter 4: Finding APIs and Their Attack Surface (No Starch Press)
- The Web Application Hacker's Handbook, 2nd Edition — Stuttard & Pinto, Chapter 4: Mapping the Application (Wiley)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.