Git Repository History Secret Recovery: Identifying Deleted Credentials via Commit Log Forensics
Theory
Why This Matters
Credentials committed to version control are among the most frequently exploited initial-access vectors documented in public breach disclosures. The fundamental property of git that makes this dangerous is also what makes it valuable as a version control system: every change is permanently recorded. A developer who commits an AWS access key and then deletes it in the next commit has not removed the key — they have added a deletion record. Anyone who clones the repository and examines the full history recovers the original credential in seconds. This pattern has resulted in some of the most damaging cloud infrastructure compromises on record: entire AWS accounts drained, production databases exfiltrated, and signing keys extracted from CI/CD pipelines. Threat intelligence analysts mapping an organisation's attack surface must check all public repositories. Security engineers must understand the tools and patterns to protect their own infrastructure. This card covers both sides.
Core Concept
Git history secret discovery exploits the append-only nature of git's object model. Every commit object references a tree (snapshot of file contents at that point), a parent commit, and metadata. When a file is modified, the old version remains accessible via the parent commit's tree. When a file is deleted, it remains accessible via the last commit that contained it. This means deletion does not equal removal from git history — it equals the creation of a commit recording the deletion.
Three categories of secrets are most commonly found in git history. Hardcoded credentials include database passwords, SMTP authentication strings, and hardcoded admin passwords embedded directly in configuration files or application source. API keys and tokens include cloud provider keys (AWS AKIA... format, GCP service account JSON, Azure client secrets), third-party service keys (Stripe, Twilio, SendGrid), and OAuth tokens. Private keys and certificates include RSA/EC private keys (PEM-encoded, beginning with -----BEGIN RSA PRIVATE KEY-----), SSH private keys, and TLS certificate private keys.
truffleHog uses entropy analysis and regex patterns to identify high-probability secrets across all commits. gitleaks applies a rule-based engine with a large built-in ruleset covering over 150 secret types. git-secrets (AWS tool) applies configurable pattern matching with support for custom rules.
The git log -p command (patch format) outputs every commit with its full diff — the most direct way to manually search history. git log --all includes all branches, tags, and orphaned commits. Combining these with grep enables targeted searches for specific patterns.
git filter-repo is the correct tool for permanently removing secrets from history (replacing the deprecated git filter-branch). It rewrites the repository's entire history, removing specified file paths or content patterns from every commit. After rewriting, all collaborators must re-clone — existing clones retain the old history.
Technical Deep-Dive
# Step 1: Clone the full repository with all refs
git clone --mirror https://github.com/targetorg/target-repo /tmp/target-repo-mirror
cd /tmp/target-repo-mirror
# Step 2: Manual grep across all history (fast initial triage)
git log --all -p --follow -- .
| grep -iE "(password|passwd|secret|api_key|apikey|token|private_key|credentials)"
| grep "^+" | grep -v "^+++" | head -50
# Search for specific file paths that commonly hold secrets:
git log --all --full-history -- ".env" "*.pem" "*.key" "config/database.yml"
"application.properties" "secrets.json" "credentials.json"
| grep -E "^commit"
# View the content of a specific historical file version:
# git log --all --full-history -- .env returns commit hashes
git show <commit-hash>:.env
# Step 3: truffleHog — entropy + pattern scanning of full history
trufflehog git file:///tmp/target-repo-mirror
--json --only-verified 2>/dev/null
| jq -r '"'"'select(.Verified==true) | "(.DetectorName): (.Raw[:60])
File: (.SourceMetadata.Data.Git.file)
Commit: (.SourceMetadata.Data.Git.commit)"''' '
# Step 4: gitleaks — rule-based scanning
gitleaks detect --source /tmp/target-repo-mirror
--report-format json --report-path /tmp/leaks.json
--no-git # use --no-git for mirror repos; omit for standard clones
jq '.[] | {RuleID, File, Commit, Secret: .Secret[:40]}' /tmp/leaks.json
# Step 5: Find exactly which commit introduced a secret (git bisect)
# Scenario: gitleaks found AWS key in commit a3f8c2d; want to find FIRST introduction
git bisect start
git bisect bad a3f8c2d # commit where secret exists
git bisect good <older-clean-commit>
# Git will check out midpoints; test for secret presence:
# git bisect run grep -r "AKIA" .
git bisect run sh -c 'grep -rq "AKIA" . && exit 1 || exit 0'
# Bisect identifies the exact introducing commit automatically
# Step 6: Post-discovery — purge secret from history (DESTRUCTIVE — for remediation)
# Install: pip install git-filter-repo
git filter-repo --path .env --invert-paths # remove .env from all history
# OR to redact a specific string:
git filter-repo --replace-text <(echo "AKIAIOSFODNN7EXAMPLE==>REDACTED_KEY")
# High-entropy string detector (supplement for manual review):
import math, re, sys
def entropy(s):
if not s: return 0
freq = {c: s.count(c)/len(s) for c in set(s)}
return -sum(p * math.log2(p) for p in freq.values())
# Read git log -p output and flag high-entropy tokens on added lines
with open(sys.argv[1]) as f:
for line in f:
if line.startswith("+") and not line.startswith("+++"):
# Extract tokens of length > 20
tokens = re.findall(r"[A-Za-z0-9+/=_-]{20,}", line)
for tok in tokens:
if entropy(tok) > 4.5: # threshold for likely secrets
print(f"High entropy ({entropy(tok):.2f}): {tok[:50]}")
Intelligence Collection Methodology
- Identify all public repositories associated with the target organisation: browse
https://github.com/orgs/ORGNAME/repositoriesand enumerate all public repos. Also search GitHub for the organisation name using GitHub code search:org:ORGNAME. - For each repository, perform an initial triage with
git log --all --full-history -- .env "*.pem" "*.key" config/secrets*to check whether any sensitive file paths ever existed in history. - Run truffleHog in
--only-verifiedmode first. Verified findings are confirmed live credentials — the tool has contacted the provider's API and confirmed the key is active. These require immediate escalation in an authorised assessment. - Run gitleaks for broad pattern coverage. Review the output JSON for any rules matching AWS, GCP, Azure, GitHub, Stripe, Twilio, and database connection string patterns.
- For any confirmed finding, use
git log --all -p -- <file>to see the full history of the file containing the secret. Note the commit hash, author email, commit timestamp, and commit message. - Use
git show <hash>:<filepath>to retrieve the exact content of the secret-containing file at the time of the commit. Copy the credential value for correlation with discovered infrastructure. - Cross-reference discovered credential types with Shodan findings for the target organisation: an AWS access key combined with Shodan evidence of AWS-hosted services suggests which environment the key may access.
- Search GitHub's code search for the specific key prefix or pattern (
AKIAfor AWS access keys,ghp_for GitHub PATs) combined withorg:ORGNAMEto check for cross-repository exposure. - Document all findings: repository URL, commit hash, file path, secret type, first-seen commit date, author identity, and whether the secret appears to be rotated or still active. This constitutes the credential intelligence section of the attack surface report.
Common Intelligence Collection Errors
- Scanning only the default branch:
git clonewithout--mirrorfetches only the default branch. Secrets committed to feature branches, release branches, or orphaned refs (from force-pushes) are invisible without--mirroror--all. Always clone with--mirrorfor comprehensive history scanning. - Trusting a deleted file means the secret is gone: The most common developer misconception about git. A file deleted in commit B was fully present in commit A and is recoverable by any clone with
git show A:<filename>. Onlygit filter-repohistory rewriting removes content from all commits. - Not checking forks of public repositories: When a repository is forked before a secret is removed from history, the fork retains the full original history including the secret. GitHub forks are independent repositories — their history is not affected by changes to the upstream repo. Search for forks before concluding exposure is remediated.
- Dismissing unverified truffleHog findings: truffleHog's
--only-verifiedflag suppresses findings where the key has been rotated and is no longer accepted by the provider API. These still represent a disclosure event and may be valuable intelligence about historical infrastructure (which cloud account, which service was integrated). - Ignoring commit author emails as intelligence artifacts: Every commit in the scanned history contains an author name and email. Collecting all unique author emails from a repository's history (
git log --all --format="%ae" | sort -u) produces a complete list of contributors — including contractors and past employees whose corporate email access has been revoked but whose identity is now confirmed. - Failing to check CI/CD configuration files for secret references: Files such as
.github/workflows/*.yml,.travis.yml,Jenkinsfile, and.circleci/config.ymlfrequently reference environment variable names for secrets. Even when the secrets themselves are stored in CI/CD secret vaults, variable names can reveal what credentials exist (e.g.,AWS_PROD_ACCESS_KEY_IDconfirms a production AWS integration).
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0058 | Knowledge of network protocols | Understanding git's network protocol and object model as the mechanism that makes historical secret recovery possible |
| K0145 | Knowledge of security assessment approaches | Applying a systematic multi-tool scanning methodology (truffleHog + gitleaks + manual grep) with explicit triage and escalation logic |
| K0272 | Knowledge of network security architecture | Correlating discovered credentials (cloud keys, database connection strings) with the target's identified cloud and network infrastructure |
| K0427 | Knowledge of encryption algorithms | Identifying PEM-encoded private keys, distinguishing RSA from EC key types, and assessing the cryptographic impact of exposed keys |
| S0040 | Skill in identifying and extracting data of interest | Extracting credentials, API keys, and private keys from git history using entropy analysis, regex patterns, and targeted path searches |
| T0569 | Apply and utilize authorized cyber capabilities to achieve objectives | Using truffleHog, gitleaks, and git commands within an authorised repository security review to identify credential exposure |
Further Reading
- Alice and Bob Learn Application Security — Tanya Janca, Chapter 8: Secrets Management (Wiley)
- Hacking APIs — Corey Ball, Chapter 5: Discovering API Secrets (No Starch Press)
- The DevOps Handbook, 2nd Edition — Kim, Humble, Debois & Willis, Chapter 22: Information Security (IT Revolution Press)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.