Git Repository History Secret Recovery: Identifying Deleted Credentials via Commit Log Forensics

web_auth_sessions Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Credentials committed to version control are among the most frequently exploited initial-access vectors documented in public breach disclosures. The fundamental property of git that makes this dangerous is also what makes it valuable as a version control system: every change is permanently recorded. A developer who commits an AWS access key and then deletes it in the next commit has not removed the key — they have added a deletion record. Anyone who clones the repository and examines the full history recovers the original credential in seconds. This pattern has resulted in some of the most damaging cloud infrastructure compromises on record: entire AWS accounts drained, production databases exfiltrated, and signing keys extracted from CI/CD pipelines. Threat intelligence analysts mapping an organisation's attack surface must check all public repositories. Security engineers must understand the tools and patterns to protect their own infrastructure. This card covers both sides.

Core Concept

Git history secret discovery exploits the append-only nature of git's object model. Every commit object references a tree (snapshot of file contents at that point), a parent commit, and metadata. When a file is modified, the old version remains accessible via the parent commit's tree. When a file is deleted, it remains accessible via the last commit that contained it. This means deletion does not equal removal from git history — it equals the creation of a commit recording the deletion.

Three categories of secrets are most commonly found in git history. Hardcoded credentials include database passwords, SMTP authentication strings, and hardcoded admin passwords embedded directly in configuration files or application source. API keys and tokens include cloud provider keys (AWS AKIA... format, GCP service account JSON, Azure client secrets), third-party service keys (Stripe, Twilio, SendGrid), and OAuth tokens. Private keys and certificates include RSA/EC private keys (PEM-encoded, beginning with -----BEGIN RSA PRIVATE KEY-----), SSH private keys, and TLS certificate private keys.

truffleHog uses entropy analysis and regex patterns to identify high-probability secrets across all commits. gitleaks applies a rule-based engine with a large built-in ruleset covering over 150 secret types. git-secrets (AWS tool) applies configurable pattern matching with support for custom rules.

The git log -p command (patch format) outputs every commit with its full diff — the most direct way to manually search history. git log --all includes all branches, tags, and orphaned commits. Combining these with grep enables targeted searches for specific patterns.

git filter-repo is the correct tool for permanently removing secrets from history (replacing the deprecated git filter-branch). It rewrites the repository's entire history, removing specified file paths or content patterns from every commit. After rewriting, all collaborators must re-clone — existing clones retain the old history.

Technical Deep-Dive

# Step 1: Clone the full repository with all refs
git clone --mirror https://github.com/targetorg/target-repo /tmp/target-repo-mirror
cd /tmp/target-repo-mirror

# Step 2: Manual grep across all history (fast initial triage)
git log --all -p --follow -- . 
  | grep -iE "(password|passwd|secret|api_key|apikey|token|private_key|credentials)" 
  | grep "^+" | grep -v "^+++" | head -50

# Search for specific file paths that commonly hold secrets:
git log --all --full-history -- ".env" "*.pem" "*.key" "config/database.yml" 
  "application.properties" "secrets.json" "credentials.json" 
  | grep -E "^commit"

# View the content of a specific historical file version:
# git log --all --full-history -- .env returns commit hashes
git show <commit-hash>:.env

# Step 3: truffleHog — entropy + pattern scanning of full history
trufflehog git file:///tmp/target-repo-mirror 
  --json --only-verified 2>/dev/null 
  | jq -r '"'"'select(.Verified==true) | "(.DetectorName): (.Raw[:60])
  File: (.SourceMetadata.Data.Git.file)
  Commit: (.SourceMetadata.Data.Git.commit)"''' '

# Step 4: gitleaks — rule-based scanning
gitleaks detect --source /tmp/target-repo-mirror 
  --report-format json --report-path /tmp/leaks.json 
  --no-git  # use --no-git for mirror repos; omit for standard clones
jq '.[] | {RuleID, File, Commit, Secret: .Secret[:40]}' /tmp/leaks.json

# Step 5: Find exactly which commit introduced a secret (git bisect)
# Scenario: gitleaks found AWS key in commit a3f8c2d; want to find FIRST introduction
git bisect start
git bisect bad a3f8c2d  # commit where secret exists
git bisect good <older-clean-commit>
# Git will check out midpoints; test for secret presence:
# git bisect run grep -r "AKIA" .
git bisect run sh -c 'grep -rq "AKIA" . && exit 1 || exit 0'
# Bisect identifies the exact introducing commit automatically

# Step 6: Post-discovery — purge secret from history (DESTRUCTIVE — for remediation)
# Install: pip install git-filter-repo
git filter-repo --path .env --invert-paths  # remove .env from all history
# OR to redact a specific string:
git filter-repo --replace-text <(echo "AKIAIOSFODNN7EXAMPLE==>REDACTED_KEY")

# High-entropy string detector (supplement for manual review):
import math, re, sys

def entropy(s):
    if not s: return 0
    freq = {c: s.count(c)/len(s) for c in set(s)}
    return -sum(p * math.log2(p) for p in freq.values())

# Read git log -p output and flag high-entropy tokens on added lines
with open(sys.argv[1]) as f:
    for line in f:
        if line.startswith("+") and not line.startswith("+++"):
            # Extract tokens of length > 20
            tokens = re.findall(r"[A-Za-z0-9+/=_-]{20,}", line)
            for tok in tokens:
                if entropy(tok) > 4.5:  # threshold for likely secrets
                    print(f"High entropy ({entropy(tok):.2f}): {tok[:50]}")

Intelligence Collection Methodology

Identify all public repositories associated with the target organisation: browse https://github.com/orgs/ORGNAME/repositories and enumerate all public repos. Also search GitHub for the organisation name using GitHub code search: org:ORGNAME.
For each repository, perform an initial triage with git log --all --full-history -- .env "*.pem" "*.key" config/secrets* to check whether any sensitive file paths ever existed in history.
Run truffleHog in --only-verified mode first. Verified findings are confirmed live credentials — the tool has contacted the provider's API and confirmed the key is active. These require immediate escalation in an authorised assessment.
Run gitleaks for broad pattern coverage. Review the output JSON for any rules matching AWS, GCP, Azure, GitHub, Stripe, Twilio, and database connection string patterns.
For any confirmed finding, use git log --all -p -- <file> to see the full history of the file containing the secret. Note the commit hash, author email, commit timestamp, and commit message.
Use git show <hash>:<filepath> to retrieve the exact content of the secret-containing file at the time of the commit. Copy the credential value for correlation with discovered infrastructure.
Cross-reference discovered credential types with Shodan findings for the target organisation: an AWS access key combined with Shodan evidence of AWS-hosted services suggests which environment the key may access.
Search GitHub's code search for the specific key prefix or pattern (AKIA for AWS access keys, ghp_ for GitHub PATs) combined with org:ORGNAME to check for cross-repository exposure.
Document all findings: repository URL, commit hash, file path, secret type, first-seen commit date, author identity, and whether the secret appears to be rotated or still active. This constitutes the credential intelligence section of the attack surface report.

Common Intelligence Collection Errors

Scanning only the default branch: git clone without --mirror fetches only the default branch. Secrets committed to feature branches, release branches, or orphaned refs (from force-pushes) are invisible without --mirror or --all. Always clone with --mirror for comprehensive history scanning.
Trusting a deleted file means the secret is gone: The most common developer misconception about git. A file deleted in commit B was fully present in commit A and is recoverable by any clone with git show A:<filename>. Only git filter-repo history rewriting removes content from all commits.
Not checking forks of public repositories: When a repository is forked before a secret is removed from history, the fork retains the full original history including the secret. GitHub forks are independent repositories — their history is not affected by changes to the upstream repo. Search for forks before concluding exposure is remediated.
Dismissing unverified truffleHog findings: truffleHog's --only-verified flag suppresses findings where the key has been rotated and is no longer accepted by the provider API. These still represent a disclosure event and may be valuable intelligence about historical infrastructure (which cloud account, which service was integrated).
Ignoring commit author emails as intelligence artifacts: Every commit in the scanned history contains an author name and email. Collecting all unique author emails from a repository's history (git log --all --format="%ae" | sort -u) produces a complete list of contributors — including contractors and past employees whose corporate email access has been revoked but whose identity is now confirmed.
Failing to check CI/CD configuration files for secret references: Files such as .github/workflows/*.yml, .travis.yml, Jenkinsfile, and .circleci/config.yml frequently reference environment variable names for secrets. Even when the secrets themselves are stored in CI/CD secret vaults, variable names can reveal what credentials exist (e.g., AWS_PROD_ACCESS_KEY_ID confirms a production AWS integration).

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0058	Knowledge of network protocols	Understanding git's network protocol and object model as the mechanism that makes historical secret recovery possible
K0145	Knowledge of security assessment approaches	Applying a systematic multi-tool scanning methodology (truffleHog + gitleaks + manual grep) with explicit triage and escalation logic
K0272	Knowledge of network security architecture	Correlating discovered credentials (cloud keys, database connection strings) with the target's identified cloud and network infrastructure
K0427	Knowledge of encryption algorithms	Identifying PEM-encoded private keys, distinguishing RSA from EC key types, and assessing the cryptographic impact of exposed keys
S0040	Skill in identifying and extracting data of interest	Extracting credentials, API keys, and private keys from git history using entropy analysis, regex patterns, and targeted path searches
T0569	Apply and utilize authorized cyber capabilities to achieve objectives	Using truffleHog, gitleaks, and git commands within an authorised repository security review to identify credential exposure