Markdown rendering XSS

web_auth_sessions Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Markdown rendering XSS is a persistent source of vulnerabilities in collaborative and developer-facing platforms. In 2019, a researcher earned a $10,000 bounty on HackerOne by demonstrating stored XSS on the GitLab platform through a Markdown link injection that bypassed server-side sanitisation. In 2021, a mutation XSS (mXSS) bypass against DOMPurify (CVE-2020-26870) allowed script execution despite the sanitiser appearing to clean the input, because the browser mutated the sanitised HTML upon insertion into the DOM. OWASP A03:2021 (Injection) covers HTML injection in rendered Markdown as a concrete instance of the class. Markdown-to-HTML conversion without sanitisation is the direct equivalent of innerHTML = userInput — both place user-controlled HTML in the DOM.

Core Concept

Markdown is a lightweight markup language. Markdown-to-HTML converters transform user-written Markdown syntax into HTML. The security concern arises in two distinct scenarios: converters that pass through raw HTML (since Markdown is a superset of HTML in most implementations) and converters that produce HTML from benign-looking Markdown syntax in unexpected ways.

The violated invariant is that user-controlled input must never reach a browser DOM as unescaped HTML. Markdown processing introduces multiple injection vectors:

Link injection: Markdown links are written as [text](url). If the URL is not validated, [click me](javascript:alert(1)) renders as <a href="javascript:alert(1)">click me</a> — a classic javascript: URL XSS vector.

Image injection: Markdown images are written as ![alt](url). An attacker can inject ![x](x onerror=alert(1)) which, in a parser that faithfully converts to <img alt="x" src="x" onerror="alert(1)">, executes JavaScript via the onerror event handler.

Raw HTML passthrough: the CommonMark specification allows raw HTML blocks embedded in Markdown. Libraries such as marked.js (Node.js) enable this by default with mangle: false and headerIds: true — unless sanitize: true (deprecated) or an external sanitiser is applied. An attacker who knows raw HTML is passed through simply injects <script>alert(1)</script> directly.

Mutation XSS (mXSS): DOMPurify is the most widely deployed client-side HTML sanitiser. mXSS describes a class of bypass where DOMPurify correctly removes dangerous content from the HTML string, but the browser's HTML parser mutates the cleaned HTML when it is inserted into the DOM — re-introducing the dangerous content through parser quirks. mXSS bypasses are version-specific and require careful analysis of HTML parser edge cases.

CommonMark spec vs library implementation gaps: different Markdown libraries implement the CommonMark spec with varying fidelity. Edge cases around HTML entity encoding, nested inline code blocks, and mixed delimiter handling produce different HTML output across libraries — some of which allows injection that a tester targeting a different library would miss.

Technical Deep-Dive

<!-- Link injection — javascript: URI -->
[click me](javascript:alert(document.domain))

<!-- Image injection — onerror event handler -->
![x](x onerror=alert(document.domain))

<!-- Raw HTML passthrough (works when library allows HTML) -->
<script>alert(document.domain)</script>
<img src=x onerror=alert(document.domain)>
<svg onload=alert(document.domain)>

<!-- Nested code block edge case (library-dependent) -->
`[x](javascript:alert(1))`

// marked.js (Node.js) — vulnerable default configuration
const marked = require('marked');

// VULNERABLE: allows raw HTML, no sanitisation
const html = marked.parse(userMarkdown);
document.getElementById('output').innerHTML = html;

// SAFE: use a sanitiser after conversion
const DOMPurify = require('dompurify');
const html_raw = marked.parse(userMarkdown);
const html_clean = DOMPurify.sanitize(html_raw, {
  ALLOWED_TAGS: ['p', 'b', 'i', 'em', 'strong', 'a', 'ul', 'ol', 'li',
                 'h1', 'h2', 'h3', 'blockquote', 'code', 'pre'],
  ALLOWED_ATTR: ['href', 'title'],
  ALLOW_DATA_ATTR: false
});
document.getElementById('output').innerHTML = html_clean;

// Additional: validate href schemes to prevent javascript: URLs
DOMPurify.addHook('afterSanitizeAttributes', (node) => {
  if ('href' in node) {
    if (!/^(https?:|mailto:)/i.test(node.getAttribute('href'))) {
      node.removeAttribute('href');
    }
  }
});

# Server-side: Python markdown rendering with bleach sanitisation
import markdown
import bleach

ALLOWED_TAGS = bleach.sanitizer.ALLOWED_TAGS | {
    'p', 'pre', 'code', 'h1', 'h2', 'h3', 'h4', 'blockquote'
}
ALLOWED_ATTRS = {
    **bleach.sanitizer.ALLOWED_ATTRIBUTES,
    'a': ['href', 'title'],
}
ALLOWED_PROTOCOLS = ['http', 'https', 'mailto']

def render_markdown(user_input):
    # Convert Markdown to HTML
    raw_html = markdown.markdown(user_input, extensions=['fenced_code'])
    # Sanitise with bleach — strips disallowed tags and javascript: URLs
    safe_html = bleach.clean(
        raw_html,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRS,
        protocols=ALLOWED_PROTOCOLS,
        strip=True
    )
    return safe_html

# Testing mXSS — DOMPurify bypass payloads (version-specific examples)
# These demonstrate the concept; actual bypasses change with library versions

# Classic mXSS: HTML mutates after DOMPurify sanitises
# DOMPurify sees: safe string
# Browser parser reassembles to: <img src=x onerror=alert(1)>
# Example polyglot that triggers parser mutation (conceptual):
# <noscript><p title="</noscript><img src=x onerror=alert(1)>">

# Use mXSS payload research sources:
# - PortSwigger XSS Cheat Sheet (portswigger.net/web-security/cross-site-scripting/cheat-sheet)
# - Cure53 DOMPurify changelog and bypass disclosures
# - Browser-specific mutation testing with innerHTML assignment

Security Assessment Methodology

Identify Markdown rendering surfaces — Locate all features that accept and render Markdown: comment boxes, issue trackers, README editors, blog post editors, profile bios, and chat systems.
Test link injection — Submit [test](javascript:alert(document.domain)). Check whether the rendered anchor's href attribute contains the javascript: URI. Click the link in a test browser to confirm execution.
Test image injection — Submit ![x](x onerror=alert(1)). Check the rendered <img> tag for the onerror attribute. Load the page in a browser to trigger execution.
Test raw HTML passthrough — Submit <script>alert(1)</script> and <img src=x onerror=alert(1)> directly. Check the rendered HTML to determine whether raw HTML is passed through unmodified.
Identify the Markdown library — From JavaScript bundle analysis, package.json, or error messages, determine which Markdown library is in use (marked, commonmark, showdown, remark, mistune). Check the library version and review its changelog for known XSS issues.
Test mXSS payloads — For renders that use DOMPurify, test current mXSS bypass payloads from the PortSwigger XSS cheat sheet. mXSS bypasses are browser and DOMPurify version-specific; test in multiple browsers.
Test stored XSS persistence — Submit a Markdown payload into a stored field (profile bio, comment, issue description). Navigate to that content as a different user to confirm cross-user XSS execution.

Defensive Countermeasure — Apply a strict HTML sanitiser on the rendered HTML output — not on the Markdown input. On the server side, use bleach (Python) or OWASP Java HTML Sanitizer with an explicit allowlist of permitted HTML tags and attributes. On the client side, wrap the rendered HTML in DOMPurify with a narrow ALLOWED_TAGS set before insertion into the DOM. Explicitly block javascript: and data: URI schemes from all href and src attributes. Keep the Markdown library and sanitiser updated, as mXSS bypasses are patched in new releases. Disable raw HTML passthrough in the Markdown parser (sanitize: true in marked.js 4.x and below is deprecated — use a separate sanitiser instead).

Common Assessment Errors

Only testing script injection — <script> tags are blocked by most sanitisers. The real injection vectors are javascript: links, onerror in image tags, and raw HTML that reaches the DOM via passthrough.
Not distinguishing server-side from client-side rendering — If Markdown is rendered server-side and the HTML is sent to the browser, the fix requires server-side sanitisation. If rendered client-side with innerHTML, the fix requires DOMPurify in the browser. The location of rendering determines the correct remediation.
Skipping mXSS testing — mXSS bypasses against DOMPurify are published in the Cure53 changelog. An application using an outdated DOMPurify version may be bypassed even though a sanitiser is present.
Missing javascript: URI testing — Testers who focus on HTML tag injection miss that javascript: URIs in href attributes execute when the link is clicked. Always test link injection separately.
Assuming CommonMark compliance means safety — CommonMark compliance describes parsing fidelity, not security. A CommonMark-compliant parser may still allow raw HTML passthrough depending on configuration.
Not testing with stored payloads — Markdown fields are commonly stored in a database and rendered to all users who view the content. Confirm stored XSS by logging out and viewing the rendered content as a different user.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0009	Knowledge of application vulnerabilities	Develops understanding of Markdown injection vectors (link, image, raw HTML, mXSS) as an HTML injection class
K0070	Knowledge of system and application security threats and vulnerabilities	Covers Markdown library differences, DOMPurify mXSS, and allowlist-based sanitisation strategy
S0001	Skill in conducting vulnerability scans and recognizing vulnerabilities in security systems	Trains systematic testing of link injection, image injection, raw HTML passthrough, and mXSS payload application
S0044	Skill in mimicking threat behaviors	Builds adversarial skill in identifying Markdown library configuration weaknesses and crafting stored XSS payloads
T0028	Task: Identify systemic security issues based on vulnerability and configuration data	Covers Markdown rendering pipeline review, sanitiser configuration assessment, and mXSS library version auditing