Forging Session Tokens via Weak PRNG: Exploiting Insufficient Entropy in Identifiers

crypto_symmetric_kdf Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Predictable session token generation has been responsible for session hijacking attacks across a wide range of production applications. In 2009, researchers demonstrated that PHP's mt_rand() function, seeded with a value derived from time(), produced session tokens that could be predicted by an attacker who knew the approximate time of account creation — enabling password reset token prediction with fewer than 1 000 seed guesses. CVE-2019-11358 (jQuery) is a different class, but the most directly relevant real-world case is CVE-2020-7699 (node-uuid before 8.3.0): the library fell back to Math.random() on environments without a CSPRNG, producing predictable UUIDs. In Java, java.util.Random seeded with System.currentTimeMillis() produces predictable streams; the Debian OpenSSL PRNG bug of 2008 (CVE-2008-0166) seeded the entire SSL PRNG with only the process ID — approximately 32 768 possible seeds for all keys generated on Debian/Ubuntu systems between 2006 and 2008, resulting in a completely enumerable key space for millions of deployed RSA and DSA keys.

Core Concept

A pseudorandom number generator (PRNG) is an algorithm that produces a deterministic sequence of values from a seed. If the seed is recoverable or guessable, every subsequent output is predictable. The security of any token, key, or nonce derived from a PRNG therefore depends entirely on the entropy of the seed and the computational irreversibility of the PRNG's state transition function.

Non-cryptographic PRNGs in common use:

C rand() / random(): typically a linear congruential generator (LCG) with 32-bit state. Seeded with time(NULL) (1-second resolution), the entire seed space for a one-hour window is 3 600 values — trivially exhaustible.
Python random module (Mersenne Twister, MT19937): a 32-bit word PRNG with a 19 937-bit internal state. MT is designed for statistical quality, not cryptographic unpredictability. Given 624 consecutive 32-bit outputs, the full internal state can be recovered exactly using a technique called state untwisting (implemented in the untwister tool). Once the state is known, all past and future outputs are predictable. Python's random.random(), random.randint(), and random.getrandbits() all use MT19937.
Java java.util.Math.random() and java.util.Random: a linear congruential generator with 48-bit state. The next output is fully determined from the previous output: next = (seed * 0x5DEECE66DL + 0xBL) & ((1L << 48) - 1). Two consecutive nextLong() calls expose the internal 48-bit seed exactly.
JavaScript Math.random(): implementation-specific (V8 uses xorshift128+), but explicitly not cryptographically secure per the ECMAScript specification.

Entropy sources and their limitations:

time(NULL) — 1-second resolution. Any attacker who knows the approximate time of token generation can enumerate the seed space.
getpid() — typically 15–17 bits (Linux PID maximum 32 768 by default). Directly enumerable.
Startup timestamp — fixed for the lifetime of a process; all tokens from that process share the same PRNG seed.
Low-entropy VM first-boot — virtual machines and containers often have impoverished entropy pools during initial startup because hardware RNG sources are not available. The /dev/urandom entropy pool may be seeded with deterministic values on the first boot.

MT19937 state recovery (untwisting): the Mersenne Twister outputs each word by applying a temper transform to internal state words. The temper transform is invertible: given a 32-bit output, the pre-temper state word can be recovered exactly. After collecting 624 consecutive 32-bit outputs, the full 624-word internal state is known, and the random module's future outputs are fully predictable.

CSPRNGs derive output from hardware entropy via the operating system kernel. os.urandom() and secrets in Python, /dev/urandom on Linux (which is non-blocking since Linux 5.6 and seeded from the hardware RNG and multiple entropy sources), BCryptGenRandom on Windows, and NIST SP 800-90A DRBG constructions (CTR_DRBG, Hash_DRBG, HMAC_DRBG) all provide cryptographically secure randomness. These are computationally indistinguishable from true randomness under polynomial-time adversaries.

Technical Deep-Dive

# Weak PRNG analysis and MT19937 state recovery
# Prerequisites: pip install untwister (or use manual state recovery below)

import random, time, os, struct

# --- WEAK: Python random seeded with timestamp ---
def generate_weak_token_v1(timestamp: int) -> str:
    """Simulates a server that seeds random with a timestamp."""
    random.seed(timestamp)
    return hex(random.getrandbits(128))[2:]

# Attack: brute-force seed over plausible window (e.g., last 60 seconds)
def crack_seed_bruteforce(observed_token: str,
                           window_seconds: int = 60) -> int | None:
    now = int(time.time())
    for t in range(now - window_seconds, now + 1):
        random.seed(t)
        candidate = hex(random.getrandbits(128))[2:]
        if candidate == observed_token:
            return t
    return None

# --- WEAK: C-style LCG prediction (Java Random equivalent) ---
JAVA_MULT  = 0x5DEECE66D
JAVA_ADD   = 0xB
JAVA_MASK  = (1 << 48) - 1

def java_random_next(seed: int, bits: int) -> tuple:
    """Single Java Random step; returns (next_seed, output)."""
    next_seed = (seed * JAVA_MULT + JAVA_ADD) & JAVA_MASK
    return next_seed, next_seed >> (48 - bits)

def java_random_next_long(seed: int) -> tuple:
    """Java nextLong(): two 32-bit steps combined."""
    s1, hi = java_random_next(seed,  32)
    s2, lo = java_random_next(s1,    32)
    return s2, (hi << 32) + lo

def crack_java_random_from_two_longs(long1: int, long2: int) -> int | None:
    """
    Given two consecutive nextLong() outputs, recover the internal seed.
    Java nextLong() uses bits 47..16 from two consecutive 48-bit states.
    Brute-force the lower 16 bits of the first state (2^16 = 65536 guesses).
    """
    hi32 = long1 >> 32
    for lower16 in range(1 << 16):
        candidate_seed = ((hi32 << 16) | lower16) ^ JAVA_MULT   # undo XOR-based init
        # Forward one step and check hi32 of long2 matches
        s1, _ = java_random_next(candidate_seed, 32)
        s2, _ = java_random_next(s1, 32)
        recovered_long = ((candidate_seed >> (48 - 32)) << 32) | (s1 >> (48 - 32))
        if recovered_long == long1:
            # Verify long2
            _, next_long = java_random_next_long(s1)
            if next_long == long2:
                return s1
    return None

# --- MT19937 state untwisting (manual, no external tool required) ---
def untemper(y: int) -> int:
    """Invert the MT19937 temper transform to recover the pre-temper state word."""
    # Invert right shift XOR
    y ^= y >> 18
    y ^= (y << 15) & 0xEFC60000
    # Invert left shift XOR (iterative)
    tmp = y
    for _ in range(6):
        tmp = y ^ ((tmp << 7) & 0x9D2C5680)
    y = tmp
    # Invert right shift XOR
    tmp = y
    for _ in range(3):
        tmp = y ^ (tmp >> 11)
    return tmp & 0xFFFFFFFF

def recover_mt_state(outputs_32bit: list) -> list:
    """
    Recover the full 624-word MT19937 state from 624 consecutive 32-bit outputs.
    Call random.getrandbits(32) exactly 624 times to collect outputs.
    """
    assert len(outputs_32bit) == 624, "Need exactly 624 outputs"
    return [untemper(y) for y in outputs_32bit]

# --- CORRECT: CSPRNG usage ---
def generate_secure_token(length_bytes: int = 32) -> str:
    """Use os.urandom (CSPRNG) for all security-sensitive tokens."""
    return os.urandom(length_bytes).hex()

import secrets
def generate_secure_token_v2() -> str:
    """secrets module: explicitly CSPRNG, Python 3.6+."""
    return secrets.token_urlsafe(32)   # 32 bytes → 43 URL-safe base64 chars

# untwister: recover MT19937 seed or state from observed outputs
# https://github.com/bishopfox/untwister
docker run --rm -i bishopfox/untwister 
    -r python -o <observed_token_1> -o <observed_token_2> ... -d 2

# For seed brute-force over a time window:
# untwister -r php -s <start_timestamp> -e <end_timestamp> -o <token>

# hashcat can also brute-force LCG-based tokens:
# hashcat -a 3 -m 14900 <token_hash> ?d?d?d?d?d?d?d?d?d?d
# (replace mode 14900 with the appropriate algorithm for the token format)

Cryptanalysis Methodology

Identify the token generation mechanism — Read the challenge source code or reverse-engineer the token format. Look for calls to random.random(), random.randint(), rand(), Math.random(), or custom PRNG implementations. Check whether the token length suggests a specific output size (e.g., 15 decimal digits = Java nextLong() trimmed, 32 hex chars = 128-bit MT19937 output).
Determine the seed space — Identify all inputs to the PRNG seed: timestamp precision (milliseconds vs seconds), PID range, startup time, or a combination. Calculate the total enumerable seed space.
Brute-force the seed — For time(NULL)-seeded C/PHP: enumerate t - 3600 to t + 60 (one-hour window, sub-second generation assumed negligible). For PID-seeded PRNGs: enumerate 1 to 32 768. For each candidate seed, reproduce the token generation logic and compare against the observed token.
Collect MT19937 outputs for state recovery — If the server leaks 624 or more consecutive 32-bit values from the same MT19937 instance (e.g., random numbers in an API response), apply recover_mt_state() or the untwister tool to reconstruct the full state and predict all future outputs.
Recover Java Random from two consecutive longs — Call crack_java_random_from_two_longs(long1, long2) with two consecutive nextLong() outputs. The 65 536-candidate brute-force runs in milliseconds.
Predict the target token — Once the PRNG state or seed is known, reproduce the exact generation sequence to compute the target token (password reset code, CSRF token, session identifier). Verify against an observed token before submitting.

Secure Implementation Note — Use os.urandom() or the secrets module for all security-sensitive token generation in Python. secrets.token_urlsafe(32) generates 32 bytes of CSPRNG output encoded as URL-safe base64 — suitable for session tokens, API keys, and password reset links. For standards-compliant applications, use NIST SP 800-90A CTR_DRBG or HMAC_DRBG (available via cryptography.hazmat.primitives.ciphers.algorithms and related DRBG interfaces). On embedded and cloud-VM deployments, verify entropy pool readiness before generating keys at startup — use getrandom(GRND_RANDOM) on Linux 3.17+ which blocks until the kernel's CSPRNG is seeded from hardware entropy.

Common Cryptanalysis Errors

Confusing random with secrets in Python — random is not a CSPRNG; secrets is. Both are importable standard library modules, and the names are easily confused. Always verify which module is used in the target code.
Using floating-point output instead of integer output for MT19937 state recovery — random.random() returns a float derived from two consecutive 32-bit MT19937 words; random.getrandbits(32) returns a single 32-bit word. State recovery requires 624 consecutive 32-bit words; collecting 624 floats and treating them as single words produces an incorrect state.
Ignoring entropy exhaustion on first boot — A VM that generates tokens before the kernel entropy pool is seeded may produce predictable outputs even from os.urandom() on very old Linux kernels (pre-3.17). This is rare in modern environments but relevant for IoT and embedded challenges.
Assuming a long token is unguessable without checking its generator — A 128-bit token generated from a 32-bit seed is only as strong as the 32-bit seed. Token length does not determine security; the entropy of the generator's seed does.
Collecting non-consecutive MT19937 outputs — The state recovery algorithm requires 624 consecutive 32-bit outputs from the same MT19937 instance. Outputs that skip state steps (e.g., interleaved with other random calls) do not reconstruct the state correctly.
Not accounting for Python's MT19937 using getrandbits(k) vs random() — When a challenge uses random.randint(a, b), the number of MT19937 words consumed per call depends on the range size. Collecting outputs without knowing the exact consumption pattern leads to misaligned state recovery attempts.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0018	Knowledge of encryption algorithms and their weaknesses	Explains why non-cryptographic PRNGs (MT19937, LCG) are insecure as entropy sources for security tokens
K0019	Knowledge of cryptography and cryptographic key management	Covers CSPRNG selection, entropy pool requirements, and the NIST SP 800-90A DRBG standard
K0305	Knowledge of authentication, encryption, and digital signature schemes	Connects PRNG weakness to predictable session token and password reset token vulnerabilities
S0138	Skill in using public-key infrastructure tools	Trains MT19937 state recovery, LCG seed brute-force, and untwister tool usage for token prediction