Linear Congruential Generator Cryptanalysis: Parameter Recovery and State Prediction

encoding_crypto_classical Difficulté 1–5 30 min certifiable

Théorie

Why This Matters

In 2008 the Debian OpenSSL random-number-generator vulnerability (CVE-2008-0166) demonstrated that a predictable PRNG in a cryptographic context leads directly to full private key recovery — but that bug involved a broken seeding mechanism. A more structurally relevant incident occurred across multiple web frameworks and language runtimes throughout the 2010s: session tokens, CSRF tokens, password-reset links, and two-factor backup codes generated with rand() (glibc) or random.randint() (Python, backed by Mersenne Twister) were recoverable by an attacker who could observe a small number of outputs. The PHP mt_rand() state-recovery attack (fully public since 2011) allows an attacker with two observed outputs to predict all past and future values, enabling account takeover via predictable password-reset tokens. Understanding the LCG attack is the entry point to the broader class of PRNG-exploitation attacks.

Core Concept

A Linear Congruential Generator (LCG) produces a pseudorandom sequence by the recurrence: Xₙ₊₁ = (a · Xₙ + c) mod m, where m is the modulus (determines output range), a is the multiplier, c is the increment, and X₀ is the seed. The output sequence is entirely determined by the tuple (m, a, c, X₀). For the glibc rand() function, the standard parameters are m = 2³¹, a = 1103515245, c = 12345; the function returns bits 30–16 of Xₙ (upper 15 bits).

The violated invariant is unpredictability: an LCG with known (or discoverable) parameters is a deterministic linear map over ℤ/mℤ. Given two consecutive full outputs Xₙ and Xₙ₊₁, the attacker can solve for a and c simultaneously: Xₙ₊₁ − c ≡ a · Xₙ (mod m), yielding a ≡ (Xₙ₊₁ − c) · Xₙ⁻¹ (mod m) if c is known, or using the system of two equations from three consecutive outputs to recover both a and c via modular arithmetic.

If the parameters (m, a, c) are known (e.g., glibc, Java java.util.Random) but only partial outputs are observed (e.g., only the high bits of Xₙ are returned), the truncated LCG attack applies. This is harder and requires a lattice reduction approach (LLL algorithm) over the lattice defined by the linear constraints. Given k truncated outputs where each reveals the top b bits of Xₙ, the attack constructs a lattice whose shortest vector encodes the unknown low-order bits.

The cryptanalytic preconditions are: - Full-output LCG with unknown parameters: 3 consecutive outputs suffice to recover (a, c) via simultaneous modular linear equations. - Full-output LCG with known parameters: 1 output suffices to predict all subsequent values. - Truncated LCG with known parameters: approximately ceil(m / 2^b) consecutive outputs needed for LLL to succeed; in practice, 8–20 outputs for 32-bit moduli with 16-bit truncation.

Python's random module uses the Mersenne Twister (MT19937), not an LCG. MT has a 19937-bit state; given 624 consecutive 32-bit outputs from random.getrandbits(32), the full internal state can be reconstructed using the MT state-recovery attack (xgcd of the tempering transform). This is distinct from but analogous to LCG recovery.

Technical Deep-Dive

from math import gcd

def recover_lcg_params(outputs: list[int]) -> tuple[int, int, int]:
    """
    Recover LCG parameters (m, a, c) from consecutive full outputs.
    Requires at least 3 outputs. Uses the Knuth method:
      diffs[i] = X[i+1] - X[i]
      m = gcd of (diffs[2]*diffs[0] - diffs[1]^2) over multiple triples
    """
    diffs = [outputs[i+1] - outputs[i] for i in range(len(outputs)-1)]
    # m is a divisor of gcd of all (d[i+2]*d[i] - d[i+1]^2)
    candidates = [abs(diffs[i+2]*diffs[i] - diffs[i+1]**2)
                  for i in range(len(diffs)-2)]
    m = candidates[0]
    for c in candidates[1:]:
        m = gcd(m, c)
    # Remove small factors from m to recover the true modulus
    while m % 2 == 0 and m > 1:
        if m // 2 in candidates or all(c % (m//2) == 0 for c in candidates):
            m //= 2
        else:
            break
    # Recover a: a ≡ diffs[1] * modinv(diffs[0], m) (mod m)
    a = (diffs[1] * pow(diffs[0], -1, m)) % m
    # Recover c: c ≡ outputs[1] - a*outputs[0] (mod m)
    c = (outputs[1] - a * outputs[0]) % m
    return m, a, c

def predict_next(x: int, a: int, c: int, m: int) -> int:
    return (a * x + c) % m

# glibc rand() with known parameters — predict token from 2 observations
GLIBC_M = 2**31
GLIBC_A = 1103515245
GLIBC_C = 12345

def glibc_predict(observed_output: int) -> list[int]:
    """Given one glibc rand() output (top 15 bits of state), predict next 10."""
    # observed = (state >> 16) & 0x7fff, so state high bits known
    # Brute-force low 16 bits (65536 candidates)
    results = []
    for low in range(1 << 16):
        state = (observed_output << 16) | low
        next_state = (GLIBC_A * state + GLIBC_C) % GLIBC_M
        next_output = (next_state >> 16) & 0x7fff
        results.append(next_output)
    # In practice, filter by observing a second output
    return results

Cryptanalysis Methodology

Identify the PRNG context — Determine the language and runtime. C rand() → likely glibc LCG with known parameters. Java java.util.Random → known LCG (m=2⁴⁸, a=0x5DEECE66D, c=11). Python random → Mersenne Twister. PHP mt_rand() → MT19937 variant.
Collect consecutive outputs — Gather observed values from the target: token values, sequence numbers, nonces, random IDs. Ensure they are consecutive (no dropped outputs between observations).
Check for known-parameter LCG — Look up the runtime's PRNG parameters. If known, a single output plus knowledge of the output bit-width allows full state recovery by brute-forcing the masked bits.
Recover unknown parameters from 3+ full outputs — Use the recover_lcg_params() function above (Knuth method). Verify by predicting the 4th observed output and confirming it matches.
Predict future outputs — Iterate the recurrence forward to generate the sequence of future values. Convert to the target format (token, seed for a keyed operation, session ID) and test against the live application.
For truncated LCG — Use SageMath's LLL implementation. Construct the lattice matrix encoding the LCG constraints and observed top-bit values. Run L.LLL() and extract the shortest vector to recover full state values.
For Mersenne Twister (PHP/Python) — Collect 624 outputs of getrandbits(32). Use the mt19937 state-recovery tool (e.g., randcrack Python library) to reconstruct the internal state and predict subsequent outputs.

Secure Implementation Note — Never use rand(), random(), Math.random(), or any general-purpose PRNG for cryptographic purposes. Use a CSPRNG: os.urandom() in Python, crypto.randomBytes() in Node.js, /dev/urandom on Linux, CryptGenRandom on Windows, or secrets.token_bytes() in Python 3. These are backed by the OS entropy pool and are computationally unpredictable.

Common Cryptanalysis Errors

Assuming outputs are consecutive when they are not — If the application generates other random values between the observed outputs (e.g., for other users), the indices do not match, and direct parameter recovery fails. Confirm that all observed values come from the same uninterrupted sequence.
Confusing LCG output with seed — Some implementations return the seed directly; others return a transformed subset of bits. Failing to account for the bit-masking (glibc returns bits 30:16) produces wrong state candidates.
Applying LCG analysis to Mersenne Twister — MT is not an LCG and does not satisfy the linear recurrence over ℤ/mℤ. The Knuth parameter-recovery method will produce garbage on MT outputs. Identify the PRNG before choosing the attack.
Not verifying the recovered modulus — The Knuth GCD method can return a multiple of the true modulus. Always verify the recovered (m, a, c) tuple by checking that predict_next(X[i], a, c, m) == X[i+1] for all observed consecutive pairs.
Stopping after parameter recovery without exploiting the prediction — The CTF flag or token may be generated several steps ahead of the observed outputs. Iterate the recurrence forward enough steps to generate the target value.
Forgetting signed-vs-unsigned output — Some LCG implementations return signed integers (range [−2³⁰, 2³⁰−1]) while others return unsigned. An off-by-one error in the sign bit shifts the recovered state and breaks all subsequent predictions.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0018	Knowledge of encryption algorithms	Establishes the LCG recurrence and distinguishes CSPRNGs from general-purpose PRNGs in cryptographic contexts
K0019	Knowledge of cryptography and cryptographic key management concepts	Demonstrates that predictable seeding or linear recurrences compromise all derived key material
K0305	Knowledge of data concealment	Contextualises token and nonce generation as a security-critical PRNG application
S0138	Skill in using public-key encryption and PKI	Builds mathematical reasoning about linear algebra over modular rings, prerequisite for lattice-based cryptanalysis