Linear Congruential Generator Cryptanalysis: Parameter Recovery and State Prediction
Théorie
Why This Matters
In 2008 the Debian OpenSSL random-number-generator vulnerability (CVE-2008-0166) demonstrated that a predictable PRNG in a cryptographic context leads directly to full private key recovery — but that bug involved a broken seeding mechanism. A more structurally relevant incident occurred across multiple web frameworks and language runtimes throughout the 2010s: session tokens, CSRF tokens, password-reset links, and two-factor backup codes generated with rand() (glibc) or random.randint() (Python, backed by Mersenne Twister) were recoverable by an attacker who could observe a small number of outputs. The PHP mt_rand() state-recovery attack (fully public since 2011) allows an attacker with two observed outputs to predict all past and future values, enabling account takeover via predictable password-reset tokens. Understanding the LCG attack is the entry point to the broader class of PRNG-exploitation attacks.
Core Concept
A Linear Congruential Generator (LCG) produces a pseudorandom sequence by the recurrence: Xₙ₊₁ = (a · Xₙ + c) mod m, where m is the modulus (determines output range), a is the multiplier, c is the increment, and X₀ is the seed. The output sequence is entirely determined by the tuple (m, a, c, X₀). For the glibc rand() function, the standard parameters are m = 2³¹, a = 1103515245, c = 12345; the function returns bits 30–16 of Xₙ (upper 15 bits).
The violated invariant is unpredictability: an LCG with known (or discoverable) parameters is a deterministic linear map over ℤ/mℤ. Given two consecutive full outputs Xₙ and Xₙ₊₁, the attacker can solve for a and c simultaneously: Xₙ₊₁ − c ≡ a · Xₙ (mod m), yielding a ≡ (Xₙ₊₁ − c) · Xₙ⁻¹ (mod m) if c is known, or using the system of two equations from three consecutive outputs to recover both a and c via modular arithmetic.
If the parameters (m, a, c) are known (e.g., glibc, Java java.util.Random) but only partial outputs are observed (e.g., only the high bits of Xₙ are returned), the truncated LCG attack applies. This is harder and requires a lattice reduction approach (LLL algorithm) over the lattice defined by the linear constraints. Given k truncated outputs where each reveals the top b bits of Xₙ, the attack constructs a lattice whose shortest vector encodes the unknown low-order bits.
The cryptanalytic preconditions are: - Full-output LCG with unknown parameters: 3 consecutive outputs suffice to recover (a, c) via simultaneous modular linear equations. - Full-output LCG with known parameters: 1 output suffices to predict all subsequent values. - Truncated LCG with known parameters: approximately ceil(m / 2^b) consecutive outputs needed for LLL to succeed; in practice, 8–20 outputs for 32-bit moduli with 16-bit truncation.
Python's random module uses the Mersenne Twister (MT19937), not an LCG. MT has a 19937-bit state; given 624 consecutive 32-bit outputs from random.getrandbits(32), the full internal state can be reconstructed using the MT state-recovery attack (xgcd of the tempering transform). This is distinct from but analogous to LCG recovery.
Technical Deep-Dive
from math import gcd
def recover_lcg_params(outputs: list[int]) -> tuple[int, int, int]:
"""
Recover LCG parameters (m, a, c) from consecutive full outputs.
Requires at least 3 outputs. Uses the Knuth method:
diffs[i] = X[i+1] - X[i]
m = gcd of (diffs[2]*diffs[0] - diffs[1]^2) over multiple triples
"""
diffs = [outputs[i+1] - outputs[i] for i in range(len(outputs)-1)]
# m is a divisor of gcd of all (d[i+2]*d[i] - d[i+1]^2)
candidates = [abs(diffs[i+2]*diffs[i] - diffs[i+1]**2)
for i in range(len(diffs)-2)]
m = candidates[0]
for c in candidates[1:]:
m = gcd(m, c)
# Remove small factors from m to recover the true modulus
while m % 2 == 0 and m > 1:
if m // 2 in candidates or all(c % (m//2) == 0 for c in candidates):
m //= 2
else:
break
# Recover a: a ≡ diffs[1] * modinv(diffs[0], m) (mod m)
a = (diffs[1] * pow(diffs[0], -1, m)) % m
# Recover c: c ≡ outputs[1] - a*outputs[0] (mod m)
c = (outputs[1] - a * outputs[0]) % m
return m, a, c
def predict_next(x: int, a: int, c: int, m: int) -> int:
return (a * x + c) % m
# glibc rand() with known parameters — predict token from 2 observations
GLIBC_M = 2**31
GLIBC_A = 1103515245
GLIBC_C = 12345
def glibc_predict(observed_output: int) -> list[int]:
"""Given one glibc rand() output (top 15 bits of state), predict next 10."""
# observed = (state >> 16) & 0x7fff, so state high bits known
# Brute-force low 16 bits (65536 candidates)
results = []
for low in range(1 << 16):
state = (observed_output << 16) | low
next_state = (GLIBC_A * state + GLIBC_C) % GLIBC_M
next_output = (next_state >> 16) & 0x7fff
results.append(next_output)
# In practice, filter by observing a second output
return results
Cryptanalysis Methodology
- Identify the PRNG context — Determine the language and runtime. C
rand()→ likely glibc LCG with known parameters. Javajava.util.Random→ known LCG (m=2⁴⁸, a=0x5DEECE66D, c=11). Pythonrandom→ Mersenne Twister. PHPmt_rand()→ MT19937 variant. - Collect consecutive outputs — Gather observed values from the target: token values, sequence numbers, nonces, random IDs. Ensure they are consecutive (no dropped outputs between observations).
- Check for known-parameter LCG — Look up the runtime's PRNG parameters. If known, a single output plus knowledge of the output bit-width allows full state recovery by brute-forcing the masked bits.
- Recover unknown parameters from 3+ full outputs — Use the
recover_lcg_params()function above (Knuth method). Verify by predicting the 4th observed output and confirming it matches. - Predict future outputs — Iterate the recurrence forward to generate the sequence of future values. Convert to the target format (token, seed for a keyed operation, session ID) and test against the live application.
- For truncated LCG — Use SageMath's LLL implementation. Construct the lattice matrix encoding the LCG constraints and observed top-bit values. Run
L.LLL()and extract the shortest vector to recover full state values. - For Mersenne Twister (PHP/Python) — Collect 624 outputs of
getrandbits(32). Use the mt19937 state-recovery tool (e.g., randcrack Python library) to reconstruct the internal state and predict subsequent outputs.
Secure Implementation Note — Never use
rand(),random(),Math.random(), or any general-purpose PRNG for cryptographic purposes. Use a CSPRNG:os.urandom()in Python,crypto.randomBytes()in Node.js,/dev/urandomon Linux,CryptGenRandomon Windows, orsecrets.token_bytes()in Python 3. These are backed by the OS entropy pool and are computationally unpredictable.
Common Cryptanalysis Errors
- Assuming outputs are consecutive when they are not — If the application generates other random values between the observed outputs (e.g., for other users), the indices do not match, and direct parameter recovery fails. Confirm that all observed values come from the same uninterrupted sequence.
- Confusing LCG output with seed — Some implementations return the seed directly; others return a transformed subset of bits. Failing to account for the bit-masking (glibc returns bits 30:16) produces wrong state candidates.
- Applying LCG analysis to Mersenne Twister — MT is not an LCG and does not satisfy the linear recurrence over ℤ/mℤ. The Knuth parameter-recovery method will produce garbage on MT outputs. Identify the PRNG before choosing the attack.
- Not verifying the recovered modulus — The Knuth GCD method can return a multiple of the true modulus. Always verify the recovered (m, a, c) tuple by checking that predict_next(X[i], a, c, m) == X[i+1] for all observed consecutive pairs.
- Stopping after parameter recovery without exploiting the prediction — The CTF flag or token may be generated several steps ahead of the observed outputs. Iterate the recurrence forward enough steps to generate the target value.
- Forgetting signed-vs-unsigned output — Some LCG implementations return signed integers (range [−2³⁰, 2³⁰−1]) while others return unsigned. An off-by-one error in the sign bit shifts the recovered state and breaks all subsequent predictions.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0018 | Knowledge of encryption algorithms | Establishes the LCG recurrence and distinguishes CSPRNGs from general-purpose PRNGs in cryptographic contexts |
| K0019 | Knowledge of cryptography and cryptographic key management concepts | Demonstrates that predictable seeding or linear recurrences compromise all derived key material |
| K0305 | Knowledge of data concealment | Contextualises token and nonce generation as a security-critical PRNG application |
| S0138 | Skill in using public-key encryption and PKI | Builds mathematical reasoning about linear algebra over modular rings, prerequisite for lattice-based cryptanalysis |
Further Reading
- The Art of Computer Programming, Volume 2: Seminumerical Algorithms (Section 3.2) — Donald Knuth, Addison-Wesley, 3rd edition 1997
- Reconstructing Truncated Integer Variables Satisfying Linear Congruences — Alan Frieze et al., SIAM Journal on Computing, 1988
- How to Crack a Linear Congruential Generator — Peter Gutmann, Usenet sci.crypt, archived reference
Challenge Lab
Renforcez votre apprentissage avec un défi généré basé sur la compétence de cette carte.