Factoring Weak RSA Primes via Fermat Factorisation and Pollard p-1 Method

cloud_container_security Difficulty 1–5 30 min certifiable

Theory

Why This Matters

In 2012, Nadia Heninger, Zakir Durumeric, and colleagues scanned the entire IPv4 address space and collected RSA public keys from TLS and SSH servers. Among 7.7 million distinct 1024-bit RSA moduli, they found that 0.2% shared a prime factor with at least one other modulus — a catastrophic failure. The root cause was entropy starvation at boot time: embedded devices (routers, firewalls, VPN appliances) generated RSA keys during initial boot when the operating system's entropy pool was nearly empty, causing multiple devices to derive the same prime p despite using different moduli n = p·q. A single GCD computation across any two affected moduli immediately factored both keys. This event, together with contemporaneous work by Lenstra et al. ("Ron was wrong, Whit is right"), demonstrated that low-entropy prime generation was not a theoretical concern but a widespread production failure affecting hundreds of thousands of deployed devices.

Core Concept

RSA security rests on the integer factorisation problem: given n = p·q, recovering p and q is computationally infeasible when both primes are large, randomly chosen, and independent. This assumption fails in several distinct ways when prime generation uses a weak or repeating pseudorandom number generator (PRNG).

Shared-prime GCD attack (Heninger et al.): if two moduli n_1 = p·q_1 and n_2 = p·q_2 share the prime p (because both devices derived p from the same PRNG state), then gcd(n_1, n_2) = p directly. Given p, compute q_1 = n_1 / p and q_2 = n_2 / p, then derive both private keys. This attack is O(k^2) naively but O(k log k) with a product-tree batch GCD (Bernstein 2004), making it feasible against millions of keys in minutes.

Fermat factorisation exploits the case p ≈ q (primes close together). Write n = p·q = ((p+q)/2)^2 - ((p-q)/2)^2. If p and q are close, a = ceil(sqrt(n)) is slightly larger than sqrt(n), and a^2 - n = b^2 for a small b. Iterating a = a+1, checking whether a^2 - n is a perfect square, converges rapidly when |p - q| is small. Deliberately choosing close primes was a documented flaw in early smart-card implementations.

Pollard's p-1 algorithm factors n quickly when p-1 is B-smooth (all prime factors of p-1 are ≤ B). If a key generation implementation drew primes from a restricted set or used a deterministic sieve with low smoothness bound, Pollard p-1 with a modest B recovers p in minutes. Tools: msieve, yafu, PARI/GP.

FactorDB lookup: many CTF challenge moduli are pre-factored in the FactorDB public database. This should always be the first check.

The attack precondition for all variants is some form of reduced randomness during prime generation: entropy starvation, PRNG reuse, overly narrow prime search space, or deliberately weak parameters chosen for the challenge.

Technical Deep-Dive

# Prerequisites: pip install gmpy2 pycryptodome requests
import gmpy2, math

# --- Attack 1: Shared-prime GCD across a list of moduli ---
def batch_gcd_attack(moduli):
    """
    Naive O(k^2) batch GCD. For production scale use Bernstein product-tree.
    Returns list of (n_i, factor) for any factored modulus.
    """
    results = []
    for i, n_i in enumerate(moduli):
        for j, n_j in enumerate(moduli):
            if i >= j:
                continue
            g = int(gmpy2.gcd(n_i, n_j))
            if g > 1 and g != n_i:
                results.append((n_i, g, n_i // g))
                results.append((n_j, g, n_j // g))
    return results

# Example: load moduli from challenge files
# moduli = [int(line.strip(), 16) for line in open("moduli.txt")]
# hits = batch_gcd_attack(moduli)
# for n, p, q in hits:
#     print(f"Factored: {n}
  p={p}
  q={q}")

# --- Attack 2: Fermat factorisation (p ≈ q) ---
def fermat_factor(n, max_iter=1_000_000):
    a = int(gmpy2.isqrt(n)) + 1
    for _ in range(max_iter):
        b2 = a * a - n
        b, exact = gmpy2.iroot(b2, 2)
        if exact:
            p, q = int(a - b), int(a + b)
            assert p * q == n
            return p, q
        a += 1
    return None

# --- Attack 3: Recover RSA private key from (p, q, e) ---
def recover_private_key(p, q, e):
    phi = (p - 1) * (q - 1)
    d = int(gmpy2.invert(e, phi))
    return d

# --- FactorDB quick lookup (requires internet access during challenge) ---
def factordb_query(n):
    import urllib.request, json
    url = f"http://factordb.com/api?query={n}"
    with urllib.request.urlopen(url) as r:
        data = json.loads(r.read())
    # data['factors'] is list of [factor_string, exponent] pairs
    return data

# yafu command-line factorisation
yafu "factor(<n>)"

# msieve (for larger numbers with multiple algorithms)
msieve -v -nf msieve.dat <n>

# RsaCtfTool combining multiple attacks
python3 RsaCtfTool.py -n <N> -e <E> --uncipher <C> 
    --attack fermat --attack factordb --attack smallfactor

Cryptanalysis Methodology

FactorDB lookup — Submit n to factordb.com or query the API programmatically. Many CTF moduli are pre-computed and factored in seconds.
Batch GCD against all provided moduli — If the challenge provides multiple keys (PEM files, JSON key arrays), extract all moduli and run pairwise GCD. A single shared prime factors multiple keys simultaneously.
Fermat factorisation — Try fermat_factor(n) with at least 10^6 iterations. Converges immediately when |p - q| < n^(1/4) approximately.
Pollard p-1 — Run yafu factor(n) or msieve. These tools automatically apply Pollard p-1, Pollard rho, elliptic-curve method (ECM), and quadratic sieve in sequence.
RsaCtfTool sweep — Run python3 RsaCtfTool.py -n N -e E --uncipher C --attack all for an automated multi-algorithm sweep including wiener, fermat, novelty primes, and more.
Recover private key and decrypt — Once p and q are known, compute phi = (p-1)(q-1), d = gmpy2.invert(e, phi), then m = pow(c, d, n).

Secure Implementation Note — Always generate RSA primes using a cryptographically secure random number generator seeded from a high-entropy source (os.urandom in Python, /dev/urandom on Linux, BCryptGenRandom on Windows). Ensure p and q differ by at least n^(1/4) (guaranteed by any standards-compliant prime generation routine). On embedded systems, delay key generation until adequate entropy is available and use a hardware RNG if possible. Use library-provided key generation (e.g., Crypto.PublicKey.RSA.generate(2048)) rather than implementing prime generation manually.

Common Cryptanalysis Errors

Skipping FactorDB — Many CTF moduli are already factored. Skipping this lookup wastes time on more complex attacks unnecessarily.
Not collecting all moduli before running GCD — The shared-prime attack requires comparing moduli against each other. Testing a single modulus in isolation never reveals a shared factor.
Stopping Fermat at too few iterations — Setting max_iter too low misses weakly close primes. Run at least 10^6 iterations; for CTF challenges the primes are often deliberately close.
Confusing phi(n) with lambda(n) — For standard RSA decryption both work, but some implementations use Carmichael's totient lambda(n) = lcm(p-1, q-1). If d computed from phi fails, try lambda.
Ignoring multi-prime RSA — Some challenges use n = p·q·r (three primes). The private key derivation changes: phi = (p-1)(q-1)(r-1). Always check bit-length vs prime count assumptions.
Assuming factorisation means the challenge is done — After factoring, you still must correctly reconstruct the private key and decrypt/sign. Padding stripping and encoding conversions are common stumbling blocks.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0018	Knowledge of encryption algorithms and their weaknesses	Explains how entropy failures in prime generation undermine RSA's factoring hardness assumption
K0019	Knowledge of cryptography and cryptographic key management	Demonstrates the entropy requirements for secure RSA key generation across diverse deployment environments
K0305	Knowledge of authentication, encryption, and digital signature schemes	Connects weak key generation to practical compromise of authentication systems
S0138	Skill in using public-key infrastructure tools	Trains use of gmpy2, yafu, msieve, RsaCtfTool, and FactorDB for real-world factorisation attacks
T0259	Use cryptanalysis tools to recover plaintext from ciphertext	Provides complete methodology from modulus collection through private-key recovery and decryption