Attacking Weak Key Derivation Functions: Dictionary Attacks on Under-Iterated Password Hashing

cloud_container_security Difficulty 1–5 30 min certifiable

Theory

Why This Matters

In 2012, the LinkedIn password database breach exposed approximately 6.5 million SHA-1-unsalted password hashes. Within 24 hours, more than 5 million had been cracked by the public. The root failure was using a raw cryptographic hash as a password storage mechanism — fast by design, which made it catastrophically weak under GPU-accelerated brute force. The same failure class appeared in the Adobe breach of 2013 (153 million accounts, SHA-1 with salt misuse) and the RockYou leak of 2009 (32 million plaintext passwords that seeded nearly every wordlist in use today). From a key derivation perspective, CVE-2023-36661 documented a production application that derived AES-256 keys using SHA-256(password) directly — the 256-bit key offered no security beyond the entropy of the password itself, typically fewer than 40 bits for human-chosen passwords. A billion SHA-256 operations per second on commodity GPU hardware means a 10-character lowercase password space is exhausted in under a minute.

Core Concept

A Key Derivation Function (KDF) is a cryptographic primitive that transforms a secret input (password, passphrase, or shared secret) into a cryptographic key of the required size. The security requirements for a KDF differ fundamentally depending on whether the input is a high-entropy secret (e.g., a Diffie-Hellman shared value) or a low-entropy password (human-chosen).

For high-entropy inputs, HKDF (HMAC-based Extract-and-Expand KDF, RFC 5869) is the correct primitive. HKDF operates in two phases: Extract computes a pseudorandom key from the input key material and an optional salt; Expand derives multiple output key material blocks of any desired length. HKDF is computationally cheap by design — the input already has high entropy, so speed is a feature, not a liability.

For low-entropy passwords, a memory-hard or time-hard KDF is mandatory. The threat model is an offline attacker who has obtained the KDF output (e.g., from a database breach) and attempts to invert it via brute force. Three KDFs address this threat:

bcrypt applies a modified Blowfish cipher in a cost-parameterised loop. The work factor (cost parameter) determines the number of iterations as 2^cost; each increment doubles the computation time. The recommended minimum work factor as of 2023 is 12 (approximately 250 ms on a modern CPU per hash). bcrypt truncates passwords at 72 bytes and uses a 16-byte salt generated per-password to defeat rainbow tables (precomputed hash-chain lookups). bcrypt's maximum throughput on GPU hardware is approximately 100 000 hashes per second at cost 12, compared to billions per second for raw SHA-256.

Argon2id won the Password Hashing Competition in 2015 and is recommended by NIST SP 800-63B (2017) and RFC 9106 (2021). It is memory-hard: deriving a single output requires allocating a configurable amount of memory (the memory cost parameter, in kibibytes), which forces the attacker to provision equivalent memory for each parallel crack attempt — a hardware cost that GPU farms cannot amortise as easily as pure computation. Argon2id (the recommended variant) combines the side-channel resistance of Argon2i with the GPU-resistance of Argon2d. Minimum parameters per RFC 9106: memory = 19 MiB, iterations = 2, parallelism = 1.

PBKDF2-HMAC-SHA256 (PKCS#5 v2.1, RFC 8018) applies an HMAC-SHA256 pseudorandom function iteratively, with the iteration count governing the time cost. NIST SP 800-132 (2023 revision) mandates a minimum of 600 000 iterations for SHA-256. PBKDF2 is not memory-hard and therefore more vulnerable to GPU acceleration than Argon2id or bcrypt, but it is FIPS 140-3 compliant and widely required in regulated environments.

The critical distinction: using SHA-256(password) as a KDF provides a 256-bit key, but only as many bits of security as the password entropy. A 10-character alphanumeric password has approximately 59 bits of entropy — and an attacker can search the entire space at SHA-256 throughput of ~10 billion attempts per second on a single GPU.

Technical Deep-Dive

# KDF comparison: weak vs correct implementations
# Prerequisites: pip install argon2-cffi bcrypt cryptography

import hashlib, hmac, os, time

password = b"correct-horse-battery"
salt     = os.urandom(16)   # must be unique per password, stored with output

# --- WEAK: raw SHA-256 as KDF (brute-forceable at ~10^10 H/s on GPU) ---
def weak_kdf_sha256(password: bytes, salt: bytes) -> bytes:
    return hashlib.sha256(salt + password).digest()   # AES-256 key: 32 bytes

# --- WEAK: SHA-256 with single iteration PBKDF2 ---
def weak_pbkdf2(password: bytes, salt: bytes) -> bytes:
    return hashlib.pbkdf2_hmac("sha256", password, salt, iterations=1)

# --- CORRECT: PBKDF2-HMAC-SHA256 at NIST SP 800-132 (2023) minimum ---
def strong_pbkdf2(password: bytes, salt: bytes) -> bytes:
    return hashlib.pbkdf2_hmac(
        "sha256", password, salt,
        iterations=600_000,   # NIST 2023 minimum
        dklen=32              # 256-bit AES key
    )

# --- CORRECT: bcrypt for password storage (not for deriving arbitrary-length keys) ---
import bcrypt
def bcrypt_hash(password: bytes) -> bytes:
    return bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))

def bcrypt_verify(password: bytes, stored_hash: bytes) -> bool:
    return bcrypt.checkpw(password, stored_hash)

# --- CORRECT: Argon2id (recommended for new systems) ---
from argon2 import PasswordHasher
ph = PasswordHasher(
    time_cost=2,        # RFC 9106 minimum
    memory_cost=19456,  # 19 MiB
    parallelism=1,
    hash_len=32,
    salt_len=16
)
argon2_hash  = ph.hash(password.decode())
is_valid     = ph.verify(argon2_hash, password.decode())

# --- CORRECT: HKDF for high-entropy input (e.g., ECDH shared secret) ---
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.hazmat.primitives import hashes

def hkdf_derive(ikm: bytes, salt: bytes, info: bytes, length: int) -> bytes:
    """HKDF-SHA256: suitable for high-entropy IKM only (not passwords)."""
    return HKDF(
        algorithm=hashes.SHA256(), length=length,
        salt=salt, info=info
    ).derive(ikm)

# --- ATTACK: hashcat modes ---
# hashcat -a 0 -m 10900 <pbkdf2_hash_file> rockyou.txt   # PBKDF2-SHA256
# hashcat -a 0 -m  3200 <bcrypt_hash_file> rockyou.txt   # bcrypt
# hashcat -a 0 -m 13723 <argon2id_file>    rockyou.txt   # Argon2id

Cryptanalysis Methodology

Identify the KDF — Extract the hash format from the challenge. bcrypt hashes begin with $2b$ or $2a$ . Argon2id hashes begin with $argon2id$ . PBKDF2 outputs are typically base64-encoded with an iteration count and salt in a structured format. Raw SHA-256 or SHA-1 outputs are plain hex strings with no iteration data.
Test for rainbow table vulnerability — If no salt is present (same plaintext always produces the same hash), query online databases (CrackStation, hashes.com) before attempting local cracking. Many CTF challenge hashes are pre-indexed.
Determine hashcat mode — Mode 0 for raw MD5, mode 100 for raw SHA-1, mode 1400 for raw SHA-256, mode 10900 for PBKDF2-HMAC-SHA256, mode 3200 for bcrypt, mode 13723 for Argon2id. Run hashcat --example-hashes to confirm format expectations.
Run dictionary + rule attack — For weak KDFs: hashcat -a 0 -m <mode> hash.txt rockyou.txt -r best64.rule. For very weak or unsalted hashes, exhaustion is often sub-second.
Estimate feasibility for strong KDFs — bcrypt cost 12 limits GPU throughput to approximately 100 000 H/s; Argon2id with 19 MiB memory limits GPU throughput to under 1 000 H/s. If the challenge uses these, the intended path is almost certainly not brute-force — look for the key material or a logic flaw instead.
Derive the key and decrypt — Once the password is recovered, re-run the KDF with the recovered password and stored salt to regenerate the exact key, then decrypt the challenge ciphertext.

Secure Implementation Note — Use Argon2id (RFC 9106) for all new password hashing and password-based key derivation. Configure memory ≥ 19 MiB, iterations ≥ 2, parallelism = 1, and store the full encoded output including salt and parameters. For FIPS-compliant environments use PBKDF2-HMAC-SHA256 with ≥ 600 000 iterations (NIST SP 800-132, 2023). For deriving keys from high-entropy shared secrets (ECDH, DH), use HKDF-SHA256 (RFC 5869). Never use raw SHA-256, SHA-1, or MD5 as a KDF for password-derived keys.

Common Cryptanalysis Errors

Confusing key length with key security — A SHA-256(password) output is 256 bits wide, but its security is bounded by the entropy of the password, not the output width. Reporting "256-bit key" without noting the KDF weakness overstates the security.
Forgetting to include the salt in the KDF call — Regenerating the key after cracking the password requires reproducing the exact KDF invocation, including the stored salt. A missing or wrong salt produces a completely different key.
Using wrong hashcat mode — PBKDF2-HMAC-SHA256 (mode 10900) and raw SHA-256 (mode 1400) produce different outputs from the same password. Misidentifying the KDF type leads to no cracks even against the correct wordlist.
Attempting brute force against bcrypt cost 14+ — Beyond cost 14, GPU throughput drops below approximately 10 000 H/s, making dictionary attacks against long or complex passwords infeasible in a competition timeframe. Pivot to finding the password or key through other means.
Ignoring iteration count in PBKDF2 challenges — The iteration count is required to verify or reproduce the output. It is typically stored alongside the salt and hash; read the encoding carefully.
Overlooking HKDF in DH-based challenges — When the challenge combines a Diffie-Hellman exchange with symmetric encryption, the shared secret is usually passed through HKDF before use as an AES key. Extracting the DH shared secret is only half the work; the KDF expansion step must also be reproduced.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0018	Knowledge of encryption algorithms and their weaknesses	Explains why fast hash functions fail as KDFs and how memory-hard KDFs raise the attacker cost
K0019	Knowledge of cryptography and cryptographic key management	Covers parameter selection (work factor, memory, iterations) and the role of salts in KDF security
K0305	Knowledge of authentication, encryption, and digital signature schemes	Connects KDF weakness to practical password storage failures and authentication bypass
S0138	Skill in using public-key infrastructure tools	Trains hashcat mode selection, KDF output format identification, and key regeneration from recovered passwords
T0259	Use cryptanalysis tools to recover plaintext from ciphertext	Provides end-to-end methodology from KDF identification through password cracking and key reconstruction