Attacking Weak Key Derivation Functions: Dictionary Attacks on Under-Iterated Password Hashing
Theory
Why This Matters
In 2012, the LinkedIn password database breach exposed approximately 6.5 million SHA-1-unsalted password hashes. Within 24 hours, more than 5 million had been cracked by the public. The root failure was using a raw cryptographic hash as a password storage mechanism — fast by design, which made it catastrophically weak under GPU-accelerated brute force. The same failure class appeared in the Adobe breach of 2013 (153 million accounts, SHA-1 with salt misuse) and the RockYou leak of 2009 (32 million plaintext passwords that seeded nearly every wordlist in use today). From a key derivation perspective, CVE-2023-36661 documented a production application that derived AES-256 keys using SHA-256(password) directly — the 256-bit key offered no security beyond the entropy of the password itself, typically fewer than 40 bits for human-chosen passwords. A billion SHA-256 operations per second on commodity GPU hardware means a 10-character lowercase password space is exhausted in under a minute.
Core Concept
A Key Derivation Function (KDF) is a cryptographic primitive that transforms a secret input (password, passphrase, or shared secret) into a cryptographic key of the required size. The security requirements for a KDF differ fundamentally depending on whether the input is a high-entropy secret (e.g., a Diffie-Hellman shared value) or a low-entropy password (human-chosen).
For high-entropy inputs, HKDF (HMAC-based Extract-and-Expand KDF, RFC 5869) is the correct primitive. HKDF operates in two phases: Extract computes a pseudorandom key from the input key material and an optional salt; Expand derives multiple output key material blocks of any desired length. HKDF is computationally cheap by design — the input already has high entropy, so speed is a feature, not a liability.
For low-entropy passwords, a memory-hard or time-hard KDF is mandatory. The threat model is an offline attacker who has obtained the KDF output (e.g., from a database breach) and attempts to invert it via brute force. Three KDFs address this threat:
bcrypt applies a modified Blowfish cipher in a cost-parameterised loop. The work factor (cost parameter) determines the number of iterations as 2^cost; each increment doubles the computation time. The recommended minimum work factor as of 2023 is 12 (approximately 250 ms on a modern CPU per hash). bcrypt truncates passwords at 72 bytes and uses a 16-byte salt generated per-password to defeat rainbow tables (precomputed hash-chain lookups). bcrypt's maximum throughput on GPU hardware is approximately 100 000 hashes per second at cost 12, compared to billions per second for raw SHA-256.
Argon2id won the Password Hashing Competition in 2015 and is recommended by NIST SP 800-63B (2017) and RFC 9106 (2021). It is memory-hard: deriving a single output requires allocating a configurable amount of memory (the memory cost parameter, in kibibytes), which forces the attacker to provision equivalent memory for each parallel crack attempt — a hardware cost that GPU farms cannot amortise as easily as pure computation. Argon2id (the recommended variant) combines the side-channel resistance of Argon2i with the GPU-resistance of Argon2d. Minimum parameters per RFC 9106: memory = 19 MiB, iterations = 2, parallelism = 1.
PBKDF2-HMAC-SHA256 (PKCS#5 v2.1, RFC 8018) applies an HMAC-SHA256 pseudorandom function iteratively, with the iteration count governing the time cost. NIST SP 800-132 (2023 revision) mandates a minimum of 600 000 iterations for SHA-256. PBKDF2 is not memory-hard and therefore more vulnerable to GPU acceleration than Argon2id or bcrypt, but it is FIPS 140-3 compliant and widely required in regulated environments.
The critical distinction: using SHA-256(password) as a KDF provides a 256-bit key, but only as many bits of security as the password entropy. A 10-character alphanumeric password has approximately 59 bits of entropy — and an attacker can search the entire space at SHA-256 throughput of ~10 billion attempts per second on a single GPU.
Technical Deep-Dive
# KDF comparison: weak vs correct implementations
# Prerequisites: pip install argon2-cffi bcrypt cryptography
import hashlib, hmac, os, time
password = b"correct-horse-battery"
salt = os.urandom(16) # must be unique per password, stored with output
# --- WEAK: raw SHA-256 as KDF (brute-forceable at ~10^10 H/s on GPU) ---
def weak_kdf_sha256(password: bytes, salt: bytes) -> bytes:
return hashlib.sha256(salt + password).digest() # AES-256 key: 32 bytes
# --- WEAK: SHA-256 with single iteration PBKDF2 ---
def weak_pbkdf2(password: bytes, salt: bytes) -> bytes:
return hashlib.pbkdf2_hmac("sha256", password, salt, iterations=1)
# --- CORRECT: PBKDF2-HMAC-SHA256 at NIST SP 800-132 (2023) minimum ---
def strong_pbkdf2(password: bytes, salt: bytes) -> bytes:
return hashlib.pbkdf2_hmac(
"sha256", password, salt,
iterations=600_000, # NIST 2023 minimum
dklen=32 # 256-bit AES key
)
# --- CORRECT: bcrypt for password storage (not for deriving arbitrary-length keys) ---
import bcrypt
def bcrypt_hash(password: bytes) -> bytes:
return bcrypt.hashpw(password, bcrypt.gensalt(rounds=12))
def bcrypt_verify(password: bytes, stored_hash: bytes) -> bool:
return bcrypt.checkpw(password, stored_hash)
# --- CORRECT: Argon2id (recommended for new systems) ---
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=2, # RFC 9106 minimum
memory_cost=19456, # 19 MiB
parallelism=1,
hash_len=32,
salt_len=16
)
argon2_hash = ph.hash(password.decode())
is_valid = ph.verify(argon2_hash, password.decode())
# --- CORRECT: HKDF for high-entropy input (e.g., ECDH shared secret) ---
from cryptography.hazmat.primitives.kdf.hkdf import HKDF
from cryptography.hazmat.primitives import hashes
def hkdf_derive(ikm: bytes, salt: bytes, info: bytes, length: int) -> bytes:
"""HKDF-SHA256: suitable for high-entropy IKM only (not passwords)."""
return HKDF(
algorithm=hashes.SHA256(), length=length,
salt=salt, info=info
).derive(ikm)
# --- ATTACK: hashcat modes ---
# hashcat -a 0 -m 10900 <pbkdf2_hash_file> rockyou.txt # PBKDF2-SHA256
# hashcat -a 0 -m 3200 <bcrypt_hash_file> rockyou.txt # bcrypt
# hashcat -a 0 -m 13723 <argon2id_file> rockyou.txt # Argon2id
Cryptanalysis Methodology
- Identify the KDF — Extract the hash format from the challenge. bcrypt hashes begin with
$2b$or$2a$. Argon2id hashes begin with$argon2id$. PBKDF2 outputs are typically base64-encoded with an iteration count and salt in a structured format. Raw SHA-256 or SHA-1 outputs are plain hex strings with no iteration data. - Test for rainbow table vulnerability — If no salt is present (same plaintext always produces the same hash), query online databases (CrackStation, hashes.com) before attempting local cracking. Many CTF challenge hashes are pre-indexed.
- Determine hashcat mode — Mode 0 for raw MD5, mode 100 for raw SHA-1, mode 1400 for raw SHA-256, mode 10900 for PBKDF2-HMAC-SHA256, mode 3200 for bcrypt, mode 13723 for Argon2id. Run
hashcat --example-hashesto confirm format expectations. - Run dictionary + rule attack — For weak KDFs:
hashcat -a 0 -m <mode> hash.txt rockyou.txt -r best64.rule. For very weak or unsalted hashes, exhaustion is often sub-second. - Estimate feasibility for strong KDFs — bcrypt cost 12 limits GPU throughput to approximately 100 000 H/s; Argon2id with 19 MiB memory limits GPU throughput to under 1 000 H/s. If the challenge uses these, the intended path is almost certainly not brute-force — look for the key material or a logic flaw instead.
- Derive the key and decrypt — Once the password is recovered, re-run the KDF with the recovered password and stored salt to regenerate the exact key, then decrypt the challenge ciphertext.
Secure Implementation Note — Use Argon2id (RFC 9106) for all new password hashing and password-based key derivation. Configure memory ≥ 19 MiB, iterations ≥ 2, parallelism = 1, and store the full encoded output including salt and parameters. For FIPS-compliant environments use PBKDF2-HMAC-SHA256 with ≥ 600 000 iterations (NIST SP 800-132, 2023). For deriving keys from high-entropy shared secrets (ECDH, DH), use HKDF-SHA256 (RFC 5869). Never use raw SHA-256, SHA-1, or MD5 as a KDF for password-derived keys.
Common Cryptanalysis Errors
- Confusing key length with key security — A SHA-256(password) output is 256 bits wide, but its security is bounded by the entropy of the password, not the output width. Reporting "256-bit key" without noting the KDF weakness overstates the security.
- Forgetting to include the salt in the KDF call — Regenerating the key after cracking the password requires reproducing the exact KDF invocation, including the stored salt. A missing or wrong salt produces a completely different key.
- Using wrong hashcat mode — PBKDF2-HMAC-SHA256 (mode 10900) and raw SHA-256 (mode 1400) produce different outputs from the same password. Misidentifying the KDF type leads to no cracks even against the correct wordlist.
- Attempting brute force against bcrypt cost 14+ — Beyond cost 14, GPU throughput drops below approximately 10 000 H/s, making dictionary attacks against long or complex passwords infeasible in a competition timeframe. Pivot to finding the password or key through other means.
- Ignoring iteration count in PBKDF2 challenges — The iteration count is required to verify or reproduce the output. It is typically stored alongside the salt and hash; read the encoding carefully.
- Overlooking HKDF in DH-based challenges — When the challenge combines a Diffie-Hellman exchange with symmetric encryption, the shared secret is usually passed through HKDF before use as an AES key. Extracting the DH shared secret is only half the work; the KDF expansion step must also be reproduced.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0018 | Knowledge of encryption algorithms and their weaknesses | Explains why fast hash functions fail as KDFs and how memory-hard KDFs raise the attacker cost |
| K0019 | Knowledge of cryptography and cryptographic key management | Covers parameter selection (work factor, memory, iterations) and the role of salts in KDF security |
| K0305 | Knowledge of authentication, encryption, and digital signature schemes | Connects KDF weakness to practical password storage failures and authentication bypass |
| S0138 | Skill in using public-key infrastructure tools | Trains hashcat mode selection, KDF output format identification, and key regeneration from recovered passwords |
| T0259 | Use cryptanalysis tools to recover plaintext from ciphertext | Provides end-to-end methodology from KDF identification through password cracking and key reconstruction |
Further Reading
- Percival, C. and Josefsson, S. (2016). The scrypt Password-Based Key Derivation Function — RFC 7914, IETF
- Biryukov, A., Dinu, D., and Khovratovich, D. (2021). Argon2 Memory-Hard Function for Password Hashing and Proof-of-Work — RFC 9106, IETF
- NIST SP 800-132 (2023). Recommendation for Password-Based Key Derivation — National Institute of Standards and Technology
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.