Archive bomb (simulated)

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Archive bombs are a class of denial-of-service (DoS) attack that exploits the mismatch between the compressed size of a file and its expanded size. The canonical example, 42.zip (also called "the zip bomb"), is a 42-kilobyte file that, when fully extracted, expands to approximately 4.5 petabytes. Any application that accepts archive uploads and extracts them without enforcing expansion limits is vulnerable to resource exhaustion — consuming all available disk space, memory, or CPU — causing application downtime, host instability, or cascading failures. Real-world incidents include web application firewalls exhausting memory while scanning decompressed HTTP bodies, antivirus engines hanging on decompression, and mail servers crashing on receipt of archive attachments. CVE-2019-14271 (Docker) and multiple libarchive CVEs involve resource exhaustion during archive handling.

Core Concept

A zip bomb exploits the fact that the DEFLATE compression algorithm achieves very high compression ratios on repetitive data. A file consisting entirely of zero bytes compresses with a ratio exceeding 1000:1. A recursive zip bomb (the classic 42.zip design) nests archives: layer 1 contains 16 archives each 6KB; each layer-2 archive contains 16 archives each 6KB; ... 5 layers deep. The compressed size is tiny; the recursive expansion is exponential.

A non-recursive zip bomb (introduced by David Fifield in 2019) is more dangerous in practice because it works against extractors that detect recursion. It uses overlapping entries — multiple ZIP directory entries that reference the same compressed data at the same file offset. A 10-MB file can contain 1500 entries all pointing to the same 10-MB DEFLATE stream of zeros, expanding to 281 TB on extraction. This technique works against zip extractors that process entries linearly without deduplicating offsets.

Detection strategies:

Check uncompressed size before extraction — The ZIP central directory header contains the uncompressed size field (4 bytes). A sum of all uncompressed sizes exceeding a threshold (e.g., 500 MB) allows rejection before any extraction occurs. Note: this field is attacker-controlled and may be falsified; it is a hint, not a guarantee.
Compression ratio threshold — If compressed_size / uncompressed_size < 0.02 (i.e., compression ratio > 50x), treat the archive as suspicious. Legitimate archives rarely exceed 10:1 overall.
Byte limit on extraction — Enforce a hard limit on total bytes written during extraction (e.g., 100 MB). Abort and delete partial output if the limit is reached.
Entry count limit — Limit the number of entries processed (e.g., 1000). Non-recursive bombs rely on large entry counts.
Extraction time limit — Enforce a CPU time limit on the extraction process using SIGXCPU or a watchdog thread.

A quine archive is an archive whose extracted content is itself (the same archive). It demonstrates the possibility of infinite recursive extraction but does not cause disk exhaustion in a single extraction step.

Technical Deep-Dive

# ── Detection: analyse a ZIP before extraction ────────────────────────────
import zipfile

MAX_UNCOMPRESSED_BYTES = 100 * 1024 * 1024   # 100 MB limit
MAX_COMPRESSION_RATIO  = 50                    # reject if ratio > 50x
MAX_ENTRY_COUNT        = 1000

def safe_zip_check(zip_path: str) -> dict:
    """
    Inspect a ZIP archive for archive bomb indicators.
    Returns a dict with findings. Does NOT extract any data.
    """
    findings = {"safe": True, "reason": None}

    with zipfile.ZipFile(zip_path, "r") as zf:
        infos = zf.infolist()

        # Check entry count
        if len(infos) > MAX_ENTRY_COUNT:
            findings["safe"] = False
            findings["reason"] = f"Entry count {len(infos)} exceeds limit {MAX_ENTRY_COUNT}"
            return findings

        total_compressed   = sum(i.compress_size for i in infos)
        total_uncompressed = sum(i.file_size     for i in infos)

        # Check total uncompressed size (may be falsified in central dir)
        if total_uncompressed > MAX_UNCOMPRESSED_BYTES:
            findings["safe"] = False
            findings["reason"] = (
                f"Declared uncompressed size {total_uncompressed/1e6:.1f} MB "
                f"exceeds {MAX_UNCOMPRESSED_BYTES/1e6:.0f} MB limit"
            )
            return findings

        # Check compression ratio
        if total_compressed > 0:
            ratio = total_uncompressed / total_compressed
            if ratio > MAX_COMPRESSION_RATIO:
                findings["safe"] = False
                findings["reason"] = (
                    f"Compression ratio {ratio:.0f}x exceeds threshold {MAX_COMPRESSION_RATIO}x"
                )
                return findings

    return findings   # {"safe": True, "reason": None}

# ── Byte-limited extraction ───────────────────────────────────────────────
import os

def safe_extract(zip_path: str, dest: str) -> None:
    """Extract a ZIP with a byte-limit guard against runtime bombs."""
    total_written = 0
    os.makedirs(dest, exist_ok=True)

    with zipfile.ZipFile(zip_path, "r") as zf:
        for member in zf.infolist():
            # Path traversal check (see Card 4: Zip Slip)
            import pathlib
            dest_path = (pathlib.Path(dest) / member.filename).resolve()
            if not str(dest_path).startswith(str(pathlib.Path(dest).resolve())):
                raise ValueError(f"Path traversal blocked: {member.filename}")

            # Byte-limited write
            with zf.open(member) as src, open(dest_path, "wb") as dst:
                chunk_size = 65536
                while True:
                    chunk = src.read(chunk_size)
                    if not chunk:
                        break
                    total_written += len(chunk)
                    if total_written > MAX_UNCOMPRESSED_BYTES:
                        os.remove(str(dest_path))
                        raise ValueError(
                            f"Extraction aborted: exceeded {MAX_UNCOMPRESSED_BYTES} bytes"
                        )
                    dst.write(chunk)

print(safe_zip_check("suspicious.zip"))

# ── Create a test recursive zip bomb (for controlled lab use only) ─────────
# Layer structure: file -> z1 -> z2 -> z3 (3 levels)
python3 - <<'EOF'
import zipfile, io

# Innermost: 1 MB of zeros compressed to ~1 KB
zeros_1mb = b"x00" * (1024 * 1024)

buf3 = io.BytesIO()
with zipfile.ZipFile(buf3, "w", zipfile.ZIP_DEFLATED) as z:
    z.writestr("zeros.bin", zeros_1mb)
layer3 = buf3.getvalue()

buf2 = io.BytesIO()
with zipfile.ZipFile(buf2, "w", zipfile.ZIP_DEFLATED) as z:
    for i in range(16):
        z.writestr(f"l3_{i}.zip", layer3)
layer2 = buf2.getvalue()

with zipfile.ZipFile("bomb.zip", "w", zipfile.ZIP_DEFLATED) as z:
    for i in range(16):
        z.writestr(f"l2_{i}.zip", layer2)

import os
print(f"bomb.zip size: {os.path.getsize('bomb.zip')/1024:.1f} KB")
print(f"Declared expansion: {16*16*1:.0f} MB (3-level demo)")
EOF

# Check compression ratio with unzip -l
unzip -l bomb.zip | tail -5
# Column 1 is uncompressed size; compare with ls -lh bomb.zip

Security Assessment Methodology

Locate archive upload endpoints — Identify all endpoints that accept .zip, .tar, .gz, .bz2, .7z, .jar, or .war uploads. Include automatic archive fetching from URLs (update endpoints, plugin installers).
Submit a known zip bomb — Upload the EICAR zip bomb test file or a purpose-built 3-layer recursive bomb. Observe server behavior: does the request time out? Does the server return an error? Does disk usage spike?
Test uncompressed size declared vs actual — Craft a ZIP with a falsified uncompressed size in the central directory (set to 1 byte) but actual content of 100 MB. Observe whether the server enforces extraction limits or relies on the declared size.
Test entry count limits — Submit a ZIP with 5000 empty entries. If the server processes all entries, per-entry rate limits or entry count caps are absent.
Measure response time and resource indicators — Compare response time between a normal upload and the bomb upload. Server-side timeouts, 503 responses, and memory error messages are indicators of unprotected extraction.
Check for nested archive handling — Submit a 3-level nested ZIP. If the server recursively extracts inner archives, verify that recursion depth limits are enforced.
Document impact — Record compression ratio, declared vs actual size, server response, and estimated resource consumption for the report.

Defensive Countermeasure — Enforce extraction limits at three levels: (1) reject archives where the declared total uncompressed size exceeds a configured threshold (e.g., 500 MB); (2) enforce a hard byte limit during extraction using a byte-counting wrapper around the decompressor; (3) limit entry count to a reasonable maximum (e.g., 1000). Extract archives in a sandboxed subprocess with a ulimit -f disk quota and CPU time limit (ulimit -t). Use libarchive with archive_read_set_format_filter_count limits where applicable.

Common Assessment Errors

Using 42.zip without understanding recursive vs non-recursive — 42.zip is a recursive bomb; many modern extractors detect recursive archives and refuse to process them. Use both recursive and non-recursive (overlapping-entry) test files to cover both cases.
Assuming the declared size in the central directory is reliable — It is attacker-controlled. An extractor that enforces limits only based on declared size can be bypassed by falsifying it to 0. Always enforce limits during actual byte extraction.
Forgetting other archive formats — Tar.gz, bzip2, 7zip, and RAR all support analogous compression bombs. If the application accepts multiple formats, test each independently.
Testing only on the upload endpoint — Some applications extract archives server-side as part of processing pipelines (e.g., loading a plugin, processing a report template). These pipeline steps may lack the upload size limits applied to the initial upload.
Not verifying server-side resource consumption — A server that returns a 200 response after "processing" a bomb may have truncated extraction early (a good sign) or may be asynchronously processing it (a bad sign). Verify actual disk usage server-side if possible.
Conflating zip bomb with Zip Slip — These are distinct vulnerabilities. A zip bomb exploits decompression resource limits; Zip Slip exploits path traversal. An archive can contain both. Test for both simultaneously.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0009	Knowledge of application vulnerabilities	Explains the recursive and non-recursive zip bomb mechanisms and detection strategies
K0070	Knowledge of system and application security threats and vulnerabilities	Connects archive bombs to real-world DoS incidents in AV engines and WAFs
S0001	Skill in conducting vulnerability scans and recognizing vulnerabilities in security systems	Trains systematic resource exhaustion testing across archive format variants
S0044	Skill in mimicking threat behaviors to test defenses	Develops ability to craft multi-layer test bombs and analyze server responses
T0028	Conduct and support authorized penetration testing on enterprise networks	Provides a stepwise methodology for archive bomb assessment with resource impact documentation
T0591	Perform penetration testing as required for new or updated applications	Frames archive extraction limit testing as a required upload endpoint assessment step