Advanced Multi-Layer Encoding with Compression: gzip/zlib Layer Identification and Programmatic Decoding

reverse_engineering Difficulty 1–5 30 min certifiable

Theory

Reverse Engineering Methodology

Advanced layered encoding challenges extend the basic model by introducing modifications that break naive decoders: non-standard alphabets that replace the canonical Base64 charset, reversed strings, URL-safe variants, and compressed payloads embedded mid-stack. Understanding each variant requires recognising both the structural fingerprint and the behavioural signature that distinguishes it from the canonical form.

Non-standard alphabets. Standard Base64 uses A-Za-z0-9+/=. Custom alphabets replace some or all of these characters. Common substitutions seen in CTFs: +→-, /→_ (URL-safe / Base64url, RFC 4648 §5); +→., /→_ (used by PHP); completely custom orderings where the alphabet is permuted. When a Base64-shaped blob fails to decode, construct the translation table by frequency analysis: the most common characters in Base64 output map to the highest-frequency symbols of the input alphabet.

Reversed Base64. The entire encoded string is reversed before delivery. Identify by: reversing the blob and checking if the result is valid Base64 (correct length, valid charset, = padding at end). A reversed Base64 string will have = characters at the beginning rather than the end.

URL-safe Base64 (Base64url). Uses - instead of + and _ instead of /. Padding = is often omitted. Python: base64.urlsafe_b64decode(s + '=='[:(4 - len(s)%4)%4]).

Compressed layers. Gzip and zlib compression applied mid-stack create a layer that looks like binary noise. Magic byte signatures: - gzip: 1F 8B 08 — always the first three bytes of a gzip stream. - zlib (deflate with header): 78 01 (low compression), 78 9C (default compression), 78 DA (best compression). - bzip2: 42 5A 68 (BZh). - zstd: 28 B5 2F FD.

When a decoded layer starts with these bytes, decompress before continuing to peel.

Technical Deep-Dive

import base64, zlib, gzip, io, binascii

MAGIC_BYTES = {
    b'x1fx8b': 'gzip',
    b'xx01': 'zlib-low',
    b'xx9c': 'zlib-default',
    b'xxda': 'zlib-best',
    b'BZh':   'bzip2',
}

def detect_compression(data: bytes) -> str | None:
    for magic, name in MAGIC_BYTES.items():
        if data[:len(magic)] == magic:
            return name
    return None

def decompress_layer(data: bytes, kind: str) -> bytes:
    if kind == 'gzip':
        return gzip.decompress(data)
    if kind.startswith('zlib'):
        return zlib.decompress(data)
    if kind == 'bzip2':
        import bz2; return bz2.decompress(data)
    raise ValueError(f"Unknown compression: {kind}")

def b64_custom_decode(blob: str, custom_alpha: str) -> bytes:
    """Translate custom alphabet to standard Base64, then decode."""
    STANDARD = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    if len(custom_alpha) != 64:
        raise ValueError("Custom alphabet must be exactly 64 characters")
    table = str.maketrans(custom_alpha, STANDARD)
    normalised = blob.translate(table)
    padding = '=' * (-len(normalised) % 4)
    return base64.b64decode(normalised + padding)

def smart_peel(blob) -> list[tuple[str, object]]:
    """Peel a blob that may be str (encoded) or bytes (binary/compressed)."""
    layers = []
    current = blob
    for _ in range(30):
        # If bytes, check for compression
        if isinstance(current, bytes):
            kind = detect_compression(current)
            if kind:
                try:
                    current = decompress_layer(current, kind)
                    layers.append((f"decompress:{kind}", current))
                    continue
                except Exception:
                    pass
            # Try to decode as UTF-8 for further string-based peeling
            try:
                current = current.decode("utf-8").strip()
            except UnicodeDecodeError:
                break   # true binary, stop
        # String-based peeling
        if isinstance(current, str):
            s = current.strip()
            # URL-safe Base64
            if set(s) <= set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_="):
                try:
                    decoded = base64.urlsafe_b64decode(s + '=='[:(4-len(s)%4)%4])
                    layers.append(('base64url', decoded))
                    current = decoded
                    continue
                except Exception:
                    pass
            # Reversed Base64 check
            rev = s[::-1]
            if set(rev) <= set("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="):
                try:
                    decoded = base64.b64decode(rev + '=='[:(4-len(rev)%4)%4])
                    layers.append(('base64-reversed', decoded))
                    current = decoded
                    continue
                except Exception:
                    pass
            break
    return layers

# Identify gzip / zlib magic bytes in a hex dump
xxd encoded_layer.bin | head -4
# Look for: 1f8b (gzip) or 789c / 78da (zlib)

# Decompress gzip layer extracted from a multi-layer blob
python3 -c "
import base64, gzip, sys
data = base64.b64decode(open('layer.b64').read())
print(gzip.decompress(data).decode())
"

# URL-safe Base64 decode at the shell
python3 -c "import base64,sys; print(base64.urlsafe_b64decode(sys.argv[1]+'==').decode())" "$BLOB"

Common Reversing Errors

1. Not checking for reversed input. A reversed Base64 string is syntactically valid ASCII but will decode to garbage. If Base64 decoding produces non-printable output yet the charset looks correct, try reversing first.

2. Using strict Base64 decoders on URL-safe input. base64.b64decode will raise binascii.Error: Invalid base64-encoded string on - and _ characters. Always try urlsafe_b64decode as a fallback.

3. Ignoring binary layers mid-stack. Analysts expect each layer to be printable. A gzip layer in the middle produces raw bytes that must be decompressed, not decoded as text. Inspect the first two bytes whenever a decode step yields bytes.

4. Forgetting omitted padding in Base64url. RFC 4648 §5 permits omission of = padding. Always add padding = '=' * (-len(s) % 4) before decoding.

5. Frequency-analysis failure on short alphabets. Custom alphabet recovery via frequency analysis requires the payload to be long enough (hundreds of characters). For short blobs, brute-force the alphabet rotation (there are only 64 possible cyclic shifts) or look for printable-ASCII output as the oracle.

6. Missing zlib header variants. Analysts learn 78 9C but miss 78 01 (low compression) and 78 DA (best compression). Always test all three zlib header bytes when standard decompression fails.

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.