Browse CTFs New CTF Sign in

Bacon cipher

log_analysis_siem Difficulty 1–5 30 min certifiable

Theory

Why This Matters

The Bacon cipher was designed by Francis Bacon in 1605 not primarily as a cipher but as a steganographic system — a method of hiding the very existence of a secret message inside an innocent-looking carrier text. Bacon's original application used two typefaces (roman and italic) to mark letters as A or B, allowing a message to be concealed in any printed text. This property — hiding data in plain sight through typographic variation — makes it a recurring pattern in CTF challenges that involve formatted documents, HTML with mixed font weights, or PDF files with inconsistent styling. Understanding Bacon's mechanism is also foundational to understanding modern binary-symbol steganography, including zero-width character encodings and whitespace steganography.

Core Concept

In the Bacon cipher, each letter of the plaintext alphabet is represented as a five-character sequence drawn from a two-symbol alphabet, conventionally A and B. In Bacon's original 24-letter variant, I and J share one code, and U and V share another (following Renaissance practice). The modern 26-letter variant assigns a unique 5-bit code to each letter from A to Z.

The mapping is positional: A=AAAAA (00000), B=AAAAB (00001), C=AAABA (00010), … Z=BBBBB (11111) in the 26-letter variant. This is equivalent to a 5-bit binary encoding where A=0 and B=1. Any two-symbol alphabet can substitute for A/B: bold vs. normal text, uppercase vs. lowercase, dots vs. dashes, 0 vs. 1. In Bacon's original steganographic use, the cover text had to be at least five times as long (in characters) as the hidden message, because each plaintext letter requires five cover characters to encode one hidden letter.

Technical Deep-Dive

26-letter Bacon alphabet (A=0, B=1 in 5-bit binary):
A AAAAA  B AAAAB  C AAABA  D AAABB  E AABAA
F AABAB  G AABBA  H AABBB  I ABAAA  J ABAAB
K ABABA  L ABABB  M ABBAA  N ABBAB  O ABBBA
P ABBBB  Q BAAAA  R BAAAB  S BAABA  T BAABB
U BABAA  V BABAB  W BABBA  X BABBB  Y BBAAA
Z BBAAB

24-letter variant: I=J=ABAAA, U=V=BABAA

Example encode of "FLAG":
F = AABAB
L = ABABB
A = AAAAA
G = AABBA
Concatenated: AABABABABBBAAAAAAABBA
BACON26 = {
    "AAAAA": "A", "AAAAB": "B", "AAABA": "C", "AAABB": "D",
    "AABAA": "E", "AABAB": "F", "AABBA": "G", "AABBB": "H",
    "ABAAA": "I", "ABAAB": "J", "ABABA": "K", "ABABB": "L",
    "ABBAA": "M", "ABBAB": "N", "ABBBA": "O", "ABBBB": "P",
    "BAAAA": "Q", "BAAAB": "R", "BAABA": "S", "BAABB": "T",
    "BABAA": "U", "BABAB": "V", "BABBA": "W", "BABBB": "X",
    "BBAAA": "Y", "BBAAB": "Z"
}

def decode_bacon(text: str, sym_a="A", sym_b="B") -> str:
    # Normalise custom symbols to A/B
    text = text.upper().replace(sym_b, "B").replace(sym_a, "A")
    text = text.replace(" ", "")   # strip spaces if present
    groups = [text[i:i+5] for i in range(0, len(text), 5)]
    return "".join(BACON26.get(g, "?") for g in groups)

# Numeric 0/1 variant
def decode_bacon_binary(bitstring: str) -> str:
    return decode_bacon(bitstring, sym_a="0", sym_b="1")

print(decode_bacon("AABABABABBBAAAAAAABBA"))  # FLAG
print(decode_bacon_binary("00101010110000000110"))  # FLAG
# Steganographic extraction: bold=B, normal=A in HTML
from bs4 import BeautifulSoup

def extract_bacon_from_html(html: str) -> str:
    soup = BeautifulSoup(html, "html.parser")
    bits = ""
    for char in soup.get_text():
        if char.strip():
            # bold parent = B, normal = A
            parent = soup.find(string=char)
            bits += "B" if parent and parent.parent.name == "b" else "A"
    return decode_bacon(bits)

Analytical Methodology

  1. Identify the two-symbol alphabet. If input contains only A and B (or 0 and 1, or two consistently used symbols) arranged in groups of 5, suspect Bacon. Verify that total character count is divisible by 5.
  2. Test both 24-letter and 26-letter variants. Decode with 26-letter first. If output contains unexpected ? or the message looks garbled near I/J or U/V positions, switch to 24-letter (merge I=J and U=V).
  3. Check for symbol reversal. If decoding with A=0, B=1 yields nonsense, try A=1, B=0 (reversed assignment). Some CTF authors invert the standard.
  4. Look for typographic steganography in formatted files. Open PDFs or Word documents in a text editor or parser. Extract per-character font weight, case, or colour. Map bold/italic → B, normal → A, then decode.
  5. Use CyberChef. "Bacon Cipher Decode" operation accepts A/B input and handles both 24 and 26 letter variants via the "Translation" option.
  6. Verify output length. A Bacon-encoded message of N plaintext characters requires exactly 5N symbols. If the symbol stream length is not a multiple of 5, either padding is present or the encoding is not pure Bacon.

Common Analytical Errors

  • Confusing 24-letter and 26-letter variants. Using the wrong variant causes every letter after I or U to be shifted by one position, producing a consistently garbled output that might be mistaken for a Caesar shift.
  • Missing typographic steganography. When Bacon is embedded in formatted text, the message is invisible to copy-paste — the A/B signal is in formatting attributes, not character values. Always inspect source formatting when a document challenge yields no obvious encoding in the raw text.
  • Treating the 5-bit groups as standard binary. AAAAB decodes to B (the letter), not to decimal 1. Analysts who read Bacon as binary will get the wrong output even though the bit pattern is identical.
  • Incorrect symbol identification in custom variants. A challenge may use any two symbols (e.g., . and o, or 0 and O). Failure to identify which symbol maps to A vs. B is the primary failure mode. Test both assignments.
  • Off-by-one in group boundary detection. If the encoded stream has whitespace or delimiter characters, stripping them before grouping is essential. Failing to normalise whitespace shifts all subsequent group boundaries.
  • Assuming the cover text has semantic meaning. In Bacon steganography, the carrier text is a dummy — it can be any text of sufficient length. Spending time analysing cover text meaning is unproductive.

NICE Framework Alignment

Code Knowledge/Skill/Task Statement How This Card Develops It
K0018 Knowledge of encryption algorithms used to protect data during transmission Grounds learner in a historically significant binary substitution scheme that precedes modern binary encoding
K0019 Knowledge of cryptography and key management concepts Introduces steganographic concealment as a conceptual precursor to modern covert channel techniques
K0305 Knowledge of encryption standards and various encryption algorithms Contextualises Bacon within the evolution from classical ciphers to binary encoding standards
S0138 Skill in using defensive coding practices Develops careful boundary and variant handling in decoder implementations
T0212 Perform penetration testing as required to evaluate information security Builds skill in recognising hidden-data patterns in document files and formatted text

Further Reading

  • The Advancement of Learning and New Atlantis — Francis Bacon (1605), containing the original cipher description
  • The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography — Simon Singh, Fourth Estate
  • Disappearing Cryptography: Information Hiding Steganography and Watermarking — Peter Wayner, Morgan Kaufmann

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.