Bacon cipher
Theory
Why This Matters
The Bacon cipher was designed by Francis Bacon in 1605 not primarily as a cipher but as a steganographic system — a method of hiding the very existence of a secret message inside an innocent-looking carrier text. Bacon's original application used two typefaces (roman and italic) to mark letters as A or B, allowing a message to be concealed in any printed text. This property — hiding data in plain sight through typographic variation — makes it a recurring pattern in CTF challenges that involve formatted documents, HTML with mixed font weights, or PDF files with inconsistent styling. Understanding Bacon's mechanism is also foundational to understanding modern binary-symbol steganography, including zero-width character encodings and whitespace steganography.
Core Concept
In the Bacon cipher, each letter of the plaintext alphabet is represented as a five-character sequence drawn from a two-symbol alphabet, conventionally A and B. In Bacon's original 24-letter variant, I and J share one code, and U and V share another (following Renaissance practice). The modern 26-letter variant assigns a unique 5-bit code to each letter from A to Z.
The mapping is positional: A=AAAAA (00000), B=AAAAB (00001), C=AAABA (00010), … Z=BBBBB (11111) in the 26-letter variant. This is equivalent to a 5-bit binary encoding where A=0 and B=1. Any two-symbol alphabet can substitute for A/B: bold vs. normal text, uppercase vs. lowercase, dots vs. dashes, 0 vs. 1. In Bacon's original steganographic use, the cover text had to be at least five times as long (in characters) as the hidden message, because each plaintext letter requires five cover characters to encode one hidden letter.
Technical Deep-Dive
26-letter Bacon alphabet (A=0, B=1 in 5-bit binary):
A AAAAA B AAAAB C AAABA D AAABB E AABAA
F AABAB G AABBA H AABBB I ABAAA J ABAAB
K ABABA L ABABB M ABBAA N ABBAB O ABBBA
P ABBBB Q BAAAA R BAAAB S BAABA T BAABB
U BABAA V BABAB W BABBA X BABBB Y BBAAA
Z BBAAB
24-letter variant: I=J=ABAAA, U=V=BABAA
Example encode of "FLAG":
F = AABAB
L = ABABB
A = AAAAA
G = AABBA
Concatenated: AABABABABBBAAAAAAABBA
BACON26 = {
"AAAAA": "A", "AAAAB": "B", "AAABA": "C", "AAABB": "D",
"AABAA": "E", "AABAB": "F", "AABBA": "G", "AABBB": "H",
"ABAAA": "I", "ABAAB": "J", "ABABA": "K", "ABABB": "L",
"ABBAA": "M", "ABBAB": "N", "ABBBA": "O", "ABBBB": "P",
"BAAAA": "Q", "BAAAB": "R", "BAABA": "S", "BAABB": "T",
"BABAA": "U", "BABAB": "V", "BABBA": "W", "BABBB": "X",
"BBAAA": "Y", "BBAAB": "Z"
}
def decode_bacon(text: str, sym_a="A", sym_b="B") -> str:
# Normalise custom symbols to A/B
text = text.upper().replace(sym_b, "B").replace(sym_a, "A")
text = text.replace(" ", "") # strip spaces if present
groups = [text[i:i+5] for i in range(0, len(text), 5)]
return "".join(BACON26.get(g, "?") for g in groups)
# Numeric 0/1 variant
def decode_bacon_binary(bitstring: str) -> str:
return decode_bacon(bitstring, sym_a="0", sym_b="1")
print(decode_bacon("AABABABABBBAAAAAAABBA")) # FLAG
print(decode_bacon_binary("00101010110000000110")) # FLAG
# Steganographic extraction: bold=B, normal=A in HTML
from bs4 import BeautifulSoup
def extract_bacon_from_html(html: str) -> str:
soup = BeautifulSoup(html, "html.parser")
bits = ""
for char in soup.get_text():
if char.strip():
# bold parent = B, normal = A
parent = soup.find(string=char)
bits += "B" if parent and parent.parent.name == "b" else "A"
return decode_bacon(bits)
Analytical Methodology
- Identify the two-symbol alphabet. If input contains only A and B (or 0 and 1, or two consistently used symbols) arranged in groups of 5, suspect Bacon. Verify that total character count is divisible by 5.
- Test both 24-letter and 26-letter variants. Decode with 26-letter first. If output contains unexpected
?or the message looks garbled near I/J or U/V positions, switch to 24-letter (merge I=J and U=V). - Check for symbol reversal. If decoding with A=0, B=1 yields nonsense, try A=1, B=0 (reversed assignment). Some CTF authors invert the standard.
- Look for typographic steganography in formatted files. Open PDFs or Word documents in a text editor or parser. Extract per-character font weight, case, or colour. Map bold/italic → B, normal → A, then decode.
- Use CyberChef. "Bacon Cipher Decode" operation accepts A/B input and handles both 24 and 26 letter variants via the "Translation" option.
- Verify output length. A Bacon-encoded message of N plaintext characters requires exactly 5N symbols. If the symbol stream length is not a multiple of 5, either padding is present or the encoding is not pure Bacon.
Common Analytical Errors
- Confusing 24-letter and 26-letter variants. Using the wrong variant causes every letter after I or U to be shifted by one position, producing a consistently garbled output that might be mistaken for a Caesar shift.
- Missing typographic steganography. When Bacon is embedded in formatted text, the message is invisible to copy-paste — the A/B signal is in formatting attributes, not character values. Always inspect source formatting when a document challenge yields no obvious encoding in the raw text.
- Treating the 5-bit groups as standard binary. AAAAB decodes to B (the letter), not to decimal 1. Analysts who read Bacon as binary will get the wrong output even though the bit pattern is identical.
- Incorrect symbol identification in custom variants. A challenge may use any two symbols (e.g.,
.ando, or0andO). Failure to identify which symbol maps to A vs. B is the primary failure mode. Test both assignments. - Off-by-one in group boundary detection. If the encoded stream has whitespace or delimiter characters, stripping them before grouping is essential. Failing to normalise whitespace shifts all subsequent group boundaries.
- Assuming the cover text has semantic meaning. In Bacon steganography, the carrier text is a dummy — it can be any text of sufficient length. Spending time analysing cover text meaning is unproductive.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0018 | Knowledge of encryption algorithms used to protect data during transmission | Grounds learner in a historically significant binary substitution scheme that precedes modern binary encoding |
| K0019 | Knowledge of cryptography and key management concepts | Introduces steganographic concealment as a conceptual precursor to modern covert channel techniques |
| K0305 | Knowledge of encryption standards and various encryption algorithms | Contextualises Bacon within the evolution from classical ciphers to binary encoding standards |
| S0138 | Skill in using defensive coding practices | Develops careful boundary and variant handling in decoder implementations |
| T0212 | Perform penetration testing as required to evaluate information security | Builds skill in recognising hidden-data patterns in document files and formatted text |
Further Reading
- The Advancement of Learning and New Atlantis — Francis Bacon (1605), containing the original cipher description
- The Code Book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography — Simon Singh, Fourth Estate
- Disappearing Cryptography: Information Hiding Steganography and Watermarking — Peter Wayner, Morgan Kaufmann
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.