Baudot encoding
Theory
Why This Matters
Baudot code, invented by Émile Baudot in 1870 and later standardised as the International Telegraph Alphabet No. 2 (ITA2), was the dominant encoding for electromechanical teleprinters (teletypes) through most of the twentieth century. It underpins RTTY (Radio Teletype) transmissions still audible on shortwave frequencies and was used as a pre-encryption layer before cipher machines in both World Wars — notably, the Lorenz SZ40/42 cipher machine operated on ITA2 streams. In CTF challenges, Baudot/ITA2 appears as 5-bit binary sequences, sometimes disguised as octal (0–31) or decimal values. The key analytical difficulty is the shift-state dependency: the same 5-bit code produces a different character depending on whether the decoder is currently in LETTERS or FIGURES mode.
Core Concept
ITA2 uses 5 bits to encode characters, giving 32 possible code points (0–31). Because 32 code points are insufficient for both letters and special characters, ITA2 uses two shift states: LTRS (code 31, 11111) switches the decoder into letters mode, and FIGS (code 27, 11011) switches it into figures mode. In letters mode, code 01 decodes to "A"; in figures mode, the same code decodes to "-". This effectively doubles the character set to approximately 58 printable characters plus control codes.
Special codes include: NULL (00000), DEL/RUBOUT (11111 in some variants), CR (01000 = carriage return), LF (00010 = line feed), SPACE (00100). The five bits are conventionally transmitted LSB first in hardware, but CTF problems often present them MSB first — this distinction is a common source of confusion.
Technical Deep-Dive
ITA2 Table (5-bit code, decimal: LTRS mode / FIGS mode):
Code LTRS FIGS Code LTRS FIGS
00000 NUL NUL 10000 - 9
00001 E 3 10001 (blank)
00010 LF LF 10010 . (period)... (abbreviated)
00011 A - 11000 K (
00100 SPC SPC 11001 J BELL
00101 S APOS 11010 D WHO ARE YOU
00110 I 8 11011 FIGS FIGS ← shift to figures
00111 U 7 11100 G &
01000 CR CR 11101 N ,
01001 D WHO 11110 M .
01010 R 4 11111 LTRS LTRS ← shift to letters
... (full table in ITU-T S.1)
# ITA2 decode with shift-state tracking
LTRS = {
0b00001: "E", 0b00011: "A", 0b00100: " ", 0b00101: "S",
0b00110: "I", 0b00111: "U", 0b01000: "
", 0b01001: "D",
0b01010: "R", 0b01011: "J", 0b01100: "N", 0b01101: "F",
0b01110: "C", 0b01111: "K", 0b10000: "T", 0b10001: "Z",
0b10010: "L", 0b10011: "W", 0b10100: "H", 0b10101: "Y",
0b10110: "P", 0b10111: "Q", 0b11000: "O", 0b11001: "B",
0b11010: "G", 0b11100: "M", 0b11101: "X", 0b11110: "V",
0b00010: "
",
}
FIGS = {
0b00001: "3", 0b00011: "-", 0b00100: " ", 0b00101: "'",
0b00110: "8", 0b00111: "7", 0b01001: "x05", 0b01010: "4",
0b01011: "a", 0b01100: ",", 0b01101: "!", 0b01110: ":",
0b01111: "(", 0b10000: "5", 0b10001: "+", 0b10010: ")",
0b10011: "2", 0b10100: "#", 0b10101: "6", 0b10110: "0",
0b10111: "1", 0b11000: "9", 0b11001: "?", 0b11010: "&",
0b11100: ".", 0b11101: "/", 0b11110: ";", 0b00010: "
",
}
LTRS_SHIFT, FIGS_SHIFT = 0b11111, 0b11011
def decode_ita2(bits: str) -> str:
groups = [int(bits[i:i+5], 2) for i in range(0, len(bits.replace(" ","")), 5)]
result, mode = [], LTRS
for code in groups:
if code == LTRS_SHIFT:
mode = LTRS
elif code == FIGS_SHIFT:
mode = FIGS
elif code in mode:
result.append(mode[code])
return "".join(result)
# CyberChef: "Baudot" operation
# Settings: ITA2, MSB/LSB first (match challenge convention)
# dcode.fr: "Baudot Code" decoder — handles ITA1 and ITA2
Analytical Methodology
- Recognise 5-bit groupings. Input is typically a sequence of 5-bit binary strings (e.g.,
00001 11111 00101), octal values 0–31, or a continuous binary stream whose length is divisible by 5. Distinguish from Bacon cipher (also 5-bit) by the presence of shift codes (11111, 11011) and the ITA2 character distribution. - Determine bit order. Try MSB-first decode first (most common in CTF problems). If output is garbled but contains correct-looking fragments, try LSB-first (matching hardware convention).
- Track shift states. A correct ITA2 decoder must carry shift state across the entire message. Stateless per-character lookup is incorrect and will mis-decode all characters after a FIGS shift. Always implement or use a stateful decoder.
- Use CyberChef Baudot operation. Select ITA2 variant; choose MSB or LSB first; confirm output is printable text. If output contains unexpected control characters (BEL, ENQ), these may be FIGS-mode punctuation mappings — cross-check the full ITA2 table.
- Distinguish ITA1 from ITA2. ITA1 (original Baudot, also called CCITT-1) has a different code table. If ITA2 decoding produces mostly wrong characters but the structure is clearly 5-bit, try ITA1.
- Verify with known prefixes. If the flag format is known (e.g.,
CTF{), encode those letters in ITA2 and compare against the beginning of the challenge stream to confirm the bit order and shift state assumptions.
Common Analytical Errors
- Ignoring LTRS/FIGS shift codes. Treating shift codes as printable characters rather than mode switches produces garbled output. Every ITA2 decoder must handle these two control codes specially.
- Confusing MSB and LSB bit order. Hardware transmission is LSB-first, but CTF presentations are usually MSB-first. Silently assuming the wrong order shifts every character to a different code point.
- Mistaking ITA1 for ITA2. The two standards have different code tables. ITA2 is far more common in CTF but ITA1 appears in challenges themed around early telegraphy. The main table differences are in punctuation and special characters.
- Conflating Baudot with Bacon cipher. Both use 5-bit groups. The distinguishing features are: Baudot has only 32 distinct codes (0–31) with many in the control range; Bacon uses only A/B and maps directly to letters without shift states.
- Dropping leading zeros in 5-bit groups. Binary representation of code 1 is
00001, not1. Incorrect parsing that treats each group as a variable-length integer will misalign all subsequent group boundaries. - Forgetting NULL codes. Code 00000 is NULL and produces no output. If it appears in the stream it must be consumed and discarded, not skipped in the bit-group parsing.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0018 | Knowledge of encryption algorithms used to protect data during transmission | Connects learner to the teleprinter encoding layer that preceded cipher machines in historical communications security |
| K0019 | Knowledge of cryptography and key management concepts | Establishes understanding of stateful encoding, directly analogous to stream cipher state machines |
| K0305 | Knowledge of encryption standards and various encryption algorithms | Positions ITA2 within the history of telegraph and telex encoding standards |
| S0138 | Skill in using defensive coding practices | Develops careful stateful decoder implementation with explicit shift-state tracking |
| T0212 | Perform penetration testing as required to evaluate information security | Builds recognition of teleprinter encoding patterns in RTTY signal captures and historical cipher artifacts |
Further Reading
- ITU-T Recommendation S.1 — International Telegraph Alphabet No. 2 — International Telecommunication Union
- Seizing the Enigma: The Race to Break the German U-Boats' Codes — David Kahn, Houghton Mifflin
- The Hut Six Story: Breaking the Enigma Codes — Gordon Welchman, McGraw-Hill
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.