Browse CTFs New CTF Sign in

Baudot encoding

log_analysis_siem Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Baudot code, invented by Émile Baudot in 1870 and later standardised as the International Telegraph Alphabet No. 2 (ITA2), was the dominant encoding for electromechanical teleprinters (teletypes) through most of the twentieth century. It underpins RTTY (Radio Teletype) transmissions still audible on shortwave frequencies and was used as a pre-encryption layer before cipher machines in both World Wars — notably, the Lorenz SZ40/42 cipher machine operated on ITA2 streams. In CTF challenges, Baudot/ITA2 appears as 5-bit binary sequences, sometimes disguised as octal (0–31) or decimal values. The key analytical difficulty is the shift-state dependency: the same 5-bit code produces a different character depending on whether the decoder is currently in LETTERS or FIGURES mode.

Core Concept

ITA2 uses 5 bits to encode characters, giving 32 possible code points (0–31). Because 32 code points are insufficient for both letters and special characters, ITA2 uses two shift states: LTRS (code 31, 11111) switches the decoder into letters mode, and FIGS (code 27, 11011) switches it into figures mode. In letters mode, code 01 decodes to "A"; in figures mode, the same code decodes to "-". This effectively doubles the character set to approximately 58 printable characters plus control codes.

Special codes include: NULL (00000), DEL/RUBOUT (11111 in some variants), CR (01000 = carriage return), LF (00010 = line feed), SPACE (00100). The five bits are conventionally transmitted LSB first in hardware, but CTF problems often present them MSB first — this distinction is a common source of confusion.

Technical Deep-Dive

ITA2 Table (5-bit code, decimal: LTRS mode / FIGS mode):
Code  LTRS  FIGS     Code  LTRS  FIGS
00000  NUL   NUL     10000  -     9
00001  E     3       10001  (blank)
00010  LF    LF      10010  .     (period)... (abbreviated)
00011  A     -       11000  K     (
00100  SPC   SPC     11001  J     BELL
00101  S     APOS    11010  D     WHO ARE YOU
00110  I     8       11011  FIGS  FIGS   ← shift to figures
00111  U     7       11100  G     &
01000  CR    CR      11101  N     ,
01001  D     WHO     11110  M     .
01010  R     4       11111  LTRS  LTRS   ← shift to letters
... (full table in ITU-T S.1)
# ITA2 decode with shift-state tracking
LTRS = {
    0b00001: "E", 0b00011: "A", 0b00100: " ", 0b00101: "S",
    0b00110: "I", 0b00111: "U", 0b01000: "
", 0b01001: "D",
    0b01010: "R", 0b01011: "J", 0b01100: "N", 0b01101: "F",
    0b01110: "C", 0b01111: "K", 0b10000: "T", 0b10001: "Z",
    0b10010: "L", 0b10011: "W", 0b10100: "H", 0b10101: "Y",
    0b10110: "P", 0b10111: "Q", 0b11000: "O", 0b11001: "B",
    0b11010: "G", 0b11100: "M", 0b11101: "X", 0b11110: "V",
    0b00010: "
",
}
FIGS = {
    0b00001: "3", 0b00011: "-", 0b00100: " ", 0b00101: "'",
    0b00110: "8", 0b00111: "7", 0b01001: "x05", 0b01010: "4",
    0b01011: "a", 0b01100: ",", 0b01101: "!", 0b01110: ":",
    0b01111: "(", 0b10000: "5", 0b10001: "+", 0b10010: ")",
    0b10011: "2", 0b10100: "#", 0b10101: "6", 0b10110: "0",
    0b10111: "1", 0b11000: "9", 0b11001: "?", 0b11010: "&",
    0b11100: ".", 0b11101: "/", 0b11110: ";", 0b00010: "
",
}
LTRS_SHIFT, FIGS_SHIFT = 0b11111, 0b11011

def decode_ita2(bits: str) -> str:
    groups = [int(bits[i:i+5], 2) for i in range(0, len(bits.replace(" ","")), 5)]
    result, mode = [], LTRS
    for code in groups:
        if code == LTRS_SHIFT:
            mode = LTRS
        elif code == FIGS_SHIFT:
            mode = FIGS
        elif code in mode:
            result.append(mode[code])
    return "".join(result)
# CyberChef: "Baudot" operation
# Settings: ITA2, MSB/LSB first (match challenge convention)
# dcode.fr: "Baudot Code" decoder — handles ITA1 and ITA2

Analytical Methodology

  1. Recognise 5-bit groupings. Input is typically a sequence of 5-bit binary strings (e.g., 00001 11111 00101), octal values 0–31, or a continuous binary stream whose length is divisible by 5. Distinguish from Bacon cipher (also 5-bit) by the presence of shift codes (11111, 11011) and the ITA2 character distribution.
  2. Determine bit order. Try MSB-first decode first (most common in CTF problems). If output is garbled but contains correct-looking fragments, try LSB-first (matching hardware convention).
  3. Track shift states. A correct ITA2 decoder must carry shift state across the entire message. Stateless per-character lookup is incorrect and will mis-decode all characters after a FIGS shift. Always implement or use a stateful decoder.
  4. Use CyberChef Baudot operation. Select ITA2 variant; choose MSB or LSB first; confirm output is printable text. If output contains unexpected control characters (BEL, ENQ), these may be FIGS-mode punctuation mappings — cross-check the full ITA2 table.
  5. Distinguish ITA1 from ITA2. ITA1 (original Baudot, also called CCITT-1) has a different code table. If ITA2 decoding produces mostly wrong characters but the structure is clearly 5-bit, try ITA1.
  6. Verify with known prefixes. If the flag format is known (e.g., CTF{), encode those letters in ITA2 and compare against the beginning of the challenge stream to confirm the bit order and shift state assumptions.

Common Analytical Errors

  • Ignoring LTRS/FIGS shift codes. Treating shift codes as printable characters rather than mode switches produces garbled output. Every ITA2 decoder must handle these two control codes specially.
  • Confusing MSB and LSB bit order. Hardware transmission is LSB-first, but CTF presentations are usually MSB-first. Silently assuming the wrong order shifts every character to a different code point.
  • Mistaking ITA1 for ITA2. The two standards have different code tables. ITA2 is far more common in CTF but ITA1 appears in challenges themed around early telegraphy. The main table differences are in punctuation and special characters.
  • Conflating Baudot with Bacon cipher. Both use 5-bit groups. The distinguishing features are: Baudot has only 32 distinct codes (0–31) with many in the control range; Bacon uses only A/B and maps directly to letters without shift states.
  • Dropping leading zeros in 5-bit groups. Binary representation of code 1 is 00001, not 1. Incorrect parsing that treats each group as a variable-length integer will misalign all subsequent group boundaries.
  • Forgetting NULL codes. Code 00000 is NULL and produces no output. If it appears in the stream it must be consumed and discarded, not skipped in the bit-group parsing.

NICE Framework Alignment

Code Knowledge/Skill/Task Statement How This Card Develops It
K0018 Knowledge of encryption algorithms used to protect data during transmission Connects learner to the teleprinter encoding layer that preceded cipher machines in historical communications security
K0019 Knowledge of cryptography and key management concepts Establishes understanding of stateful encoding, directly analogous to stream cipher state machines
K0305 Knowledge of encryption standards and various encryption algorithms Positions ITA2 within the history of telegraph and telex encoding standards
S0138 Skill in using defensive coding practices Develops careful stateful decoder implementation with explicit shift-state tracking
T0212 Perform penetration testing as required to evaluate information security Builds recognition of teleprinter encoding patterns in RTTY signal captures and historical cipher artifacts

Further Reading

  • ITU-T Recommendation S.1 — International Telegraph Alphabet No. 2 — International Telecommunication Union
  • Seizing the Enigma: The Race to Break the German U-Boats' Codes — David Kahn, Houghton Mifflin
  • The Hut Six Story: Breaking the Enigma Codes — Gordon Welchman, McGraw-Hill

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.