Morse Code Decoding: Timing Analysis, Delimiter Identification and Transcription Methodology

log_analysis_siem Difficulté 1–5 30 min certifiable

Théorie

Why This Matters

Morse code has appeared in covert communications contexts since its invention in the 1830s, and it remains a fixture in CTF challenges because it occupies a middle ground between "immediately obvious cipher" and "requires specialist knowledge." Real-world relevance includes embedded firmware that blinks error codes in Morse, clandestine radio operators transmitting intelligence, and steganographic audio tracks in malware samples. CTF authors frequently layer Morse inside other encodings (binary substitution, image metadata, WAV audio) precisely because analysts unfamiliar with the scheme waste time on the wrong decoding path. Understanding Morse recognition signatures prevents that time loss.

Core Concept

Each letter and digit in Morse is represented as a sequence of short signals (dots, written as .) and long signals (dashes, written as -). The duration of a dash is conventionally three times that of a dot. Letter separators are a single space between groups; word separators are a forward slash (/) or a triple space. The standard in use today is International Morse Code (ITU-R M.1677), which covers 26 Latin letters, 10 digits, and punctuation. American Morse (used historically on landline telegraph) differs in the representation of several letters and digits — notably the letter L (.-. in International, --- in American) and the digit 5.

In CTF challenges, Morse frequently appears in four surface forms: (1) literal dot-dash text; (2) 0/1 substitutions where 0 represents dots and 1 represents dashes (or vice versa); (3) audio WAV files where short tones are dots and long tones are dashes; and (4) visual flashing patterns in video or image sequences. Prosigns are Morse shorthand sequences not separated by letter spaces — the most recognisable is SOS (...---... concatenated without spaces), originally an emergency call.

Technical Deep-Dive

International Morse Code (partial):
A .-    B -...  C -.-.  D -..   E .     F ..-.
G --.   H ....  I ..    J .---  K -.-   L .-..
M --    N -.    O ---   P .--.  Q --.-  R .-.
S ...   T -     U ..-   V ...-  W .--   X -..-
Y -.--  Z --..

Digits:
0 -----  1 .----  2 ..---  3 ...--  4 ....-
5 .....  6 -....  7 --...  8 ---..  9 ----.

Example: .--. ...  .-  ... ...
Decoded: P      A   S   S  (word "PASS" split as demo only — not a flag)

SOS prosign: ...---...  (no internal letter spacing)

MORSE_TABLE = {
    ".-": "A", "-...": "B", "-.-.": "C", "-..": "D", ".": "E",
    "..-.": "F", "--.": "G", "....": "H", "..": "I", ".---": "J",
    "-.-": "K", ".-..": "L", "--": "M", "-.": "N", "---": "O",
    ".--.": "P", "--.-": "Q", ".-.": "R", "...": "S", "-": "T",
    "..-": "U", "...-": "V", ".--": "W", "-..-": "X", "-.--": "Y",
    "--..": "Z", ".----": "1", "..---": "2", "...--": "3",
    "....-": "4", ".....": "5", "-....": "6", "--...": "7",
    "---..": "8", "----.": "9", "-----": "0"
}

def decode_morse(text: str) -> str:
    # Normalise 0/1 substitution variants to ./- first
    text = text.replace("0", ".").replace("1", "-")
    words = text.strip().split(" / ")
    return " ".join(
        "".join(MORSE_TABLE.get(ch, "?") for ch in word.split())
        for word in words
    )

# Example usage
print(decode_morse(".--. .- ... ..."))   # PASS

# CyberChef CLI equivalent (using node + cyberchef-node)
# Recipe: "From Morse Code" with Letter delimiter=Space, Word delimiter=/
# Online: gchq.github.io/CyberChef — search "From Morse Code"

# Audio Morse: extract tone timings with aubio or scipy
python3 -c "
import scipy.io.wavfile as wav, numpy as np
rate, data = wav.read('signal.wav')
energy = np.abs(data).flatten()
# threshold to get on/off sequence, then measure durations
"

Analytical Methodology

Identify the surface form. Does the input contain literal . and - characters with spaces? If so, confirm the alphabet is restricted to ., -, /, and space. If it contains only 0 and 1, consider a Morse/binary substitution. If it is a WAV file, inspect the waveform for short/long tones.
Confirm separators. Locate word boundaries: a / or triple space between groups. If word boundaries are absent, try treating the entire sequence as a single word first, then look for length patterns.
Check variant. Compare decoded output against expected flag format. If output is garbled, try reversing dot/dash assignments (some challenges swap them), or attempt American Morse for letters that differ (L, O, R, etc.).
Use CyberChef. Apply "From Morse Code" recipe with appropriate Letter/Word delimiter settings. The recipe handles both ./- and configurable delimiters.
Handle binary encodings. Replace 0 → . and 1 → - (or vice versa) before decoding. A string of exactly 0 and 1 with spaces that does not decode as binary or ASCII is almost certainly Morse/binary.
Validate output. Confirm decoded text matches expected flag format or readable English. Unrecognised ? characters in the output indicate an incorrect variant or separator assumption.

Common Analytical Errors

Assuming International Morse when American Morse is intended. If the decoded text contains frequent ? outputs for valid-looking groups, check American Morse mappings for letters L, O, R, Z and digits 1–9 which differ significantly.
Misidentifying 0/1 assignment direction. If binary-looking Morse decodes to gibberish, reverse the substitution: try 1=dot, 0=dash instead of the natural reading.
Treating prosigns as letter sequences. SOS (...---...) written without spaces is a single prosign, not S + O + S. Some automated decoders will split it incorrectly into three separate letters.
Missing the word separator. When / is absent and only spaces are used, decoders may fail to find word boundaries. Manually insert / at apparent word breaks or inspect for triple-space sequences.
Conflating Morse audio pitch variants. Multiple carrier frequencies in a WAV do not indicate multiple encodings — only timing (duration ratio) matters. Analysts sometimes discard high-frequency audio assuming it is noise rather than signal.
Over-relying on automated decoders for edge cases. CyberChef and dcode.fr handle standard ITU Morse cleanly but may silently drop or mis-map punctuation prosigns. Always verify decoded punctuation manually against the Morse table.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0018	Knowledge of encryption algorithms used to protect data during transmission	Grounds learner in pre-digital signal encoding schemes that underpin understanding of classical cipher evolution
K0019	Knowledge of cryptography and key management concepts	Establishes understanding of keyless substitution codes as a baseline before key-based ciphers
K0305	Knowledge of encryption standards and various encryption algorithms	Contextualises Morse within the broader landscape of encoding vs. encryption standards
S0138	Skill in using defensive coding practices	Reinforces writing explicit, validated decoders rather than assuming input format
T0212	Perform penetration testing as required to evaluate information security	Develops the recognition and decoding skills used when evaluating covert channel indicators in pen-test targets