Browse CTFs New CTF Sign in

Analyzing WebSocket PCAP Captures via HTTP Upgrade Detection and Frame Payload Extraction

osint_collection Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Modern malware increasingly uses WebSocket connections for command-and-control (C2) communication because WebSocket traffic is nearly indistinguishable from normal HTTPS-upgraded connections to standard firewall rules — it uses TCP port 443, begins as an HTTP request, and is permitted through most corporate proxies. The Mekotio banking trojan (2020), several RAT families, and multiple post-exploitation frameworks including Covenant and Havoc use WebSocket-based C2 channels precisely for this reason. When a network forensics analyst encounters a persistent long-lived TCP connection on port 80 or 443 carrying small bidirectional messages at irregular intervals, WebSocket C2 is high on the differential. Identifying, dissecting, and extracting WebSocket frame content is a required skill for any analyst investigating advanced persistent threats or modern malware.

Core Concept

WebSocket is a full-duplex, persistent TCP channel initiated via an HTTP Upgrade handshake. The protocol is defined in RFC 6455.

Upgrade handshake: The client sends an HTTP GET with headers Upgrade: websocket, Connection: Upgrade, and Sec-WebSocket-Key: <base64_16bytes>. The server responds with 101 Switching Protocols and Sec-WebSocket-Accept: <SHA1(key + magic_guid) base64>. After the 101 response, the TCP connection carries WebSocket frames directly — no more HTTP.

Frame structure: each frame has a 2–10 byte header containing: FIN bit (1 = last fragment), RSV1-3 (extension bits, normally 0), opcode (4 bits: 0x1 = text, 0x2 = binary, 0x8 = close, 0x9 = ping, 0xA = pong), MASK bit (1 = client→server frames are XOR-masked), 7-bit payload length (with 16 or 64-bit extensions for large payloads), optional 4-byte masking key (if MASK=1), and payload data.

Client-to-server masking: per RFC 6455, all frames sent by the client MUST be masked. The masking key is a random 4-byte value included in the frame header. The payload is XOR-decoded: decoded[i] = masked[i] XOR masking_key[i % 4]. Server-to-client frames are NEVER masked. This means in a PCAP, client frames appear as binary noise until unmasked; server frames are directly readable.

C2 and exfiltration indicators: persistent connection with no HTTP requests after the 101 response; irregular low-volume message bursts (command/response pattern); binary frames carrying packed or encrypted data; DNS requests to unusual domains immediately before the WebSocket connection.

Technical Deep-Dive

# Identify WebSocket upgrade handshakes
tshark -r capture.pcap 
  -Y "http.upgrade == "websocket"" 
  -T fields -e frame.number -e frame.time_relative 
  -e ip.src -e ip.dst -e tcp.dstport 
  -e http.host -e http.request.uri

# View all WebSocket frames after upgrade
tshark -r capture.pcap -Y "websocket" -T fields 
  -e frame.number -e frame.time_relative -e ip.src -e ip.dst 
  -e websocket.opcode -e websocket.payload_length 
  -e websocket.masked -e websocket.text 
  -E header=y -E separator="|"

# Filter only text frames (opcode 0x1)
tshark -r capture.pcap 
  -Y "websocket.opcode == 1" 
  -T fields -e frame.time_relative -e ip.src -e websocket.text

# Follow the full WebSocket stream (upgrade + all frames)
# First find the TCP stream number:
tshark -r capture.pcap -Y "http.upgrade == "websocket"" 
  -T fields -e tcp.stream
# Then follow:
tshark -r capture.pcap -z "follow,tcp,ascii,4" 2>/dev/null | head -60
# Python: unmask client-to-server WebSocket frames from raw bytes
def unmask_websocket(masked_payload: bytes, masking_key: bytes) -> bytes:
    """XOR each byte with the corresponding masking key byte (cyclic)."""
    return bytes(masked_payload[i] ^ masking_key[i % 4]
                 for i in range(len(masked_payload)))

# Example: parse a raw WebSocket frame from a byte buffer
def parse_ws_frame(data: bytes):
    if len(data) < 2:
        return None
    b0, b1 = data[0], data[1]
    fin    = (b0 & 0x80) >> 7
    opcode = b0 & 0x0F
    masked = (b1 & 0x80) >> 7
    plen   = b1 & 0x7F

    offset = 2
    if plen == 126:
        plen = int.from_bytes(data[2:4], "big"); offset = 4
    elif plen == 127:
        plen = int.from_bytes(data[2:10], "big"); offset = 10

    mask_key = b""
    if masked:
        mask_key = data[offset:offset+4]; offset += 4

    payload = data[offset:offset+plen]
    if masked:
        payload = unmask_websocket(payload, mask_key)

    opcodes = {1:"text", 2:"binary", 8:"close", 9:"ping", 10:"pong"}
    print(f"FIN={fin} op={opcodes.get(opcode,opcode)} masked={masked} "
          f"len={plen} payload={payload[:80]!r}")
    return payload

# Use with Scapy raw TCP payload extraction:
from scapy.all import rdpcap, TCP, Raw
for pkt in rdpcap("capture.pcap"):
    if pkt.haslayer(Raw) and pkt.haslayer(TCP):
        data = bytes(pkt[Raw])
        if len(data) >= 2 and (data[0] & 0x0F) in (1,2,8,9,10):
            parse_ws_frame(data)

Analytical Methodology

  1. Open the PCAP in Wireshark. Apply display filter http.upgrade == "websocket" to identify all WebSocket upgrade handshakes. Note the target host (http.host), URI (http.request.uri), and timestamp of each upgrade.
  2. For each identified WebSocket upgrade, note the TCP stream number. Apply filter websocket to see all WebSocket frames. Wireshark automatically dissects frames after the 101 Switching Protocols response.
  3. In the WebSocket frame list, examine the websocket.opcode column: opcode 1 (text) frames are immediately readable in the Info column; opcode 2 (binary) frames require hex inspection; opcode 8 is connection close.
  4. Apply filter websocket.text to read all text-frame payloads directly — these often contain JSON command/response structures in C2 traffic.
  5. For client-to-server binary frames (masked), note that Wireshark automatically unmasks them using the masking key from the frame header. The websocket.payload field in the dissection shows the unmasked content.
  6. Right-click any WebSocket frame → Follow → TCP Stream to view the complete session: HTTP upgrade handshake at the top, followed by all frame content in stream order.
  7. For binary frames carrying opaque payloads, export raw bytes via tshark and process with the Python unmasking script. Analyse the unmasked payload with file command or Python binwii/magic for format identification.
  8. Use tcpdump (-A -s 0) to capture live WebSocket traffic for comparison against suspected C2 patterns: irregular bursts with consistent binary payload sizes often indicate C2 heartbeats or command delivery.
  9. Correlate the WebSocket destination IP/hostname with threat intelligence feeds and DNS query history in the same PCAP. A WebSocket connection to a recently-registered domain or known C2 IP confirms malicious use.

Common Analytical Errors

  • Filtering only on port 80 or 443: WebSocket can run on any port — malware frequently uses non-standard ports to avoid proxy inspection. Filter on the websocket dissector protocol rather than port.
  • Assuming client-frame binary noise means encryption: Client→server frames are always masked (RFC 6455 requirement), making them appear as random bytes. This is NOT encryption — it is a simple XOR operation with a visible key in the frame header. Wireshark unmasks automatically; raw hex view before Wireshark dissection shows the masked form.
  • Missing fragmented messages: Large WebSocket messages may be split into multiple frames with FIN=0. Only the final fragment has FIN=1. Reassembling the logical message requires concatenating all fragments with the same opcode's continuation frames.
  • Ignoring the HTTP headers in the upgrade request: The Origin:, Host:, Cookie:, and custom extension headers in the HTTP Upgrade request often reveal application context — the cookie may contain a session token, and the Host may disambiguate between legitimate and malicious use of the same IP.
  • Overlooking ping/pong frames as C2 heartbeats: Opcode 9 (ping) and 0xA (pong) frames are normally application-layer keepalives but are used as covert channel heartbeats in some implants. Regular ping/pong with a non-zero payload encodes data; baseline is an empty payload.

NICE Framework Alignment

Code Knowledge/Skill/Task Statement How This Card Develops It
K0046 Knowledge of intrusion detection systems and methodologies Recognising WebSocket-based C2 patterns: persistent connections, irregular frame bursts, binary payloads — and how IDS rules detect them
K0093 Knowledge of network protocols Understanding WebSocket's HTTP upgrade handshake, frame structure, masking mechanism, and opcodes at the protocol level
K0221 Knowledge of OSI model and network layers Relating WebSocket's application-layer framing to its underlying TCP stream at layer 4 and HTTP origin at layer 7
S0046 Skill in performing packet-level analysis Using Wireshark WebSocket dissector, tshark field extraction, and Python frame parsing to read and unmask WebSocket payloads
T0023 Collect intrusion artifacts for use in forensic analysis Extracting WebSocket command/response sequences, upgrade headers, and binary payloads as forensic artifacts of C2 communication

Further Reading

  • RFC 6455: The WebSocket Protocol — Fette & Melnikov (IETF)
  • The Web Application Hacker's Handbook, 2nd Edition — Stuttard & Pinto, Chapter 13: Attacking Other Users (Wiley) — WebSocket security context
  • Malware Traffic Analysis — Brad Duncan (malware-traffic-analysis.net) — case studies of WebSocket-based C2 in PCAP format

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.