Analyzing WebSocket Data Exfiltration via Frame Demasking, Payload Extraction and Protocol Reconstruction
Theory
Why This Matters
WebSocket-based data exfiltration was identified in the 2022 Okta breach investigation, where threat actors used persistent WebSocket connections through compromised support tooling to exfiltrate session data. Because WebSocket traffic shares port 443 with HTTPS and appears as a legitimate HTTP upgrade, it is frequently permitted through web proxies and firewalls without deep inspection. An analyst who knows how to identify WebSocket upgrades, follow the binary frame stream, and decode the XOR-masked client payloads can recover exfiltrated data that would otherwise appear as opaque encrypted traffic.
Core Concept
WebSocket (RFC 6455) begins as an HTTP/1.1 connection. The client sends an HTTP Upgrade request:
GET /ws HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
The server responds with HTTP/1.1 101 Switching Protocols. After the 101, the TCP connection carries raw WebSocket frames — no longer HTTP.
WebSocket frame structure (RFC 6455 Section 5.2):
| Bit | Field |
|---|---|
| 0 | FIN (1 = final fragment) |
| 1–3 | RSV1–RSV3 (reserved; RSV1=1 indicates per-message compression) |
| 4–7 | Opcode: 0=continuation, 1=text, 2=binary, 8=close, 9=ping, 10=pong |
| 8 | MASK (1 = payload is masked — mandatory for client→server frames) |
| 9–15 | Payload length (7 bits; 126 = next 2 bytes; 127 = next 8 bytes) |
| Masking key | 4 bytes (if MASK=1) |
| Payload | Masked or unmasked data |
Masking: client→server frames are always masked (RFC 6455 mandates this). Server→client frames are never masked. The mask algorithm: payload[i] ^ masking_key[i % 4].
Exfiltrated data travels in the payload of binary (opcode 2) or text (opcode 1) frames from client to server, masked. Recovering the plaintext requires extracting the 4-byte masking key and XOR-decoding each payload byte.
Technical Deep-Dive
# Identify WebSocket upgrade in a PCAP
tshark -r capture.pcap
-Y "http.upgrade == "websocket" or http.response.code == 101"
-T fields -e frame.number -e frame.time_relative
-e ip.src -e ip.dst -e http.request.uri
-e http.upgrade -e http.response.code
# Display all WebSocket frames after the upgrade
tshark -r capture.pcap -Y "websocket"
-T fields
-e frame.number -e frame.time_relative
-e ip.src -e ip.dst
-e websocket.fin -e websocket.opcode
-e websocket.mask -e websocket.masking_key
-e websocket.payload
-E header=y
# Follow the TCP stream containing the WebSocket (stream index identified first)
tshark -r capture.pcap -Y "http.upgrade == "websocket""
-T fields -e tcp.stream | head -1
# Then: tshark -r capture.pcap -q -z "follow,tcp,ascii,<stream_index>"
#!/usr/bin/env python3
"""
Decode WebSocket frames from a raw TCP stream or tshark hex output.
Handles masking, multi-byte length fields, and fragmentation.
"""
import struct, sys
def decode_ws_frame(data: bytes):
"""
Parse one WebSocket frame from bytes.
Returns: (fin, opcode, masked, payload_bytes, bytes_consumed)
"""
if len(data) < 2:
return None
byte0, byte1 = data[0], data[1]
fin = (byte0 >> 7) & 1
opcode = byte0 & 0x0F
masked = (byte1 >> 7) & 1
length = byte1 & 0x7F
offset = 2
if length == 126:
if len(data) < offset + 2: return None
length = struct.unpack(">H", data[offset:offset+2])[0]
offset += 2
elif length == 127:
if len(data) < offset + 8: return None
length = struct.unpack(">Q", data[offset:offset+8])[0]
offset += 8
mask_key = b""
if masked:
if len(data) < offset + 4: return None
mask_key = data[offset:offset+4]
offset += 4
if len(data) < offset + length:
return None
raw_payload = data[offset:offset+length]
if masked:
payload = bytes(b ^ mask_key[i % 4] for i, b in enumerate(raw_payload))
else:
payload = raw_payload
return fin, opcode, masked, payload, offset + length
OPCODE_NAMES = {0: "continuation", 1: "text", 2: "binary",
8: "close", 9: "ping", 10: "pong"}
def process_ws_stream(raw_bytes: bytes, direction: str = "client->server"):
"""Process a sequence of WebSocket frames from a byte stream."""
pos = 0
while pos < len(raw_bytes):
result = decode_ws_frame(raw_bytes[pos:])
if result is None:
break
fin, opcode, masked, payload, consumed = result
opname = OPCODE_NAMES.get(opcode, f"opcode-{opcode}")
try:
text = payload.decode("utf-8")
except Exception:
text = payload.hex()
print(f"[{direction}] FIN={fin} op={opname} mask={masked} "
f"len={len(payload)} payload={text[:120]}")
pos += consumed
# Example: read raw TCP stream bytes (after HTTP 101) from a file
stream_data = open("ws_stream.bin", "rb").read()
process_ws_stream(stream_data)
# Extract only client-to-server masked frames and batch-decode:
tshark -r capture.pcap
-Y "websocket and websocket.mask == 1"
-T fields -e websocket.masking_key -e websocket.payload
| python3 -c "
import sys
for line in sys.stdin:
parts = line.strip().split(chr(9))
if len(parts) < 2: continue
mask = bytes.fromhex(parts[0].replace(':','))
payload_hex = parts[1].replace(':',')
if not payload_hex: continue
raw = bytes.fromhex(payload_hex)
decoded = bytes(b ^ mask[i % 4] for i,b in enumerate(raw))
try: print(decoded.decode('utf-8'))
except: print(decoded.hex())
"
Analytical Methodology
- Apply Wireshark filter
http.upgrade == "websocket"to locate the HTTP 101 Switching Protocols frame. This frame contains the URI path of the WebSocket endpoint — a non-standard path (e.g.,/telemetry,/sync,/update) on an unexpected domain is an exfiltration indicator. - Note the TCP stream index of the WebSocket connection. Apply filter
websocketor follow the TCP stream after the 101 frame to see all subsequent WebSocket frames. - For each WebSocket frame, examine the opcode: text frames (1) carry UTF-8 encoded data; binary frames (2) carry arbitrary bytes. Exfiltrated files typically use binary frames; commands often use text/JSON frames.
- Identify the FIN bit pattern: FIN=1 means the frame is a complete message. FIN=0 indicates a fragmented message continued in subsequent continuation frames (opcode 0). Reassemble fragmented messages before attempting to decode.
- For client→server frames (MASK=1): extract the 4-byte masking key from the frame header. XOR-decode the payload:
payload[i] ^ mask_key[i % 4]. The decoded bytes are the actual message content. - For server→client frames (MASK=0): payload is already plaintext — no decoding required.
- Examine decoded payloads for structure: JSON objects suggest application-level protocol messages; binary blobs may be file chunks (look for magic bytes), compressed data (zlib magic
78 9c), or base64-encoded content. - Measure frame timing and size patterns: regular large binary frames at fixed intervals suggest automated data exfiltration; variable-size frames with short intervals suggest interactive command-and-control.
Common Analytical Errors
- Filtering on port 80 or 443 for WebSocket: WebSocket shares the HTTP port and is identified by the protocol upgrade, not the port. Use the
websocketdisplay filter or search for HTTP 101 responses, not port numbers. - Forgetting fragmentation: WebSocket messages can span multiple frames. A large file will be chunked into fragments with FIN=0, terminated by a final fragment with FIN=1. Analysing each frame independently yields partial data; always collect all fragments for a message before decoding.
- Applying the mask key across fragment boundaries incorrectly: Each WebSocket frame has its own independent masking key, even within a fragmented message. Do not reuse the first frame's masking key for subsequent continuation frames.
- Overlooking WebSocket compression (permessage-deflate): If RSV1=1 in the first fragment of a message, the payload is deflate-compressed after unmasking. Decompress with
zlib.decompress(payload, -15)(raw deflate, negative window size) before treating as plaintext.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0046 | Knowledge of intrusion detection systems and methodologies | WebSocket exfiltration evades content-inspection IDS; pattern-based and behavioral detection are required |
| K0093 | Knowledge of network protocols | WebSocket RFC 6455 frame format, masking algorithm, fragmentation, and opcode semantics |
| K0221 | Knowledge of OSI model and network layers | WebSocket begins as HTTP (layer 7) and transitions to a raw framing protocol over TCP (layer 4) |
| S0046 | Skill in performing packet-level analysis | Parsing binary WebSocket frame headers, extracting masking keys, and XOR-decoding payload bytes from PCAP |
| T0023 | Collect intrusion artifacts for use in forensic analysis | Decoded WebSocket payloads are forensic artifacts establishing what data was exfiltrated and through which endpoint |
Further Reading
- RFC 6455: The WebSocket Protocol — Section 5 (Data Framing) for complete frame format specification
- Wireshark Wiki: WebSocket dissector documentation
- OWASP: Testing WebSockets (WSTG-CLNT-10) — attack patterns that produce forensic artifacts
- Portswigger Web Security Academy: WebSocket security vulnerabilities — understanding attacker tooling
- SANS: "WebSocket Forensics" (FOR508 network traffic module)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.