Analyzing WebSocket PCAP Captures via HTTP Upgrade Detection and Frame Payload Extraction
Theory
Why This Matters
Modern malware increasingly uses WebSocket connections for command-and-control (C2) communication because WebSocket traffic is nearly indistinguishable from normal HTTPS-upgraded connections to standard firewall rules — it uses TCP port 443, begins as an HTTP request, and is permitted through most corporate proxies. The Mekotio banking trojan (2020), several RAT families, and multiple post-exploitation frameworks including Covenant and Havoc use WebSocket-based C2 channels precisely for this reason. When a network forensics analyst encounters a persistent long-lived TCP connection on port 80 or 443 carrying small bidirectional messages at irregular intervals, WebSocket C2 is high on the differential. Identifying, dissecting, and extracting WebSocket frame content is a required skill for any analyst investigating advanced persistent threats or modern malware.
Core Concept
WebSocket is a full-duplex, persistent TCP channel initiated via an HTTP Upgrade handshake. The protocol is defined in RFC 6455.
Upgrade handshake: The client sends an HTTP GET with headers Upgrade: websocket, Connection: Upgrade, and Sec-WebSocket-Key: <base64_16bytes>. The server responds with 101 Switching Protocols and Sec-WebSocket-Accept: <SHA1(key + magic_guid) base64>. After the 101 response, the TCP connection carries WebSocket frames directly — no more HTTP.
Frame structure: each frame has a 2–10 byte header containing: FIN bit (1 = last fragment), RSV1-3 (extension bits, normally 0), opcode (4 bits: 0x1 = text, 0x2 = binary, 0x8 = close, 0x9 = ping, 0xA = pong), MASK bit (1 = client→server frames are XOR-masked), 7-bit payload length (with 16 or 64-bit extensions for large payloads), optional 4-byte masking key (if MASK=1), and payload data.
Client-to-server masking: per RFC 6455, all frames sent by the client MUST be masked. The masking key is a random 4-byte value included in the frame header. The payload is XOR-decoded: decoded[i] = masked[i] XOR masking_key[i % 4]. Server-to-client frames are NEVER masked. This means in a PCAP, client frames appear as binary noise until unmasked; server frames are directly readable.
C2 and exfiltration indicators: persistent connection with no HTTP requests after the 101 response; irregular low-volume message bursts (command/response pattern); binary frames carrying packed or encrypted data; DNS requests to unusual domains immediately before the WebSocket connection.
Technical Deep-Dive
# Identify WebSocket upgrade handshakes
tshark -r capture.pcap
-Y "http.upgrade == "websocket""
-T fields -e frame.number -e frame.time_relative
-e ip.src -e ip.dst -e tcp.dstport
-e http.host -e http.request.uri
# View all WebSocket frames after upgrade
tshark -r capture.pcap -Y "websocket" -T fields
-e frame.number -e frame.time_relative -e ip.src -e ip.dst
-e websocket.opcode -e websocket.payload_length
-e websocket.masked -e websocket.text
-E header=y -E separator="|"
# Filter only text frames (opcode 0x1)
tshark -r capture.pcap
-Y "websocket.opcode == 1"
-T fields -e frame.time_relative -e ip.src -e websocket.text
# Follow the full WebSocket stream (upgrade + all frames)
# First find the TCP stream number:
tshark -r capture.pcap -Y "http.upgrade == "websocket""
-T fields -e tcp.stream
# Then follow:
tshark -r capture.pcap -z "follow,tcp,ascii,4" 2>/dev/null | head -60
# Python: unmask client-to-server WebSocket frames from raw bytes
def unmask_websocket(masked_payload: bytes, masking_key: bytes) -> bytes:
"""XOR each byte with the corresponding masking key byte (cyclic)."""
return bytes(masked_payload[i] ^ masking_key[i % 4]
for i in range(len(masked_payload)))
# Example: parse a raw WebSocket frame from a byte buffer
def parse_ws_frame(data: bytes):
if len(data) < 2:
return None
b0, b1 = data[0], data[1]
fin = (b0 & 0x80) >> 7
opcode = b0 & 0x0F
masked = (b1 & 0x80) >> 7
plen = b1 & 0x7F
offset = 2
if plen == 126:
plen = int.from_bytes(data[2:4], "big"); offset = 4
elif plen == 127:
plen = int.from_bytes(data[2:10], "big"); offset = 10
mask_key = b""
if masked:
mask_key = data[offset:offset+4]; offset += 4
payload = data[offset:offset+plen]
if masked:
payload = unmask_websocket(payload, mask_key)
opcodes = {1:"text", 2:"binary", 8:"close", 9:"ping", 10:"pong"}
print(f"FIN={fin} op={opcodes.get(opcode,opcode)} masked={masked} "
f"len={plen} payload={payload[:80]!r}")
return payload
# Use with Scapy raw TCP payload extraction:
from scapy.all import rdpcap, TCP, Raw
for pkt in rdpcap("capture.pcap"):
if pkt.haslayer(Raw) and pkt.haslayer(TCP):
data = bytes(pkt[Raw])
if len(data) >= 2 and (data[0] & 0x0F) in (1,2,8,9,10):
parse_ws_frame(data)
Analytical Methodology
- Open the PCAP in Wireshark. Apply display filter
http.upgrade == "websocket"to identify all WebSocket upgrade handshakes. Note the target host (http.host), URI (http.request.uri), and timestamp of each upgrade. - For each identified WebSocket upgrade, note the TCP stream number. Apply filter
websocketto see all WebSocket frames. Wireshark automatically dissects frames after the 101 Switching Protocols response. - In the WebSocket frame list, examine the websocket.opcode column: opcode 1 (text) frames are immediately readable in the Info column; opcode 2 (binary) frames require hex inspection; opcode 8 is connection close.
- Apply filter
websocket.textto read all text-frame payloads directly — these often contain JSON command/response structures in C2 traffic. - For client-to-server binary frames (masked), note that Wireshark automatically unmasks them using the masking key from the frame header. The
websocket.payloadfield in the dissection shows the unmasked content. - Right-click any WebSocket frame → Follow → TCP Stream to view the complete session: HTTP upgrade handshake at the top, followed by all frame content in stream order.
- For binary frames carrying opaque payloads, export raw bytes via tshark and process with the Python unmasking script. Analyse the unmasked payload with
filecommand or Pythonbinwii/magicfor format identification. - Use tcpdump (
-A -s 0) to capture live WebSocket traffic for comparison against suspected C2 patterns: irregular bursts with consistent binary payload sizes often indicate C2 heartbeats or command delivery. - Correlate the WebSocket destination IP/hostname with threat intelligence feeds and DNS query history in the same PCAP. A WebSocket connection to a recently-registered domain or known C2 IP confirms malicious use.
Common Analytical Errors
- Filtering only on port 80 or 443: WebSocket can run on any port — malware frequently uses non-standard ports to avoid proxy inspection. Filter on the
websocketdissector protocol rather than port. - Assuming client-frame binary noise means encryption: Client→server frames are always masked (RFC 6455 requirement), making them appear as random bytes. This is NOT encryption — it is a simple XOR operation with a visible key in the frame header. Wireshark unmasks automatically; raw hex view before Wireshark dissection shows the masked form.
- Missing fragmented messages: Large WebSocket messages may be split into multiple frames with
FIN=0. Only the final fragment hasFIN=1. Reassembling the logical message requires concatenating all fragments with the same opcode's continuation frames. - Ignoring the HTTP headers in the upgrade request: The
Origin:,Host:,Cookie:, and custom extension headers in the HTTP Upgrade request often reveal application context — the cookie may contain a session token, and the Host may disambiguate between legitimate and malicious use of the same IP. - Overlooking ping/pong frames as C2 heartbeats: Opcode 9 (ping) and 0xA (pong) frames are normally application-layer keepalives but are used as covert channel heartbeats in some implants. Regular ping/pong with a non-zero payload encodes data; baseline is an empty payload.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0046 | Knowledge of intrusion detection systems and methodologies | Recognising WebSocket-based C2 patterns: persistent connections, irregular frame bursts, binary payloads — and how IDS rules detect them |
| K0093 | Knowledge of network protocols | Understanding WebSocket's HTTP upgrade handshake, frame structure, masking mechanism, and opcodes at the protocol level |
| K0221 | Knowledge of OSI model and network layers | Relating WebSocket's application-layer framing to its underlying TCP stream at layer 4 and HTTP origin at layer 7 |
| S0046 | Skill in performing packet-level analysis | Using Wireshark WebSocket dissector, tshark field extraction, and Python frame parsing to read and unmask WebSocket payloads |
| T0023 | Collect intrusion artifacts for use in forensic analysis | Extracting WebSocket command/response sequences, upgrade headers, and binary payloads as forensic artifacts of C2 communication |
Further Reading
- RFC 6455: The WebSocket Protocol — Fette & Melnikov (IETF)
- The Web Application Hacker's Handbook, 2nd Edition — Stuttard & Pinto, Chapter 13: Attacking Other Users (Wiley) — WebSocket security context
- Malware Traffic Analysis — Brad Duncan (malware-traffic-analysis.net) — case studies of WebSocket-based C2 in PCAP format
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.