Reconstructing HTTP Sessions via Multi-Request Correlation, Credential and Object Recovery

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

HTTP session reconstruction was central to the 2013 Target breach investigation. Forensic analysts replayed HTTP sessions captured from the point-of-sale network and recovered the exact sequence of requests made by the BlackPOS malware's exfiltration component — including the HTTP POST bodies carrying stolen card data to an internal drop server. The technique required reassembling multi-packet TCP streams, identifying session cookies, and extracting POST body content from a noisy PCAP containing legitimate retail traffic. Session reconstruction separates relevant attacker activity from background noise by following the logical user or process session, not just individual packets.

Core Concept

HTTP session reconstruction is the process of grouping individual HTTP request/response pairs into a single logical session representing one browser or application's interaction with a server. Sessions are bounded by session cookies (Set-Cookie → Cookie headers), persistent TCP connections (Connection: keep-alive), or TLS session IDs.

The analysis workflow has three stages. First, stream identification: HTTP/1.1 keep-alive reuses a single TCP connection for multiple requests. Each request/response pair must be identified within the stream by request method, path, and response code. Second, object extraction: images, scripts, documents, and archive files embedded in HTTP responses can be recovered from the PCAP as complete binary objects. Third, credential and token recovery: login POST bodies (username=X&password=Y or JSON), session cookie values, and Bearer tokens transmitted in headers are often the highest-value artifacts.

Tools: Wireshark Export HTTP Objects, tshark -z http,tree, NetworkMiner (automatic object extraction and session grouping), and Zeek (produces http.log with per-request metadata).

Technical Deep-Dive

# Summarise all HTTP transactions: method, host, URI, response code, size
tshark -r capture.pcap -Y "http.request || http.response" 
  -T fields 
  -e frame.time_relative 
  -e ip.src 
  -e ip.dst 
  -e http.request.method 
  -e http.host 
  -e http.request.uri 
  -e http.response.code 
  -e http.content_length 
  -E header=y -E separator=","

# Extract all HTTP POST bodies (credentials, file uploads, exfil data)
tshark -r capture.pcap -Y "http.request.method == "POST"" 
  -T fields -e http.host -e http.request.uri -e http.file_data

# List all Set-Cookie and Cookie headers for session tracking
tshark -r capture.pcap 
  -Y "http.set_cookie || http.cookie" 
  -T fields 
  -e ip.src -e ip.dst 
  -e http.set_cookie -e http.cookie

# Extract HTTP objects non-interactively (all MIME types)
mkdir -p http_objects
tshark -r capture.pcap --export-objects http,http_objects/
ls -lh http_objects/

# NetworkMiner (if installed): full automatic extraction
# mono NetworkMiner.exe -r capture.pcap
# Extracts to AssembledFiles/ directory with MIME type detection

from scapy.all import rdpcap, TCP, Raw, IP
import re

def extract_http_sessions(pcap_path: str) -> list[dict]:
    """Extract HTTP request metadata from a PCAP file."""
    packets = rdpcap(pcap_path)
    sessions: list[dict] = []

    for pkt in packets:
        if not (pkt.haslayer(TCP) and pkt.haslayer(Raw)):
            continue
        payload = bytes(pkt[Raw])
        try:
            text = payload.decode("utf-8", errors="ignore")
        except Exception:
            continue

        # Match HTTP request lines
        match = re.match(r'(GET|POST|PUT|DELETE|PATCH|HEAD) (S+) HTTP/[d.]+
', text)
        if not match:
            continue

        method, path = match.group(1), match.group(2)
        host    = re.search(r'Host: ([^
]+)', text)
        cookie  = re.search(r'Cookie: ([^
]+)', text)
        body_m  = re.search(r'

(.*)', text, re.DOTALL)

        sessions.append({
            "src":    pkt[IP].src,
            "dst":    pkt[IP].dst,
            "method": method,
            "host":   host.group(1) if host else "",
            "path":   path,
            "cookie": cookie.group(1) if cookie else "",
            "body":   (body_m.group(1)[:200] if body_m else ""),
        })

    return sessions

for sess in extract_http_sessions("capture.pcap"):
    if sess["method"] == "POST":
        print(f"POST {sess['host']}{sess['path']}")
        if sess["body"]:
            print(f"  Body: {sess['body']!r}")
        if sess["cookie"]:
            print(f"  Cookie: {sess['cookie'][:80]}")

Analytical Methodology

Apply display filter http in Wireshark to isolate all HTTP traffic. Use Statistics → HTTP → Requests to get an overview of all hosts and URIs requested in the capture.
Identify session cookies: filter http.set_cookie to find Set-Cookie headers. Note the cookie name and value. Then filter http.cookie contains "<cookie_value>" to group all subsequent requests belonging to that session.
Inspect all POST requests: filter http.request.method == "POST". For each, view the full payload using Follow → TCP Stream. Look for form-encoded credentials (username=, password=, email=), JSON login bodies, and data exfiltration payloads.
Extract HTTP objects: in Wireshark, File → Export Objects → HTTP. This extracts all HTTP response bodies (images, scripts, archives, documents) as individual files. Run file * on the extracted directory to identify types by magic bytes.
In NetworkMiner, load the PCAP and examine the Files tab for automatically extracted objects and the Sessions tab for grouped request sequences. The Credentials tab surfaces cleartext credentials from HTTP Basic Auth and form POST bodies automatically.
For chunked transfer encoding, verify reassembly: Wireshark handles chunk reassembly automatically, but tshark --export-objects may miss partial chunks. Compare exported file sizes against Content-Length headers to detect truncation.
Reconstruct the browsing timeline: export the HTTP request list to CSV and sort by frame.time_relative. Build a chronological narrative of what the client requested and what the server returned, noting any redirect chains (301/302 → destination).

Common Analytical Errors

Analysing packets instead of streams: HTTP is a stream protocol. Individual packets contain partial request/response data. Always follow the TCP stream or use tshark field extraction — never parse raw packet bytes for HTTP content.
Missing persistent connection multiplexing: HTTP/1.1 keep-alive sends multiple requests over one TCP connection. The second request immediately follows the first response. Without tracking Content-Length or chunked encoding boundaries, request/response pairs become interleaved and confusing.
Overlooking compression: HTTP responses often use gzip (Content-Encoding: gzip). Wireshark decompresses automatically; tshark --export-objects also decompresses. If processing raw bytes with Python, decompress with zlib.decompress(data, 16 + zlib.MAX_WBITS) before parsing.
Ignoring redirect chains: An initial GET to a login page may redirect through two or three 302 responses before reaching the authenticated session. The session cookie may be set at any point in the chain. Trace all redirect hops.
Focusing only on GET requests: Data exfiltration overwhelmingly uses POST or PUT. Filtering exclusively on GET misses the most forensically significant requests.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0046	Knowledge of intrusion detection systems and methodologies	Identifying exfiltration patterns and credential theft in HTTP sessions that web application firewalls and IDS rules detect
K0093	Knowledge of network protocols	Understanding HTTP/1.1 request/response mechanics, keep-alive multiplexing, cookie session management, and MIME object encoding
K0221	Knowledge of OSI model and network layers	HTTP operates at layer 7; session reconstruction requires understanding how layer-4 TCP stream reassembly underlies layer-7 session analysis
S0046	Skill in performing packet-level analysis	Using Wireshark Export Objects, tshark field extraction, NetworkMiner session grouping, and Python stream parsing to reconstruct HTTP sessions
T0023	Collect intrusion artifacts for use in forensic analysis	Recovering session cookies, credential submissions, exfiltrated data payloads, and transferred files as structured forensic artifacts