Reconstructing HTTP Sessions via Multi-Request Correlation, Credential and Object Recovery
Theory
Why This Matters
HTTP session reconstruction was central to the 2013 Target breach investigation. Forensic analysts replayed HTTP sessions captured from the point-of-sale network and recovered the exact sequence of requests made by the BlackPOS malware's exfiltration component — including the HTTP POST bodies carrying stolen card data to an internal drop server. The technique required reassembling multi-packet TCP streams, identifying session cookies, and extracting POST body content from a noisy PCAP containing legitimate retail traffic. Session reconstruction separates relevant attacker activity from background noise by following the logical user or process session, not just individual packets.
Core Concept
HTTP session reconstruction is the process of grouping individual HTTP request/response pairs into a single logical session representing one browser or application's interaction with a server. Sessions are bounded by session cookies (Set-Cookie → Cookie headers), persistent TCP connections (Connection: keep-alive), or TLS session IDs.
The analysis workflow has three stages. First, stream identification: HTTP/1.1 keep-alive reuses a single TCP connection for multiple requests. Each request/response pair must be identified within the stream by request method, path, and response code. Second, object extraction: images, scripts, documents, and archive files embedded in HTTP responses can be recovered from the PCAP as complete binary objects. Third, credential and token recovery: login POST bodies (username=X&password=Y or JSON), session cookie values, and Bearer tokens transmitted in headers are often the highest-value artifacts.
Tools: Wireshark Export HTTP Objects, tshark -z http,tree, NetworkMiner (automatic object extraction and session grouping), and Zeek (produces http.log with per-request metadata).
Technical Deep-Dive
# Summarise all HTTP transactions: method, host, URI, response code, size
tshark -r capture.pcap -Y "http.request || http.response"
-T fields
-e frame.time_relative
-e ip.src
-e ip.dst
-e http.request.method
-e http.host
-e http.request.uri
-e http.response.code
-e http.content_length
-E header=y -E separator=","
# Extract all HTTP POST bodies (credentials, file uploads, exfil data)
tshark -r capture.pcap -Y "http.request.method == "POST""
-T fields -e http.host -e http.request.uri -e http.file_data
# List all Set-Cookie and Cookie headers for session tracking
tshark -r capture.pcap
-Y "http.set_cookie || http.cookie"
-T fields
-e ip.src -e ip.dst
-e http.set_cookie -e http.cookie
# Extract HTTP objects non-interactively (all MIME types)
mkdir -p http_objects
tshark -r capture.pcap --export-objects http,http_objects/
ls -lh http_objects/
# NetworkMiner (if installed): full automatic extraction
# mono NetworkMiner.exe -r capture.pcap
# Extracts to AssembledFiles/ directory with MIME type detection
from scapy.all import rdpcap, TCP, Raw, IP
import re
def extract_http_sessions(pcap_path: str) -> list[dict]:
"""Extract HTTP request metadata from a PCAP file."""
packets = rdpcap(pcap_path)
sessions: list[dict] = []
for pkt in packets:
if not (pkt.haslayer(TCP) and pkt.haslayer(Raw)):
continue
payload = bytes(pkt[Raw])
try:
text = payload.decode("utf-8", errors="ignore")
except Exception:
continue
# Match HTTP request lines
match = re.match(r'(GET|POST|PUT|DELETE|PATCH|HEAD) (S+) HTTP/[d.]+
', text)
if not match:
continue
method, path = match.group(1), match.group(2)
host = re.search(r'Host: ([^
]+)', text)
cookie = re.search(r'Cookie: ([^
]+)', text)
body_m = re.search(r'
(.*)', text, re.DOTALL)
sessions.append({
"src": pkt[IP].src,
"dst": pkt[IP].dst,
"method": method,
"host": host.group(1) if host else "",
"path": path,
"cookie": cookie.group(1) if cookie else "",
"body": (body_m.group(1)[:200] if body_m else ""),
})
return sessions
for sess in extract_http_sessions("capture.pcap"):
if sess["method"] == "POST":
print(f"POST {sess['host']}{sess['path']}")
if sess["body"]:
print(f" Body: {sess['body']!r}")
if sess["cookie"]:
print(f" Cookie: {sess['cookie'][:80]}")
Analytical Methodology
- Apply display filter
httpin Wireshark to isolate all HTTP traffic. Use Statistics → HTTP → Requests to get an overview of all hosts and URIs requested in the capture. - Identify session cookies: filter
http.set_cookieto findSet-Cookieheaders. Note the cookie name and value. Then filterhttp.cookie contains "<cookie_value>"to group all subsequent requests belonging to that session. - Inspect all POST requests: filter
http.request.method == "POST". For each, view the full payload using Follow → TCP Stream. Look for form-encoded credentials (username=,password=,email=), JSON login bodies, and data exfiltration payloads. - Extract HTTP objects: in Wireshark, File → Export Objects → HTTP. This extracts all HTTP response bodies (images, scripts, archives, documents) as individual files. Run
file *on the extracted directory to identify types by magic bytes. - In NetworkMiner, load the PCAP and examine the Files tab for automatically extracted objects and the Sessions tab for grouped request sequences. The Credentials tab surfaces cleartext credentials from HTTP Basic Auth and form POST bodies automatically.
- For chunked transfer encoding, verify reassembly: Wireshark handles chunk reassembly automatically, but tshark
--export-objectsmay miss partial chunks. Compare exported file sizes againstContent-Lengthheaders to detect truncation. - Reconstruct the browsing timeline: export the HTTP request list to CSV and sort by
frame.time_relative. Build a chronological narrative of what the client requested and what the server returned, noting any redirect chains (301/302 → destination).
Common Analytical Errors
- Analysing packets instead of streams: HTTP is a stream protocol. Individual packets contain partial request/response data. Always follow the TCP stream or use tshark field extraction — never parse raw packet bytes for HTTP content.
- Missing persistent connection multiplexing: HTTP/1.1 keep-alive sends multiple requests over one TCP connection. The second request immediately follows the first response. Without tracking Content-Length or chunked encoding boundaries, request/response pairs become interleaved and confusing.
- Overlooking compression: HTTP responses often use gzip (
Content-Encoding: gzip). Wireshark decompresses automatically; tshark--export-objectsalso decompresses. If processing raw bytes with Python, decompress withzlib.decompress(data, 16 + zlib.MAX_WBITS)before parsing. - Ignoring redirect chains: An initial GET to a login page may redirect through two or three 302 responses before reaching the authenticated session. The session cookie may be set at any point in the chain. Trace all redirect hops.
- Focusing only on GET requests: Data exfiltration overwhelmingly uses POST or PUT. Filtering exclusively on GET misses the most forensically significant requests.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0046 | Knowledge of intrusion detection systems and methodologies | Identifying exfiltration patterns and credential theft in HTTP sessions that web application firewalls and IDS rules detect |
| K0093 | Knowledge of network protocols | Understanding HTTP/1.1 request/response mechanics, keep-alive multiplexing, cookie session management, and MIME object encoding |
| K0221 | Knowledge of OSI model and network layers | HTTP operates at layer 7; session reconstruction requires understanding how layer-4 TCP stream reassembly underlies layer-7 session analysis |
| S0046 | Skill in performing packet-level analysis | Using Wireshark Export Objects, tshark field extraction, NetworkMiner session grouping, and Python stream parsing to reconstruct HTTP sessions |
| T0023 | Collect intrusion artifacts for use in forensic analysis | Recovering session cookies, credential submissions, exfiltrated data payloads, and transferred files as structured forensic artifacts |
Further Reading
- The Web Application Hacker's Handbook — Stuttard & Pinto, Chapter 3: HTTP Protocol Mechanics (Wiley) — session management fundamentals
- Network Forensics: Tracking Hackers Through Cyberspace — Sherri Davidoff & Jonathan Ham, Chapter 9: Web Forensics (Prentice Hall)
- Wireshark Wiki: "Export Objects" — complete guide to HTTP object extraction
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.