Reconstructing SIP Call Dialogs and Extracting RTP Stream Parameters for VoIP Forensics
Theory
Why This Matters
A 2019 FBI public advisory documented a wave of VoIP fraud cases in which attackers intercepted SIP REGISTER traffic on hotel and enterprise networks using ARP spoofing, captured SIP Digest authentication credentials from the captured PCAP, and used offline cracking to recover passwords — then resold the compromised SIP accounts for international toll fraud generating millions of dollars in charges. In corporate espionage cases, call recordings have been reconstructed entirely from captured RTP streams in network forensics engagements. The VoIP infrastructure of any organisation running on-premises or cloud PBX is a target for eavesdropping, fraud, and denial of service. Analysts must be able to dissect SIP signalling, extract authentication material, and reconstruct audio streams from PCAP evidence.
Core Concept
SIP (Session Initiation Protocol) is a text-based application-layer protocol (port 5060 UDP/TCP, port 5061 TLS) that establishes, modifies, and terminates VoIP sessions. It is architecturally similar to HTTP: requests have a method and headers; responses have a three-digit status code.
The call flow for a basic call: INVITE (caller initiates) → 100 Trying → 180 Ringing → 200 OK (callee answers) → ACK (caller confirms) → [media flows via RTP] → BYE (either party ends) → 200 OK. The SIP INVITE message body contains an SDP (Session Description Protocol) payload that negotiates codec, media IP, and RTP port for the actual audio.
SIP REGISTER is used by phones to register with the PBX. SIP Digest Authentication follows HTTP Digest: the server challenges with 401 Unauthorized containing WWW-Authenticate: Digest realm="...", nonce="...". The client responds with Authorization: Digest username="...", realm="...", nonce="...", uri="...", response="<MD5_hash>". The MD5 response is computed as MD5(MD5(user:realm:pass):nonce:MD5(method:uri)). With the nonce and all other fields visible in plaintext, hashcat mode 11400 cracks the password offline.
RTP (Real-time Transport Protocol) carries the actual audio. It flows on the UDP port pair negotiated in SDP (typically ephemeral high ports). Wireshark can play back decoded RTP audio directly.
Technical Deep-Dive
# List all SIP methods and response codes in the capture
tshark -r capture.pcap -Y "sip" -T fields
-e frame.number -e frame.time_relative -e ip.src -e ip.dst
-e sip.CSeq.method -e sip.Status-Code -e sip.reason-phrase
-E header=y -E separator="|"
# Extract SIP Digest Authentication fields for hashcat
tshark -r capture.pcap -Y "sip.Authorization" -T fields
-e frame.time_relative -e ip.src
-e sip.auth.username -e sip.auth.realm
-e sip.auth.nonce -e sip.auth.uri
-e sip.auth.response -e sip.CSeq.method
-E header=y -E separator=" "
# Format for hashcat -m 11400:
# username:realm:nonce:uri:response (method embedded differently — see hashcat wiki)
# Find all REGISTER requests (credential usage)
tshark -r capture.pcap -Y "sip.CSeq.method == "REGISTER""
-T fields -e frame.time_relative
-e ip.src -e sip.from.user -e sip.to.user
# Identify RTP streams and their parameters
tshark -r capture.pcap -Y "rtp" -T fields
-e frame.time_relative -e ip.src -e ip.dst
-e udp.srcport -e udp.dstport
-e rtp.ssrc -e rtp.payload_type -e rtp.seq
| sort -u -k5,5 | head -20 # unique SSRC = unique RTP stream
# Extract SDP body from INVITE to find negotiated media ports
tshark -r capture.pcap -Y "sip.CSeq.method == "INVITE" and sdp"
-T fields -e sip.from.user -e sip.to.user
-e sdp.connection_info.address -e sdp.media.port -e sdp.media.format
# Hashcat SIP Digest format (mode 11400):
# username:realm:nonce:cnonce:nc:qop:response:method:uri:realm
# Example crack command:
# hashcat -m 11400 digest_hashes.txt wordlist.txt
#
# Extract hash string from tshark output:
# Format: username:realm::::<empty cnonce fields>::response:REGISTER:sip:realm
# See hashcat example_hashes wiki for exact field ordering
# Python: parse SIP Digest fields from tshark TSV output
import sys, hashlib
# Reconstruct what the SIP Digest hash covers (for verification)
def sip_digest(user, realm, password, nonce, method, uri):
ha1 = hashlib.md5(f"{user}:{realm}:{password}".encode()).hexdigest()
ha2 = hashlib.md5(f"{method}:{uri}".encode()).hexdigest()
response = hashlib.md5(f"{ha1}:{nonce}:{ha2}".encode()).hexdigest()
return response
# Verify a captured response matches a candidate password
user, realm, nonce, method, uri = "alice", "corp.com", "abc123", "REGISTER", "sip:corp.com"
candidate_password = "Password1"
computed = sip_digest(user, realm, candidate_password, nonce, method, uri)
print(f"Computed: {computed}")
# Compare with captured sip.auth.response field
Analytical Methodology
- Open the PCAP in Wireshark. Apply display filter
sipto see all SIP signalling. Scan the Info column for REGISTER, INVITE, and 401 Unauthorized — these are the high-value frames. - Navigate to Telephony → VoIP Calls. Wireshark populates a list of all detected calls with caller, callee, start time, duration, and state. Select a call and click Flow to see the SIP message ladder diagram.
- Select any call in VoIP Calls and click Player to extract and play back the associated RTP audio streams. This reconstructs the actual conversation as audio.
- Navigate to Telephony → RTP → RTP Streams to see all RTP streams with SSRC, codec, packet count, and jitter statistics. This is useful for identifying recording-quality streams vs degraded ones.
- Apply filter
sip.Authorizationto locate Digest Authentication frames. Expand the SIP Authorization header in the packet details pane to readusername,realm,nonce,uri, andresponsefields. - Run tshark with the Digest extraction command above to export all authentication fields to TSV. Format them for hashcat mode 11400 and run offline against a wordlist.
- Navigate to Telephony → SIP Statistics for a summary of all SIP methods, response codes, and call counts — useful for rapid triage of large PCAPs.
- For each
INVITE, follow the SDP body to identify the negotiated RTP ports. Correlate with the RTP stream list to confirm which UDP flows carry audio for which call. - Document findings: for each call, record caller URI, callee URI, call start/end timestamps, duration, codec, whether authentication occurred, and any extracted credential material.
Common Analytical Errors
- Treating UDP as unanalysable: SIP uses UDP by default; analysts accustomed to TCP streams may overlook UDP-based SIP. Wireshark's SIP dissector handles UDP transparently — apply filter
sipregardless of transport. - Confusing SIP port 5060 with RTP: SIP signalling on 5060 and RTP audio on high ports are entirely separate flows. Filtering only on port 5060 captures call control but misses all audio content. RTP streams use the ports negotiated in SDP, which vary per call.
- Not extracting all Digest fields for hashcat: Hashcat mode 11400 requires multiple fields from the Authorization header. Missing the
methodorurifield renders the hash uncrackable. Always extract all fields as a unit. - Ignoring re-INVITE and UPDATE messages: Call transfers, hold/resume operations, and codec renegotiations generate additional SIP messages mid-call. These contain updated SDP and may reference new RTP ports — failing to track them causes incomplete RTP stream mapping.
- Missing encrypted SIP on port 5061: TLS-encrypted SIP appears as TLS handshake traffic on port 5061. Without key material the SIP content is inaccessible, but the connection volume, source, and destination still indicate the number of calls and parties.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0046 | Knowledge of intrusion detection systems and methodologies | Recognising VoIP fraud and eavesdropping signatures: anomalous REGISTER rates, Digest auth cracking potential, unexpected RTP destinations |
| K0093 | Knowledge of network protocols | Understanding SIP call flow (INVITE/200 OK/ACK/BYE), SDP media negotiation, SIP Digest Authentication, and RTP stream identification |
| K0221 | Knowledge of OSI model and network layers | Situating SIP at the application layer over UDP/TCP, and RTP at the application layer over UDP, both carried at layer 4 |
| S0046 | Skill in performing packet-level analysis | Using Wireshark Telephony menus, VoIP call graph, RTP stream player, and tshark SIP field extraction to reconstruct calls and extract credentials |
| T0023 | Collect intrusion artifacts for use in forensic analysis | Preserving SIP Digest authentication hashes, call metadata, and reconstructed RTP audio as forensic evidence of VoIP eavesdropping or fraud |
Further Reading
- Hacking Exposed Unified Communications & VoIP, 2nd Edition — Mark Collier & David Endler, Chapters 7–9: SIP Attacks and Forensics (McGraw-Hill)
- RFC 3261: SIP: Session Initiation Protocol — Rosenberg et al. (IETF)
- Wireshark Network Analysis, 2nd Edition — Laura Chappell, Chapter 20: VoIP Analysis (Protocol Analysis Institute)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.