Correlating SQLi, XSS, LFI and RCE Attack Patterns Across Web Server Access Logs

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Web server access logs are the primary evidence source in application-layer attack investigations. The 2021 Log4Shell exploitation wave was confirmed at thousands of organisations by retrospective analysis of access logs showing ${jndi:ldap://...} strings in User-Agent headers. SQL injection, XSS, LFI, and remote code execution attempts all leave characteristic signatures in the URI field, query string, and POST body sections of Apache and Nginx access logs. Analysts who can efficiently parse large log files, identify attack patterns, correlate attacker IPs with outcome codes, and reconstruct the attack timeline are essential to every web incident response.

Core Concept

Apache/Nginx Combined Log Format (the dominant log format) records one request per line:

192.168.1.5 - admin [15/Mar/2024:10:23:44 +0000] "GET /search?q=test HTTP/1.1" 200 1234 "http://example.com" "Mozilla/5.0"

Fields: %h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i" where %h=client IP, %u=auth user, %t=time, %r=request line, %>s=status code, %b=bytes.

Attack signatures visible in the request line:

SQL injection: UNION, SELECT, OR 1=1, ', --, ; DROP, 0x, SLEEP(, BENCHMARK( in URI or query string. URL-encoded variants: %27 ('), %20 (space), %2B (+).
XSS: <script>, alert(, onerror=, onload=, javascript:, <img src=x, %3Cscript%3E.
LFI (Local File Inclusion): ../, ..%2F, ....//, /etc/passwd, /proc/self, %2Fetc%2Fpasswd, php://filter.
RCE: cmd=, exec=, system=, passthru=, shell_exec=, ; id, | whoami, id, Log4Shell ${jndi:.

HTTP status codes reveal attack success: 200 (success — possible hit), 403 (forbidden — blocked), 404 (not found — probing), 500 (server error — may indicate injection or crash), 302 (redirect — may indicate authentication bypass).

Technical Deep-Dive

# Identify SQL injection attempts in Apache access log
grep -iE "(union|select|from|where|having|drop|insert|update|delete|ors+1=1|sleep(|benchmark(|0x[0-9a-f]+|%27|%2527)" access.log 
  | awk '{print $1, $7, $9}' 
  | sort | uniq -c | sort -rn | head -20

# Identify LFI attempts
grep -iE "(../|..%2[Ff]|%252[Ff]|etc/passwd|proc/self|php://filter|zip://)" access.log 
  | awk '{print $1, $7, $9}' | head -20

# Identify XSS attempts
grep -iE "(<script|alert(|onerror=|onload=|javascript:|%3Cscript|%3c)" access.log 
  | awk '{print $1, $7, $9}' | head -20

# Identify RCE attempts (command injection)
grep -iE "(cmd=|exec=|system=|passthru=|shell_exec=||s*whoami|;s*id|`id`|${jndi:|%24%7Bjndi)" access.log 
  | awk '{print $1, $7, $9}' | head -20

# Find successful attacks (HTTP 200) from attacker IP
ATTACKER_IP="192.168.1.100"
grep "^${ATTACKER_IP}" access.log | awk '$9 == "200"' 
  | awk '{print $7}' | sort | uniq -c | sort -rn

# Timeline of attacker activity
grep "^${ATTACKER_IP}" access.log 
  | awk '{print $4, $7, $9}' 
  | sort

# awk: parse Combined Log Format fields robustly
# Field positions: 1=IP 4=timestamp 7=URI 9=status 10=bytes
awk '
NR > 0 {
  ip = $1
  # Remove brackets from timestamp
  ts = substr($4, 2)
  method = substr($6, 2)   # strip leading quote
  uri = $7
  status = $9
  bytes = $10
  if (status ~ /^[245]/ && uri ~ /UNION|SELECT|0x[0-9a-f]/i) {
    print ip, ts, status, uri
  }
}
' access.log | head -30

# GoAccess: interactive log analysis (generate HTML report)
# goaccess access.log --log-format=COMBINED -o report.html

# Count attack types per source IP
awk '
{
  ip = $1; uri = $7
  if (uri ~ /UNION|SELECT/i) attacks[ip]["sqli"]++
  if (uri ~ /..//i)       attacks[ip]["lfi"]++
  if (uri ~ /script/i)       attacks[ip]["xss"]++
  if (uri ~ /cmd=|exec=/i)   attacks[ip]["rce"]++
}
END {
  for (ip in attacks) {
    printf "%s:", ip
    for (t in attacks[ip]) printf " %s=%d", t, attacks[ip][t]
    printf "
"
  }
}
' access.log

# Splunk SPL: web attack detection and triage
# index=web sourcetype=access_combined
# | rex field=_raw ""(?P<method>GET|POST|PUT|DELETE) (?P<uri>S+) HTTP"
# | eval attack_type=case(
#     match(uri, "(?i)UNION|SELECT|0x[0-9a-f]+|sleep("), "SQLi",
#     match(uri, "(?i)../|etc/passwd|php://"), "LFI",
#     match(uri, "(?i)<script|alert(|onerror="), "XSS",
#     match(uri, "(?i)cmd=|exec=|jndi:"), "RCE",
#     true(), "Other"
#   )
# | where attack_type != "Other"
# | stats count dc(uri) AS unique_uris values(uri) AS sample_uris
#          values(status) AS statuses
#   BY src_ip attack_type
# | sort -count

#!/usr/bin/env python3
"""
Parse Apache Combined Log Format, classify attack attempts,
and output a timeline with success/failure correlation.
"""
import re, sys
from collections import defaultdict
from datetime import datetime

LOG_RE = re.compile(
    r'(?P<ip>S+) S+ S+ [(?P<time>[^]]+)] '
    r'"(?P<method>S+) (?P<uri>S+) S+" '
    r'(?P<status>d{3}) (?P<bytes>S+)'
)

PATTERNS = {
    "SQLi": re.compile(r"UNION|SELECT|ORs+1=1|sleep(|benchmark(|0x[0-9a-fA-F]+|%27|--", re.I),
    "LFI":  re.compile(r"../|..%2[Ff]|etc/passwd|proc/self|php://filter", re.I),
    "XSS":  re.compile(r"<script|alert(|onerror=|onload=|javascript:|%3[Cc]script", re.I),
    "RCE":  re.compile(r"cmd=|exec=|system=|passthru=||s*whoami|;s*id|${jndi:", re.I),
}

findings = defaultdict(list)

for line in sys.stdin:
    m = LOG_RE.match(line)
    if not m: continue
    ip, ts_str, method, uri, status, _ = (
        m.group(g) for g in ("ip","time","method","uri","status","bytes"))

    try:
        ts = datetime.strptime(ts_str[:20], "%d/%b/%Y:%H:%M:%S")
    except Exception:
        ts = None

    for attack_type, pattern in PATTERNS.items():
        if pattern.search(uri):
            findings[ip].append({
                "ts": ts, "type": attack_type,
                "uri": uri[:120], "status": status
            })

for ip, events in sorted(findings.items(), key=lambda x: -len(x[1])):
    successes = [e for e in events if e["status"] == "200"]
    print(f"
[{ip}] {len(events)} attacks, {len(successes)} returned 200")
    for e in sorted(events, key=lambda x: x["ts"] or datetime.min):
        marker = "HIT " if e["status"] == "200" else "    "
        print(f"  {marker}{e['ts']} [{e['type']}] {e['status']} {e['uri']}")

Analytical Methodology

Determine the log format and time range. Identify the total number of log lines, unique source IPs, and request volume. Tools like wc -l, awk, or GoAccess provide a quick overview before deep analysis.
Run pattern searches for each attack category (SQLi, LFI, XSS, RCE) using grep with case-insensitive and URL-encoded variant matching. Collect the set of source IPs that generated each attack type.
For each attacker IP, extract all their requests and sort by timestamp to reconstruct the attack timeline. Identify the first suspicious request (reconnaissance phase) and the last (exfiltration or persistence).
Correlate each attack request with its HTTP status code: 200 responses to injection attempts indicate the server processed the malicious input (possible success); 404/403 indicate failed probing; 500 errors may indicate a crash caused by the injection.
Pay special attention to POST requests: SQL injection and RCE attempts in POST bodies may not appear fully in the URI field, depending on log configuration. If the log does not capture POST bodies, note this limitation in the report.
Identify scanning sequences: rapid sequential requests to incremental paths (/admin, /admin.php, /admin.html) or sequential parameter values (id=1, id=2, id=3) indicate automated scanning. These precede targeted exploitation.
Extract the User-Agent strings used by the attacker. Automated tools (sqlmap, nikto, Burp Suite) have distinctive UAs. Unusual or absent UAs from the same IP as attack traffic corroborate tool usage.
Build the final incident summary: attacker IP(s), attack types attempted, first and last seen timestamps, successful requests (200 responses to attack patterns), estimated data exfiltrated (bytes field for suspicious 200 responses), and recommended IOCs for blocking.

Common Analytical Errors

Relying only on URL patterns without checking POST bodies: Many web attacks, especially SQLi and RCE, submit payloads in POST request bodies. Standard access logs only record the URI, not the POST body. If the log does not include body content, acknowledge the gap and recommend application-level logging.
False positives from URL-encoded legitimate content: Search terms, filenames, and base64-encoded parameters can accidentally match attack patterns. Always review raw log lines for flagged entries — a search query containing "selection" should not be classified as SQLi.
Missing URL double-encoding: Attackers encode payloads twice (%252f for /) to bypass WAF and log inspection. Add double-encoded variants to your grep patterns, and use a URL decoder on suspicious URIs before classifying.
Ignoring non-200 status codes for LFI: A successful LFI often returns 200, but the application may also return 500 if the included file causes a PHP error. Search for 500 responses alongside 200 responses when investigating LFI attempts — both indicate the injection reached the application layer.

NICE Framework Alignment

Code	Knowledge/Skill/Task Statement	How This Card Develops It
K0046	Knowledge of intrusion detection methodologies	Web attack pattern detection in access logs is a core WAF and SIEM log-based detection capability
K0145	Knowledge of security event correlation tools	Splunk SPL `rex` and `stats` for correlating attack patterns with outcome codes across millions of log lines
K0187	Knowledge of file type abuse by adversaries	LFI attacks target application-readable files; RCE may deliver malicious scripts or binaries via web parameters
S0047	Skill in preserving evidence integrity	Access logs must be exported and checksummed before analysis; originals preserved for chain-of-custody
T0049	Decrypt seized data / analyze forensic artifacts	URL-decoding double-encoded attack payloads and reconstructing attacker intent from obfuscated log entries