Detecting Log Injection Attacks Through CRLF Forensics and Entry Authenticity Analysis

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Log injection is the offensive counterpart to log deletion: rather than removing evidence, the attacker inserts fake log entries to mislead investigators, create alibi events, or pollute anomaly-detection baselines. A classic attack embeds newline characters () into a user-controlled input field — a username, HTTP header, or form parameter — causing the logging system to split a single event into two or more entries, with the injected portion appearing as a legitimate log line. If an analyst does not recognise injected entries, they may reconstruct a false incident timeline or exonerate a malicious account.

Core Concept

Log injection exploits the fact that text-based log formats delimit records with newline characters. If an application logs user-supplied input without sanitisation, an attacker can craft input containing or to insert arbitrary content into the log stream.

Example: a web application logs failed logins as: 2024-03-15 10:23:44 AUTH_FAIL user=admin src=192.168.1.5

If the username field is unsanitised, an attacker submitting the username: admin 2024-03-15 10:23:45 AUTH_SUCCESS user=root src=10.0.0.1

causes the log to contain two entries: the real failure, and a fabricated success for root from an internal IP.

Indicators of injected entries: - Source IP anomalies: injected entries claim internal IPs (127.0.0.1, RFC-1918) that are impossible from the external-facing service in question - Out-of-sequence timestamps: injected entries may have timestamps that are identical to, slightly before, or slightly after the triggering real event - Improbable event combinations: a failed login immediately followed by a success from the same source within the same second - Field count mismatches: injected content may not perfectly replicate the log format, producing entries with wrong field counts or missing required fields

Technical Deep-Dive

# Detect entries with anomalous source IPs in an external-facing service log
# (should only contain external IPs — flag RFC-1918 and loopback)
awk '{print $NF}' access.log 
  | grep -E "^(127.|10.|192.168.|172.(1[6-9]|2[0-9]|3[01]).)" 
  | sort | uniq -c | sort -rn

# Detect duplicate timestamps (two events in exact same second — injection indicator)
awk '{print $1, $2}' auth.log 
  | sort | uniq -d | head -20

# Count fields per line — injected entries may have wrong field count
awk '{print NF, NR}' auth.log 
  | sort | uniq -c | sort -rn | head -10
# Most lines should have the same field count; outliers are suspicious

# Parse structured log and detect entries inconsistent with expected format
import re, sys
from datetime import datetime

PATTERN = re.compile(
    r'(?P<ts>d{4}-d{2}-d{2} d{2}:d{2}:d{2}) '
    r'(?P<level>w+) '
    r'user=(?P<user>S+) '
    r'src=(?P<src>S+)'
)

RFC1918 = re.compile(
    r'^(127.|10.|192.168.|172.(1[6-9]|2[0-9]|3[01]).)')

prev_ts = None
for lineno, line in enumerate(open("auth.log"), 1):
    m = PATTERN.match(line.strip())
    if not m:
        print(f"[L{lineno}] FORMAT MISMATCH: {line.rstrip()}")
        continue
    ts = datetime.strptime(m.group("ts"), "%Y-%m-%d %H:%M:%S")
    src = m.group("src")
    if RFC1918.match(src):
        print(f"[L{lineno}] INTERNAL SRC in external log: {src} ({m.group('ts')})")
    if prev_ts and ts < prev_ts:
        print(f"[L{lineno}] TIMESTAMP REGRESSION: {ts} < {prev_ts}")
    prev_ts = ts

# Detect CRLF sequences embedded in log fields (if log is read as binary)
grep -P "
" auth.log && echo "CRLF found — possible injection artifact"
# Or:
cat -A auth.log | grep "^M"   # ^M is 
 in cat -A output

Analytical Methodology

Determine the expected log format for the service: field count, field types, valid value ranges, and which fields contain user-supplied data. Consult application documentation or a known-good log sample.
Parse every entry against the expected format. Flag entries that do not conform: wrong field count, unexpected field values, or fields containing characters invalid for their type.
Examine source IP fields for RFC-1918 or loopback addresses in logs from externally facing services. Legitimate external requests cannot originate from internal addresses.
Check for timestamp regressions or exact duplicates. Injected entries inserted mid-stream often share a timestamp with the triggering real event or have a timestamp slightly earlier than the preceding genuine entry.
Look for improbable event sequences: an authentication failure and success from the same source within the same second, or a logout event before the corresponding login event.
If the log system supports structured output (JSON, CEF), validate each entry against the schema. Injected newlines that create malformed JSON are detectable with a strict JSON parser.
Cross-reference suspicious entries against network-level logs (firewall, proxy). An injected "AUTH_SUCCESS" entry from 10.0.0.1 is immediately disproven if firewall logs show no connection from that IP.
Document all suspected injected entries: line number, injected content, detection method, and the real entry it was embedded in. This is critical for reconstructing the true timeline.

Common Analytical Errors

Treating all internal-IP log entries as legitimate: Some applications log internal proxy hops or load-balancer source IPs legitimately (X-Forwarded-For). Understand the application architecture before flagging internal IPs as injection.
Missing multi-line injection: An attacker who injects multiple newlines can create several fake entries from one real request. Do not stop at the first anomalous entry — scan the entire region around each suspicious entry.
Overlooking JSON-escaped injection: If the log format is JSON and the application correctly JSON-escapes user input, literal becomes \n and does not split entries. In this case, injection appears differently — look for unexpectedly long field values containing encoded newlines.
Confusing injection with log corruption: Log files can be corrupted by disk errors, log rotation races, or encoding issues, producing malformed entries that superficially resemble injection. Corroborate with other evidence before attributing to an attacker.

NICE Framework Alignment

Code	Work Role Knowledge / Skill / Task	Relevance
K0046	Knowledge of intrusion detection methodologies	Log injection detection requires format-aware parsing beyond simple keyword matching
K0145	Knowledge of security event correlation tools	Cross-referencing suspicious log entries against network-level logs is a SIEM correlation technique
K0187	Knowledge of file type abuse by adversaries	Adversaries craft malicious input specifically to manipulate log file structure
S0047	Skill in preserving evidence integrity	Identifying injected entries prevents false evidence from contaminating an investigation
T0049	Decrypt seized data / analyze forensic artifacts	Parsing and validating structured log formats to distinguish authentic from fabricated entries