Verifying Cryptographic Log Integrity by Detecting SHA-256 Hash Chain Breaks

web_injection_logic Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Standard syslog provides no integrity protection: any entry can be deleted or modified without leaving an obvious mark. Hash-chain logging addresses this by embedding a cryptographic hash of each entry into the next, forming a tamper-evident chain analogous to blockchain or certificate transparency logs. Systems configured with syslog-ng's $(hash) template macro or custom hash-chain implementations provide investigators with a verifiable log record. When an attacker deletes or modifies entries, the chain breaks at the point of tampering — pinpointing exactly which records were altered.

Core Concept

In hash-chain logging, each log entry includes a chain hash field computed as SHA-256(previous_chain_hash + current_log_content). The first entry uses a fixed seed (e.g., all-zero hash or a per-boot nonce). To verify integrity, an analyst recomputes the chain hash for every entry and compares it to the stored value. The first entry where the computed hash diverges from the stored hash marks the tamper point.

Properties of a valid chain: - Every entry's stored hash equals SHA-256(prev_hash || entry_content) - A deletion shifts all subsequent computed hashes, producing a break at the deleted entry's successor - An insertion also breaks the chain at the inserted position - Modification of any field in any entry breaks the chain at that exact entry

This makes hash-chain logs append-only verifiable: once the chain is intact, any subsequent modification is detectable. The challenge artifact typically provides a log file where most entries are valid but a small number have been deleted or modified, and the task is to write a verifier and report the first broken link.

Technical Deep-Dive

#!/usr/bin/env python3
"""
Hash-chain log verifier.
Expected log format (tab-separated):
  SEQUENCE_NUM  TIMESTAMP  LEVEL  MESSAGE  CHAIN_HASH
where CHAIN_HASH = sha256(prev_hash + SEQ + TIMESTAMP + LEVEL + MESSAGE)
"""
import hashlib, sys, csv

SEED = "0" * 64   # initial previous hash (64 hex chars = 256-bit zero)

def compute_hash(prev_hash: str, seq: str, ts: str, level: str, msg: str) -> str:
    data = prev_hash + seq + ts + level + msg
    return hashlib.sha256(data.encode()).hexdigest()

def verify_chain(logfile: str) -> None:
    prev_hash = SEED
    with open(logfile, newline="") as fh:
        reader = csv.reader(fh, delimiter=" ")
        for row in reader:
            if not row or row[0].startswith("#"):
                continue
            seq, ts, level, msg, stored_hash = row[:5]
            expected = compute_hash(prev_hash, seq, ts, level, msg)
            if expected != stored_hash:
                print(f"CHAIN BREAK at seq={seq} ts={ts}")
                print(f"  Expected : {expected}")
                print(f"  Stored   : {stored_hash}")
                print(f"  Previous hash used: {prev_hash}")
                # continue verifying using stored hash to find further breaks
                # (use expected to propagate what the chain SHOULD have been)
            prev_hash = stored_hash  # follow stored chain to find all breaks

if __name__ == "__main__":
    verify_chain(sys.argv[1])

# Quick shell verification for simple newline-hashed logs
# where each line ends with :<sha256_of_previous_line_hash+current_line_content>
prev="0000000000000000000000000000000000000000000000000000000000000000"
line_num=0
while IFS= read -r line; do
    line_num=$((line_num + 1))
    # Extract stored hash (last 64 chars after final colon)
    stored="${line##*:}"
    content="${line%:*}"
    computed=$(printf "%s%s" "$prev" "$content" | sha256sum | awk '''{print $1}''')
    if [ "$computed" != "$stored" ]; then
        echo "BREAK at line $line_num"
    fi
    prev="$stored"
done < chain.log

Analytical Methodology

Read the challenge log file. Identify the hash-chain format: locate the field containing the chain hash (often the last column or a dedicated chain_hash: field), and determine the hash input construction (which fields are concatenated, in which order, with what separator).
Identify the seed value: the initial previous hash for the first entry. Challenges typically document this, or it appears as a header comment in the log. Common seeds: all-zero 64-char hex string, or the SHA-256 of the literal string "GENESIS".
Implement a verifier (Python is fastest for challenges) that iterates entries in order, recomputes the expected hash using the documented formula, and compares to the stored value.
Run the verifier. Record the sequence number and timestamp of the first broken link — this is the tampered entry or the entry immediately after a deleted entry.
Determine whether the break indicates a deletion (entries jump in sequence number) or a modification (sequence numbers are continuous but hash fails). Deletion shifts all subsequent hashes; modification breaks exactly at the altered entry.
If modifications are detected, identify which fields were changed by comparing the failed entry against surrounding entries for logical consistency (e.g., timestamp regression, impossible IP address).
Report: number of intact entries, first break location, tamper type (deletion/modification), and the content of the broken entry.

Common Analytical Errors

Using the stored hash as prev_hash after a break: After a break, you must decide whether to continue verification using the stored hash (to find further breaks in the remaining chain) or the computed hash (to model what the chain would look like with tampering reverted). Use the stored hash to traverse the actual file; document that subsequent entries may reflect a re-anchored chain.
Ignoring encoding: Hashes may be stored as lowercase hex, uppercase hex, or base64. Normalise before comparison to avoid false positives from encoding mismatches.
Overlooking the hash input separator: Some implementations concatenate fields with no separator; others use |, , or . Incorrect separator choice will produce a false break at entry 1. Verify the construction formula carefully before coding.
Not checking for extra whitespace: Log entries written by different tools may include trailing spaces, CRLF line endings, or BOM characters. Strip and normalise before hashing.

NICE Framework Alignment

Code	Work Role Knowledge / Skill / Task	Relevance
K0046	Knowledge of intrusion detection methodologies	Hash-chain verification is a tamper-detection methodology at the log infrastructure level
K0145	Knowledge of security event correlation tools	Sequence-number and hash-field correlation across a log file is a SIEM-adjacent analytic technique
K0187	Knowledge of file type abuse by adversaries	Attackers may manipulate log file formats to insert plausible-looking fake entries
S0047	Skill in preserving evidence integrity	Hash-chain logs provide mathematical proof of integrity for court-admissible log evidence
T0049	Decrypt seized data / analyze forensic artifacts	Cryptographic hash verification is a direct application of artifact analysis to log files