Verifying Cryptographic Log Integrity by Detecting SHA-256 Hash Chain Breaks
Theory
Why This Matters
Standard syslog provides no integrity protection: any entry can be deleted or modified without leaving an obvious mark. Hash-chain logging addresses this by embedding a cryptographic hash of each entry into the next, forming a tamper-evident chain analogous to blockchain or certificate transparency logs. Systems configured with syslog-ng's $(hash) template macro or custom hash-chain implementations provide investigators with a verifiable log record. When an attacker deletes or modifies entries, the chain breaks at the point of tampering — pinpointing exactly which records were altered.
Core Concept
In hash-chain logging, each log entry includes a chain hash field computed as SHA-256(previous_chain_hash + current_log_content). The first entry uses a fixed seed (e.g., all-zero hash or a per-boot nonce). To verify integrity, an analyst recomputes the chain hash for every entry and compares it to the stored value. The first entry where the computed hash diverges from the stored hash marks the tamper point.
Properties of a valid chain: - Every entry's stored hash equals SHA-256(prev_hash || entry_content) - A deletion shifts all subsequent computed hashes, producing a break at the deleted entry's successor - An insertion also breaks the chain at the inserted position - Modification of any field in any entry breaks the chain at that exact entry
This makes hash-chain logs append-only verifiable: once the chain is intact, any subsequent modification is detectable. The challenge artifact typically provides a log file where most entries are valid but a small number have been deleted or modified, and the task is to write a verifier and report the first broken link.
Technical Deep-Dive
#!/usr/bin/env python3
"""
Hash-chain log verifier.
Expected log format (tab-separated):
SEQUENCE_NUM TIMESTAMP LEVEL MESSAGE CHAIN_HASH
where CHAIN_HASH = sha256(prev_hash + SEQ + TIMESTAMP + LEVEL + MESSAGE)
"""
import hashlib, sys, csv
SEED = "0" * 64 # initial previous hash (64 hex chars = 256-bit zero)
def compute_hash(prev_hash: str, seq: str, ts: str, level: str, msg: str) -> str:
data = prev_hash + seq + ts + level + msg
return hashlib.sha256(data.encode()).hexdigest()
def verify_chain(logfile: str) -> None:
prev_hash = SEED
with open(logfile, newline="") as fh:
reader = csv.reader(fh, delimiter=" ")
for row in reader:
if not row or row[0].startswith("#"):
continue
seq, ts, level, msg, stored_hash = row[:5]
expected = compute_hash(prev_hash, seq, ts, level, msg)
if expected != stored_hash:
print(f"CHAIN BREAK at seq={seq} ts={ts}")
print(f" Expected : {expected}")
print(f" Stored : {stored_hash}")
print(f" Previous hash used: {prev_hash}")
# continue verifying using stored hash to find further breaks
# (use expected to propagate what the chain SHOULD have been)
prev_hash = stored_hash # follow stored chain to find all breaks
if __name__ == "__main__":
verify_chain(sys.argv[1])
# Quick shell verification for simple newline-hashed logs
# where each line ends with :<sha256_of_previous_line_hash+current_line_content>
prev="0000000000000000000000000000000000000000000000000000000000000000"
line_num=0
while IFS= read -r line; do
line_num=$((line_num + 1))
# Extract stored hash (last 64 chars after final colon)
stored="${line##*:}"
content="${line%:*}"
computed=$(printf "%s%s" "$prev" "$content" | sha256sum | awk '''{print $1}''')
if [ "$computed" != "$stored" ]; then
echo "BREAK at line $line_num"
fi
prev="$stored"
done < chain.log
Analytical Methodology
- Read the challenge log file. Identify the hash-chain format: locate the field containing the chain hash (often the last column or a dedicated
chain_hash:field), and determine the hash input construction (which fields are concatenated, in which order, with what separator). - Identify the seed value: the initial previous hash for the first entry. Challenges typically document this, or it appears as a header comment in the log. Common seeds: all-zero 64-char hex string, or the SHA-256 of the literal string "GENESIS".
- Implement a verifier (Python is fastest for challenges) that iterates entries in order, recomputes the expected hash using the documented formula, and compares to the stored value.
- Run the verifier. Record the sequence number and timestamp of the first broken link — this is the tampered entry or the entry immediately after a deleted entry.
- Determine whether the break indicates a deletion (entries jump in sequence number) or a modification (sequence numbers are continuous but hash fails). Deletion shifts all subsequent hashes; modification breaks exactly at the altered entry.
- If modifications are detected, identify which fields were changed by comparing the failed entry against surrounding entries for logical consistency (e.g., timestamp regression, impossible IP address).
- Report: number of intact entries, first break location, tamper type (deletion/modification), and the content of the broken entry.
Common Analytical Errors
- Using the stored hash as prev_hash after a break: After a break, you must decide whether to continue verification using the stored hash (to find further breaks in the remaining chain) or the computed hash (to model what the chain would look like with tampering reverted). Use the stored hash to traverse the actual file; document that subsequent entries may reflect a re-anchored chain.
- Ignoring encoding: Hashes may be stored as lowercase hex, uppercase hex, or base64. Normalise before comparison to avoid false positives from encoding mismatches.
- Overlooking the hash input separator: Some implementations concatenate fields with no separator; others use
|,, or