Browse CTFs New CTF Sign in

XLSX hidden cell forensics (white-on-white)

network_forensics_pcap Difficulty 1–5 30 min certifiable

Theory

Why This Matters

Spreadsheet forensics has surfaced in major legal proceedings and corporate investigations. In the Enron scandal, forensic accountants recovered significant evidence from Excel files where rows had been hidden rather than deleted — hidden rows are preserved in the file but invisible to casual viewers. More recently, whistleblower cases and GDPR compliance audits have involved enumerating "very hidden" worksheets in Excel files whose existence was denied by the document creator. For security analysts, Microsoft Office documents submitted during phishing simulations, malware analysis, or challenge competitions routinely use OOXML's rich metadata and visibility control features to conceal data that is only accessible to an analyst who understands the underlying XML format.

Core Concept

OOXML (Office Open XML), the format underlying .xlsx, .docx, and .pptx files, is a ZIP archive containing a collection of XML files and supporting assets. The file [Content_Types].xml at the archive root declares all content types; xl/workbook.xml lists all sheets; xl/worksheets/sheet1.xml (and similarly numbered files) contains the cell data for each sheet.

A hidden row is declared with the hidden="1" attribute on the <row> element in the sheet XML: <row r="5" hidden="1">. Similarly, a hidden column is declared on a <col> element in the <cols> section. These rows and columns are present in the XML and their cell values are intact; only the rendering is suppressed. Worksheet visibility is controlled by the state attribute on the <sheet> element in workbook.xml: state="hidden" makes the sheet hidden via the normal Format > Sheet > Hide menu (easily unhidden by a user), while state="veryHidden" sets a visibility that cannot be changed through the Excel UI — only via VBA, XML editing, or a programmatic parser.

Cell values in OOXML are stored in one of two ways: inline as the <v> element value (for numbers and some strings), or as an index into the shared strings table (xl/sharedStrings.xml) for string values. A cell whose <v> contains 7 and whose t (type) attribute is s means: look up index 7 in the shared strings table. This indirection is commonly missed when analysts search spreadsheets with grep — they find the index integer rather than the actual string content.

Formula cells with t="str" store the formula result as <v> but the formula itself in <f>. A formula like =CONCATENATE(A1,B1) may assemble the flag from cells that are individually innocuous.

Technical Deep-Dive

import zipfile, openpyxl
from lxml import etree

# --- Method 1: openpyxl (handles hidden rows/columns/sheets) ---
wb = openpyxl.load_workbook("challenge.xlsx", data_only=True)

for ws_name in wb.sheetnames:
    ws = wb[ws_name]
    print(f"
Sheet: {ws_name!r}  state={wb[ws_name].sheet_state!r}")
    # Check for hidden rows
    for row in ws.iter_rows():
        row_dim = ws.row_dimensions.get(row[0].row)
        is_hidden = row_dim and row_dim.hidden
        for cell in row:
            if cell.value is not None:
                print(f"  [{cell.coordinate}] hidden_row={is_hidden} "
                      f"value={cell.value!r}")

# --- Method 2: raw XML inspection (catches veryHidden sheets) ---
with zipfile.ZipFile("challenge.xlsx") as z:
    # List all XML files in the archive
    for name in z.namelist():
        print(name)

    # Parse workbook.xml for sheet states
    wb_xml = etree.parse(z.open("xl/workbook.xml"))
    ns = {"x": "http://schemas.openxmlformats.org/spreadsheetml/2006/main"}
    for sheet in wb_xml.findall(".//x:sheet", ns):
        print(sheet.attrib)   # includes name, sheetId, state, r:id

    # Parse sharedStrings.xml
    ss_xml = etree.parse(z.open("xl/sharedStrings.xml"))
    strings = [si.find(".//x:t", ns).text
               for si in ss_xml.findall(".//x:si", ns)]
    print("Shared strings:", strings[:20])
# Unzip the xlsx and grep all XML for flag patterns
unzip -o challenge.xlsx -d xlsx_extracted/
grep -r "CTF|flag|hidden|veryHidden" xlsx_extracted/ --include="*.xml"

# Check workbook.xml for veryHidden sheets
grep -i "veryHidden|state=" xlsx_extracted/xl/workbook.xml

# Check all sheet XML for hidden row/column attributes
grep -l "hidden" xlsx_extracted/xl/worksheets/*.xml

Analytical Methodology

  1. Unzip the XLSXunzip challenge.xlsx -d extracted/. List all files: find extracted/ -type f. Note all XML files in xl/ — each worksheets/sheetN.xml corresponds to a worksheet.
  2. Check workbook.xml for hidden/veryHidden sheetsgrep -i "state" extracted/xl/workbook.xml. Both hidden and veryHidden values indicate suppressed sheets whose cell data is fully present.
  3. Parse each sheet XML for hidden rows/columnsgrep -n "hidden" extracted/xl/worksheets/sheet*.xml. Lines with hidden="1" on <row> or <col> elements contain invisible data.
  4. Read the shared strings table — Open extracted/xl/sharedStrings.xml. String cell values reference this table by index. Any interesting text (Base64 strings, flag fragments) will appear here if cells contain strings.
  5. Enumerate all cell values programmatically — Use openpyxl with data_only=True to iterate every cell in every sheet, printing coordinate, value, and whether the row/column is hidden.
  6. Check document propertiesextracted/docProps/core.xml (creator, lastModifiedBy, description) and extracted/docProps/app.xml (application name, company) may contain flag fragments or encoding hints.
  7. Distinguish hidden from deleted — Truly deleted rows are absent from the XML. Hidden rows are present with hidden="1". If you can see the row in the XML but not in Excel, it is hidden, not deleted.

Common Analytical Errors

  • Opening in Excel/LibreOffice and eyeballing — A spreadsheet with hidden rows and veryHidden sheets appears completely normal to a casual viewer. Never trust the rendered view; always inspect the underlying XML.
  • Grepping only for obvious strings — If the flag is stored as a shared string index integer, grep "CTF" finds nothing. Check sharedStrings.xml and correlate index numbers from cells.
  • Missing veryHidden sheets — openpyxl lists all sheets in wb.sheetnames, including veryHidden ones, but wb.active and sheet tab counts in the UI do not. Always iterate wb.sheetnames explicitly.
  • data_only=False returning formula strings — If data_only=False, openpyxl returns the formula string (e.g., =A1) rather than the computed result. Use data_only=True to get the cached value that was last computed.
  • Forgetting row 1 may be hidden — Many challenges hide the header row (row 1) which contains the flag. Programmatic iteration starts at row 1 regardless of visibility; make sure you are not skipping it.
  • Ignoring embedded objects — XLSX files can contain OLE objects, images, or other embedded files in xl/media/ and xl/drawings/. If the cell search yields nothing, check these directories.

NICE Framework Alignment

Code Knowledge/Skill/Task Statement How This Card Develops It
K0082 Knowledge of file format standards and document metadata Teaches OOXML ZIP structure, workbook.xml sheet states, shared strings table, and hidden cell XML attributes
K0118 Knowledge of file format structures and forensic artefacts Connects OOXML internals (cell types, formula caching, row dimensions) to forensic observables
S0065 Skill in identifying and extracting data of forensic interest from file artifacts Practises multi-layer extraction: unzip → parse XML → correlate shared strings → enumerate hidden rows
T0048 Task: Perform file system forensic analysis Applies forensic enumeration methodology to OOXML documents to recover concealed data

Further Reading

  • ECMA-376: Office Open XML File Formats Standard — Ecma International (5th Edition, 2021)
  • Digital Forensics with Open Source Tools — Cory Altheide & Harlan Carvey (Syngress)
  • Spreadsheet Forensics: Recovering Evidence from Excel Files — SANS Digital Forensics and Incident Response
  • openpyxl Documentation — Eric Gazoni & Charlie Clark (openpyxl.readthedocs.io)

Challenge Lab

Reinforce your learning with a hands-on generated challenge based on this card's competency.