XLSX hidden cell forensics (white-on-white)
Theory
Why This Matters
Spreadsheet forensics has surfaced in major legal proceedings and corporate investigations. In the Enron scandal, forensic accountants recovered significant evidence from Excel files where rows had been hidden rather than deleted — hidden rows are preserved in the file but invisible to casual viewers. More recently, whistleblower cases and GDPR compliance audits have involved enumerating "very hidden" worksheets in Excel files whose existence was denied by the document creator. For security analysts, Microsoft Office documents submitted during phishing simulations, malware analysis, or challenge competitions routinely use OOXML's rich metadata and visibility control features to conceal data that is only accessible to an analyst who understands the underlying XML format.
Core Concept
OOXML (Office Open XML), the format underlying .xlsx, .docx, and .pptx files, is a ZIP archive containing a collection of XML files and supporting assets. The file [Content_Types].xml at the archive root declares all content types; xl/workbook.xml lists all sheets; xl/worksheets/sheet1.xml (and similarly numbered files) contains the cell data for each sheet.
A hidden row is declared with the hidden="1" attribute on the <row> element in the sheet XML: <row r="5" hidden="1">. Similarly, a hidden column is declared on a <col> element in the <cols> section. These rows and columns are present in the XML and their cell values are intact; only the rendering is suppressed. Worksheet visibility is controlled by the state attribute on the <sheet> element in workbook.xml: state="hidden" makes the sheet hidden via the normal Format > Sheet > Hide menu (easily unhidden by a user), while state="veryHidden" sets a visibility that cannot be changed through the Excel UI — only via VBA, XML editing, or a programmatic parser.
Cell values in OOXML are stored in one of two ways: inline as the <v> element value (for numbers and some strings), or as an index into the shared strings table (xl/sharedStrings.xml) for string values. A cell whose <v> contains 7 and whose t (type) attribute is s means: look up index 7 in the shared strings table. This indirection is commonly missed when analysts search spreadsheets with grep — they find the index integer rather than the actual string content.
Formula cells with t="str" store the formula result as <v> but the formula itself in <f>. A formula like =CONCATENATE(A1,B1) may assemble the flag from cells that are individually innocuous.
Technical Deep-Dive
import zipfile, openpyxl
from lxml import etree
# --- Method 1: openpyxl (handles hidden rows/columns/sheets) ---
wb = openpyxl.load_workbook("challenge.xlsx", data_only=True)
for ws_name in wb.sheetnames:
ws = wb[ws_name]
print(f"
Sheet: {ws_name!r} state={wb[ws_name].sheet_state!r}")
# Check for hidden rows
for row in ws.iter_rows():
row_dim = ws.row_dimensions.get(row[0].row)
is_hidden = row_dim and row_dim.hidden
for cell in row:
if cell.value is not None:
print(f" [{cell.coordinate}] hidden_row={is_hidden} "
f"value={cell.value!r}")
# --- Method 2: raw XML inspection (catches veryHidden sheets) ---
with zipfile.ZipFile("challenge.xlsx") as z:
# List all XML files in the archive
for name in z.namelist():
print(name)
# Parse workbook.xml for sheet states
wb_xml = etree.parse(z.open("xl/workbook.xml"))
ns = {"x": "http://schemas.openxmlformats.org/spreadsheetml/2006/main"}
for sheet in wb_xml.findall(".//x:sheet", ns):
print(sheet.attrib) # includes name, sheetId, state, r:id
# Parse sharedStrings.xml
ss_xml = etree.parse(z.open("xl/sharedStrings.xml"))
strings = [si.find(".//x:t", ns).text
for si in ss_xml.findall(".//x:si", ns)]
print("Shared strings:", strings[:20])
# Unzip the xlsx and grep all XML for flag patterns
unzip -o challenge.xlsx -d xlsx_extracted/
grep -r "CTF|flag|hidden|veryHidden" xlsx_extracted/ --include="*.xml"
# Check workbook.xml for veryHidden sheets
grep -i "veryHidden|state=" xlsx_extracted/xl/workbook.xml
# Check all sheet XML for hidden row/column attributes
grep -l "hidden" xlsx_extracted/xl/worksheets/*.xml
Analytical Methodology
- Unzip the XLSX —
unzip challenge.xlsx -d extracted/. List all files:find extracted/ -type f. Note all XML files inxl/— eachworksheets/sheetN.xmlcorresponds to a worksheet. - Check workbook.xml for hidden/veryHidden sheets —
grep -i "state" extracted/xl/workbook.xml. BothhiddenandveryHiddenvalues indicate suppressed sheets whose cell data is fully present. - Parse each sheet XML for hidden rows/columns —
grep -n "hidden" extracted/xl/worksheets/sheet*.xml. Lines withhidden="1"on<row>or<col>elements contain invisible data. - Read the shared strings table — Open
extracted/xl/sharedStrings.xml. String cell values reference this table by index. Any interesting text (Base64 strings, flag fragments) will appear here if cells contain strings. - Enumerate all cell values programmatically — Use
openpyxlwithdata_only=Trueto iterate every cell in every sheet, printing coordinate, value, and whether the row/column is hidden. - Check document properties —
extracted/docProps/core.xml(creator, lastModifiedBy, description) andextracted/docProps/app.xml(application name, company) may contain flag fragments or encoding hints. - Distinguish hidden from deleted — Truly deleted rows are absent from the XML. Hidden rows are present with
hidden="1". If you can see the row in the XML but not in Excel, it is hidden, not deleted.
Common Analytical Errors
- Opening in Excel/LibreOffice and eyeballing — A spreadsheet with hidden rows and veryHidden sheets appears completely normal to a casual viewer. Never trust the rendered view; always inspect the underlying XML.
- Grepping only for obvious strings — If the flag is stored as a shared string index integer,
grep "CTF"finds nothing. ChecksharedStrings.xmland correlate index numbers from cells. - Missing veryHidden sheets — openpyxl lists all sheets in
wb.sheetnames, including veryHidden ones, butwb.activeand sheet tab counts in the UI do not. Always iteratewb.sheetnamesexplicitly. - data_only=False returning formula strings — If
data_only=False, openpyxl returns the formula string (e.g.,=A1) rather than the computed result. Usedata_only=Trueto get the cached value that was last computed. - Forgetting row 1 may be hidden — Many challenges hide the header row (row 1) which contains the flag. Programmatic iteration starts at row 1 regardless of visibility; make sure you are not skipping it.
- Ignoring embedded objects — XLSX files can contain OLE objects, images, or other embedded files in
xl/media/andxl/drawings/. If the cell search yields nothing, check these directories.
NICE Framework Alignment
| Code | Knowledge/Skill/Task Statement | How This Card Develops It |
|---|---|---|
| K0082 | Knowledge of file format standards and document metadata | Teaches OOXML ZIP structure, workbook.xml sheet states, shared strings table, and hidden cell XML attributes |
| K0118 | Knowledge of file format structures and forensic artefacts | Connects OOXML internals (cell types, formula caching, row dimensions) to forensic observables |
| S0065 | Skill in identifying and extracting data of forensic interest from file artifacts | Practises multi-layer extraction: unzip → parse XML → correlate shared strings → enumerate hidden rows |
| T0048 | Task: Perform file system forensic analysis | Applies forensic enumeration methodology to OOXML documents to recover concealed data |
Further Reading
- ECMA-376: Office Open XML File Formats Standard — Ecma International (5th Edition, 2021)
- Digital Forensics with Open Source Tools — Cory Altheide & Harlan Carvey (Syngress)
- Spreadsheet Forensics: Recovering Evidence from Excel Files — SANS Digital Forensics and Incident Response
- openpyxl Documentation — Eric Gazoni & Charlie Clark (openpyxl.readthedocs.io)
Challenge Lab
Reinforce your learning with a hands-on generated challenge based on this card's competency.