# What Are Hidden Metadata Risks in Unredacted Documents?

> Word and PDF files embed author names, tracked changes, and hidden text recoverable after visual redactions. Metadata stripping is required for true redaction.

- **Author:** Neetusha
- **Published:** 2026-06-22
- **URL:** https://www.redactifyai.com/answers/hidden-metadata-risks-unredacted-documents/

---

A document that looks properly redacted on screen can still contain sensitive information in its metadata. Word documents embed the name of every person who edited the file, the full revision history, deleted text, and comments. PDFs contain title, author, subject, keywords, and creation software in document properties. Scanned PDFs may have an OCR text layer that is invisible in the rendered view but fully readable by anyone who extracts it. Applying visual redactions without stripping metadata leaves the underlying content accessible.

## The four metadata risks

**1. Author and editor names in document properties**

Microsoft Word embeds the document creator's name, the last author who saved the file, and the names of all contributors who edited the file in tracked-changes mode. These names appear in File > Properties and in the revision history. A document shared as a "clean" version may reveal the names of attorneys, paralegals, or staff who worked on it.

**2. Tracked changes and comments**

Deleted text in a Word document with tracked changes is not removed from the file. It is marked as deleted and hidden from the normal view but is fully visible when Track Changes is toggled on. Comments, including those marked as resolved, remain in the file unless specifically deleted. A settlement agreement with negotiation comments still in the file reveals the negotiation history to anyone who enables tracked changes.

**3. PDF metadata fields**

Every PDF contains document properties: title, author, subject, keywords, application (software used to create it), and creation/modification dates. A PDF exported from a Word file with the attorney's name as document author and the client matter number as title transmits that information to every recipient. Check this in Adobe Acrobat via File > Properties > Description.

**4. Hidden text layers in scanned PDFs**

OCR software applied to a scanned document creates a text layer positioned behind the rendered image. This layer contains the extracted text from the scan, including any content that was visually obscured by a black box or white highlight applied after scanning. The [NSA/CSS guidance on redacting with confidence](https://www.iad.gov/) specifically warns against this pattern: visual overlays on a scanned PDF with an existing OCR layer do not remove the underlying text.

## How to check your documents for metadata exposure

Three checks before sharing any document:

1. **Word**: File > Info > Check for Issues > Inspect Document. Run Document Inspector and remove all flagged categories before exporting to PDF.
2. **PDF**: File > Properties in Adobe Acrobat shows all document properties. The Redact tool (Tools > Redact > Sanitize Document) removes metadata, hidden layers, and embedded content.
3. **PDF inspector tools**: Third-party tools like PDFiD or pdf-parser expose the internal structure of a PDF file, including hidden streams and embedded content not visible through standard viewers.

The [NIST guidelines on document and storage media sanitization](https://csrc.nist.gov/publications/detail/sp/800-111/final) address the broader principle: information is not removed until it is unrecoverable by a knowledgeable adversary, not merely hidden from casual view.

## Why purpose-built redaction tools handle this by default

General-purpose redaction workflows (applying black boxes in Word, saving to PDF) do not include metadata stripping as a step. Purpose-built redaction tools include metadata sanitization as part of every export, not as an optional add-on. The output file contains only the content intended to be shared.

RedactifyAI strips document metadata including author fields, tracked changes, and embedded content as part of every redaction workflow, so the exported file does not carry forward any information from the source document beyond the visible, approved content.