What Is the Difference Between Redaction and Blackout?
Blackout and redaction look identical on screen or on paper: both produce a solid black rectangle over text. The difference is what happens to the underlying data. A blackout conceals text visually by placing an overlay on top of it, leaving the original characters fully intact in the file. Redaction permanently removes those characters from the document so they cannot be recovered by any means. In legal and compliance contexts, only redaction meets the standard for protecting sensitive information; blackout alone does not.
What a blackout does at the file level
In a PDF, a blackout is typically implemented as an annotation object: a black-filled rectangle that sits in a layer above the page content. The text content stream is untouched. Anyone with a PDF editor can select and delete that annotation to reveal the original text. Even without an editor, simply selecting the covered area with a cursor and pressing Ctrl+C in most PDF readers copies the hidden text to the clipboard, because the reader reads from the content stream rather than from the visual rendering.
In Word documents, a blackout is most commonly implemented as a black text-highlight color or a black-filled shape placed over text. Both approaches leave every character in the document XML. Opening the .docx file as a ZIP archive and inspecting document.xml reveals the text immediately.
Physical blackout with a marker faces a different version of the same problem. When a marked paper document is scanned, any OCR step attached to the scan reconstructs text from whatever the scanner captured. Thin or uneven ink allows the scanner to read through it. The NSA's guidance on redacting sensitive information from documents states that marking over text does not eliminate the underlying information and is not an approved redaction method.
What redaction does at the file level
Proper redaction modifies the PDF content stream directly. The tool identifies the text rendering operators and character data for the targeted region, deletes those objects from the file structure, and writes a flat filled rectangle as part of the page content rather than as a floating annotation that can be removed. After this operation, the characters do not exist anywhere in the file. Running a text extraction tool such as pdftotext on the redacted region returns nothing because there is nothing to extract.
A correctly redacted PDF also has its annotation layer flattened, removing any separate layer structure. Metadata fields that could contain the deleted text, such as author comments or document properties, are also stripped as part of a complete redaction workflow.
Why the distinction matters legally
Courts and regulators treat blackout and redaction as categorically different. The Federal Rules of Civil Procedure Rule 5.2 requires that parties filing documents in federal court protect certain private identifiers, and courts have sanctioned attorneys for submitting documents where covered text was recoverable. The requirement is for the information to be protected, not merely covered.
Under HIPAA, the Safe Harbor de-identification standard requires that the 18 enumerated identifiers be removed from protected health information, not visually obscured. A document with blacked-out names that can be recovered by copying does not meet that standard. Similarly, GDPR Article 4 defines pseudonymization and anonymization in terms of whether re-identification is reasonably possible. A blackout that is trivially reversible does not qualify as either.
In discovery and public records contexts, agencies and firms that disclose documents with recoverable blacked-out content face the same liability as if they had disclosed the content openly, because the information was in fact disclosed.
How courts and regulators have addressed blackout failures
Several high-profile cases illustrate the risk. In 2019, Paul Manafort's legal team filed a court brief with PDF black boxes over text that was still present in the content stream; news organizations recovered it immediately by copying and pasting. In 2008, AT&T's lawyers filed a brief with blacked-out text in a patent dispute, and the covered text was extracted by the same copy-paste method within hours. Both incidents became public precisely because the attorneys used blackout rather than true redaction.
Choosing tools that do true redaction rather than blackout
Purpose-built redaction software, such as RedactifyAI, applies permanent content-stream deletion rather than visual overlays, and uses AI to identify PII categories such as names, addresses, Social Security numbers, and dates of birth before removing them. This is the distinction that separates defensible redaction from a blackout that creates legal exposure: the data is gone from the file, not merely hidden from view.
Before sharing any document externally, verify the redaction by selecting text from a redacted region, searching the document for a term that was redacted, and running the file through a text extraction tool. If any of these returns protected content, the document was blacked out rather than redacted and must be processed again with a tool that modifies the file structure.
Stop redacting documents manually
RedactifyAI detects PII automatically and redacts it permanently. Not just a black box overlay. Try it free, no credit card required.