How to Redact a PDF: The Complete Guide for 2026
PDF is the default format for sensitive documents. Court filings, medical records, financial statements, contracts, immigration paperwork -- if it contains personal data, it's probably a PDF. That ubiquity makes PDF redaction one of the most important data protection tasks any organization performs. It also makes it one of the most commonly botched.
Every year, government agencies, law firms, and corporations publish documents they believe are redacted. Recipients select the black boxes, paste into a text editor, and read every word that was supposed to be hidden. The problem isn't carelessness. It's that PDF redaction is genuinely harder than it looks, and the tools most people reach for don't actually do what they think.
This guide explains why PDFs are structurally difficult to redact, which common methods fail and why, and how to redact a PDF document so the sensitive data is actually gone.
Why PDFs are tricky to redact
To understand why redacting a PDF is harder than redacting a Word document, you need to understand what a PDF actually is under the hood.
A PDF is not a single layer of text on a page. It's a container format with multiple internal structures:
- Content streams. The actual text lives in encoded content streams, which are sequences of drawing instructions that tell a PDF renderer where to place each character. This is the text you see on screen and the text that must be removed during redaction.
- Annotations. Comments, highlights, form field labels, and markup exist as separate objects layered on top of content streams. Most "redaction" attempts add an annotation rather than modifying the content stream.
- Metadata. Every PDF carries document properties: author name, creation date, modification history, software used, and sometimes GPS coordinates. This data persists even if you redact all visible text.
- Embedded fonts. PDFs often embed the fonts used in the document. Font subsets can contain character tables that reveal which characters appear in the document, even after visual masking.
- Form fields. Interactive PDFs with fillable fields store form data separately from the visible page content. Redacting the visible text doesn't touch the form field values stored in the document's AcroForm dictionary.
- Bookmarks and links. Navigation bookmarks and hyperlinks can contain text strings that reference redacted content.
- Incremental saves. PDFs support incremental saving, where new changes are appended to the end of the file without removing old data. A PDF that has been edited may contain previous versions of every page, including the unredacted originals, in its file history.
- Embedded attachments. PDFs can contain other files as attachments. These attached files are not affected by any redaction applied to the parent PDF's pages.
When you draw a black rectangle over text in a PDF reader, you're adding a new annotation object. The content stream containing the original text is untouched. The black box and the text coexist in the same file. Anyone who removes the annotation -- or simply copies the text underneath it -- gets the original content.
This is why drawing a black box over text in a PDF doesn't work. It's the digital equivalent of placing a sticky note over a word on a printed page. The word is still there.
Methods that don't work (and why people think they do)
These approaches look like they work because the text is no longer visible on screen. But visibility and removal are two different things.
Drawing shapes or annotations in PDF readers
This is the most common mistake. Open a PDF in any reader, use the rectangle tool, set fill to black, place it over the sensitive text, and save. The document looks redacted. But the text stream is unchanged. Select the area, paste, and the text appears. Tools like pdftotext extract it instantly. For a detailed breakdown of how visual masking fails, see our guide on safe document redaction.
Using highlight tools with black color
Some people use the highlight annotation tool and set the color to black, thinking this blacks out the text. It does visually -- in some viewers. But the text is fully intact, and some PDF viewers render highlights with partial transparency, meaning the text is visible even on screen. This method offers zero protection.
Printing to image and re-scanning
The logic here is: if you print the PDF to an image (or "flatten" it), the text layer disappears. This is partially true. A rasterized PDF doesn't contain searchable text. But there are problems:
- OCR recovery. Anyone with OCR software can extract text from the image, including the text under your black boxes if the boxes aren't fully opaque at print resolution.
- Metadata persistence. The image file still carries EXIF and XMP metadata from the conversion process, which can include the source file name, software, and timestamps.
- Quality loss. Every print-to-image cycle degrades quality. After one round trip, text may be fuzzy. After two, it may be illegible for legitimate content, not just redacted content.
- No verification path. You can't programmatically verify that an image-based "redaction" is complete.
Using Mac Preview markup
Mac Preview's markup tools are annotation tools. They add visual objects on top of PDF content streams. Apple has acknowledged this limitation, but it remains a common source of failed redactions in legal contexts. The text is trivially recoverable.
Screenshot and paste approach
Taking a screenshot of the PDF, blacking out areas in an image editor, and pasting the result into a new document avoids some PDF-specific issues. But it creates new ones: the resulting document is an image with no text layer (making it non-compliant with accessibility requirements), quality is dependent on screen resolution, and the original file's metadata isn't addressed at all.
Methods that do work
True PDF redaction modifies or removes text from the content stream itself, then rewrites the file. After proper redaction, the sensitive text does not exist anywhere in the file.
Adobe Acrobat Pro's redaction tool
Adobe Acrobat Pro has a dedicated Redact tool (separate from its annotation and markup tools). The workflow is: mark areas for redaction, review the marks, then "Apply Redactions." The apply step rewrites the content stream, replacing marked text with redaction marks and removing the original characters. This works -- when the full workflow is completed.
The failure point is that many users mark text for redaction but never apply. The marks are just annotations until you explicitly apply them. Others use Acrobat's drawing tools instead of the Redact tool, which produces the annotation-only result described above. For more on how this goes wrong, read about Adobe redaction risks.
Dedicated redaction software with AI-powered detection
Purpose-built redaction tools handle the full pipeline: detecting sensitive content (often using AI to identify PII categories automatically), marking it, permanently removing it from the content stream, stripping metadata, and generating verification reports. These tools are designed to eliminate the human error that plagues manual workflows.
What "applying" a redaction means technically
When a redaction tool "applies" redactions, it performs several operations at the file level:
- Removes text objects from the content stream for the marked regions
- Removes or redacts associated font glyphs to prevent character-level recovery
- Draws a redaction mark (typically a black or white rectangle) in place of the removed content
- Rewrites the PDF without incremental save, eliminating any previous versions of the redacted pages
- Updates the cross-reference table so the file structure no longer references the removed objects
After this process, the original text is not recoverable by any means. This is fundamentally different from adding a layer on top of existing content.
Step-by-step: How to properly redact a PDF
Step 1: Identify what needs redacting
Before opening any tool, define your redaction scope. Common PII categories include:
- Names, Social Security numbers, dates of birth
- Financial account numbers, credit card numbers
- Addresses, phone numbers, email addresses
- Medical record numbers, diagnosis codes, treatment details
- Driver's license and passport numbers
- Biometric identifiers
- Any data your jurisdiction or court rules require to be removed
For compliance-driven redaction, map your categories to the applicable regulation. HIPAA's Safe Harbor method specifies 18 identifier types. GDPR covers any data that identifies or could identify a natural person. Court rules like FRCP 5.2 have their own specific list. For common mistakes in this area, see PII in PDFs.
Step 2: Use a tool with true redaction capability
Choose a PDF redaction tool that modifies content streams, not one that adds annotations. This means Adobe Acrobat Pro's Redact tool (not its drawing or comment tools), or dedicated redaction software. Free PDF readers like Preview, Foxit Reader's basic mode, and browser-based PDF viewers do not have true redaction capability. For a detailed comparison, see our best redaction software comparison.
Step 3: Mark areas for redaction
Using your chosen tool, mark every instance of sensitive content. If your tool supports automated PII detection, run it first and review the results. Automated detection catches patterns humans miss -- a Social Security number on page 47, an email address in a footer, a name embedded in a URL.
Manual marking is still necessary for context-dependent redactions: a company name that's confidential in one document but public in another, or a dollar amount that's privileged in a specific context.
Step 4: Review all marks before applying
This is the quality gate. Before applying redactions, review every mark across every page:
- Are all required PII categories covered?
- Are there any false positives (marks on content that should remain visible)?
- Did automated detection miss anything in headers, footers, watermarks, or page margins?
- Are there references to redacted content elsewhere in the document (e.g., a name redacted on page 1 but mentioned again on page 12)?
This review step is critical because the next step is irreversible.
Step 5: Apply redactions
This is the permanent step. When you apply redactions, the tool rewrites the PDF, removing marked content from the content streams. This cannot be undone. The original text is gone from the file.
In Adobe Acrobat Pro, this is the "Apply Redactions" button. In dedicated tools, it's typically the "Redact" or "Apply" action. Make sure you see a confirmation that redactions have been applied, not just marked.
Step 6: Remove metadata
After applying redactions to visible content, clean the document's metadata:
- Document properties: Author name, title, subject, keywords, creation and modification dates
- Comments and annotations: Any remaining sticky notes, comments, or review markup
- Hidden layers: Some PDFs have optional content groups that contain non-visible layers
- Embedded JavaScript: Interactive PDFs may contain scripts that reference sensitive data
- Revision history: If the PDF uses incremental saves, previous page versions may exist in the file
- XMP metadata: Extended metadata blocks can contain detailed document history
Adobe Acrobat Pro has a "Remove Hidden Information" tool. Dedicated redaction software typically handles this automatically as part of the redaction workflow.
Step 7: Verify the redaction
Never trust the visual result alone. Run these verification tests:
- Copy-paste test. Open the redacted PDF, select the area where redacted content was, and paste into a text editor. You should get nothing, or only the redaction marker text.
- Text extraction test. Run
pdftotextor a similar tool on the file and search the output for any content that should have been redacted. - Search test. Use the PDF reader's Find function to search for specific terms you redacted (a name, a number). No results should appear.
- File size check. The redacted file should be the same size or smaller than the original. If it's significantly larger, content may have been added (annotations) rather than removed.
- Metadata inspection. Check document properties to confirm author names, revision history, and other metadata have been stripped.
For workflows with legal or regulatory stakes, see our guide on QA before court production.
Step 8: Save as a new file
Save the redacted document as a new file. Do not overwrite the original. You need the original for your records (stored securely), and saving as a new file avoids any risk of incremental save preserving old content. Name the new file clearly -- for example, contract_2026_REDACTED.pdf -- so there's no confusion about which version is safe to distribute.
Special cases
Scanned PDFs
A scanned PDF is an image wrapped in a PDF container. There's no text layer, which means there's nothing for a text-based redaction tool to find or remove. You must run OCR (Optical Character Recognition) first to create a text layer, then redact that text layer, then verify.
The risk with scanned documents is that redaction tools that skip OCR will report "no PII found" -- because they can't read the document at all. The text is there visually, and any recipient with OCR software can extract it.
PDFs with form fields
Interactive PDFs store form data in AcroForm or XFA dictionaries, separate from page content streams. If you redact text on the visible page but don't clear the form field data, the field values persist. Check your tool's ability to handle form fields explicitly, or flatten the form before redacting.
PDFs with embedded attachments
A PDF can contain other files as embedded attachments (also called file annotations or portfolios). Redacting the parent PDF's pages does not touch these attachments. You must extract, redact separately, and either re-embed or remove them.
Password-protected PDFs
Password protection controls who can open or edit a PDF. It does not affect the underlying data structure. If you have the password to edit, you can redact normally. If you only have the view password (owner password restricts editing), you'll need to remove the restriction first, or use a tool that handles permission passwords transparently.
Verification checklist
Post-redaction verification tests
| Test | Method | Pass criteria |
|---|---|---|
| Copy-paste | Select redacted area, paste into text editor | No original text appears |
| Text extraction | Run pdftotext or similar tool, search output | No redacted content in extracted text |
| Keyword search | Use Find function for known redacted terms | Zero matches found |
| File size | Compare redacted file size to original | Same or smaller (not significantly larger) |
| Metadata check | Inspect document properties and XMP data | No author names, revision history, or sensitive properties |
| Form fields | Check for interactive form data in document | No sensitive values in form field dictionaries |
| Attachments | Check for embedded files in PDF | No unredacted attachments present |
How RedactifyAI handles PDF redaction
RedactifyAI was built specifically to address the edge cases that trip up general-purpose tools. When you upload a PDF, the platform automatically detects whether it's a native or scanned document and runs OCR when needed, so scanned PDFs don't pass through with hidden text intact.
AI-powered detection identifies 25+ PII entity types across the full document, including headers, footers, form fields, and embedded text that manual reviewers routinely miss. Redactions are applied at the content stream level -- text objects are permanently removed, not masked with annotations. Metadata is stripped as part of every redaction workflow, not as an optional extra step.
For teams handling volume -- discovery productions, FOIA batches, patient record releases -- batch processing lets you redact hundreds of documents in a single run, with consistent rules applied across every page of every file. The result is a clean PDF where the redacted content is verifiably gone, not just hidden behind a black box.
If you're currently relying on manual PDF redaction or tools that leave you wondering whether the data is actually removed, you can try RedactifyAI free and run the verification tests above on the output.
See how RedactifyAI automates this workflow
Explore features