How to Redact a PDF: The Complete Guide for 2026

PDF is the default format for sensitive documents. Court filings, medical records, financial statements, contracts, immigration paperwork -- if it contains personal data, it's probably a PDF. That ubiquity makes PDF redaction one of the most important data protection tasks any organization performs. It also makes it one of the most commonly botched.

Every year, government agencies, law firms, and corporations publish documents they believe are redacted. Recipients select the black boxes, paste into a text editor, and read every word that was supposed to be hidden. The problem isn't carelessness. It's that PDF redaction is genuinely harder than it looks, and the tools most people reach for don't actually do what they think.

This guide explains why PDFs are structurally difficult to redact, which common methods fail and why, and how to redact a PDF document so the sensitive data is actually gone.

Why PDFs are tricky to redact

To understand why redacting a PDF is harder than redacting a Word document, you need to understand what a PDF actually is under the hood.

A PDF is not a single layer of text on a page. It's a container format with multiple internal structures:

Content streams. The actual text lives in encoded content streams, which are sequences of drawing instructions that tell a PDF renderer where to place each character. This is the text you see on screen and the text that must be removed during redaction.
Annotations. Comments, highlights, form field labels, and markup exist as separate objects layered on top of content streams. Most "redaction" attempts add an annotation rather than modifying the content stream.
Metadata. Every PDF carries document properties: author name, creation date, modification history, software used, and sometimes GPS coordinates. This data persists even if you redact all visible text.
Embedded fonts. PDFs often embed the fonts used in the document. Font subsets can contain character tables that reveal which characters appear in the document, even after visual masking.
Form fields. Interactive PDFs with fillable fields store form data separately from the visible page content. Redacting the visible text doesn't touch the form field values stored in the document's AcroForm dictionary.
Bookmarks and links. Navigation bookmarks and hyperlinks can contain text strings that reference redacted content.
Incremental saves. PDFs support incremental saving, where new changes are appended to the end of the file without removing old data. A PDF that has been edited may contain previous versions of every page, including the unredacted originals, in its file history.
Embedded attachments. PDFs can contain other files as attachments. These attached files are not affected by any redaction applied to the parent PDF's pages.

When you draw a black rectangle over text in a PDF reader, you're adding a new annotation object. The content stream containing the original text is untouched. The black box and the text coexist in the same file. Anyone who removes the annotation -- or simply copies the text underneath it -- gets the original content.

This is why drawing a black box over text in a PDF doesn't work. It's the digital equivalent of placing a sticky note over a word on a printed page. The word is still there.

Methods that don't work (and why people think they do)

These approaches look like they work because the text is no longer visible on screen. But visibility and removal are two different things.

Drawing shapes or annotations in PDF readers

This is the most common mistake. Open a PDF in any reader, use the rectangle tool, set fill to black, place it over the sensitive text, and save. The document looks redacted. But the text stream is unchanged. Select the area, paste, and the text appears. Tools like pdftotext extract it instantly. For a detailed breakdown of how visual masking fails, see our guide on safe document redaction.

Using highlight tools with black color

Some people use the highlight annotation tool and set the color to black, thinking this blacks out the text. It does visually -- in some viewers. But the text is fully intact, and some PDF viewers render highlights with partial transparency, meaning the text is visible even on screen. This method offers zero protection.

Printing to image and re-scanning

The logic here is: if you print the PDF to an image (or "flatten" it), the text layer disappears. This is partially true. A rasterized PDF doesn't contain searchable text. But there are problems:

OCR recovery. Anyone with OCR software can extract text from the image, including the text under your black boxes if the boxes aren't fully opaque at print resolution.
Metadata persistence. The image file still carries EXIF and XMP metadata from the conversion process, which can include the source file name, software, and timestamps.
Quality loss. Every print-to-image cycle degrades quality. After one round trip, text may be fuzzy. After two, it may be illegible for legitimate content, not just redacted content.
No verification path. You can't programmatically verify that an image-based "redaction" is complete.

Using Mac Preview markup

Mac Preview's markup tools are annotation tools. They add visual objects on top of PDF content streams. Apple has acknowledged this limitation, but it remains a common source of failed redactions in legal contexts. The text is trivially recoverable.

Screenshot and paste approach

Taking a screenshot of the PDF, blacking out areas in an image editor, and pasting the result into a new document avoids some PDF-specific issues. But it creates new ones: the resulting document is an image with no text layer (making it non-compliant with accessibility requirements), quality is dependent on screen resolution, and the original file's metadata isn't addressed at all.

Methods that do work

True PDF redaction modifies or removes text from the content stream itself, then rewrites the file. After proper redaction, the sensitive text does not exist anywhere in the file.

Adobe Acrobat Pro's redaction tool

Adobe Acrobat Pro has a dedicated Redact tool (separate from its annotation and markup tools). The workflow is: mark areas for redaction, review the marks, then "Apply Redactions." The apply step rewrites the content stream, replacing marked text with redaction marks and removing the original characters. This works -- when the full workflow is completed.

The failure point is that many users mark text for redaction but never apply. The marks are just annotations until you explicitly apply them. Others use Acrobat's drawing tools instead of the Redact tool, which produces the annotation-only result described above. For more on how this goes wrong, read about Adobe redaction risks.

Dedicated redaction software with AI-powered detection

Purpose-built redaction tools handle the full pipeline: detecting sensitive content (often using AI to identify PII categories automatically), marking it, permanently removing it from the content stream, stripping metadata, and generating verification reports. These tools are designed to eliminate the human error that plagues manual workflows.

What "applying" a redaction means technically

When a redaction tool "applies" redactions, it performs several operations at the file level:

Removes text objects from the content stream for the marked regions
Removes or redacts associated font glyphs to prevent character-level recovery
Draws a redaction mark (typically a black or white rectangle) in place of the removed content
Rewrites the PDF without incremental save, eliminating any previous versions of the redacted pages
Updates the cross-reference table so the file structure no longer references the removed objects

After this process, the original text is not recoverable by any means. This is fundamentally different from adding a layer on top of existing content.

Step-by-step: How to properly redact a PDF

Step 1: Identify what needs redacting

Before opening any tool, define your redaction scope. Common PII categories include:

Names, Social Security numbers, dates of birth
Financial account numbers, credit card numbers
Addresses, phone numbers, email addresses
Medical record numbers, diagnosis codes, treatment details
Driver's license and passport numbers
Biometric identifiers
Any data your jurisdiction or court rules require to be removed

For compliance-driven redaction, map your categories to the applicable regulation. HIPAA's Safe Harbor method specifies 18 identifier types. GDPR covers any data that identifies or could identify a natural person. Court rules like FRCP 5.2 have their own specific list. For common mistakes in this area, see PII in PDFs.

Step 2: Use a tool with true redaction capability

Choose a PDF redaction tool that modifies content streams, not one that adds annotations. This means Adobe Acrobat Pro's Redact tool (not its drawing or comment tools), or dedicated redaction software. Free PDF readers like Preview, Foxit Reader's basic mode, and browser-based PDF viewers do not have true redaction capability. For a detailed comparison, see our best redaction software comparison.

Step 3: Mark areas for redaction

Using your chosen tool, mark every instance of sensitive content. If your tool supports automated PII detection, run it first and review the results. Automated detection catches patterns humans miss -- a Social Security number on page 47, an email address in a footer, a name embedded in a URL.

Manual marking is still necessary for context-dependent redactions: a company name that's confidential in one document but public in another, or a dollar amount that's privileged in a specific context.

Step 4: Review all marks before applying

This is the quality gate. Before applying redactions, review every mark across every page:

Are all required PII categories covered?
Are there any false positives (marks on content that should remain visible)?
Did automated detection miss anything in headers, footers, watermarks, or page margins?
Are there references to redacted content elsewhere in the document (e.g., a name redacted on page 1 but mentioned again on page 12)?

This review step is critical because the next step is irreversible.

Step 5: Apply redactions

This is the permanent step. When you apply redactions, the tool rewrites the PDF, removing marked content from the content streams. This cannot be undone. The original text is gone from the file.

In Adobe Acrobat Pro, this is the "Apply Redactions" button. In dedicated tools, it's typically the "Redact" or "Apply" action. Make sure you see a confirmation that redactions have been applied, not just marked.

Step 6: Remove metadata

After applying redactions to visible content, clean the document's metadata:

Document properties: Author name, title, subject, keywords, creation and modification dates
Comments and annotations: Any remaining sticky notes, comments, or review markup
Hidden layers: Some PDFs have optional content groups that contain non-visible layers
Embedded JavaScript: Interactive PDFs may contain scripts that reference sensitive data
Revision history: If the PDF uses incremental saves, previous page versions may exist in the file
XMP metadata: Extended metadata blocks can contain detailed document history

Adobe Acrobat Pro has a "Remove Hidden Information" tool. Dedicated redaction software typically handles this automatically as part of the redaction workflow.

Step 7: Verify the redaction

Never trust the visual result alone. Run these verification tests:

Copy-paste test. Open the redacted PDF, select the area where redacted content was, and paste into a text editor. You should get nothing, or only the redaction marker text.
Text extraction test. Run pdftotext or a similar tool on the file and search the output for any content that should have been redacted.
Search test. Use the PDF reader's Find function to search for specific terms you redacted (a name, a number). No results should appear.
File size check. The redacted file should be the same size or smaller than the original. If it's significantly larger, content may have been added (annotations) rather than removed.
Metadata inspection. Check document properties to confirm author names, revision history, and other metadata have been stripped.

For workflows with legal or regulatory stakes, see our guide on QA before court production.

Step 8: Save as a new file

Save the redacted document as a new file. Do not overwrite the original. You need the original for your records (stored securely), and saving as a new file avoids any risk of incremental save preserving old content. Name the new file clearly -- for example, contract_2026_REDACTED.pdf -- so there's no confusion about which version is safe to distribute.

Special cases

Scanned PDFs

A scanned PDF is an image wrapped in a PDF container. There's no text layer, which means there's nothing for a text-based redaction tool to find or remove. You must run OCR (Optical Character Recognition) first to create a text layer, then redact that text layer, then verify.

The risk with scanned documents is that redaction tools that skip OCR will report "no PII found" -- because they can't read the document at all. The text is there visually, and any recipient with OCR software can extract it.

PDFs with form fields

Interactive PDFs store form data in AcroForm or XFA dictionaries, separate from page content streams. If you redact text on the visible page but don't clear the form field data, the field values persist. Check your tool's ability to handle form fields explicitly, or flatten the form before redacting.

PDFs with embedded attachments

A PDF can contain other files as embedded attachments (also called file annotations or portfolios). Redacting the parent PDF's pages does not touch these attachments. You must extract, redact separately, and either re-embed or remove them.

Password-protected PDFs

Password protection controls who can open or edit a PDF. It does not affect the underlying data structure. If you have the password to edit, you can redact normally. If you only have the view password (owner password restricts editing), you'll need to remove the restriction first, or use a tool that handles permission passwords transparently.

Verification checklist

Post-redaction verification tests

Test	Method	Pass criteria
Copy-paste	Select redacted area, paste into text editor	No original text appears
Text extraction	Run pdftotext or similar tool, search output	No redacted content in extracted text
Keyword search	Use Find function for known redacted terms	Zero matches found
File size	Compare redacted file size to original	Same or smaller (not significantly larger)
Metadata check	Inspect document properties and XMP data	No author names, revision history, or sensitive properties
Form fields	Check for interactive form data in document	No sensitive values in form field dictionaries
Attachments	Check for embedded files in PDF	No unredacted attachments present

How RedactifyAI handles PDF redaction

RedactifyAI was built specifically to address the edge cases that trip up general-purpose tools. When you upload a PDF, the platform automatically detects whether it's a native or scanned document and runs OCR when needed, so scanned PDFs don't pass through with hidden text intact.

AI-powered detection identifies 25+ PII entity types across the full document, including headers, footers, form fields, and embedded text that manual reviewers routinely miss. Redactions are applied at the content stream level -- text objects are permanently removed, not masked with annotations. Metadata is stripped as part of every redaction workflow, not as an optional extra step.

For teams handling volume -- discovery productions, FOIA batches, patient record releases -- batch processing lets you redact hundreds of documents in a single run, with consistent rules applied across every page of every file. The result is a clean PDF where the redacted content is verifiably gone, not just hidden behind a black box.

If you're currently relying on manual PDF redaction or tools that leave you wondering whether the data is actually removed, you can try RedactifyAI free and run the verification tests above on the output.