Why Law Firms Keep Exposing PII in PDFs, and How to Fix It
Law firms handle some of the most sensitive information around: client names, financials, medical history, and confidential deal terms. Yet PII still leaks in court filings, discovery responses, and shared documents. It's rarely intentional. This article explains why it keeps happening and what to do about it.
Quick answer: Why can I still see redacted text in a PDF?. Same topic, condensed to ~400 words.
The problem: PII that "looks" redacted but isn't
The most common failure isn't forgetting to redact. It's redacting in a way that only looks secure. Someone draws a black box over a social security number or client name, saves the PDF, and assumes the data is gone. It isn't. The text often remains in the file. Anyone who copies the "redacted" area, searches the document, or opens it in another tool can still see the content.
Courts have sanctioned parties for exactly this. Regulators and clients don't accept "we thought it was redacted" as an excuse, and bar counsel has cited ABA Model Rule 1.1 (competence with technology) in disciplinary matters tied to redaction failures. So the first fix is to treat redaction as permanent removal from the file, not visual masking. For the basics, see what is redaction and how to redact documents safely.
High-profile redaction failures in legal practice
These aren't hypothetical risks. Real cases have produced real consequences:
The Manafort filing (2019)
Lawyers for former Trump campaign chair Paul Manafort filed redacted pleadings in federal court in response to allegations that Manafort violated his plea agreement. The redacted sections were supposed to conceal confidential information about Manafort's ties to Russian intelligence-connected associates. The black bars were visual only, and journalists copied and pasted the hidden text within minutes, exposing previously unknown details about contacts with Konstantin Kilimnik. What should have been a routine filing became international news and raised questions about the legal team's technical competence.
Apple v. Samsung (2011)
A California federal district judge released an opinion in the Apple-Samsung patent dispute with blacked-out sections. The concealed text could be copied directly from the PDF. The court had to emergency-seal the document and post a corrected version hours later, but not before the hidden content had been captured and published.
Benghazi documents (2016)
House Democrats shared documents about the Benghazi investigation believing they were properly redacted. A PDF rendering issue allowed recipients to copy and reveal hidden information about a political adviser. The failure compromised what was intended to be a controlled disclosure.
Epstein investigation documents
Heavily redacted documents related to the Epstein investigation were released to protect identities and sensitive details. The redactions could be defeated with the same basic copy-paste technique, highlighting the systemic nature of visual masking failures.
NSA / New York Times (2014)
The New York Times published a leaked document outlining CIA operations. Redactions intended to protect an NSA agent's identity were visual only. Copy-paste revealed the name, creating a direct safety threat for an intelligence operative.
These cases share a common thread: black rectangles were overlaid on text without removing the underlying content. No sophisticated hacking was required. All it took was Ctrl+A, Ctrl+C, Ctrl+V.
Common ways PII stays exposed in law firm PDFs
1. Visual-only "redaction"
Covering text with a rectangle, highlighter, or white box in a PDF editor (without using a proper redaction workflow that removes the underlying text) leaves the data in the document. Many firms rely on Adobe's redaction tool, which can fail in exactly this way if not used correctly or if the document has complex structure.
The problem is particularly insidious because the document looks correct on screen. Without performing verification tests, there's no visual indication that the redaction failed. The only way to know is to test.
2. Metadata and comments
PDFs carry author names, creation dates, revision history, and comments. Redacting the body text but leaving metadata means names, dates, or work product can still leak. Same for sticky notes, markups, or embedded comments that reference clients or strategy.
Common metadata leaks in law firm documents include:
- Author field: Reveals which attorney or paralegal created or modified the document
- Company field: Shows the firm name on documents meant to be anonymous
- Track changes: Word documents converted to PDF may retain revision history showing original unredacted text. This risk is even greater when redacting Word documents through a conversion workflow rather than natively
- Comments: Internal notes like "redact this section per partner instructions" that reference the redaction strategy itself
- Creation and modification dates: Can reveal the timeline of document preparation
- Email headers: When PDFs are created from emails, header metadata can include sender, recipient, and routing information
3. Multiple versions and drafts
You redact the "final" version but send an older draft, or you redact one copy and another copy (e.g., from email or a shared drive) goes out unredacted. Version control and a single source of truth before release reduce this risk.
In modern law firms, a single document may exist in:
- The document management system (DMS)
- One or more email threads
- A shared drive or cloud folder
- A local workstation
- A practice management system like Clio
Each copy is a potential unredacted version waiting to leak. The fix: always redact from a designated "source of truth" and verify that the redacted version is the one being shared. If you use Clio, ensure your redaction tool preserves the original file when syncing back; overwriting the original creates a different kind of version problem.
4. Incomplete scope
You redact the main narrative but miss exhibits, footnotes, headers/footers, or form fields. Or you catch SSNs and birth dates but miss account numbers, addresses, or minor children's names that court rules (e.g., FRCP 5.2) require you to limit.
Areas commonly overlooked in law firm documents:
- Exhibit stamps and Bates numbers: May contain case identifiers or party names
- Headers and footers: Running headers often include case captions with party names
- Table of contents: May list section titles that reference specific individuals
- Hyperlinks: Embedded URLs that point to client portals, internal systems, or documents with revealing filenames
- Form fields: PDF forms with pre-filled or hidden field data
- Image-based text: Scanned signatures, letterheads, or exhibits with embedded text
- Cross-references: "See Exhibit B, Declaration of John Smith" in a document where John Smith is supposed to be redacted
5. OCR layers in scanned documents
Scanned documents present a unique challenge. When a paper document is scanned and OCR (optical character recognition) is applied, the PDF contains both an image layer and a text layer. Redacting the visible image (e.g., drawing a black box over a name in the scan) may not remove the corresponding text in the OCR layer. The text can still be searched, copied, and extracted even though it's visually hidden.
6. Rushing under deadline
Filing or production deadlines push people to skip verification. They apply redaction, save, and send without copy-paste or search tests. That's when hidden text and metadata slip through.
Time pressure amplifies every other risk factor. When a paralegal has 30 minutes before a filing deadline, verification is the first step that gets skipped, and it's the step that catches failures.
Why it happens: tools and process
Tools: General-purpose PDF editors aren't built for secure redaction. They may only hide text on screen, leave metadata intact, or behave inconsistently with complex PDFs. Purpose-built redaction tools are designed to remove data and clean metadata; they reduce the chance of "looks redacted but isn't."
The core issue is that PDF editors like Adobe Acrobat treat redaction as one feature among many. The full redaction workflow (mark, review, apply, clean metadata, verify) requires multiple steps that aren't enforced by the software. Skip a step, especially the critical "Apply Redactions" step, and you get visual masking, not redaction.
Process: Even with good tools, human error and inconsistency matter. If there's no standard workflow (what to redact, how to apply it, how to verify, who checks), mistakes multiply. Who needs redaction in a law firm? Everyone who touches filings or productions. So the process has to be clear and repeatable.
The human factor
Manual redaction has inherent limitations that no amount of training fully eliminates:
- Attention fatigue: Human accuracy drops significantly after reviewing documents for extended periods. A reviewer who catches 95% of PII in the first hour may catch only 85% in the third hour.
- Pattern blindness: After seeing hundreds of SSNs, reviewers develop "pattern blindness" and start missing variations (e.g., SSNs formatted as XXX.XX.XXXX instead of XXX-XX-XXXX).
- Entity variations: A document mentioning "John Robert Smith," "J.R. Smith," "Mr. Smith," and "the defendant" all refers to the same person. Manual reviewers frequently miss one or more variations.
- Contextual PII: Some identifying information is contextual. "The only female partner at the three-person firm in rural Wyoming" effectively identifies a person without using their name. Catching these requires judgment that's harder to apply consistently under pressure.
Real-world consequences
- Court sanctions: Orders to refile, seal documents, or pay fees; in some cases, questions about competence or privilege. Courts have awarded attorney fees to opposing parties whose confidential information was exposed through inadequate redaction.
- Privilege waiver: Inadvertent disclosure of work product or attorney-client communications. Some courts treat copy-paste-recoverable content as a disclosure, potentially waiving privilege over the exposed material. The burden then shifts to the disclosing party to prove the disclosure was inadvertent and that reasonable steps were taken to prevent it.
- Regulatory and client fallout: Breach notification obligations under HIPAA, GDPR, or state privacy laws. Loss of client trust and potential malpractice claims. Law firms that handle healthcare, financial, or government clients face compounding regulatory exposure.
- Professional ethics consequences: State bar associations have issued opinions addressing attorneys' obligations to protect client data in electronic documents. Inadequate redaction can implicate duties of competence (Model Rule 1.1), confidentiality (Model Rule 1.6), and supervision (Model Rules 5.1 and 5.3).
- Malpractice liability: Clients whose PII is exposed through failed redaction may have grounds for malpractice claims. Professional liability insurance may not cover losses attributable to negligent data handling.
A single failed redaction can trigger a chain reaction: emergency motions to seal, refiling with corrected documents, breach notifications under state or federal law, and weeks of damage control with affected clients. One AmLaw 200 firm reportedly spent over $200,000 responding to a single redaction failure in a discovery production. Proper tooling and a repeatable process cost a small fraction of that.
How to avoid these mistakes
1. Use a method that removes data
Not just visual masking. Prefer tools that permanently remove or overwrite text and clean metadata. The tool should modify the PDF's content streams, not just add visual layers.
2. Verify every time
Copy-paste test, search for known identifiers, check metadata. Do this before filing or sending. Also open the document in a different PDF reader, since redaction failures sometimes appear differently across Adobe, Foxit, browser-based viewers, and other tools.
3. Include metadata and hidden content
Redaction isn't done until metadata and comments are cleaned and hidden layers are checked. Create a metadata checklist: author, company, comments, tracked changes, embedded files, form fields, bookmarks, and XMP data.
4. Standardize the workflow
Same steps for every matter: what to redact (per court rules and policy), how to apply, how to verify, who signs off. Document the workflow and train everyone who participates.
A standardized workflow should include:
- Identification: List all PII categories required by applicable rules and policies
- Preparation: Export/download the final version from your DMS
- Redaction: Apply using a method that permanently removes data
- Metadata cleaning: Strip all document metadata and hidden content
- Verification: Perform copy-paste, search, metadata, and cross-reader tests
- Documentation: Log who redacted, when, what categories, and verification results
- Approval: Second-person review before filing or sending
5. Train the team
Everyone who prepares filings or productions should know how to redact documents safely and why "looks redacted" isn't enough. Training should cover not just the "how" but the "why." Showing real examples of failures and their consequences is more effective than abstract instructions.
Training should be recurring, not one-time. As tools, court rules, and document formats evolve, the team's knowledge needs to keep pace.
6. Consider AI-powered redaction tools
AI-powered redaction tools address the human limitations that cause most failures:
- The core advantage is catching what humans skip. An AI model scans for SSNs, account numbers, minor children's names, and dozens of other PII categories across every page of a filing, including exhibits, footnotes, and OCR layers, without the accuracy drop-off that hits a paralegal in hour three of a document review.
- Entity linking matters more in legal documents than almost any other context. When a complaint refers to "John Robert Smith," "Mr. Smith," "J.R. Smith," and "the defendant," AI connects those references so redacting one redacts all of them. Miss even one variation and opposing counsel has the name.
- Large productions become manageable. Instead of a paralegal working through a 500-document discovery set one file at a time, applying different judgment calls to each, AI applies the same detection policy across the entire set in minutes. The paralegal's time shifts to reviewing flagged items and exercising judgment on edge cases.
- Verification is built into the workflow rather than bolted on as an afterthought. The tool confirms that redacted content is actually removed from the PDF's content streams, not just covered with a visual layer, before the document leaves the firm.
- Every redaction action is logged: who ran it, when, what categories were targeted, and what was found. That audit trail is what you produce when a court or regulator asks how you handled PII in a matter.
At an average paralegal rate of $150/hour, manually redacting a large production (10,000 pages) can easily reach $15,000 in labor alone, before factoring in QC and rework. AI-powered tools compress that to a fraction of the cost while catching PII variations that manual review routinely misses.
For firms using Clio, redaction best practices for Clio users can help integrate this into your matter workflow. For a broader look at where AI detection excels and where human review is still needed, see AI vs manual redaction for law firms.
Summary
Why law firms keep exposing PII in PDFs: Usually because redaction is done visually (black boxes, highlighters) instead of by permanently removing data from the file, and because metadata, comments, OCR layers, and verification are skipped. The pattern is consistent across high-profile failures from Manafort to Apple v. Samsung, where visual masking was defeated by basic copy-paste.
Fix it by using tools and a process that remove data and clean metadata, then verify with copy-paste, search, and cross-reader tests before every release. For help picking the right solution, review our redaction software comparison covering seven tools with honest pros and cons. Standardize the workflow, train the team, and consider AI-powered tools that address the human limitations driving most failures. The cost of prevention is a fraction of the cost of a breach, sanctions, or malpractice claim.
Want to see the difference between visual masking and permanent removal? Redact a PDF for free and try the copy-paste test on the output. No account needed. For full production workflows with metadata cleanup and audit trails, sign up free or book a demo.
Frequently asked questions
What are the most common PII mistakes in legal PDFs?
Five recurring failures: using black highlighting or shapes instead of real redaction, forgetting to strip metadata (author names, edit history, software versions), missing PII in headers and footers, leaving identifiers in document properties, and overwriting the original file rather than saving a new redacted version. Each looks fine on screen but fails verification.
Why do law firms keep exposing PII in PDFs?
Three structural reasons. First, the default PDF tools (Preview on Mac, free Adobe Reader) have no real redaction feature. Second, the visual result of fake redaction is indistinguishable from real redaction without verification. Third, manual review at filing deadlines under time pressure leads to skipped verification steps. AI-based detection plus mandatory verification reduces all three.
Is metadata considered PII under court rules?
It can be. FRCP 5.2 specifies the five identifiers that must be redacted from filing content. Local rules and protective orders may extend this. PDF metadata that contains author names, firm identifiers, or edit history can constitute privileged or confidential information even if it's not explicitly listed in Rule 5.2. Strip metadata in every redaction workflow.
How do I prevent PII leaks in my firm's PDFs?
Use a redaction tool with AI-based PII detection rather than relying on visual review. Run mandatory verification (copy test, search test, cross-viewer test, text extraction) before any external sharing. Strip metadata in the same step as content redaction. Train staff to never use Highlight or Shape tools as redaction. Consider a checklist for each filing.
Stop redacting documents manually
RedactifyAI detects PII automatically and redacts it permanently. Not just a black box overlay. Try it free, no credit card required.