How to Redact Sensitive Information from a PDF Properly

· 8 min read

Redacting a PDF is one of those tasks that looks easy and goes wrong in spectacular ways. Drawing a black rectangle over a name in Acrobat or Preview hides the name visually but leaves the original text in the document, recoverable in seconds by anyone with a free PDF reader. High-profile leaks at the United Nations, the U.S. Department of Justice, and Manafort's legal team have all involved precisely this mistake. Proper redaction permanently removes the text from the document, which is harder than it sounds and benefits from a tool built for the purpose.

A short history of failed PDF redaction

PDF redaction failures have been a recurring news story for over twenty years. In May 2005, the U.S. military published a report on the killing of Italian journalist Nicola Calipari in Iraq with sensitive sections "redacted" by black overlays; Italian journalists who downloaded the PDF were able to select and copy the underlying text within minutes. In 2009, the U.S. Department of Justice released a memo on enhanced interrogation with the same flaw. In 2019, Paul Manafort's legal team filed a court document with bracketed black redactions that turned out to be transparent boxes, exposing details of his contacts with Konstantin Kilimnik. The same year, a confidential Boeing FAA filing about the 737 MAX MCAS system reached reporters in fully readable form because the redactions were just shapes.

The pattern is so consistent that the NSA published the guidance "Redacting with Confidence: How to Safely Publish Sanitized Reports Converted from Word to PDF" in late 2005, and Adobe and Foxit both shipped dedicated redaction modes in the years that followed. The core lesson: a redaction tool must delete the underlying text and replace it with an opaque mark; visually covering with a rectangle is never sufficient.

Why visually covering text fails

A PDF stores a page as a content stream: a sequence of drawing operators that place text, lines, rectangles, and images on the page. When you draw a black rectangle over a name in Acrobat, the PDF now contains both the text operator (writing the name) and the rectangle operator (drawing the box over it). The viewer renders both, in order, producing a page where the name is hidden visually. The text operator is still in the file, indexable, copyable, and recoverable by any PDF parser. Adobe's own Reader will let you select the hidden text with Ctrl+A and paste it into Notepad.

Form fields, comments, and metadata are stored in entirely separate dictionaries in the PDF and are not affected by visual overlays at all. A "redacted" PDF that still has the author's name in metadata, comments referencing the redacted text by name, or form field values containing the original data is just as leaky as one with text under a rectangle.

How a proper redaction tool works

A real redaction does three things:

  1. Removes the text content from the content stream at the redacted regions, so any future parser sees the redaction mark, not the original text.
  2. Removes any metadata that referenced the original content, including the document author, last editor, software, original filename, and any custom XMP metadata fields.
  3. Removes form fields, comments, and attachments that overlap or reference the redacted regions.
  4. Replaces the area with an opaque mark (usually a black rectangle, sometimes with a redaction reason like "[FOIA exemption b6]") drawn on top of the now-empty content.

Browser-based redaction tools using pdf-lib or PDF.js can do all of this in JavaScript without uploading the file. The redacted PDF is rebuilt locally and offered as a download. Because the original never leaves your device, the privacy guarantee is total.

How to redact a PDF, step by step

  1. Upload the PDF. Drop the file onto the page. The tool reads it into memory and shows the first page for preview. Nothing is uploaded.
  2. Find what to redact. Use the text search to find names, account numbers, dates of birth, addresses, or any other recurring sensitive string. The tool highlights every occurrence.
  3. Mark redaction regions. Click and drag to draw a rectangle, or click "redact all matches" to apply the mark to every found instance at once.
  4. Optionally add a reason label. Government workflows (FOIA, GDPR Article 17, HIPAA) often require the redaction to be labelled with the legal basis. Type the label and it is drawn inside the rectangle.
  5. Apply the redaction. This is the key step: it permanently deletes the text under the rectangles from the content stream, strips metadata, and saves a new PDF with the marks burned in.
  6. Verify the result. Open the redacted PDF, try Ctrl+A then Ctrl+C and paste into a text editor. You should see the redaction labels (or nothing) where the original text was, never the original text itself.

What to redact

The obvious cases are names, addresses, phone numbers, email addresses, and account numbers. The less obvious ones cause most of the real-world leaks:

Category What to look for
Direct identifiers Names, addresses, phone numbers, email addresses, social security numbers
Financial Account numbers, credit card numbers, IBAN, routing numbers, balances
Health Diagnoses, medication, dates of treatment, patient ID, insurance numbers
Government Case numbers, source identifiers, dates and times of operations, locations
Indirect identifiers Job titles + employer + city (uniquely identifying), unique vehicle descriptions, distinctive medical conditions
Metadata Document author, original filename, last editor, software version, total edit time
Comments Reviewer comments, "Q: who is this person?" annotations, track changes
Form fields Pre-filled values, even from earlier versions
Attachments Embedded files referenced by the document
Image regions Names on a screenshot, faces in a photo, license plates, addresses on an envelope

The last row is especially important: a screenshot of a CRM showing a customer record, embedded in the PDF as a raster image, will not be redacted by text-layer tools. The image itself must be painted over.

Common pitfalls

Alternative tools and workflows

Tool Strength Watch out for
Browser PDF redactor (this tool) Local, no upload, free Slower than native tools on very large PDFs
Adobe Acrobat Pro Industry standard, batch-redaction, signed audit trail Paid, processes locally but vendor lock-in
Foxit PhantomPDF Cheaper than Adobe, similar feature set Some redactions are subscription-tier
qpdf (CLI) Powerful, scriptable, free Not a true redaction tool, you must combine with pdftotext + sed for text removal
pdftk Common for splits and merges Does not include redaction; do not use it for sensitive removal
Print-to-PDF rasterisation Removes text layer by design Huge file size, loss of searchability, image-level traces may remain
Online "redaction" services Quick UI Upload to a third-party server; review their retention and privacy policy

For a one-off legal filing or job application, the browser tool is the right answer. For batch redaction of hundreds of FOIA requests, Acrobat Pro or a scripted qpdf + pdftotext pipeline pays for itself. For redacting image-heavy scans, run OCR first and then redact the bounding boxes in both the OCR text layer and the underlying raster.

Verification checklist before sharing

Before you send a redacted PDF outside your team, run through this checklist:

Privacy and the redactor

The browser PDF redactor runs entirely in your device's memory. The file you drop is read by the File API, parsed by pdf-lib or PDF.js in JavaScript, re-rendered with the redactions applied, and offered back as a download. Nothing is uploaded, nothing is logged, nothing is cached server-side. For sensitive material (court filings, medical records, FOIA responses, breach notifications), that local-only flow is the difference between a redaction you control and a redaction you have to trust someone else to handle correctly. The whole tool can run offline once the page is loaded, which you can verify by disconnecting your network and redacting another file.

Frequently Asked Questions

Is drawing a black box over text in a PDF editor enough to redact it?

No. Drawing a black rectangle over text only hides the text visually. The underlying characters remain in the PDF and can be recovered by copying, by selecting the text under the rectangle, or by extracting the text layer with any PDF parser. Proper redaction removes the text from the document and replaces it with an opaque shape.

What kinds of information can be recovered from a poorly redacted PDF?

Text content (even if covered visually), embedded metadata (author, last editor, software, original filename), revision history if the PDF was saved with track changes, comments, form field values, attached files, and sometimes raster image previews that show the original page before the redaction overlay.

Does flattening a PDF redact it?

Flattening merges layers and removes form fields, but does not by itself remove the text under a drawn rectangle. The text content remains in the content stream. You must explicitly delete the text, not just cover it.

How do I redact text that appears as part of an image (a scan)?

For scanned documents, run OCR first to detect the text positions, then redact those regions in the underlying image (not just the OCR layer). Some tools let you paint over the image with a solid colour at the redaction location, which is the correct approach for raster content.

What standards define proper PDF redaction?

The U.S. National Security Agency published "Redacting with Confidence" in 2005-2006, after several high-profile failed redactions led to leaks. Adobe's PDF Reference and the ISO 32000-1 PDF specification describe content streams in enough detail to confirm that visually covering text does not remove it. The CIA, FBI, and most government agencies now require the use of dedicated redaction tools that destroy the underlying content, not just hide it.