How to Redact Sensitive Information from a PDF Properly
Redacting a PDF is one of those tasks that looks easy and goes wrong in spectacular ways. Drawing a black rectangle over a name in Acrobat or Preview hides the name visually but leaves the original text in the document, recoverable in seconds by anyone with a free PDF reader. High-profile leaks at the United Nations, the U.S. Department of Justice, and Manafort's legal team have all involved precisely this mistake. Proper redaction permanently removes the text from the document, which is harder than it sounds and benefits from a tool built for the purpose.
A short history of failed PDF redaction
PDF redaction failures have been a recurring news story for over twenty years. In May 2005, the U.S. military published a report on the killing of Italian journalist Nicola Calipari in Iraq with sensitive sections "redacted" by black overlays; Italian journalists who downloaded the PDF were able to select and copy the underlying text within minutes. In 2009, the U.S. Department of Justice released a memo on enhanced interrogation with the same flaw. In 2019, Paul Manafort's legal team filed a court document with bracketed black redactions that turned out to be transparent boxes, exposing details of his contacts with Konstantin Kilimnik. The same year, a confidential Boeing FAA filing about the 737 MAX MCAS system reached reporters in fully readable form because the redactions were just shapes.
The pattern is so consistent that the NSA published the guidance "Redacting with Confidence: How to Safely Publish Sanitized Reports Converted from Word to PDF" in late 2005, and Adobe and Foxit both shipped dedicated redaction modes in the years that followed. The core lesson: a redaction tool must delete the underlying text and replace it with an opaque mark; visually covering with a rectangle is never sufficient.
Why visually covering text fails
A PDF stores a page as a content stream: a sequence of drawing operators that place text, lines, rectangles, and images on the page. When you draw a black rectangle over a name in Acrobat, the PDF now contains both the text operator (writing the name) and the rectangle operator (drawing the box over it). The viewer renders both, in order, producing a page where the name is hidden visually. The text operator is still in the file, indexable, copyable, and recoverable by any PDF parser. Adobe's own Reader will let you select the hidden text with Ctrl+A and paste it into Notepad.
Form fields, comments, and metadata are stored in entirely separate dictionaries in the PDF and are not affected by visual overlays at all. A "redacted" PDF that still has the author's name in metadata, comments referencing the redacted text by name, or form field values containing the original data is just as leaky as one with text under a rectangle.
How a proper redaction tool works
A real redaction does three things:
- Removes the text content from the content stream at the redacted regions, so any future parser sees the redaction mark, not the original text.
- Removes any metadata that referenced the original content, including the document author, last editor, software, original filename, and any custom XMP metadata fields.
- Removes form fields, comments, and attachments that overlap or reference the redacted regions.
- Replaces the area with an opaque mark (usually a black rectangle, sometimes with a redaction reason like "[FOIA exemption b6]") drawn on top of the now-empty content.
Browser-based redaction tools using pdf-lib or PDF.js can do all of this in JavaScript without uploading the file. The redacted PDF is rebuilt locally and offered as a download. Because the original never leaves your device, the privacy guarantee is total.
How to redact a PDF, step by step
- Upload the PDF. Drop the file onto the page. The tool reads it into memory and shows the first page for preview. Nothing is uploaded.
- Find what to redact. Use the text search to find names, account numbers, dates of birth, addresses, or any other recurring sensitive string. The tool highlights every occurrence.
- Mark redaction regions. Click and drag to draw a rectangle, or click "redact all matches" to apply the mark to every found instance at once.
- Optionally add a reason label. Government workflows (FOIA, GDPR Article 17, HIPAA) often require the redaction to be labelled with the legal basis. Type the label and it is drawn inside the rectangle.
- Apply the redaction. This is the key step: it permanently deletes the text under the rectangles from the content stream, strips metadata, and saves a new PDF with the marks burned in.
- Verify the result. Open the redacted PDF, try Ctrl+A then Ctrl+C and paste into a text editor. You should see the redaction labels (or nothing) where the original text was, never the original text itself.
What to redact
The obvious cases are names, addresses, phone numbers, email addresses, and account numbers. The less obvious ones cause most of the real-world leaks:
| Category | What to look for |
|---|---|
| Direct identifiers | Names, addresses, phone numbers, email addresses, social security numbers |
| Financial | Account numbers, credit card numbers, IBAN, routing numbers, balances |
| Health | Diagnoses, medication, dates of treatment, patient ID, insurance numbers |
| Government | Case numbers, source identifiers, dates and times of operations, locations |
| Indirect identifiers | Job titles + employer + city (uniquely identifying), unique vehicle descriptions, distinctive medical conditions |
| Metadata | Document author, original filename, last editor, software version, total edit time |
| Comments | Reviewer comments, "Q: who is this person?" annotations, track changes |
| Form fields | Pre-filled values, even from earlier versions |
| Attachments | Embedded files referenced by the document |
| Image regions | Names on a screenshot, faces in a photo, license plates, addresses on an envelope |
The last row is especially important: a screenshot of a CRM showing a customer record, embedded in the PDF as a raster image, will not be redacted by text-layer tools. The image itself must be painted over.
Common pitfalls
- Black-boxing in Word and re-exporting to PDF. The text under the box is still in the Word document and survives the export. Even if your visible page looks clean, the PDF content stream contains the original.
- Using highlight + change colour to black. The highlight is a comment annotation, not a content stream change. Anyone can remove the annotation to reveal the original.
- Forgetting metadata. A redacted memo with "Author: John Smith" in the document properties tells you who wrote it, even if every name in the body is redacted.
- Forgetting comments. Adobe and PDFelement reviewers often add comments referencing names that are being redacted in the body. Strip all comments.
- Forgetting form field history. Filled PDF forms can carry the values of previous form submissions in their AcroForm dictionary. Flattening removes the field, but the value may still be in document history.
- Not redacting images of text. OCR the document, identify the text bounding boxes, then paint over those regions in the image itself, not just the OCR text layer.
- Confusing print-to-PDF with redaction. Printing to PDF rasterises the page, which does remove the text layer, but produces a much larger file and loses searchability. It is a heavy-handed workaround, not a redaction.
- Sharing the original by mistake. Always rename the redacted file with a clear suffix (
-redacted.pdf) so you cannot accidentally attach the original. - Trusting visual review alone. The redacted PDF may look perfect and still leak. Always test by selecting all text and copying, by extracting metadata with
pdfinfoorexiftool, and by checking the file size against the original. - Permission-based "redaction". Locking a PDF with a password or restricting copy is not redaction. The data is still in the file and the restrictions are advisory; PDF password removers are a click away.
Alternative tools and workflows
| Tool | Strength | Watch out for |
|---|---|---|
| Browser PDF redactor (this tool) | Local, no upload, free | Slower than native tools on very large PDFs |
| Adobe Acrobat Pro | Industry standard, batch-redaction, signed audit trail | Paid, processes locally but vendor lock-in |
| Foxit PhantomPDF | Cheaper than Adobe, similar feature set | Some redactions are subscription-tier |
| qpdf (CLI) | Powerful, scriptable, free | Not a true redaction tool, you must combine with pdftotext + sed for text removal |
| pdftk | Common for splits and merges | Does not include redaction; do not use it for sensitive removal |
| Print-to-PDF rasterisation | Removes text layer by design | Huge file size, loss of searchability, image-level traces may remain |
| Online "redaction" services | Quick UI | Upload to a third-party server; review their retention and privacy policy |
For a one-off legal filing or job application, the browser tool is the right answer. For batch redaction of hundreds of FOIA requests, Acrobat Pro or a scripted qpdf + pdftotext pipeline pays for itself. For redacting image-heavy scans, run OCR first and then redact the bounding boxes in both the OCR text layer and the underlying raster.
Verification checklist before sharing
Before you send a redacted PDF outside your team, run through this checklist:
- Select all text (Ctrl+A) and copy into a text editor. The redacted strings should not appear.
- Open metadata:
exiftool file.pdfor use a PDF viewer's properties dialog. Author, creator, last editor, original filename, custom XMP fields should all be empty or anonymous. - Check comments and annotations explicitly. Acrobat's comment panel, Preview's annotation list, or
pdftotext -layoutwill all surface them. - Confirm the file size is materially smaller than the original (removed text and metadata should reduce size). A redacted PDF that is the same byte size as the original is suspicious.
- For documents with images: open in an image viewer and zoom in on the redacted areas. Some viewers reveal image content under overlays.
- For multi-page documents: spot-check at least the first, last, and three random middle pages. Search for the redacted names again to be sure no occurrence was missed.
Privacy and the redactor
The browser PDF redactor runs entirely in your device's memory. The file you drop is read by the File API, parsed by pdf-lib or PDF.js in JavaScript, re-rendered with the redactions applied, and offered back as a download. Nothing is uploaded, nothing is logged, nothing is cached server-side. For sensitive material (court filings, medical records, FOIA responses, breach notifications), that local-only flow is the difference between a redaction you control and a redaction you have to trust someone else to handle correctly. The whole tool can run offline once the page is loaded, which you can verify by disconnecting your network and redacting another file.
Frequently Asked Questions
Is drawing a black box over text in a PDF editor enough to redact it?
No. Drawing a black rectangle over text only hides the text visually. The underlying characters remain in the PDF and can be recovered by copying, by selecting the text under the rectangle, or by extracting the text layer with any PDF parser. Proper redaction removes the text from the document and replaces it with an opaque shape.
What kinds of information can be recovered from a poorly redacted PDF?
Text content (even if covered visually), embedded metadata (author, last editor, software, original filename), revision history if the PDF was saved with track changes, comments, form field values, attached files, and sometimes raster image previews that show the original page before the redaction overlay.
Does flattening a PDF redact it?
Flattening merges layers and removes form fields, but does not by itself remove the text under a drawn rectangle. The text content remains in the content stream. You must explicitly delete the text, not just cover it.
How do I redact text that appears as part of an image (a scan)?
For scanned documents, run OCR first to detect the text positions, then redact those regions in the underlying image (not just the OCR layer). Some tools let you paint over the image with a solid colour at the redaction location, which is the correct approach for raster content.
What standards define proper PDF redaction?
The U.S. National Security Agency published "Redacting with Confidence" in 2005-2006, after several high-profile failed redactions led to leaks. Adobe's PDF Reference and the ISO 32000-1 PDF specification describe content streams in enough detail to confirm that visually covering text does not remove it. The CIA, FBI, and most government agencies now require the use of dedicated redaction tools that destroy the underlying content, not just hide it.