Éditeur de métadonnées PDF, gratuit
Modifiez les métadonnées PDF, titre, auteur, sujet, mots-clés, etc. Tourne entièrement dans votre navigateur.
Qu'est-ce que les métadonnées PDF ?
Les métadonnées PDF sont des informations sur le document qui n'apparaissent pas dans le contenu visible. Elles incluent le titre, l'auteur, le sujet, les mots-clés, la date de création et d'autres propriétés. Ces informations facilitent l'organisation, la recherche et l'identification des documents.
Pourquoi modifier les métadonnées PDF ?
- Organisation · définissez des métadonnées cohérentes sur vos documents pour un meilleur classement et une meilleure recherche.
- Professionnalisme · assurez-vous que vos documents affichent les bons auteur et titre.
- Référencement & découverte · les mots-clés dans les métadonnées aident à la découvrabilité.
- Correction des propriétés · corrigez des informations d'auteur, de titre ou de sujet incorrectes ou manquantes.
Questions fréquentes
Modifier les métadonnées change-t-il le contenu du PDF ?
Non. Seules les métadonnées sont modifiées. Le contenu, les pages et la mise en forme du PDF restent strictement identiques.
Puis-je modifier les métadonnées d'un PDF chiffré ?
Si le PDF est protégé par mot de passe, vous ne pouvez pas modifier ses métadonnées avec cet outil. Le fichier doit d'abord être déverrouillé.
Quelle est la limite de taille de fichier ?
Cet outil prend en charge les PDF jusqu'à 10 Mo. Les fichiers plus volumineux peuvent prendre plus de temps à traiter.
What PDF metadata actually is
A PDF file can carry document-level metadata in two places at once. The original mechanism, present since PDF 1.0 (1993), is the Document Information Dictionary (called "DocInfo" or /Info): a key/value object referenced from the PDF trailer. PDF 1.4 (2001) added a second, richer mechanism, an XMP metadata stream, an XML packet (RDF/XML conforming to Adobe's eXtensible Metadata Platform) embedded as a stream object attached to the document catalog. XMP became an open ISO standard in 2012 (ISO 16684-1).
The two stores are not the same and may disagree. Adobe's reference and the ISO 32000 standards both say XMP is preferred when present, and that DocInfo should be treated as a legacy mirror. In ISO 32000-2 (PDF 2.0), the older DocInfo dictionary is formally deprecated for everything except CreationDate and ModDate (which signature handlers still use). In practice, almost every reader (Adobe Acrobat, Foxit, Preview on macOS, browser viewers) reads DocInfo by default and only falls back to XMP for fields like copyright that DocInfo never supported.
The standard DocInfo fields are Title, Author, Subject, Keywords, Creator (the application that originated the document, e.g. "Microsoft Word"), Producer (the application that produced the actual PDF, e.g. "Adobe PDF Library 17.0"), CreationDate, ModDate (in PDF date format like D:20240315093000-04'00'), and Trapped. XMP organises fields into namespaces, Dublin Core's dc:title, dc:creator, dc:rights, dc:language; XMP-MM's DocumentID, InstanceID, and History editing log; PDF/A and PDF/UA conformance markers; and any custom namespaces a tool wants to add. This editor exposes the most-used DocInfo fields directly; XMP-only fields require a more specialised editor.
A short history
PDF began with John Warnock's 1991 internal Adobe memo (the "Camelot" paper) proposing a portable document format that preserved visual fidelity across devices. Adobe shipped PDF 1.0 with Acrobat 1.0 in 1993; the DocInfo dictionary was there from day one. Through the 1990s and early 2000s the format added encryption, hyperlinks, forms, JavaScript, transparency, tagged-PDF accessibility (PDF 1.4, 2001), and the XMP metadata mechanism (also PDF 1.4). PDF/A (the archival subset that mandates embedded XMP and forbids encryption) was ratified as ISO 19005-1 in 2005. Adobe transferred PDF to ISO in 2008, where PDF 1.7 became ISO 32000-1:2008. ISO 32000-2:2017 published PDF 2.0, with the major metadata change being the deprecation of DocInfo in favour of XMP. The 2020 revision and the PDF Association's free release of the spec in April 2023 mean the standard is now openly accessible.
The privacy problem, what PDFs leak
A PDF created by typical office software broadcasts substantially more about its provenance than most users realise. From a single PDF you can usually extract:
- Author's full name. Microsoft Word writes
Authorfrom the user's Office account or the registered Windows username at install time. LibreOffice writes the user's first/last name from the user-data settings. Pages on macOS uses the system "Full Name." A PDF saved-as from any of those inherits the embedded value automatically. - The full editing history. XMP's
xmpMM:Historyrecords each save and conversion event with a timestamp, software name, and instance UUID, producing a partial revision log of the document. - Software identification down to version and build. The
Producerfield typically reads like "Microsoft® Word for Microsoft 365" or "Adobe PDF Library 17.00.6" or "Skia/PDF m120" (Chrome's print-to-PDF). This fingerprints the workstation OS and patch level. - Creation timestamp + modification timestamp + the gap between them. A 4-second gap suggests a print-to-PDF; a 45-minute gap suggests substantial editing. Together these can establish when, where and by whom a document was authored.
- Embedded image EXIF. When an image carrying EXIF GPS coordinates is dragged into a Word or InDesign document and exported to PDF, the underlying image stream often retains the EXIF tags, including latitude and longitude. ExifTool will pull them out even from "embedded" images.
- Track-changes annotations. PDFs exported from Word with "Show Markup" enabled embed reviewer initials and timestamps in annotation streams (technically content rather than metadata, but often invisible until a reader expands the comments panel).
Notable real-world cases
- Manafort court filing (January 2019): Paul Manafort's defence attorneys filed a court document using PDF redaction rectangles drawn over text. The text itself was untouched in the content stream and was extracted within hours by reporters using basic copy-paste, exposing claims that Manafort had shared US polling data with a Russian intelligence-linked associate. The accompanying metadata also named the law-firm machine and software that produced it.
- UK government "dodgy dossier" (February 2003): the document "Iraq, Its Infrastructure of Concealment, Deception and Intimidation" had editing-history metadata that named four authors, including a US graduate student whose 2002 thesis had been copy-pasted in. The Word document's hidden authorship trail was the smoking gun.
- TSA security manual (December 2009): TSA published a redacted version of its passenger-screening Standard Operating Procedures. The redactions were image overlays on top of the original text in a PDF; the underlying text was extractable. The full document, including the names of allied governments whose passport-holders received elevated screening, leaked.
- "Author: opposing-counsel firm name": repeated incidents at law firms where outgoing PDF briefs include the opposing-counsel firm name in the
Authorfield, because someone copy-pasted from a discovery PDF into a new Word document and the destination document inherited the source's author. Many firms now require Word's "Document Inspector" or Acrobat's "Sanitize Document" before any external send.
Honest scope of this tool
This editor lets you view and overwrite the standard DocInfo fields. It is genuinely useful for cleaning up author names before sending a document externally, fixing wrong title metadata that's confusing your document-management system, or stripping a workstation fingerprint from a press release. It is not a complete sanitiser. Specifically:
- Image EXIF inside embedded photos may still carry GPS coordinates and camera details.
- Track-changes and reviewer comments stored as annotations are not removed.
- Hidden text under "redaction" rectangles is still extractable, drawing a black rectangle over text doesn't remove the text from the PDF's content stream. This is the most common source of accidental disclosure.
- The
xmpMM:Historyediting log in the XMP stream is not necessarily cleared. - Embedded font subsets can identify the originating workstation if unusual fonts were used.
- Printer tracking dots (yellow microdot patterns most colour laser printers embed) are content-level and unaffected by metadata editing, the Reality Winner case (June 2017) hinged on these.
For a complete sanitisation pass on a sensitive document, the right tools are Adobe Acrobat Pro's "Sanitize Document" command, the open-source cpdf command-line utility's -remove-metadata option, or ExifTool's -all= directive followed by manual inspection. Sensitive workflows often re-create the document from extracted plain text rather than trying to scrub the original.
Tools to view metadata
- Adobe Acrobat: File → Properties. Shows the DocInfo fields and a separate "Additional Metadata" panel for the XMP packet.
- ExifTool (Phil Harvey), the command-line gold standard.
exiftool file.pdfprints everything;exiftool -all= file.pdfstrips everything. - pdfinfo (part of poppler-utils), quick CLI dump of DocInfo plus page-level details.
- pdf.js / PDF.js (the library Firefox uses to render PDFs)) exposes metadata via
doc.getMetadata()for browser-side reading. - pdf-lib: the JavaScript library powering this tool's edit pass; exposes
setTitle(),setAuthor(), etc., and writes a fully-conformant PDF back.
When you'd reach for this
- Cleaning up author/creator names before sending a document outside your organisation.
- Setting consistent title metadata for a batch of documents that will end up in a document-management system or library catalogue.
- Adding keywords for internal full-text-search systems that use them as a discovery boost.
- Fixing the wrong title when "save-as PDF" inherited a misleading filename.
- Asserting copyright / licence via the
Authorand (for tools that handle XMP)dc:rightsfield. - Quick privacy sanitisation for routine documents, though see the scope caveat above for high-stakes cases.
More questions
Why do my edits sometimes appear in DocInfo but not XMP (or vice versa)?
Because PDFs carry both stores and they can disagree. This editor writes to DocInfo (the field every reader inspects). XMP is updated for fields that have a clear DocInfo equivalent. Some viewers (Adobe Acrobat in particular) read XMP first; if you see "stale" metadata after editing, open the document with a different reader to confirm whether the issue is XMP-only or whether your reader is just caching the old version.
Will this tool break a digital signature?
Yes, almost always. A digital signature on a PDF protects the entire document including the metadata; modifying any byte breaks the signature's cryptographic verification. If you need to edit metadata on a signed PDF, you'll either need to remove the signature first (with the signer's permission), edit the metadata, and have it re-signed; or apply the metadata changes before signing in the original workflow.
What about PDF/A archival files?
PDF/A files have additional XMP requirements (the pdfaid:part and pdfaid:conformance markers, plus required Dublin Core fields). Editing a PDF/A's DocInfo without updating the XMP packet may technically take the file out of PDF/A conformance. For archival workflows, use a PDF/A-aware editor like Acrobat Pro or veraPDF.
How do I make a "completely anonymous" PDF?
For routine documents: edit the DocInfo here to clear identifying fields, then run the result through Acrobat's "Sanitize Document" or cpdf -remove-metadata. For high-stakes anonymisation (whistleblowing, journalism, legal disclosure): re-create the PDF from scratch on a different machine using only extracted plain text, with no images that came from the original. Print-and-rescan also works (the OCR layer of the rescanned PDF is freshly authored), at the cost of file size and image quality.
Does anything get sent to a server?
No. The PDF is parsed and rewritten by pdf-lib running locally in your browser; the modified file is downloaded straight to your device. Nothing about your PDF leaves the page, useful when the document contains internal author names, client information or confidential subject lines that you'd rather not upload to a third-party service. The pdf-lib library itself loads from a public CDN once with subresource-integrity verification, then is cached.