September 08, 2004

Clarifying Acrobat PDF Metadata Issues

It's easy to knee-jerk on metadata issues, especially with the amount of FUD (Fear, Uncertainty, and Doubt) out there. While many professionals aim for the holy grail of complete metadata removal (or "cleaning"), I think a more informed approach is metadata management. Sometimes you want it, and sometimes you don't. Think of the usefulness and concurrent danger of "tracked changes" for a perfect example. Thus many attorneys have adopted the quick fix of converting their Word and other document types to PDF before transmitting or sharing them. The widespread assumption is that PDF is a safe haven for transmitting metadata-free documents -- something that isn't necessarily true.

PDF for Lawyers has a good post which clarifies some of the issues raised in an interesting August 2004 Law Technology News article, "Metadata: Are You Protected," by Donna Payne & Bruce Lewis. (Free subscription required.) Donna and Bruce stated that "PDF files contain substantial metadata," and the print version contains a comparison table listing nearly 20 items of metadata than can exist in PDF files.

Thus to get a balanced perspective, I highly recommend reading the LTN article first, and then head on over to the PDF for Lawyers post, which clarifies this a bit:

As I understand it, the 'tracked changes' in Word do not ordinarily pass into a PDF file when the word processing document is converted. It can happen, but it takes unusual conditions. After reading the article, I asked Ms. Payne in an E-mail to explain to me how the 'tracked changes' would be passed into a PDF file and she gave me two examples.

First, if the person who converted the Word document attached the Word file into the PDF in its native format (Acrobat allows you to attach files into a PDF document). Okay, but how many people know about this feature and would want to use it if they did? She gave a couple of better examples of where the tracked changes could pass over: (1) if you have the tracked changes visible when you convert to PDF (yes, that would create a PDF with the tracked changes blatantly showing; so make sure you look over the resulting PDF file to verify what you are sending before you send it); (2) if you have your printing configuration in Word set to print 'tracked changes' along with the document (now this is something that could sneak up on you, although you can avoid it again by reading the resulting PDF file after you create it; or you can make sure that your default printing choice is set to not include the tracked changes).

Subdued, there are indeed some known issues with the free Microsoft metadata remover, in terms of what it does and doesn't remove. I've made several posts about this, and the following link sums it up:

It's probably better than not using any metadata cleaner. However, before blindly relying on any metadata remover tool, I recommend having a clear understanding of exactly what it can and cannot do. Otherwise, users risk having a false sense of security.

Posted by: Jeff Beard at September 12, 2004 08:04 PM

Microsoft offers an extension to Word called Remove Hidden Data that strips out personal information and other "metadata." As far as I can tell it works. Gets rid of track changes, author, company, last-saved-by, etc.

Posted by: subduedcitizen at September 9, 2004 10:50 PM