January 06, 2004

New MS Office 2003/XP Add-in to Remove Hidden Data

Microsoft just published a free tool to remove hidden data (metadata) from the following Office applications:

  • Microsoft Office Word 2003

  • Microsoft Office Excel 2003

  • Microsoft Office PowerPoint 2003

  • Microsoft Word 2002

  • Microsoft Excel 2002

  • Microsoft PowerPoint 2002

Microsoft's overview states: "With this add-in you can permanently remove hidden and collaboration data, such as change tracking and comments, from Word 2003/XP, Excel 2003/XP, and PowerPoint 2003/XP files." There is a "readme" file included in its installation which provides a complete list of all of the types of data that the tool will help to remove.

Per MS, "you can run the Remove Hidden Data add-in on individual files from within your Office XP or Office 2003 application. Or, you can run Remove Hidden Data on multiple files at once from the command line."

Here's the big catch (you knew there had to be one): Currently, the only supported operating system for this add-in is Windows XP. Microsoft states that "[t]he Remove Hidden Data add-in has not been tested on Microsoft Windows 2000. Also, the add-in cannot be installed on Windows 98 or Windows Millennium Edition." While I'll resist the temptation to mention this appears to be yet another MS ploy to drive Win XP upgrades, I have to admit the thought crossed my mind. It could also be that MS wanted to release it as soon as they had a Win XP-ready add-in. Here's hoping they will support other Windows versions (but I'm also not holding my breath on this one).

Apparently this add-in is free to licensed users of these programs. Please note this is not a separate standalone program, so you must have the necessary Office program installed in Windows XP for the add-in to work. Microsoft's web page above also lists a number of helpful tips, such as saving to a new file so as to preserve any wanted items (e.g., Track Changes) in the original collaborated files.

I mentioned the readme file so that savvy users could compare its functionality to other metadata removers on the market. Although it's free, I strongly suggest that you make sure this tool removes everything you need it to remove. If it doesn't, then I recommend obtaining a program that will do the necessary job rather than rely upon this free utility. Otherwise, it could create a false sense of security, which when relied upon can cause many of the same problems as not using a metadata remover at all. Still, if you do not currently have a metadata remover and use the Office XP or Office 2003 suites, then using this add-in is probably better than the alternative.

On another note, while speaking at a recent legal technology conference, I was glad I attended a presentation from Donna Payne of Payne Consulting. She emphasized that metadata issues and improved metadata control is at least one compelling reason to upgrade to either Office XP or 2003 from prior versions. Of course, she then "scared us straight" by demonstrating metadata issues about which MS was unaware until she showed them. Yikes.

Topic(s):   Electronic Discovery  |  Legal Technology  |  Privacy & Security
Posted by Jeff Beard
Comments

Clarification: "Your code" in the comment immediately after this one refers to the VBA snippet posted farther down by Arfa.

Posted by: Wells Anderson at January 11, 2004 08:45 PM

Your code only removes the history of tracked changes. The MS utility claims to remove the following, according to a webpage included with the application:

Comments. This data is removed automatically.
Previous authors and editors. This data is removed automatically.
User name. This data is removed automatically.
Personal summary information. This data is removed automatically.
Headers and footers. Except in command-line mode, you will be prompted when this data is found. By default, this data is left in place.
Revision marks. This data is removed automatically.
Deleted text. This data is removed automatically.
Hidden text. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
Hidden rows/columns. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
Pivot tables. Except in command-line mode, you will be prompted when this data is found. By default, some data will be removed from the pivot table.
Hyperlinks. Except in command-line mode, you will be prompted when this data is found. By default, this data is left in place.
Versions. This data is removed automatically.
Field codes. Except in command-line mode, you will be prompted when this data is found. By default, the field codes will be resolved and left in place.
Template name. This data is removed automatically.
VB Modules. Except in command-line mode, you will be prompted when this data is found. By default, this data is left in place.
File paths. Except in command-line mode, you will be prompted when this data is found. By default, this data is left in place.
Embedded objects. Except in command-line mode, you will be prompted when this data is found. By default, embedded objects that are Office documents will be removed, while other embedded objects will be left in place.
Note  For those embedded objects that are removed, the picture representing the embedded object visually remains in the document, but all the other data related to it is removed.Hidden worksheets. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
Custom views. Except in command-line mode, you will be prompted when this data is found. By default, this data is left in place.
SmartTags. These are turned off, but the appropriate text is left in place.
Links to external data sources. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
Custom properties. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
The ID number used to identify your document for the purpose of merging changes back into the original document. This data is removed automatically.
Printer paths (except as noted in Known issues). This data is removed automatically.
Routing slips. This data is removed automatically.
E-mail headers. This data is removed automatically.
Scenario comments. This data is removed automatically.
PowerPoint presentation notes. Except in command-line mode, you will be prompted when this data is found. By default, this data is removed.
PowerPoint off-slide content. By default, this data is left in place.
Unique identifiers (Office 97 documents only). This data is removed automatically.

Posted by: Wells Anderson at January 11, 2004 08:42 PM

What does the MS product obliterate that wouldn't be cleared by the following VBA:

If ActiveDocument.Revisions.Count >= 1 Then
ActiveDocument.AcceptAllRevisions
ActiveDocument.Revisions.AcceptAll
End If
ActiveDocument.TrackRevisions = False
ActiveDocument.ShowRevisions = False
ActiveDocument.PrintRevisions = False

Posted by: Arfa at January 9, 2004 10:15 AM