A surprise with Emacs and Office 2007

I had a little surprise when I tried to open an Excel file from Emacs. I was using dired, a sort of file explorer inside Emacs. I expected one of two things to happen. Maybe Emacs would know to launch the file using Excel. Or maybe it would open the binary file as a bunch of gibberish. Instead I got something like this:


This was disorienting at first. Then I thought about how Office 2007 documents are zipped XML files. But how does dired know that my .xlsx file is a zip file? I suppose since Emacs is a Unix application at heart, it’s acting like a Unix application even though I was using it on Windows. It’s determining the file type by inspected the file itself and ignoring the file extension.

(Office 2007 documents are not entirely XML. The data and styling directives are XML, but embedded files are just files. The spreadsheet in this example contains a large photo. That photo is a JPEG file that gets zipped up with all the XML files that make up the spreadsheet.)

So I learned that Emacs knows how to navigate inside zip files, and that a convenient way to poke around inside an Office 2007 file is to browse into it from Emacs.

Here’s another post that discusses Emacs and Office 2007, contrasting their interface designs: Clutter-discoverability trade-off

Bundled versus unbundled version history

The other day I said to a colleague that an advantage to LaTeX over Microsoft Word is that it’s easy to version LaTeX files because they’re just plain text. My colleague had the opposite view. He said that LaTeX was impossible to version because its files are just plain text. How could we interpret the same facts so differently?

I was thinking about checking files in and out of a version control system. With a text file, the version control system can tell you exactly how two versions differ. But with something like a Word document, the system will give an unhelpful message like “binary files differ.”

My colleague was thinking about using the change tracking features of Microsoft Word. He’s accustomed to seeing documents in isolation, such as a file attachment in an email. In that setting, a plain text file has no version history, but a Word document may.

I assumed version information would be external to the document. He assumed the version information would be bundled with the document. My view is typical of software developers. His is typical of everyone else.

These two approaches are analogous to functional programming versus object oriented programming. Version control systems have a functional view of files. The versioning functionality is unbundled from the file content, in part because the content (typically source code files) could be used by many different applications. Word provides a sort of object oriented versioning system, bundling versioning functionality with the data.

As with functional versus object oriented programming, there’s no “right” way to solve this problem, only approaches that work better in different contexts. I much prefer using a version control system to track changes to my files, but that approach won’t fly with people who don’t share a common version control system or don’t use version control at all.

Related posts

Clutter-discoverability trade-off

There’s a tension between presenting a user an uncluttered interface and helping the user discover new features. This post will begin by discussing two extreme examples. On the cluttered but discoverable end of the spectrum is Microsoft Word 2007. On the uncluttered but also undiscoverable end is Emacs.

Microsoft added the ribbon toolbar to Office 2007 to make it easier to discover new features. Before that release, 90% of the feature requests the Office team received were for features that Office already supported. The functionality was there, but users couldn’t discover it.

Word 2007 ribbon

According to this report, the ribbon has been remarkably successful.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

Power users often dislike the ribbon, but most of the estimated half billion people who use Microsoft Office are not power users.

(By the way, you can collapse the ribbon with Control-F1. The ribbon will reappear when you click on a menu. On a small screen, say on a netbook, this could greatly increase your vertical real estate.)

In some ways Emacs may be the exact opposite of Microsoft Word. It has an enormous number of features, and yet it doesn’t feel cluttered. The downside is that discoverability in Emacs is pretty bad. The best way to discover Emacs features is to read the documentation. There are ways to discover features while using Emacs, but you have to be fairly deep into Emacs before you learn how to learn more.

Can you increase discoverability without adding clutter? Maybe if your design is not very good to begin with. But after some refinement it seems inevitable that you’ll have to decide whether you’re willing to increase clutter in order to increase discoverability.

One suggested compromise is to have interfaces adapt over time. Applications could start out with voluminous menus when discoverability is most important, then hide uncommonly used options over time, reducing clutter as users gain experience. Microsoft tried that approach in Office 2003 without much success. It sounded like a good idea, but changing menus scared novice users and annoyed advanced users.

A variation on this approach is to make controls visible based on context rather than based on frequency of use. People find this easier to understand.

The trade-off between discoverability and clutter may be a question of where you want your clutter, in the UI or in external documentation. I suppose I’d prefer the clutter in the UI for software I use rarely and in the documentation for software I use the most.

Related posts

Did the MS Office ribbon work?

One of the major design goals for Microsoft Office 2007 was making features easier to discover. A study had shown that about 90% of the feature requests for Microsoft Office were for features already in the product. People just didn’t know what was already there.

A major part of Microsoft’s response was the “ribbon” interface. More controls are on display rather than being hidden behind a deep hierarchy of menus. According to Katherine Murray, the user interface changes achieved their goal.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

The quote above was taken from First Look: Microsoft Office 2010. I’d like to see more details, but the book is a sales brochure and not a statistical report. Still, if you take these figures at face value, it seems the ribbon and other user interface changes were very successful.

Many pundits hate the ribbon. But most of the 500 million people who use Microsoft Office are not pundits.

Office 2007 documents are zipped XML

Microsoft Office 2007 documents are zipped XML files. For example, you can change a Word document’s extension from .docx to .zip and unzip it. Apparently this isn’t widely known; most people I talk to are surprised when I mention this.

I’ve found a couple uses for the zip/XML format. One is that you can unzip a document and grab all the embedded content. For example, .jpeg images are simply files that are zipped up into the Office document.

Another use is that you can crack open a document’s underlying XML to search for something you can’t find via the user interface. You can unzip Office documents, tweak them, and zip them back up. I don’t recommend this, but I’ve done it when I was desperate. (Microsoft publishes an API for manipulating Office files. Using the official APIs is safer and in the long run easier, but I haven’t looked into it.)

Related posts

I owe Microsoft Word an apology

I tried to use the Equation Editor in Microsoft Word years ago and hated it. It was hard to use and produced ugly output. I tried it again recently and was pleasantly surprised. I’m using Word 2007. I don’t remember what version I’d tried before.

I’ve long said that math written in Word is ugly, and it usually is. But the fault lies with users, like myself, not with Word. I realize now that the problem is that most people writing math in Word are not using the Equation Editor. LaTeX produces ugly math too when people do not use it correctly, though this happens less often.

Math typography is subtle. For example, mathematical symbols are set in an italic font that is not quite the same as the italic font used in prose. Also, word-like symbols such as “log” or “cos” are not set in italics. I imagine most people do not consciously notice these conventions — I never noticed until I learned to use LaTeX — but subconsciously notice when the conventions are violated. The conventions of math typography give clues that help readers distinguish, for example, the English indefinite article “a” from a variable named “a” and to distinguish the symbol for maximum from the product of variables “m”, “a”, and “x.”

Microsoft’s Equation Editor typesets math correctly. Word documents usually do not, but only because folks usually do not use the Equation Editor. In the following example, I set the same equation three times: using ordinary text, using ordinary italic for the “x”, and finally using the Equation Editor.

screen shot of trig identity using MS Word

Note that the “x” in the third version is not the same as the italic “x” in the second version. The prose in this example is set in Calibri font and the Equation Editor uses Cambria Math font. Also, I did not tell Word to format “sin” and “cos” one way and “x” another or tell it what font to use; I simply typed sin^2 x + cos^2 x = 1 into the Equation Editor and it formatted the result as above. I haven’t used it much, but the Equation Editor seems to be more capable and easier to use than I thought.

Here are a few more examples of Equation Editor output.

examples of math using Word: Gaussian integral, Fourier series, quadratic equation

I still prefer using LaTeX for documents containing math symbols. I’ve used LaTeX for many years and I can typeset equations very quickly using it. But I’m glad to know that Word can typeset equations well and that the process is easier than I thought.

I tried out the Equation Editor because Bob Matthews suggested I try MathType, a third-party equation editor add-on for Microsoft Word. I haven’t tried MathType yet but from what I hear it produces even better output.

Related post: Contrasting Microsoft Word and LaTeX

Three ways to convert documents to PDF

Microsoft has an Office 2007 plug-in that lets you save documents as PDF files. This works for all Microsoft Office applications, not just Microsoft Word. The only drawback is that this only works for Office 2007, not earlier versions of Office, and does not work with other document types.

Adobe Acrobat (not the free Adobe Reader) installs a printer driver that lets you convert any document to a PDF by “printing” it to their software. The advantage is that this works for any document type. However, if you’re starting with a Word 2007 document, Microsoft’s plug-in is much faster, maybe 10x faster.

If you don’t want to buy Adobe Acrobat, you could use PDF995. Like Adobe Acrobat, this installs a printer driver; you convert documents to PDF by choosing this software as your “printer.” PDF995 comes in two versions: a free version supported by advertising, and an advertising-free version for $9.95.

I would rank these methods in the order presented above. I’ve had the best experience with the Microsoft plug-in. The Acrobat printer driver is slow, but usually does a good job. The PDF995 printer driver works OK most of the time, but I had a few issues with it. It’s been a long time since I used it, but I think the problems had to do with unwanted footers and sometimes fonts in the PDF not matching the original fonts. I’m not sure now, but I think I’ve also had problems with the Acrobat printer driver.

If you want to make a PDF from a LaTeX document, use the pdflatex program that ships with LaTeX. I’ve never had any problems with it.

Update: See this post for notes on PDFCreator and pdftk.

Four ways to convert Excel tables to LaTeX

Gregor Gorjanc has a post on Excel and LaTeX that lists four ways to convert and Excel table into LaTeX. I’ve used two of the methods he lists: brute force and excel2latex. I recommend excel2latex. I used it frequently until I upgraded to Office 2007 and the plug-in quit working. The only bug I remember with it was that sometimes it would give you a warning saying it didn’t work, but it did; the LaTeX code you wanted was waiting for you on the Windows clipboard.

I plan to try out Gregor’s other two suggestions. Creating tables in Excel is far easier than doing so in LaTeX and I miss the functionality that excel2latex provided. Maybe there’s a way to use excel2latex with Excel 2007. If you know of a way, please leave a comment.

Three ways to enter Unicode characters in Windows

Won currency symbol, U+20A9

Here are three approaches to entering Unicode characters in Windows. See the next post for entering Unicode characters in Linux.

Alt – x

In Microsoft Word you can insert Unicode characters by typing the hex value of the character then typing Alt-x. You can also see the Unicode value of a character by placing the cursor immediately after the character and pressing Alt-x. This also works in applications that use the Windows rich edit control such as WordPad and Outlook.

Pros: Nothing to install or configure. You can see the numeric value before you turn it into a symbol. It’s handy to be able to go the opposite direction, looking up Unicode values for characters.

Cons: Does not work with many applications.

Alt – +

Another approach which works with more applications is as follows. First create a registry key under HKEY_CURRENT_USER of type REG_SZ called EnableHexNumpad, set its value to 1, and reboot. Then you can enter Unicode symbols by holding down the Alt key and typing the plus sign on the numeric keypad followed by the character value. When you release the Alt key, the symbol will appear. This approach worked with most applications I tried, including Firefox and Safari, but did not with Internet Explorer.

Pros: Works with many applications. No software to install.

Cons: Requires a registry edit and a reboot. It’s awkward to hold down the Alt key while typing several other keys. You cannot see the numbers you’re typing. Doesn’t work with every application.


Another option is to install the UnicodeInput utility. This worked with every application I tried, including Internet Explorer. Once installed, the window below pops up whenever you hold down the Alt key and type the plus sign on the numeric keypad. Type the numeric value of the character in the box, click the Send button, and the character will be inserted into the window that had focus when you clicked Alt-plus.

UnicodeInput screenshot

Pros: Works everywhere (as far as I’ve tried). The software is free. Easy to use.

Cons: Requires installing software.

Related links