A surprise with Emacs and Office 2007

I had a little surprise when I tried to open an Excel file from Emacs. I was using dired, a sort of file explorer inside Emacs. I expected one of two things to happen. Maybe Emacs would know to launch the file using Excel. Or maybe it would open the binary file as a bunch of gibberish. Instead I got something like this:

screenshot

This was disorienting at first. Then I thought about how Office 2007 documents are zipped XML files. But how does dired know that my .xlsx file is a zip file? I suppose since Emacs is a Unix application at heart, it’s acting like a Unix application even though I was using it on Windows. It’s determining the file type by inspected the file itself and ignoring the file extension.

(Office 2007 documents are not entirely XML. The data and styling directives are XML, but embedded files are just files. The spreadsheet in this example contains a large photo. That photo is a JPEG file that gets zipped up with all the XML files that make up the spreadsheet.)

So I learned that Emacs knows how to navigate inside zip files, and that a convenient way to poke around inside an Office 2007 file is to browse into it from Emacs.

Here’s another post that discusses Emacs and Office 2007, contrasting their interface designs: Clutter-discoverability trade-off

Read More

Bundled versus unbundled version history

The other day I said to a colleague that an advantage to LaTeX over Microsoft Word is that it’s easy to version LaTeX files because they’re just plain text. My colleague had the opposite view. He said that LaTeX was impossible to version because its files are just plain text. How could we interpret the same facts so differently?

I was thinking about checking files in and out of a version control system. With a text file, the version control system can tell you exactly how two versions differ. But with something like a Word document, the system will give an unhelpful message like “binary files differ.”

My colleague was thinking about using the change tracking features of Microsoft Word. He’s accustomed to seeing documents in isolation, such as a file attachment in an email. In that setting, a plain text file has no version history, but a Word document may.

I assumed version information would be external to the document. He assumed the version information would be bundled with the document. My view is typical of software developers. His is typical of everyone else.

These two approaches are analogous to functional programming versus object oriented programming. Version control systems have a functional view of files. The versioning functionality is unbundled from the file content, in part because the content (typically source code files) could be used by many different applications. Word provides a sort of object oriented versioning system, bundling versioning functionality with the data.

As with functional versus object oriented programming, there’s no “right” way to solve this problem, only approaches that work better in different contexts. I much prefer using a version control system to track changes to my files, but that approach won’t fly with people who don’t share a common version control system or don’t use version control at all.

Related posts:

Contrasting Microsoft Word and LaTeX
Complexity of HTML and LaTeX
I owe Microsoft Word an apology

Read More

Clutter-discoverability trade-off

There’s a tension between presenting a user an uncluttered interface and helping the user discover new features. This post will begin by discussing two extreme examples. On the cluttered but discoverable end of the spectrum is Microsoft Word 2007. On the uncluttered but also undiscoverable end is Emacs.

Microsoft added the ribbon toolbar to Office 2007 to make it easier to discover new features. Before that release, 90% of the feature requests the Office team received were for features that Office already supported. The functionality was there, but users couldn’t discover it.

Word 2007 ribbon

According to this report, the ribbon has been remarkably successful.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

Power users often dislike the ribbon, but most of the estimated half billion people who use Microsoft Office are not power users.

(By the way, you can collapse the ribbon with Control-F1. The ribbon will reappear when you click on a menu. On a small screen, say on a netbook, this could greatly increase your vertical real estate.)

In some ways Emacs may be the exact opposite of Microsoft Word. It has an enormous number of features, and yet it doesn’t feel cluttered. The downside is that discoverability in Emacs is pretty bad. The best way to discover Emacs features is to read the documentation. There are ways to discover features while using Emacs, but you have to be fairly deep into Emacs before you learn how to learn more.

Can you increase discoverability without adding clutter? Maybe if your design is not very good to begin with. But after some refinement it seems inevitable that you’ll have to decide whether you’re willing to increase clutter in order to increase discoverability.

One suggested compromise is to have interfaces adapt over time. Applications could start out with voluminous menus when discoverability is most important, then hide uncommonly used options over time, reducing clutter as users gain experience. Microsoft tried that approach in Office 2003 without much success. It sounded like a good idea, but changing menus scared novice users and annoyed advanced users.

A variation on this approach is to make controls visible based on context rather than based on frequency of use. People find this easier to understand.

The trade-off between discoverability and clutter may be a question of where you want your clutter, in the UI or in external documentation. I suppose I’d prefer the clutter in the UI for software I use rarely and in the documentation for software I use the most.

Related posts:

Is helpful software really helpful?
Good user interface design: EpiPen
Bad user interface design: hotel showers

Read More

Did the MS Office ribbon work?

One of the major design goals for Microsoft Office 2007 was making features easier to discover. A study had shown that about 90% of the feature requests for Microsoft Office were for features already in the product. People just didn’t know what was already there.

A major part of Microsoft’s response was the “ribbon” interface. More controls are on display rather than being hidden behind a deep hierarchy of menus. According to Katherine Murray, the user interface changes achieved their goal.

Data is showing that the redesign of Office really did reach this goal — Word 2007 and Excel 2007 users are now using four times as many features as they used in previous versions, and for PowerPoint, the increase in feature use is a factor of five.

The quote above was taken from First Look: Microsoft Office 2010. I’d like to see more details, but the book is a sales brochure and not a statistical report. Still, if you take these figures at face value, it seems the ribbon and other user interface changes were very successful.

Many pundits hate the ribbon. But most of the 500 million people who use Microsoft Office are not pundits.

Read More

Office 2007 documents are zipped XML

Microsoft Office 2007 documents are zipped XML files. For example, you can change a Word document’s extension from .docx to .zip and unzip it. Apparently this isn’t widely known; most people I talk to are surprised when I mention this.

I’ve found a couple uses for the zip/XML format. One is that you can unzip a document and grab all the embedded content. For example, .jpeg images are simply files that are zipped up into the Office document.

Another use is that you can crack open a document’s underlying XML to search for something you can’t find via the user interface. You can unzip Office documents, tweak them, and zip them back up. I don’t  recommend this, but I’ve done it when I was desperate. (Microsoft publishes an API for manipulating Office files. Using the official APIs is safer and in the long run easier, but I haven’t looked into it.)


Related posts
:

I owe Microsoft Word an apology
Contrasting Microsoft Word and LaTeX

Read More

I owe Microsoft Word an apology

I tried to use the Equation Editor in Microsoft Word years ago and hated it. It was hard to use and produced ugly output. I tried it again recently and was pleasantly surprised. I’m using Word 2007. I don’t remember what version I’d tried before.

I’ve long said that math written in Word is ugly, and it usually is. But the fault lies with users, like myself, not with Word. I realize now that the problem is that most people writing math in Word are not using the Equation Editor. LaTeX produces ugly math too when people do not use it correctly, though this happens less often.

Math typography is subtle. For example, mathematical symbols are set in an italic font that is not quite the same as the italic font used in prose. Also, word-like symbols such as “log” or “cos” are not set in italics. I imagine most people do not consciously notice these conventions — I never noticed until I learned to use LaTeX — but subconsciously notice when the conventions are violated. The conventions of math typography give clues that help readers distinguish, for example, the English indefinite article “a” from a variable named “a” and to distinguish the symbol for maximum from the product of variables “m”, “a”, and “x.”

Microsoft’s Equation Editor typesets math correctly. Word documents usually do not, but only because folks usually do not use the Equation Editor. In the following example, I set the same equation three times: using ordinary text, using ordinary italic for the “x”, and finally using the Equation Editor.

screen shot of trig identity using MS Word

Note that the “x” in the third version is not the same as the italic “x” in the second version. The prose in this example is set in Calibri font and the Equation Editor uses Cambria Math font. Also, I did not tell Word to format “sin” and “cos” one way and “x” another or tell it what font to use; I simply typed sin^2 x + cos^2 x = 1 into the Equation Editor and it formatted the result as above. I haven’t used it much, but the Equation Editor seems to be more capable and easier to use than I thought.

Here are a few more examples of Equation Editor output.

examples of math using Word: Gaussian integral, Fourier series, quadratic equation

I still prefer using LaTeX for documents containing math symbols. I’ve used LaTeX for many years and I can typeset equations very quickly using it. But I’m glad to know that Word can typeset equations well and that the process is easier than I thought.

I tried out the Equation Editor because Bob Matthews suggested I try MathType, a third-party equation editor add-on for Microsoft Word. I haven’t tried MathType yet but from what I hear it produces even better output.

Related post: Contrasting Microsoft Word and LaTeX

Read More

Three ways to convert documents to PDF

Microsoft has an Office 2007 plug-in that lets you save documents as PDF files. This works for all Microsoft Office applications, not just Microsoft Word. The only drawback is that this only works for Office 2007, not earlier versions of Office, and does not work with other document types.

Adobe Acrobat (not the free Adobe Reader) installs a printer driver that lets you convert any document to a PDF by “printing” it to their software. The advantage is that this works for any document type. However, if you’re starting with a Word 2007 document, Microsoft’s plug-in is much faster, maybe 10x faster.

If you don’t want to buy Adobe Acrobat, you could use PDF995. Like Adobe Acrobat, this installs a printer driver; you convert documents to PDF by choosing this software as your “printer.” PDF995 comes in two versions: a free version supported by advertising, and an advertising-free version for $9.95.

I would rank these methods in the order presented above. I’ve had the best experience with the Microsoft plug-in. The Acrobat printer driver is slow, but usually does a good job. The PDF995 printer driver works OK most of the time, but I had a few issues with it. It’s been a long time since I used it, but I think the problems had to do with unwanted footers and sometimes fonts in the PDF not matching the original fonts. I’m not sure now, but I think I’ve also had problems with the Acrobat printer driver.

If you want to make a PDF from a LaTeX document, use the pdflatex program that ships with LaTeX. I’ve never had any problems with it.

Update: See this post for notes on PDFCreator and pdktk.

Read More

Four ways to convert Excel tables to LaTeX

Gregor Gorjanc has a post on Excel and LaTeX that lists four ways to convert and Excel table into LaTeX. I’ve used two of the methods he lists: brute force and excel2latex. I recommend excel2latex. I used it frequently until I upgraded to Office 2007 and the plug-in quit working. The only bug I remember with it was that sometimes it would give you a warning saying it didn’t work, but it did; the LaTeX code you wanted was waiting for you on the Windows clipboard.

I plan to try out Gregor’s other two suggestions. Creating tables in Excel is far easier than doing so in LaTeX and I miss the functionality that excel2latex provided. Maybe there’s a way to use excel2latex with Excel 2007. If you know of a way, please leave a comment.

Read More

Three ways to enter Unicode characters in Windows

Here are three approaches to entering Unicode characters in Windows. See the next post for entering Unicode characters in Linux.

(1) In Microsoft Word you can insert Unicode characters by typing the hex value of the character then typing Alt-x. You can also see the Unicode value of a character by placing the cursor immediately after the character and pressing Alt-x. This also works in applications that use the Windows rich edit control such as WordPad and Outlook.

Pros: Nothing to install or configure. You can see the numeric value before you turn it into a symbol. It’s handy to be able to go the opposite direction, looking up Unicode values for characters.

Cons: Does not work with many applications.

(2) Another approach which works with more applications is as follows. First create a registry key under HKEY_CURRENT_USER of type REG_SZ called EnableHexNumpad, set its value to 1, and reboot. Then you can enter Unicode symbols by holding down the Alt key and typing the plus sign on the numeric keypad followed by the character value. When you release the Alt key, the symbol will appear. This approach worked with most applications I tried, including Firefox and Safari, but did not with Internet Explorer.

Pros: Works with many applications. No software to install.

Cons: Requires a registry edit and a reboot. It’s awkward to hold down the Alt key while typing several other keys. You cannot see the numbers you’re typing. Doesn’t work with every application.

(3) Another option is to install the UnicodeInput utility. This worked with every application I tried, including Internet Explorer. Once installed, the window below pops up whenever you hold down the Alt key and type the plus sign on the numeric keypad. Type the numeric value of the character in the box, click the Send button, and the character will be inserted into the window that had focus when you clicked Alt-plus.

UnicodeInput screenshot

Pros: Works everywhere (as far as I’ve tried). The software is free. Easy to use.

Cons: Requires installing software.

Related links:

Entering Unicode characters in Linux
Unicode resources
Greek letters in HTML, XML, TeX, and Unicode

Read More

LaTeX and PowerPoint presentations

I use LaTeX for math documents and PowerPoint for presentations. When I need to make a math presentation, I can’t have everything I want in one environment. I usually go with PowerPoint.

Yesterday I tried the LaTeX Beamer package based on a friend’s recommendation. I believe I’ll switch to using this package as my default for math presentations. Here are my notes on my experience with Beamer.

Installation

Beamer is available from SourceForge. The installation instructions begin by saying “Put all files somewhere where TeX can find them.” This made me think Beamer would be another undocumented software package, but just a few words later the instructions point to a 224-page PDF manual with plenty of detail. However, I would recommend a couple minor corrections to the documentation.

  1. The manual says that if you want to install Beamer under MiKTeX, use the update wizard. But the update wizard will only update packages already installed. To install new packages with MiKTeX, use the Package Manager. (Command line mpm.exe or GUI mpm_mfc.exe.)
  2. The manual says to install latex-beamer, pgf, and xcolor. The Package Manager shows no latex-beamer package, but does show a beamer package.

The installation went smoothly overall. However, the MiKTeX Package Manager doesn’t let you know when packages have finished installing. You just have to assume when it quits giving new messages that it must be finished. At least that was my experience using the graphical version.

Using Beamer

I found Bruce Byfield’s introduction to Beamer helpful. The Beamer package is simple to use and well documented.

It’s nice to use real math typography rather than using PowerPoint hacks or pasting in LaTeX output as images. I also like animating bullet points simply by adding pause to the end of an enumerated item.

Inserting images

The biggest advantage that PowerPoint has over LaTeX is working with images. With PowerPoint you can:

  1. Paste images directly into your presentations.
  2. Edit files in place.
  3. Carry around your entire presentation as a single file.
  4. Include multiple image formats in a consistent way.

The last point may not seem like much until you’ve tried to figure out how to include images in LaTeX.

Read More

Accented letters in HTML, TeX, and MS Word

I frequently need to look up how to add diacritical marks to letters in HTML, TeX, and Microsoft Word, though not quite frequently enough to commit the information to my long-term memory. So today I wrote up a set of notes on adding accents for future reference. Here’s a chart summarizing the notes.

Accent HTML TeX Word
grave grave \` CTRL + `
acute acute \' CTRL + '
circumflex circ \^ CTRL + ^
tilde tidle \~ CTRL + SHIFT + ~
umlaut uml \" CTRL + SHIFT + :
cedilla cedil \c CTRL + ,
æ, Æ æ, Æ \ae, \AE CTRL + SHIFT + & + a or A
ø, Ø ø, Ø \o, \O CTRL + / + o or O
å, Å å, Å \aa, \AA CTRL + SHIFT + @ + a or A

The notes go into more details about how accents function in each environment and what limitations each has. For example, LaTeX will let you combine any accent with any letter, but MS Word and HTML only support letter/accent combinations that are common in spoken languages.

Read More

Contrasting Microsoft Word and LaTeX

Here’s an interesting graph from Marko Pinteric comparing Microsoft Word and Donald Knuth’s LaTeX.

comparing Word and Latex. Image by Marko Pinteric.

According to the graph, LaTeX becomes easier to use relative to Microsoft Word as the task becomes more complex. That matches my experience, though I’d add a few footnotes.

  1. Most people spend most of their time working with documents of complexity to the left of the cross over.
  2. Your first LaTeX document will take much longer to write than your first Word document.
  3. Word is much easier to use if you need to paste in figures.
  4. LaTeX documents look better, especially if they contain mathematics.

See Charles Petzold’s notes about the lengths he went to in order to produce is upcoming book in Word. I imagine someone of less talent and persistence than Petzold could not have pulled it off using Word, though they would have stood a better chance using LaTeX.

Before the 2007 version, Word documents were stored in an opaque binary format. This made it harder to compare two documents. A version control system, for example, could not diff two Word documents the same way it could diff two text files. It also made Word documents difficult to troubleshoot since you had no way to look beneath the WYSIWYG surface.

However, a Word 2007 document is a zip file containing a directory of XML files and embedded resources. You can change the extension of any Office 2007 file to .zip and unzip it, inspect and possibly change the contents, the re-zip it. This opens up many new possibilities.

I’ve written some notes that may be useful for people wanting to try out LaTeX on Windows.

Read More