I had a little surprise when I tried to open an Excel file from Emacs. I was using
dired, a sort of file explorer inside Emacs. I expected one of two things to happen. Maybe Emacs would know to launch the file using Excel. Or maybe it would open the binary file as a bunch of gibberish. Instead I got something like this:
This was disorienting at first. Then I thought about how Office 2007 documents are zipped XML files. But how does
dired know that my
.xlsx file is a zip file? I suppose since Emacs is a Unix application at heart, it’s acting like a Unix application even though I was using it on Windows. It’s determining the file type by inspected the file itself and ignoring the file extension.
(Office 2007 documents are not entirely XML. The data and styling directives are XML, but embedded files are just files. The spreadsheet in this example contains a large photo. That photo is a JPEG file that gets zipped up with all the XML files that make up the spreadsheet.)
So I learned that Emacs knows how to navigate inside zip files, and that a convenient way to poke around inside an Office 2007 file is to browse into it from Emacs.
Here’s another post that discusses Emacs and Office 2007, contrasting their interface designs: Clutter-discoverability trade-off
8 thoughts on “A surprise with Emacs and Office 2007”
Emacs also deals swimmingly with tar archives, compressed or not, and even .deb files if you are using Debian/Ubuntu which provides a nice way to poke inside a package before installing it.
Having a text editor to open all types of files will teach you a lot about their contents. all office 2007 file formats are zipped files
I learned that when I accidentally opened them with Total Commander, which is probably the best Norton Commander clone on the market, by now far surpassing the original.
To investigate the contents of, say, a .docx file, simply change the extension to .zip and open it with WinZip, 7Zip or whatever.
Not a surprise (in fact is more surprise for me that you just discovered that), i think that now you will discover that OASIS format is doing basicly the same thing.
P.S. OASIS format was standard before the Micrsoft one for office 2007
I seem to recall it also working on jar files.
” it’s acting like a Unix application […] determining the file type by inspected the file itself and ignoring the file extension.”
I’ve never seen a Windows application that doesn’t do this. Windows itself provides you the shortcut of associating a program with each filetype, but any sane application is going to actually inspect the file it’s given so that it can handle it correctly (or not at all). As a silly example, MS Paint can read a bitmap or PNG or JPEG regardless of the extension is wrong.