Deleting reproducible files in Emacs dired

Imagine you could list the contents of a directory from a command line, and then edit the text output to make things happen. That’s sorta how Emacs dired works. It’s kind of a cross between a bash shell and the Windows File Explorer. Why would you ever want to use such a bizarre hybrid?

One reason is to avoid context switching. If you’re editing a file, you can pop over to a new buffer that is your file manager, do what you need to do, then pop back, all without ever leaving Emacs.

Another reason is that, as with everything else in Emacs, it’s all text. Everything in Emacs is just text, and so the same editing commands can be used everywhere. (More on that here.)

Even though I use Emacs daily, and even though I can make a case for why dired is great, I don’t use it that much. Or rather, I don’t use that much of what it can do. The good that I would, I do not.

I was reviewing dired‘s features and discovered something very handy: typing %& will mark files for deletion that can easily be created again. In particular, it will flag the byproducts of compiling a LaTeX file.

For example, the following is a screenshot of a dired buffer.

When I type %& it highlights the LaTeX temp files in red and marks them for deletion. (The D’s in the left column indicate files to be deleted.)

This doesn’t delete the files, but it marks them for deletion. If I then type x the files will be deleted.

In addition to the unneeded LaTeX files, it also highlighted a .bak file. However, it did not highlight the .o object file. I suppose the thought was that most people would manage C programs from a make file. I’m sure the class of files to mark is configurable, like everything else in Emacs.

Opening Windows files from bash and eshell

I often work in a sort of amphibious environment, using Unix software on Windows. As you can well imagine, this causes headaches. But I’ve found such headaches are generally more manageable than the headaches from alternatives I’ve tried.

On the Windows command line, you can type the name of a file and Windows will open the file with the default application associated with its file extension. For example, typing foo.docx and pressing Enter will open the file by that name using Microsoft Word, assuming that is your default application for .docx files.

Unix shells don’t work that way. The first thing you type at the command prompt must be a command, and foo.docx is not a command. The Windows command line generally works this way too, but it makes an exception for files with recognized extensions; the command is inferred from the extension and the file name is an argument to that command.

WSL bash

When you’re running bash on Windows, via WSL (Windows Subsystem for Linux), you can run the Windows utility start which will open a file according to its extension. For example,

    cmd.exe /C start foo.pdf

will open the file foo.pdf with your default PDF viewer.

You can also use start to launch applications without opening a particular file. For example, you could launch Word from bash with

    cmd.exe /C start winword.exe

Emacs eshell

Eshell is a shell written in Emacs Lisp. If you’re running Windows and you do not have access to WSL but you do have Emacs, you can run eshell inside Emacs for a Unix-like environment.

If you try running

    start foo.pdf

that will probably not work because eshell does not use the windows PATH environment.

I got around this by creating a Windows batch file named mystart.bat and put it in my path. The batch file simply calls start with its argument:

    start %

Now I can open foo.pdf from eshell with

    mystart foo.pdf

The solution above for bash

    cmd.exe /C start foo.pdf

also works from eshell.

(I just realized I said two contradictory things: that eshell does not use your path, and that it found a bash file in my path. I don’t know why the latter works. I keep my batch files in c:/bin, which is a Unix-like location, and maybe eshell looks there, not because it’s in my Windows path, but because it’s in what it would expect to be my path based on Unix conventions. I’ve searched the eshell documentation, and I don’t see how to tell what it uses for a path.)

Org entities

This morning I found out that Emacs org-mode has its own markdown entities, analogous to HTML entities or LaTeX commands. Often they’re identical to LaTeX commands. For example, \approx is the approximation symbol ≈, exactly as in LaTeX.

So what’s the advantage of org-entities? In fact, how does Emacs even know whether \approx is a LaTeX command or an org entity?

If you use the command C-c C-x \ , Emacs will show you the compiled version of the entity, i.e. ≈ rather than the command \approx. This is global: all entities are displayed. The org entities would be converted to symbols if you export the file to HTML or LaTeX, but this gives you a way to see the symbols before exporting.

Here something that’s possibly surprising, possibly useful. The symbol you see is for display only. If you copy and paste it to another program, you’ll see the entity text, not the symbol. And if you C-c C-x \ again, you’ll see the command again, not the symbol; Note that the full name of the command is org-toggle-pretty-entities with “toggle” the middle.

If you use set-input-method to enter symbols using LaTeX commands or HTML entities as I described here, Emacs inserts a Unicode character and is irreversible. Once you type the LaTeX command \approx or the corresponding HTML entity &asymp;, any knowledge of how that character was entered is lost. So org entities are useful when you want to see Unicode characters but want your source file to remain strictly ASCII.

Incidentally, there are org entities for Hebrew letters, but only the first four, presumably because these are the only ones used as mathematical symbols.

To see a list of org entities, use the command org-entities-help. Even if you never use org entities, the org entity documentation makes a nice reference for LaTeX commands and HTML entities. Here’s a screenshot of the first few lines.

Entering symbols in Emacs

Emacs has a relatively convenient way to add accents to letters or to insert a Unicode character if you know the code point for the value. See these notes.

But usually you don’t know the Unicode values of symbols. Then what do you do?

TeX commands

You enter symbols by typing their corresponding TeX commands by using

    M-x set-input-method RET tex

After doing that, you could, for example, enter π by typing \pi.

You’ll see the backslash as you type the command, but once you finish you’ll see the symbol instead [1].

HTML entities

You may know the HTML entity for a symbol and want to use that to enter characters in Emacs. Unfortunately, the following does NOT work.

    M-x set-input-method RET html

However, there is a slight variation on this that DOES work:

    M-x set-input-method RET sgml

Once you’ve set your input method to sgml, you could, for example, type &radic; to insert a √ symbol.

Why SGML rather than HTML?

HTML was created by simplifying SGML (Standard Generalized Markup Language). Emacs is older than HTML, and so maybe Emacs supported SGML before HTML was written.

There may be some useful SGML entities that are not in HTML, though I don’t know. I imagine these days hardly anyone knows anything about SGML beyond the subset that lives on in HTML and XML.

Changing input modes

If you want to move between your default input mode and TeX mode, you can use the command toggle-input-method. This is usually mapped to C-u C-\.

You can see a list of all available input methods with list-input-methods. Most of these are spoken languages, such as Arabic or Welsh, rather than technical input modes like TeX and SGML.

More Emacs posts

[1] I suppose there could be a problem if one command were a prefix of another. That is, if there were symbols \foo and \foobar and you intended to insert the latter, Emacs might think you’re done after you’ve typed the former. But I can’t think of a case where that would happen. TeX commands are nearly prefix codes. There are TeX commands like \tan and \tanh, but these don’t represent symbols per se. Emacs doesn’t need any help to insert the letters “tan” or “tanh” into a file.

Control characters

I didn’t realize until recently that there’s a connection between the control key on a computer keyboard and controlling a mechanical device. Both uses of the word control are related via ASCII control characters as I discovered by reading the blog post Four Column ASCII.

Computers work with bits in groups of eight, and there are a lot more possible eight-bit combinations than there are letters in the Roman alphabet, so some of the values were reserved for printer control codes. This is most obvious when you arrange the table of ASCII values in four columns, hence the title of the post above.

Most of the codes for controlling printers are obsolete, but historical vestiges remain. When you hold down the control key and type a letter, it may produce a corresponding control character which differs from the letter by flipping its second bit from 1 to 0, though often the control keys have been put to other uses.

Control-H

The letter H has ASCII code 0100 1000 and the back space control character has ASCII code 0000 1000. In some software, such as the bash shell and the Windows command line cmd, holding down the control key and typing H has the same effect as using the backspace key [1].

Other software uses Control-H for its own purposes. For example, Windows software often uses it to bring up a find-and-replace dialog, and Emacs uses it as the prefix to a help command.

Control-I

In ASCII the letter I is 0100 1001 and the tab character is 0000 1001. In some software you can produce a tab character with Control-I. This works in Emacs and in Notepad, for example. It doesn’t work in WYSIWYG programs like Word where Control-I usually formats text in italic.

Control-J and Control-M

The letter J has ASCII code 0100 1010 and the line feed control character has ASCII code 0000 1010. In some software typing Control-J inserts a line feed, and in other software it does something analogous.

Unix uses a line feed character to denote the start of a new line, but DOS used a carriage return and a line feed. If you type Control-J in Windows Notepad, you’ll get a new line, but it will be saved as a carriage return and a line feed.

In Emacs, the behavior of Control-J depends on the mode. In text mode, it simply inserts a newline. In TeX mode, Control-J ends a paragraph, but it also checks the preceding paragraph for unbalanced delimiters. If you have something like an open brace with no corresponding close brace, you’ll see a warning “Paragraph being closed appears to contain a mismatch.”

The carriage return character has ASCII code 0000 1101, and M has ASCII code 0100 1101. That why if a file was create on Windows and you open it in Unix, you may see ^M throughout the file.

Control-[

Some control characters correspond to characters other than letters. If you flip the second bit of the ASCII code for [ you get the control character for escape. And in some software, such as vi or Emacs, Control-[ has the same effect as the escape key.

More ASCII posts

[1] Control keys are often written with capital letters, like Control-H. This can be misleading if you think this means you have to also hold down the shift key as if you were typing a capital H. Control-h would be better notation. But the ASCII codes for control characters correspond to capital letters, so I use capital letters here.

Journalistic stunt with Emacs

Emacs has been called a text editor with ambitions of being an operating system, and some people semi-seriously refer to it as their operating system. Emacs does not want to be an operating system per se, but it is certainly ambitious. It can be a shell, a web browser, an email client, a calculator, a Lisp interpreter, etc. It’s possible to work all day and never leave Emacs. It would be an interesting experiment to do just that.

Journalist experiment

Journalists occasionally impose some restriction on themselves and write about the experience. For example, Kashmir Hill did an experiment earlier this year, blocking the Big Five tech companies—Amazon, Facebook, Google, Microsoft, and Apple—for a week each, then finally all in the same week, and wrote a series about her experience. It would be interesting for someone to work only from Emacs for a week and write about it.

Living exclusively inside Emacs would be hard. Emacs applications require effort to discover and learn how to use, and different people find different applications worth learning. Someone doing everything in Emacs for the sake of a story would have to use some features they would not otherwise find worthwhile.

Why stay inside Emacs?

The point of using a calculator, for example, inside Emacs is that it lets you stay in your primary work environment. You don’t have to open a new application to do a quick calculation. Also, since everything is text-based, everything can be navigated and edited the same way.

You may have run into a situation using Windows where some text can be copied, such as text inside an edit box, but other text cannot, such as text displayed on a dialog box. That doesn’t happen in Emacs since everything is editable text. Consistency and interoperability sometimes make it worthwhile to do things inside Emacs that could be done more easily in another application.

Finally, everything in Emacs is programmable. Something that is awkward to use manually might still be valuable since it can be automated.

Examples from recent posts

My previous post was about various ways to compute hash functions. I could have added Emacs to the list. Here’s how you could compute the SHA256 hash of “hello world” using Emacs Lisp:

    (secure-hash 'sha256 "hello world")

You could, for example, type the code above in the middle of a document and type Control-x Control-e to evaluate it as a Lisp expression.

I also wrote about software to factor integers recently, and you could do this in Emacs as well. You could pull up the Emacs calculator and type prfac(161393) for example and it would return a list of the prime factors: [251, 643].

Neither of these functions is best of breed. The secure-hash function only supports the most popular hash functions, unlike openssl. And prfac will work fail on large inputs, unlike PARI/GP. Emacs is ambitions, but not that ambitious. It doesn’t aim to replace specialized software, but to provide a convenient way to carry out common tasks.

Emacs features that use regular expressions

The syntax of regular expressions in Emacs is a little disappointing, but the ways you can use regular expressions in Emacs is impressive.

I’ve written before about the syntax of Emacs regular expressions. It’s a pretty conservative subset of the features you may be used to from other environments as summarized in the diagram below.

But there are many, many was to use regular expressions in Emacs. I did a quick search and found that about 15% of the pages in the massive Emacs manual contain at least one reference to regular expressions. Exhaustively listing the uses of regular expressions would not be practical or very interesting. Instead, I’ll highlight a few uses that I find helpful.

Searching and replacing

One of the most frequently used features in Emacs is incremental search. You can search forward or backward for a string, searching as you type, with the commands C-s (isearch-forward) and C-r (isearch-backward). The regular expression counterparts of these commands are C-M-s (isearch-forward-regexp) and C-M-r (isearch-backward-regexp).

Note that the regular expression commands add the Alt (meta) key to their string counterparts. Also, note that Emacs consistently refers to regular expressions as regexp and never, as far as I know, as regex. (Emacs relies heavily on conventions like this to keep the code base manageable.)

A common task in any editor is to search and replace text. In Emacs you can replace all occurrences of a regular expression with replace-regexp or interactively choose which instances to replace with query-replace-regexp.

Purging lines

You can delete all lines in a file that contain a given regular expression with flush-lines. You can also invert this command, specifying which lines not to delete with keep-lines.

Aligning code

One lesser-known but handy feature is align-regexp. This command will insert white space as needed so that all instances of a regular expression in a region align vertically. For example, if you have a sequence of assignment statements in a programming language you could have all the equal signs line up by using align-regexp with the regular expression consisting simply of an equal sign. Of course you could also align based on a much more complex pattern.

Although I imagine this feature is primarily used when editing source code, I imagine you could use it in other context such as aligning poetry or ASCII art diagrams.

Directory editing

The Emacs directory editor dired is something like the Windows File Explorer or the OSX Finder, but text-based. dired has many features that use regular expressions. Here are a few of the more common ones.

You can mark files based on the file names with % m (dired-mark-files-regexp) or based on the contents of the files with % g (dired-mark-files-containing-regexp). You can also mark files for deletion with % d (dired-flag-files-regexp).

Inside dired you can search across a specified set of files by typing A (dired-do-find-regexp), and you can interactively search and replace across a set of files by typing Q (dired-do-find-regexp-and-replace).

Miscellaneous

The help apropos command (C-h a) can take a string or a regular expression.

The command to search for available fonts (list-faces-display) can take a string or regular expression.

Interactive highlighting commands (highlight-regexp, unhighlight-regexp, highlight-lines-matching-regexp) take a regular expression argument.

You can use a regular expression to specify which buffers to close with kill-matching-buffers.

Maybe the largest class of uses for regular expressions in Emacs is configuration. Many customizations in Emacs, such as giving Emacs hints to determine the right editing mode for a file or how to recognize comments in different languages, use regular expressions as arguments.

Resources

You can find more posts on regular expressions and on Emacs by going to my technical notes page. Note that the outline at the top has links for regular expressions
and for Emacs.

For daily tips on regular expressions or Unix-native tools like Emacs, follow @RegexTip and @UnixToolTip on Twitter.

Selecting things in Emacs

You can select blocks of text in Emacs just as you would in most other environments. You could, for example, drag your mouse over a region. You could also hold down the Shift key and use arrow keys. But Emacs also has a number of commands that let you work in larger semantic units. That is, instead of working with an undifferentiated set of characters, you can select meaningful chunks of text, the meaning depending on context.

When you’re editing English prose, the semantic units you are concerned with might be words, sentences, or paragraphs. When you’re editing programming language source code, you care about functions or various “balanced expressions” such as the content between two parentheses or two curly brackets.

The following table gives some of the selection commands built into Emacs.

Unit Command Key binding
word mark-word M-@
paragraph mark-paragraph M-h
page mark-page C-x C-p
buffer mark-whole-buffer  C-x h
function mark-defun C-M-h
balanced expression mark-sexp C-M-@

The expand-region package offers an alternative to several of these commands. More on that later.

The command for selecting a word does just what you expect. Likewise, the commands for selecting a page or a buffer require little explanation. But the meaning of a “paragraph” depends on context (i.e. editing mode), as do the meanings of “function” and “balanced expression.”

When editing source code, a “paragraph” is typically a block of code without blank lines. However, each language implements its own editing mode and could interpret editing units differently. Function definition syntax varies across languages, so mark-defun has to be implemented differently in each language mode.

Balanced expressions could have a different meanings in different contexts, but they’re fairly consistent. Content between matching delimiters—quotation marks, parentheses, square brackets, curly braces, etc.—is generally considered a balanced expression.

Here’s where expand-region comes in. It’s typically bound to C-=. It can be used as a substitute for mark-word and mark-sexp. And if you use it repeatedly, it can replace mark-defun.

Each time you call expand-region it takes in more context. For example, suppose you’re in text mode with your cursor is in the middle of a word. The first call to expand-region selects to the end of the word. The second call selects the whole word, i.e. expanding backward to the beginning. The next call selects the enclosing sentence and the next call the enclosing paragraph.

The expand-region function works analogously when editing source code. Suppose you’re editing the bit of Emacs Lisp below and have your cursor on the slash between eshell and pwd.

(setq eshell-prompt-function
(lambda nil
(concat
(eshell/pwd)
" $")))  Here’s what sequential invocations of expand-region will select. 1. /pwd 2. /pwd/) 3. (eshell/pwd) 4. (eshell/pwd) "$ ")
5. (concat (eshell/pwd) " $") 6. (concat (eshell/pwd) "$ "))
7. (lambda nil (concat (eshell/pwd) " $")) 8. (lambda nil (concat (eshell/pwd) "$ ")))
9. (setq eshell-prompt-function (lambda nil (concat (eshell/pwd) " \$ ")))

This is kinda tedious in this particular context because there are a lot of delimiters in a small region. In less dense code you’ll select larger blocks of code with each invocation of expand-region. Since each invocation requires only a single key (i.e. hold down Control and repeatedly type =) it’s easy to call expand-region over and over until you select the region you’d like.

Setting up Emacs shell on a Mac

Here are a few things I’ve had to figure out in the process of setting up Emacs on a Mac, in particular with getting shell-mode to work as I’d like. Maybe this will save someone else some time if they want to do the same.

I’ve used a Mac occasionally since the days of the beige toasters, but I never owned one until recently. I’ve said for years that I’d buy a Mac as soon as I have a justification, and I recently started a project that needs a Mac.

I’d heard that Emacs was hard to set up on Mac, but that has not been my experience. I’m running Emacs 25.1 on macOS 10.12.1. Maybe there were problems with earlier versions of Emacs or OS X that I skipped. Or maybe there are quirks I haven’t run into yet. So far my only difficulties have been related to running a shell inside Emacs.

Path differences

The first problem I ran into is that my path is not the same inside shell-mode as in a terminal window. A little searching showed a lot of discussion of this problem but no good solutions. My current solution is to run source .bash_profile from my bash shell inside Emacs to manually force it to read the configuration file. There’s probably a way to avoid this, and if you know how please tell me, but this works OK for now.

Manually sourcing the .bash_profile file works for bash but doesn’t work for Eshell. I doubt I’ll have much use for Eshell, however. It’s more useful on Windows when you want a Unix-like shell inside Emacs.

Update: Dan Schmidt pointed out in the comments that Emacs reads .bashrc rather than .bash_profile. It seems that Mac doesn’t read .bashrc at all, at least not if it can find a .bash_profile file. I created a .bashrc file that sources .bash_profile and that fixed my problem, though it did not fix the problem with Eshell or the path problem below.

Scrolling command history

The second problem I had was that Control-up arrow does not scroll through shell history because that key combination has special meaning to the operating system, bringing up Mission Control. Quite a surprise when you expect to scroll through previous commands but instead your entire screen changes.

I got around this by putting the following code in my Emacs config file and using Alt-up and Alt-down instead of Control-up and Control-down to scroll shell history. (I’m using my beloved Microsoft Natural keyboard, so I have an Alt key.)

(add-hook 'shell-mode-hook
(lambda ()
(define-key shell-mode-map (kbd "<M-up>") 'comint-previous-input)
(define-key shell-mode-map (kbd "<M-down>") 'comint-next-input)
)
)


Another path problem

The last problem I had was running the Clojure REPL inside Emacs. When I ran lein repl from bash inside Emacs I got an error saying command not found. Apparently running source .bash_profile didn’t give me entirely the same path in Emacs as in a terminal. I was able to fix the following to my Emacs config file.

(add-to-list 'exec-path "/usr/local/bin")

This works, though there are a couple things I don’t understand. First, I don’t understand why /usr/local/bin was missing from my path inside Emacs. Second, I don’t understand why adding the path customizations from my .bash_profile to exec-path doesn’t work. Until I need to understand this, I’m willing to let it remain a mystery.

Update: LaTeX path problem

After fixing the problems mentioned in the original post, I ran into another problem. Trying to run LaTeX on a file failed saying that pdflatex couldn’t be found. Adding the path to pdflatex to the exec-path didn’t work. But the following code from the TeX Stack Exchange did work:

(getenv "PATH")
(setenv "PATH" (concat "/Library/TeX/texbin" ":" (getenv "PATH")))


This is the path for El Capitan and Sierra. The path is different in earlier versions of the OS.

Portable Emacs config file

By the way, you can use one configuration file across operating systems by putting code like this in your file.

(cond
((string-equal system-type "windows-nt")
(progn
; Windows-specific configurations
...
)
)
((string-equal system-type "gnu/linux")
(progn
; Linux-specific configurations
...
)
)
((string-equal system-type "darwin")
(progn
; Mac-specific configurations
...
)
)
)


If you need machine-specific configuration for two machines running the same OS, you can test system-name rather than system-type.

Five lemma, ASCII art, and Unicode

A few days ago I wrote about creating ASCII art in Emacs using ditaa. Out of curiosity, I wanted to try making the Five Lemma diagram. [1]

The examples in the ditaa site all have arrows between boxes, but you don’t have to have boxes.

Here’s the ditaa source:

A₀ ---> A₁ ---> A₂ ---> A₃ ---> A₄
|       |       |       |       |
| f₀    | f₁    | f₂    | f₃    | f₄
|       |       |       |       |
v       v       v       v       v
B₀ ---> B₁ ---> B₂ ---> B₃ ---> B₄


and here’s the image it produces:

It’s not pretty. You could make a nicer image with LaTeX. But as the old saying goes, the remarkable thing about a dancing bear is not that it dances well but that it dances at all.

The trick to getting the subscripts is to use Unicode characters 0x208n for subscript n. As I noted at the bottom of this post, ditaa isn’t strictly limited to ASCII art. You can use Unicode characters as well. You may or may not be able to see the subscripts in the source code they are not part of the most widely supported set of characters.

* * *

[1]  The Five Lemma is a diagram-chasing result from homological algebra. It lets you infer properties the middle function f from properties of the other f‘s.