Navigating a LaTeX file

I like generating long LaTeX documents from org-mode because, for one thing, org-mode has nice section folding. But not everyone I work with uses Emacs, so its better to work in LaTeX directly rather than have Emacs generate LaTeX.

AUCTeX has section folding for LaTeX documents, though so far I’ve only has limited success at getting it to work. However, RefTeX worked right out of the box.

If you enter reftex-mode ctrlc = then RefTeX will open a table of contents window.

RefTeX screen shot

Scrolling through the table of contents window scrolls through the body of the document. This isn’t exactly section folding, but it serves a similar purpose.

RefTeX ships with Emacs, so there’s probably no need to install it, but the mode is not enabled by default.

Good autocomplete

Woman using cellphone

I’m not sure whether automatic text completion on a mobile device is a net good. It sometimes saves a few taps, but it seems like it’s at least as likely to cause extra work.

Although I’m ambivalent about autocomplete on my phone, I like it in my text editor. The difference is that in my editor autocomplete is

  • configurable,
  • transparent,
  • intentional.

I value these characteristics, but I understand that not everyone does, especially on a mobile device.

Abbrevs

Emacs abbrevs (abbreviations) are completely configurable. The only abbrevs the editor knows about are the ones you ask for, and you can see (and edit) your set of abbrevs in an ordinary text file. You can easily toggle in and out of abbrev-mode, turning all abbrevs on or off.

You can also determine whether an abbrev applies to all modes or just some modes. For example, maybe you’d like imp to automatically expand to

    import matplotlib.pyplot as plt

when writing a Python file, but not when composing an email in which you might wish to call someone an imp.

Abbrevs are not saved by default, but Emacs will ask you when you exit if you’d like to save abbrevs created during your session. This encourages experimentation. You can try out an abbrev before deciding whether to keep it, or define an abbrev for a single task with no intention of keeping it.

Emacs users often use some naming convention so that all their abbrevs begin with a key sequence they’re unlikely to type unless they want it to be expanded. I use “jc” as a prefix, both because these are my initials and because I’m unlikely to type these two letters for any other purpose. (Here’s a grid of other rare two-letter combinations.)

For example, I have an abbrev jciff that expands to the HTML sequence

    ↔︎

for the symbol ↔︎ to denote iff (if and only if). The first character is the symbol itself, and the second character is a variation selector asking browsers to render the symbol as text and not as an emoji.

Dynamic abbrevs

An advantage of autocomplete on a mobile device is that you don’t have to go to the effort of programming completions that you might want. Emacs has something similar in what it calls dynamic abbrevs. These are completions based on the text in the file you’re editing.

For example, suppose you’re writing about politician Hery Rajaonarimampianina. After having typed his name once, the second time you could type a few letters of his name and then type Alt-/ to have Emacs fill in the rest. If you haven’t typed enough letters to uniquely determine his name, you can type Alt-/ repeatedly to cycle through the possibilities. Note that you have to initiate the text completion. Emacs will not insert a single character unless explicitly asked to do so.

What to abbreviate

Of course abbreviations are a matter of personal choice. Some people use text expander programs with hundreds of abbreviations. I reserve editor abbreviations for long blocks of text, such as boilerplate HTML, or text like the example above that is difficult to remember or error-prone to type. I don’t find it worthwhile to have lots of minor shortcuts because I won’t remember that I’ve created them. (See the previous post for a discussion of remembering what you’ve automated.)

If you abbreviate something because you find it hard to remember, you trade the task of remembering the thing itself with the task of remembering the abbreviation. I would be cautious about accepting this trade-off. It’s often better to simply memorize the thing that’s hard to remember, because then you can use it in more contexts.

In the example above I abbreviate two Unicode character code points. I abbreviate these rather than memorizing them because I don’t need to use them very often, but often enough that I’ve had to look them up a few times. I’ve memorized several Unicode code points because I need them often enough to justify memorizing them.

Related posts

Literate programming to reduce errors

I had some errors in a recent blog post that might have been eliminated if I had programmatically generated the content of the post rather than writing it by hand.

I rewrote the example in this post in using org-mode. My org file looked like this:

    #+begin_src python :session :exports none
    lax_lat =   33.94
    lax_lon = -118.41
    iah_lat =   29.98
    iah_lon =  -95.34
    #+end_src

    Suppose we want to find how far a plane would travel 
    between Los Angeles (LAX) and Houston (IAH), 
    ...

    #+begin_src python :session :exports none
    a = round(90 - lax_lat, 2)
    b = round(90 - iah_lat, 2)
    #+end_src

    /a/ = src_python[:session :results raw]{f"90° – {lax_lat}° = {a}°"}

    and

    /b/ = src_python[:session :results raw]{f"90° – {iah_lat}° = {b}°"}

    ...

Here are some observations about the experience.

First of all, writing the post in org-mode was more work than writing it directly, pasting in computed values by hand, but presumably less error-prone. It would also be easier to update. If, for example, I realized that I had the wrong coordinates for one of the airports, I could update the coordinates in one location and everything else would be updated when I regenerated the page.

I don’t think this was the best application of org-mode. It’s easier to use org-mode like a notebook, in which case you’re not trying to hide the fact that you’re mixing code and prose. I wanted to insert computed values into the text without calling attention to the fact that the values were computed. This is fine when you mostly have a text document and you only want to insert a few computed values. When you’re doing more computing it becomes tedious to repeatedly write

    src_python[:session :results raw]{...}

to insert values. It might have been easier in this case to simply write a Python program that printed out the HTML source of the example.

There are a couple advantages to org-mode that weren’t relevant here. One is that the same org-mode file can be exported to multiple formats: HTML, LaTeX, ODT, etc. Here, however, I was only interested in exporting to HTML.

Another advantage of org-mode is the ability to mix multiple programming languages. Here I was using Python for everything, but org-mode will let you mix dozens of languages. You could compute one result in R, another result in Haskell, pass both results as arguments into some Java code, etc. You could also include data tables and diagrams in your org-mode file with your prose and code.

Literate programming

In general, keeping code and documentation together reduces errors. Literate programming may be more or less work, depending on the problem, but it reduces certain kinds of errors.

The example above is sorta bargain-basement literate programming. The code being developed was very simple, and not of interest beyond the article it was used in. Literate programming really shines when used to develop complex code, as in the book Physically Based Rendering. (Update: The third edition of this book is available online.)

When code requires a lot of explanation, literate programming can be very helpful. I did a project in psychoacoustics with literate programming a few years ago that would have been hard to do any other way. The project required a lot of reverse engineering and research. A line of code could easily require a couple paragraphs of explanation. Literate programming made the code development much easier because we could develop the documentation and code together, explaining things in the order most suited to the human reader, not to the compiler.

Keeping data and code together with org-mode

With org-mode you can keep data, code, and documentation in one file.

Suppose you have an org-mode file containing the following table.

    #+NAME: mydata
    | Drug | Patients |
    |------+----------|
    |    X |      232 |
    |    Y |      351 |
    |    Z |      117 |

Note that there cannot be a blank line between the NAME header and the beginning of the table.

You can bring this table into Python simply by declaring it to be a variable in the header of a Python code block.

    #+begin_src python :var tbl=mydata :results output
    print(tbl)
    #+end_src

When you evaluate this block, you see that the table is imported as a list of lists.

    [['X', 232], ['Y', 351], ['Z', 117]]

Note that the column headings were not imported into Python. Now suppose you would like to retain the headers, and use them as column names in a pandas data frame.

    #+begin_src python :var tbl=mydata :colnames no :results output
    import pandas as pd
    df = pd.DataFrame(tbl[1:], columns=tbl[0])
    print(df, "\n")
    print(df["Patients"].mean())
    #+end_src

When evaluated, this block produces the following.

      Drug  Patients 
    0    X       232
    1    Y       351
    2    Z       117

    233.33333333333334

Note that in order to import the column names, we told org-mode that there are no column names! We did this with the header option

    :colnames no

This seems backward, but it makes sense. It says do bring in the first row of the table, even though it appears to be a column header that isn’t imported by default. But then we tell pandas that we want to make a data frame out of all but the first row (i.e. tbl[1:]) and we want to use the first row (i.e. tbl[0]) as the column names.

A possible disadvantage to keeping data and code together is that the data could be large. But since org files are naturally in outline mode, you could collapse the part of the outline containing the data so that you don’t have to look at it unless you need to.

Related posts

Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

FX(x) = Prob(Xx)

and

GX(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

Related posts

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

Inline computed content in org-mode

The previous post discussed how to use org-mode as a notebook. You can have blocks of code and blocks of results, analogous to cells in a Jupyter notebook. The code and the results export as obvious blocks when you export the org file to another format, such as LaTeX or HTML. And that’s fine for a notebook.

Now suppose you want to do something more subtle. You want to splice in the result of a computed value without being obvious about it. Maybe you want to compute a value rather than directly enter it so that the document will remain consistent. Maybe you have a template and you want to set the parameters of the template at the top of the file.

Web development languages like PHP do this well. You can write a PHP file that is essentially an HTML file with pieces of code spliced in. You do this my inserting

    <?php … ?>

into the HTML code, and when the page is rendered the code between the <?php and ?> tags is replaced with the result of executing the code. We’d like to do something analogous in org-mode with org-babel. (org-babel is the subsystem of org-mode that interacts with code.)

Here’s an org-mode example that sets length and width as variables at the top of a file and multiplies them later in the body of the file to get area.

We define our variables as follows. The block is marked :exports none because we do not want to display the code or the values. We just want the code to run when we export the file.

    #+begin_src python :session :exports none
    length, width = 7, 13
    #+end_src

The following almost does what we want [1].

    Area equals src_python[:session]{length*width}.

This renders as

Area equals 91.

if we export our org file to HTML The number 91 is typeset differently than the words before it. This would be more obvious if the computed value were a string rather than a number.

Org-mode is wrapping <code> tags around the computed result. If we were to export the org file to LaTeX it would wrap the result with \texttt{}. This is because, by default, the output of a computation is displayed as computer output, which is conventionally set in a monospace font like Courier. That’s fine in a technical document when we want to make it obvious that a calculation is a calculation, but typically not in a business context. You wouldn’t want, for example, to generate a letter that starts

Dear Michael,

with Michael’s name set in Courier, announcing that this is a form letter.

The fix is to add :results raw to the header session, the part in square brackets between src_python and the Python code.

    Area equals src_python[:session :results raw]{length*width}.

Now the calculation result is reported “raw”, i.e. without any special markup surrounding it.

***

[1] In this example I’m using Python, and so I used the function src_python. org-babel supports dozens of languages, and each has its src_<language> counterpart.

Org-mode as a lightweight notebook

You can think of org-mode as simply a kind of markdown, a plain text file that can be exported to fancier formats such as HTML or PDF. It’s a lot more than that, but that’s a reasonable place to start.

Org-mode also integrates with source code. You can embed code in your file and have the code and/or the result of running the code appear when you export the file to another format.

Org-mode as notebook

You can use org-mode as a notebook, something like a Jupyter notebook, but much simpler. An org file is a plain text file, and you can execute embedded code right there in your editor. You don’t need a browser, and there’s no hidden state.

Here’s an example of mixing markup and code:

    The volume of an n-sphere of radius r is 

    $$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n.$$

    #+begin_src python :session
    from scipy import pi
    from scipy.special import gamma

    def vol(r, n):
        return pi**(n/2)*r**n/gamma(n/2 + 1)

    vol(1, 5)
    #+end_src

If you were to export the file to PDF, the equation for the volume of a sphere would be compiled into a image using LaTeX.

To run the code [1], put your cursor somewhere in the code block and type C-c C-c. When you do, the following lines will appear below your code.

    #+RESULTS:
    : 5.263789013914324

If you think of your org-mode file as primary, and you’re just inserting some code as a kind of scratch area, an advantage of org-mode is that you never leave your editor.

Jupyter notebooks

Now let’s compare that to a Jupyter notebook. Jupyter organizes everything by cells, and a cell can contain markup or code. So you could create a markup cell and enter the exact same introductory text [2].

    The volume of an n-sphere of radius r is 

    $$\frac{\pi^{\frac{n}{2}}}{\Gamma\left(\frac{n}{2} + 1\right)}r^n$$.

When you “run” the cell, the LaTeX is processed and you see the typeset expression rather than its LaTeX source. You can click on the cell to see the LaTeX code again.

Then you would enter the Python code in another cell. When you run the cell you see the result, much as in org-mode. And you could export your notebook to PDF as with org-mode.

File diff

Now suppose we make a couple small changes. We want the n and r in the comment section set in math italic, and we’d like to find the volume of a 5-sphere of radius 2 rather than radius 1. We do this, in Jupyter and in org-mode [3], by putting dollar signs around the “n” and the “r”, and we change vol(1, 5) to vol(2, 5).

Let’s run diff on the two versions of the org-mode file and on the two versions of the Jupyter notebook.

The differences in the org files are easy to spot:

    1c1
    < The volume of an n-sphere of radius r is 
    ---
    > The volume of an \(n\)-sphere of radius \(r\) is 
    12c12
    < vol(1, 5)
    ---
    > vol(2, 5)
    16c16,17
    < : 5.263789013914324
    ---
    > : 168.44124844525837

However, the differences in the Jupyter files are more complicated:

    5c5
    <    "id": "2a1b0bc4",
    ---
    >    "id": "a0a89fcf",
    8c8
    <     "The volume of an n-sphere of radius r is \n",
    ---
    >     "The volume of an $n$-sphere of radius $r$ is \n",
    15,16c15,16
    <    "execution_count": 1,
    <    "id": "888660a2",
    ---
    >    "execution_count": 2,
    >    "id": "1adcd8b1",
    22c22
    <        "5.263789013914324"
    ---
    >        "168.44124844525837"
    25c25
    <      "execution_count": 1,
    ---
    >      "execution_count": 2,
    37c37
    <     "vol(1, 5)"
    ---
    >     "vol(2, 5)"
    43c43
    <    "id": "f8d4d1b0",

There’s a lot of extra stuff in a Jupyter notebook. This is a trivial notebook, and more complex notebooks have more extra stuff. An apparently small change to the notebook can cause a large change in the underlying notebook file. This makes it difficult to track changes in a Jupyter notebook in a version control system.

Related posts

[1] Before this will work, you have to tell Emacs that Python is one of the languages you want to run inside org-mode. I have the following line in my init file to tell Emacs that I want to be able to run Python, DITAA, R, and Perl.

    (org-babel-do-load-languages 'org-babel-load-languages '((python . t) (ditaa . t) (R . t) (perl . t)))

[2] Org-mode will let you use \[ and \] to bracket LaTeX code for a displayed equation, and it will also let you use $$. Jupyter only supports the latter.

[3] In org-mode, putting dollar signs around variables sometimes works and sometimes doesn’t. And in this example, it works for the “r” but not for the “n”. This is very annoying, but it can be fixed by using \( and \) to enter and leave math mode rather than use a dollar sign for both.

Deleting reproducible files in Emacs dired

Imagine you could list the contents of a directory from a command line, and then edit the text output to make things happen. That’s sorta how Emacs dired works. It’s kind of a cross between a bash shell and the Windows File Explorer. Why would you ever want to use such a bizarre hybrid?

One reason is to avoid context switching. If you’re editing a file, you can pop over to a new buffer that is your file manager, do what you need to do, then pop back, all without ever leaving Emacs.

Another reason is that, as with everything else in Emacs, it’s all text. Everything in Emacs is just text, and so the same editing commands can be used everywhere. (More on that here.)

Even though I use Emacs daily, and even though I can make a case for why dired is great, I don’t use it that much. Or rather, I don’t use that much of what it can do. The good that I would, I do not.

I was reviewing dired‘s features and discovered something very handy: typing %& will mark files for deletion that can easily be created again. In particular, it will flag the byproducts of compiling a LaTeX file.

For example, the following is a screenshot of a dired buffer.

When I type %& it highlights the LaTeX temp files in red and marks them for deletion. (The D’s in the left column indicate files to be deleted.)

This doesn’t delete the files, but it marks them for deletion. If I then type x the files will be deleted.

In addition to the unneeded LaTeX files, it also highlighted a .bak file. However, it did not highlight the .o object file. I suppose the thought was that most people would manage C programs from a make file. I’m sure the class of files to mark is configurable, like everything else in Emacs.

More Emacs posts

Opening Windows files from bash and eshell

I often work in a sort of amphibious environment, using Unix software on Windows. As you can well imagine, this causes headaches. But I’ve found such headaches are generally more manageable than the headaches from alternatives I’ve tried.

On the Windows command line, you can type the name of a file and Windows will open the file with the default application associated with its file extension. For example, typing foo.docx and pressing Enter will open the file by that name using Microsoft Word, assuming that is your default application for .docx files.

Unix shells don’t work that way. The first thing you type at the command prompt must be a command, and foo.docx is not a command. The Windows command line generally works this way too, but it makes an exception for files with recognized extensions; the command is inferred from the extension and the file name is an argument to that command.

WSL bash

When you’re running bash on Windows, via WSL (Windows Subsystem for Linux), you can run the Windows utility start which will open a file according to its extension. For example,

    cmd.exe /C start foo.pdf

will open the file foo.pdf with your default PDF viewer.

You can also use start to launch applications without opening a particular file. For example, you could launch Word from bash with

    cmd.exe /C start winword.exe

Emacs eshell

Eshell is a shell written in Emacs Lisp. If you’re running Windows and you do not have access to WSL but you do have Emacs, you can run eshell inside Emacs for a Unix-like environment.

If you try running

    start foo.pdf

that will probably not work because eshell does not use the windows PATH environment.

I got around this by creating a Windows batch file named mystart.bat and put it in my path. The batch file simply calls start with its argument:

    start %

Now I can open foo.pdf from eshell with

    mystart foo.pdf

The solution above for bash

    cmd.exe /C start foo.pdf

also works from eshell.

(I just realized I said two contradictory things: that eshell does not use your path, and that it found a bash file in my path. I don’t know why the latter works. I keep my batch files in c:/bin, which is a Unix-like location, and maybe eshell looks there, not because it’s in my Windows path, but because it’s in what it would expect to be my path based on Unix conventions. I’ve searched the eshell documentation, and I don’t see how to tell what it uses for a path.)

More shell posts

Org entities

This morning I found out that Emacs org-mode has its own markdown entities, analogous to HTML entities or LaTeX commands. Often they’re identical to LaTeX commands. For example, \approx is the approximation symbol ≈, exactly as in LaTeX.

So what’s the advantage of org-entities? In fact, how does Emacs even know whether \approx is a LaTeX command or an org entity?

If you use the command C-c C-x \ , Emacs will show you the compiled version of the entity, i.e. ≈ rather than the command \approx. This is global: all entities are displayed. The org entities would be converted to symbols if you export the file to HTML or LaTeX, but this gives you a way to see the symbols before exporting.

Here something that’s possibly surprising, possibly useful. The symbol you see is for display only. If you copy and paste it to another program, you’ll see the entity text, not the symbol. And if you C-c C-x \ again, you’ll see the command again, not the symbol; Note that the full name of the command is org-toggle-pretty-entities with “toggle” the middle.

If you use set-input-method to enter symbols using LaTeX commands or HTML entities as I described here, Emacs inserts a Unicode character and is irreversible. Once you type the LaTeX command \approx or the corresponding HTML entity &asymp;, any knowledge of how that character was entered is lost. So org entities are useful when you want to see Unicode characters but want your source file to remain strictly ASCII.

Incidentally, there are org entities for Hebrew letters, but only the first four, presumably because these are the only ones used as mathematical symbols.

To see a list of org entities, use the command org-entities-help. Even if you never use org entities, the org entity documentation makes a nice reference for LaTeX commands and HTML entities. Here’s a screenshot of the first few lines.

First few lines of org-entities-help

Related posts