Do comments in a LaTeX file change the output?

When you add a comment to a LaTeX file, it makes no visible change to the output. The comment is ignored as far as the appearance of the file. But is that comment somehow included in the file anyway?

If you compile a LaTeX file to PDF, then edit it by throwing in a comment, and compile again, your two files will differ. As I wrote about earlier, the time that a file is created is embedded in a PDF. That time stamp is also included in two or three hashes, so the files will differ by more than just the bits in the time stamp.

But even if you compile two files at the same time (within the resolution of the time stamp, which is one second), the PDF files will still differ. Apparently some kind of hash of the source file is included in the PDF.

So suppose you have two files. The content of foo.tex is

    \documentclass{article}
    \begin{document}
    Hello world.
    \end{document}

and the content of bar.tex is

    \documentclass{article}
    \begin{document}
    Hello world. % comment
    \end{document}

then the output of running pdflatex on both files will look the same.

Suppose you compile the files at the same time so that the time stamps are the same.

    pdflatex foo.tex && pdflatex bar.tex

It’s possible that the two time stamps could be different, one file compiling a little before the tick of a new second and one compiling a little after. But if your computer is fast enough and you don’t get unlucky, the time stamps will be the same.

Then you can compare hex dumps of the two PDF files with

    diff  <(xxd foo.pdf) <(xxd bar.pdf)

This produces the following

    < ...  ./ID [<F12AF1442
    < ...  E03CC6B3AB64A5D9
    < ... 8DEE2FE> <F12AF1
    < ...  442E03CC6B3AB64A
    < ... 5D98DEE2FE>]./Le
    --
    > ...  ./ID [<4FAA0E9F1
    > ...  CC6EFCC5068F481E
    > ...  0419AD6> <4FAA0E
    > ...  9F1CC6EFCC5068F4
    > ...  81E0419AD6>]./Le

You can’t recover the comment from the binary dump, but you can tell that the files differ.

I don’t know what hash is being used. My first guess was MD5, but that’s not it. It’s a 128-bit hash, so that rules out newer hashes like SHA256. I tried searching for it but didn’t find anything. If you know what hash pdflatex uses, please let me know.

LaTeX will also let you add text at the end of the file, after the \end{document} command. This also will change the hash code but will not change the appearance of the output.

Related posts

Navigating a LaTeX file

I like generating long LaTeX documents from org-mode because, for one thing, org-mode has nice section folding. But not everyone I work with uses Emacs, so its better to work in LaTeX directly rather than have Emacs generate LaTeX.

AUCTeX has section folding for LaTeX documents, though so far I’ve only has limited success at getting it to work. However, RefTeX worked right out of the box.

If you enter reftex-mode ctrlc = then RefTeX will open a table of contents window.

RefTeX screen shot

Scrolling through the table of contents window scrolls through the body of the document. This isn’t exactly section folding, but it serves a similar purpose.

RefTeX ships with Emacs, so there’s probably no need to install it, but the mode is not enabled by default.

Lehman’s inequality, circuits, and LaTeX

Let A, B, C, and D be positive numbers. Then Lehman’s inequality says

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Proof by circuit

This inequality can be proved analytically, but Lehman’s proof is interesting because he uses electrical circuits [1].

Let A, B, C, and D be the resistances of resistors arranges as in the circuit on the left.

Resistors R1 and R2 in series have a combined resistance of

R = R_1 + R_2

and the same resistors in parallel have a combined resistance of

R = \frac{R_1 R_2}{R_1 + R_2}

This means the circuit on the left has combined resistance

\frac{(A+B)(C+D)}{A+B+C+D}

The resistance of the circuit on the right is

\frac{AC}{A+C} + \frac{BD}{B+D}

Adding a short cannot increase resistance, so the resistance of the circuit on the right must be the same or lower than the resistance of the one on the left. Therefore

\frac{(A+B)(C+D)}{A+B+C+D} \geq \frac{AC}{A+C} + \frac{BD}{B+D}

Drawing circuits in LaTeX

I drew the circuits above using the circuitikz package in LaTeX. Here’s the code to draw the circuit on the left.

    \documentclass{article}
    
    \usepackage{tikz}
    \usepackage{circuitikz}
    
    \begin{document}
    
    \begin{figure}[h!]
      \begin{center}
        \begin{circuitikz}
          \draw (0,0)
          to[R=$B$] (0,2) 
          to[R=$A$] (0,4)
          to[short] (2,4) 
          to[R=$C$] (2,2)
          to[R=$D$] (2,0)
          to[short] (0,0);
          \draw (1,4)
          to[short] (1, 5);
          \draw (1, 0)
          to[short] (1, -1);
          %\draw (0, 2)
          %to[short] (2, 2);
        \end{circuitikz}
      \end{center}
    \end{figure}
    
    \end{document}

The code to draw the second circuit removes the %’s commenting out the code that draws the short between the two parallel arms of the circuit.

[1] Alfred Lehman proves a more general result using circuits in SIAM Review, Vol. 4, No. 2 (April 1962) pp. 150–151. Fazlollah Reza gives a purely mathematical proof on pages 151 and 152 of the same journal.

Convert LaTeX to Microsoft Word

I create nearly all my documents in LaTeX, even documents that might be easier to create in Word. The reason is that even if a particular document would be easier to write in Word, my workflow is more efficient if everything is in LaTeX. LaTeX makes small, plain text files that work well with version control and searching, and I can edit them with the same editor I use for writing code and everything else I do.

Usually I send read-only documents to clients. They don’t know or care what program created the PDF I sent them. The fact that they cannot edit my reports is a feature, not a bug: if I’m going to sign off on something, I need to be sure that it doesn’t include any changes that someone else made that I’m unaware of.

But occasionally I do need to send clients a file they can edit, and this usually means Microsoft Word. Lawyers particularly want Word documents.

It’s possible to create a PDF using LaTeX and copy-and-paste the content into a Word document. This works, but you’ll have to redo all your formatting.

A better approach is to use Pandoc. The command

    pandoc foo.tex -o -s foo.docx

will convert the LaTeX file foo.tex directly to the Word document foo.docx. You may have to touch up the Word document a little, but it will retain more of the original formatting than if you when from LaTeX to Word via PDF.

You could wrap this in a script for convenience and so you don’t have to remember the pandoc syntax.

    #!/opt/local/bin/perl

    $tex = $ARGV[0];
    ($doc = $tex) =~ s/\.tex$/.docx/;
    exec "pandoc $tex -o $doc";

You could save this to tex2doc and run

    tex2doc foo.tex

to produce foo.docx.

Update: The syntax when I wrote this post did not work when I revisited this today (2023-11-30) but instead gave several warnings.  What worked today was

    pandoc foo.tex --from latex --to docx > foo.docx

Unfortunately I don’t have the version number that I used when I first wrote this post. Today I was using pandoc version 2.9.2.1.

Matching delimiters and chiastic patterns

When I first started programming I’d occasionally get an error because a delimiter wasn’t matched. Maybe I had a { without a matching }, for example. Or if I were writing Pascal, which I did back in the day, I might have had a BEGIN statement without a corresponding END statement.

At some point I saw someone type an opening delimiter, then the closing delimiter, then back up and insert the content in between. That seemed strange for about five seconds, then I realized it was brilliant and have adopted that practice ever since.

I’ve rarely seen a mismatched delimiter since then, but I ran into one this morning. In order to explain what happened, I’ll need to talk about LaTeX syntax.

LaTeX delimiters

LaTeX has two modes for math symbols: inline and display. Inline symbols appear in the middle of a sentence, while displayed symbols are set apart on their own line and centered. This is analogous to inline elements and block elements in HTML.

When I learned LaTeX, everyone used one dollar sign for inline mode and two dollar signs for display mode. So, for example, you would write $x$ for an x in the context of prose, and $$x = 5$$ to display an equation stating that x equals 5.

Now the preferred syntax is to use \[ to begin display mode and \] to end it. This has several advantages. In particular it makes it easier to debug mismatched delimiter errors since you can count the number of open and closed delimiters; you couldn’t tell without context whether $$ begins or ends a displayed equation.

It’s still common to use dollar signs for inline math mode, but there is an alternate notation that uses \( to begin and \) to end. This notation is not commonly used as far as I know. Escaped parentheses have all the theoretical advantages of escaped brackets, but in practice the scope of inline math content is very small and so it’s not a problem that the opening and closing delimiters are the same.

Org-mode

The only time I use \( and \) is when I’m not directly writing LaTeX but writing an org-mode file that generates LaTeX. I chose to write a client’s repoort in org-mode rather than directly in LaTeX because the document is very long, and so org-mode’s outlining is convenient. I can hide or expand parts of the tree by pressing a tab, for example. Also, the document has a few tables, and tables are easier to write in org-mode’s markdown than in LaTeX.

Unfortunately org-mode sometimes understands expressions like $x$ and sometimes it doesn’t. It would be safer to always write \(x\) but the former works often enough that I tend to use it out of habit. The root of my problem this morning was I had used dollar sign delimiters in a way that confused org-mode.

All else being equal, it’s best when opening and closing delimiters are different. I might advise someone learning LaTeX today to use the \(x\) notation, even though I continue to use $x$.

Chiastic patterns

When I was trying to debug my unmatched delimiter problem, I searched for all \begin and \end statements in my (org-mode generated) LaTeX file by searching on the regular expression \\begin\|\\end. Here’s an example of what I got back (with indentation added).

    \begin{quote}
    \end{quote}
    \begin{table}[htbp]
        \begin{tabular}{ll}
        \end{tabular}
    \end{table}
    \begin{algorithm}
        \begin{algorithmic}[1]
            \begin{enumerate}
            \end{enumerate}
        \end{algorithmic}
    \end{algorithm}

Sometimes opening and closing delimiters are consecutive, such as the beginning and end of the quote. But sometimes you’ll find a chiasmus pattern, named for the Greek letter χ. Notice the table above has an ABBA pattern and the algorithm has an ABCCBA pattern.

This kind of pattern is extremely common in the Bible, though it’s easy for modern readers to miss it. One reason this pattern matters is that the context of a sentence is not only the sentences above and below, but also the parallel sentence in the chiastic pattern. Furthermore, the most important point in a chiasmus is often at the middle.

While chiastic patterns were common in ancient eastern prose, they are far less common in modern western prose. However, chiastic patters are common in programming, so common that no one ever comments on them.

You can see the chiastic pattern in the indentation of source code. The context of a delimiter is its matching delimiter, which may many lines away. And the most important line of code for optimization is the code in the innermost loop, at the focus of a chiasmus.

I became aware of chiastic patterns in the Bible by reading Paul Through Mediterranean Eyes. This book has many outline illustrations that look a lot like source code such as nested for-loops.

Quartal melody: Star Trek fanfare

Intervals of a fourth, such as the interval from C to F, are common in western music, but consecutive intervals of this size are not. Quartal harmony is based on intervals of fourths, and quartal melodies use a lot of fourths, particularly consecutive fourths.

Maybe the most famous quartal melody is the opening fanfare to Star Trek (original series). Here’s a transcription of the opening line:

And here is the same music with the intervals of a fourth circled.

The theme opens with two consecutive fourths, there’s an augmented fourth in the middle, then two more consecutive fourths. There are two major thirds in the phrase above, which you could call diminished fourths.

Incidentally, there are four bell tones before the melody above begins, and the interval between the first two tones is a fourth.

Making the sheet music

Here’s the Lilypond source code I used to create the images above.

    \begin{lilypond}

    \score {
        \relative e'{
        \time 4/4
        \partial 2 a4. d8 |
        \tuplet 3/2 {g4~ 4 ges4} \tuplet 3/2 {d4 b4 e4} |
        a2 ~ 4 ~ 8 8 |
        des1
    }
    \end{lilypond}

This uses a few Lilypond features I hadn’t used before.

  • The \partial command for the two pickup notes.
  • The \tuple command for the triples.
  • The shortcut of not repeating the names repeated notes.

The last point applies twice, writing g4 4 rather than g4 g4 and writing a2 4 8 8 rather than a2 a4 a8 a8.

Related posts

Morse code in musical notation

Maybe this has been done before, but I haven’t seen it: Morse code in musical notation.

Here’s the Morse code alphabet, one letter per measure; in practice there would be less space between letters [1]. A dash is supposed to be three times as long as a dot, so a dot is a sixteenth note and a dash is a dotted eighth note.

Morse code is often at a frequency between 600 and 800 Hz. I picked the E above middle C (660 Hz) because it’s in that range.

Rhythm

Officially a dash is three times as long as a dot. But there’s also a space equal to the length of a dot between parts of a letter. So the sheet music above would be more accurate if you imagined all the sixteenth notes are staccato and the dotted eighth notes are really eighth notes followed by a sixteenth rest.

This doesn’t make much difference because individual operators have varying “fists,” styles of sending Morse code, and won’t exactly follow the official length and spacing rules.

You could rewrite the music above as follows, but it’s all an approximation.

Tempo

According to Wikipedia, “the dit length at 20 words per minute is 50 milliseconds.” So if a sixteenth note has a duration of 50 milliseconds, this would mean five quarter notes per second, or 300 beats per minute. But according to this video, the shortest duration people can distinguish is about 50 milliseconds.

That would imply that copying Morse code at 20 wpm is pushing the limits of human hearing. But copying at 20 wpm is common. Some people can copy Morse code at more than 50 words per minute or more, but at that speed they’re not hearing individual dits and dahs. An H, for example, four dits in a row, sounds like a single rough sound. In fact, they’re not really hearing letters at all but recognizing the shape of words.

How the image was made

I made the image above with LaTeX and Lilypond.

Adding the letters above each measure was kind of a hack. I used rehearsal markings to label the measures, but there was one problem: the software skips from letter H to letter J. That meant that the labels I and all subsequent letters were one ahead of what they should be, and the final letter Z was labeled AA. I tried several tricks, and Lilypond steadfastly refused to label a measure with ‘I’ even though I’ve seen such a label in the documentation.

My way around this was to make it label two consecutive measures with H, then in image editing software I turned the second H into an I. No doubt there’s a better way, but this worked.

I may play around with this and try to improve it a bit. If you have any suggestions, particularly related to Lilypond, please let me know.

Related posts

[1] You could think of the musical score above as a sort of transcription of the Farnsworth method of teaching Morse code. Students learn the letters at full speed, but with extra space between the letters at first. The faster speed discourages consciously counting the dits and dahs, forcing the student to listen to the overall rhythm of the letters.

LaTeX and Lawyers

Lawyers write Word documents and mathematicians write LaTeX documents. Of course makes collaboration awkward, but there are ways to make it better.

One solution is to simply use Word. People who use LaTeX probably know how to use Word, even if they’d rather not, and asking someone else to learn LaTeX is a non-starter. So if I’m coauthoring a document with a lawyer, I’ll use Word.

If I’m writing a report that a lawyer needs to review, I’ll use LaTeX. Using different programs actually helps because it makes a clear distinction between copy editing feedback and authorial responsibility.

This post will give a couple tips for writing reports in LaTeX to be delivered to a lawyer, one trivial and one not quite trivial.

The trivial tip is that \S produces the section sign § (U+00A7) common in legal documents but not so common elsewhere. The not so trivial tip is that the enumitem package lets you change the default labels that LaTeX uses with enumerated items.

Changing enumerated item labels

LaTeX was designed under the assumption that the user wants to focus on logical structure and leave the formatting up to the the typesetting program. Consistent with this design philosophy, nested enumerated lists simply wrapped with \begin{enumerate} and \end{enumerate} and individual list items are marked with \item, regardless of the level of nesting. LaTeX takes responsibility for displaying different labels at different levels: Arabic numerals for top-level lists, Roman letters for the next level of list, etc.

When you’re quoting legal documents, however, you don’t want to simply preserve the logical structure of (nested) lists; you want to preserve the labels as well.

Suppose you have the following nested list.

    \begin{enumerate} 
    \item First top-level item
    \item Second top-level item
      \begin{enumerate}
      \item A sub-item
      \item Another sub-item
        \begin{enumerate}
        \item A third-level item
        \item Another third-level item
          \begin{enumerate}
            \item Four levels in
            \end{enumerate}
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

By default, LaTeX will format this as follows.

But suppose in order to match another document you need the labels to progress as (a), (1), (A), and (i). The following LaTeX code will accomplish this.

    \begin{enumerate} [label={(\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={(\arabic*)}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={(\roman*)}]
            \item Four levels in
            \end{enumerate}      
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

This produces the following.

Note the parentheses in the labels above. You can replace remove one or both, replace them with square brackets, add periods, etc. as the following example shows.

    \begin{enumerate} [label={\alph*)}]
    \item First top-level item
    \item Second top-level item
      \begin{enumerate} [label={\arabic*.}]
      \item A sub-item
      \item Another sub-item
        \begin{enumerate} [label={(\Alph*)}]
        \item A third-level item
        \item Another third-level item
          \begin{enumerate} [label={[\roman*]}]
            \item Four levels in
            \end{enumerate}      
        \end{enumerate}
      \end{enumerate}
    \end{enumerate}

Here’s what this looks like when compiled.

Related posts

Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.

\invamp

The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:

    \rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with

    \rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

\invamp

Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

Including a little Hebrew in an English LaTeX document

I was looking up how to put a little Hebrew inside a LaTeX document and ran across a good answer on tex.stackexchange. Short answer: use the cjhebrew package.

In a nutshell, you put your Hebrew text between \< and > using the cjhebrew package’s transliteration. You write left-to-right, and the text will appear right-to-left. For example, \<'lp> produces

aleph in Hebrew

using ‘ for א, l for ל, and p for ף.

The code for each Hebrew letter is its English transliteration, with three footnotes.

First, when two Hebrew letters roughly correspond to the same English letter, one form may have a dot in front of it. For example, ט and ת both make a t sound; the former is encoded as .t and the latter as t.

Second, five Hebrew letters have a different form when used at the end of a word [1]. For such letters the final form is the capitalized value of the regular form. For example, פ and its final form ף are denoted by p and P respectively. The package will automatically choose between regular and final forms, but you can override this by using the capital letter in the middle of a word or by using a | after a regular form at the end of a word.

Finally, the letter ש is written with a /s The author already used s for ס and .s for צ, so he needed a new symbol to encode a third letter corresponding to s [2]. Also ש has a couple other forms. The letter can make either the sh or s sound, and you may see dots on top of the letter to distinguish these. The cjhebrew package uses +s for ש with a dot on the top right, the sh sound, and ,s for ש with a dot on the top left, the s sound.

Here is the complete consonant transliteration table from the cjhebrew documentation.

Note that the code for א is a single quote ' and the code for ע is a back tick (grave accent) `.

You can also add vowel points (niqqudim). These are also represented by their transliteration to English sounds, with one exception. The sh’va is either silent or represents a schwa sound, so there’s not a convenient transliterations. But the sh’va looks like a colon, so it is represented by a colon. See the package documentation for more details.

Related posts

[1] You may have seen something similar in Greek with sigma σ and final sigma ς. Even English had something like this. For example, people used to use a different form of t at the end of a word when writing cursive. My mother wrote this way.

[2] It would be more phonetically faithful to transliterate צ as ts, but that would make the LaTeX package harder to implement since it would have to disambiguate whether ts represents צ or תס.