# Quartal melody: Star Trek fanfare

Intervals of a fourth, such as the interval from C to F, are common in western music, but consecutive intervals of this size are not. Quartal harmony is based on intervals of fourths, and quartal melodies use a lot of fourths, particularly consecutive fourths.

Maybe the most famous quartal melody is the opening fanfare to Star Trek (original series). Here’s a transcription of the opening line:

And here is the same music with the intervals of a fourth circled.

The theme opens with two consecutive fourths, there’s an augmented fourth in the middle, then two more consecutive fourths. There are two major thirds in the phrase above, which you could call diminished fourths.

Incidentally, there are four bell tones before the melody above begins, and the interval between the first two tones is a fourth.

## Making the sheet music

Here’s the Lilypond source code I used to create the images above.

    \begin{lilypond}

\score {
\relative e'{
\time 4/4
\partial 2 a4. d8 |
\tuplet 3/2 {g4~ 4 ges4} \tuplet 3/2 {d4 b4 e4} |
a2 ~ 4 ~ 8 8 |
des1
}
\end{lilypond}


This uses a few Lilypond features I hadn’t used before.

• The \partial command for the two pickup notes.
• The \tuple command for the triples.
• The shortcut of not repeating the names repeated notes.

The last point applies twice, writing g4 4 rather than g4 g4 and writing a2 4 8 8 rather than a2 a4 a8 a8.

# Morse code in musical notation

Maybe this has been done before, but I haven’t seen it: Morse code in musical notation.

Here’s the Morse code alphabet, one letter per measure; in practice there would be less space between letters [1]. A dash is supposed to be three times as long as a dot, so a dot is a sixteenth note and a dash is a dotted eighth note.

Morse code is often at a frequency between 600 and 800 Hz. I picked the E above middle C (660 Hz) because it’s in that range.

## Rhythm

Officially a dash is three times as long as a dot. But there’s also a space equal to the length of a dot between parts of a letter. So the sheet music above would be more accurate if you imagined all the sixteenth notes are staccato and the dotted eighth notes are really eighth notes followed by a sixteenth rest.

This doesn’t make much difference because individual operators have varying “fists,” styles of sending Morse code, and won’t exactly follow the official length and spacing rules.

You could rewrite the music above as follows, but it’s all an approximation.

## Tempo

According to Wikipedia, “the dit length at 20 words per minute is 50 milliseconds.” So if a sixteenth note has a duration of 50 milliseconds, this would mean five quarter notes per second, or 300 beats per minute. But according to this video, the shortest duration people can distinguish is about 50 milliseconds.

That would imply that copying Morse code at 20 wpm is pushing the limits of human hearing. But copying at 20 wpm is common. Some people can copy Morse code at more than 50 words per minute or more, but at that speed they’re not hearing individual dits and dahs. An H, for example, four dits in a row, sounds like a single rough sound. In fact, they’re not really hearing letters at all but recognizing the shape of words.

## How the image was made

I made the image above with LaTeX and Lilypond.

Adding the letters above each measure was kind of a hack. I used rehearsal markings to label the measures, but there was one problem: the software skips from letter H to letter J. That meant that the labels I and all subsequent letters were one ahead of what they should be, and the final letter Z was labeled AA. I tried several tricks, and Lilypond steadfastly refused to label a measure with ‘I’ even though I’ve seen such a label in the documentation.

My way around this was to make it label two consecutive measures with H, then in image editing software I turned the second H into an I. No doubt there’s a better way, but this worked.

I may play around with this and try to improve it a bit. If you have any suggestions, particularly related to Lilypond, please let me know.

## Related posts

[1] You could think of the musical score above as a sort of transcription of the Farnsworth method of teaching Morse code. Students learn the letters at full speed, but with extra space between the letters at first. The faster speed discourages consciously counting the dits and dahs, forcing the student to listen to the overall rhythm of the letters.

# LaTeX and Lawyers

Lawyers write Word documents and mathematicians write LaTeX documents. Of course makes collaboration awkward, but there are ways to make it better.

One solution is to simply use Word. People who use LaTeX probably know how to use Word, even if they’d rather not, and asking someone else to learn LaTeX is a non-starter. So if I’m coauthoring a document with a lawyer, I’ll use Word.

If I’m writing a report that a lawyer needs to review, I’ll use LaTeX. Using different programs actually helps because it makes a clear distinction between copy editing feedback and authorial responsibility.

This post will give a couple tips for writing reports in LaTeX to be delivered to a lawyer, one trivial and one not quite trivial.

The trivial tip is that \S produces the section sign § (U+00A7) common in legal documents but not so common elsewhere. The not so trivial tip is that the enumitem package lets you change the default labels that LaTeX uses with enumerated items.

## Changing enumerated item labels

LaTeX was designed under the assumption that the user wants to focus on logical structure and leave the formatting up to the the typesetting program. Consistent with this design philosophy, nested enumerated lists simply wrapped with \begin{enumerate} and \end{enumerate} and individual list items are marked with \item, regardless of the level of nesting. LaTeX takes responsibility for displaying different labels at different levels: Arabic numerals for top-level lists, Roman letters for the next level of list, etc.

When you’re quoting legal documents, however, you don’t want to simply preserve the logical structure of (nested) lists; you want to preserve the labels as well.

Suppose you have the following nested list.

    \begin{enumerate}
\item First top-level item
\item Second top-level item
\begin{enumerate}
\item A sub-item
\item Another sub-item
\begin{enumerate}
\item A third-level item
\item Another third-level item
\begin{enumerate}
\item Four levels in
\end{enumerate}
\end{enumerate}
\end{enumerate}
\end{enumerate}


By default, LaTeX will format this as follows.

But suppose in order to match another document you need the labels to progress as (a), (1), (A), and (i). The following LaTeX code will accomplish this.

    \begin{enumerate} [label={(\alph*)}]
\item First top-level item
\item Second top-level item
\begin{enumerate} [label={(\arabic*)}]
\item A sub-item
\item Another sub-item
\begin{enumerate} [label={(\Alph*)}]
\item A third-level item
\item Another third-level item
\begin{enumerate} [label={(\roman*)}]
\item Four levels in
\end{enumerate}
\end{enumerate}
\end{enumerate}
\end{enumerate}


This produces the following.

Note the parentheses in the labels above. You can replace remove one or both, replace them with square brackets, add periods, etc. as the following example shows.

    \begin{enumerate} [label={\alph*)}]
\item First top-level item
\item Second top-level item
\begin{enumerate} [label={\arabic*.}]
\item A sub-item
\item Another sub-item
\begin{enumerate} [label={(\Alph*)}]
\item A third-level item
\item Another third-level item
\begin{enumerate} [label={[\roman*]}]
\item Four levels in
\end{enumerate}
\end{enumerate}
\end{enumerate}
\end{enumerate}


Here’s what this looks like when compiled.

# Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.

The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:

    \rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with

    \rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

## Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

# Including a little Hebrew in an English LaTeX document

I was looking up how to put a little Hebrew inside a LaTeX document and ran across a good answer on tex.stackexchange. Short answer: use the cjhebrew package.

In a nutshell, you put your Hebrew text between \< and > using the cjhebrew package’s transliteration. You write left-to-right, and the text will appear right-to-left. For example, \<'lp> produces

using ‘ for א, l for ל, and p for ף.

The code for each Hebrew letter is its English transliteration, with three footnotes.

First, when two Hebrew letters roughly correspond to the same English letter, one form may have a dot in front of it. For example, ט and ת both make a t sound; the former is encoded as .t and the latter as t.

Second, five Hebrew letters have a different form when used at the end of a word [1]. For such letters the final form is the capitalized value of the regular form. For example, פ and its final form ף are denoted by p and P respectively. The package will automatically choose between regular and final forms, but you can override this by using the capital letter in the middle of a word or by using a | after a regular form at the end of a word.

Finally, the letter ש is written with a /s The author already used s for ס and .s for צ, so he needed a new symbol to encode a third letter corresponding to s [2]. Also ש has a couple other forms. The letter can make either the sh or s sound, and you may see dots on top of the letter to distinguish these. The cjhebrew package uses +s for ש with a dot on the top right, the sh sound, and ,s for ש with a dot on the top left, the s sound.

Here is the complete consonant transliteration table from the cjhebrew documentation.

Note that the code for א is a single quote ' and the code for ע is a back tick (grave accent) .

You can also add vowel points (niqqudim). These are also represented by their transliteration to English sounds, with one exception. The sh’va is either silent or represents a schwa sound, so there’s not a convenient transliterations. But the sh’va looks like a colon, so it is represented by a colon. See the package documentation for more details.

## Related posts

[1] You may have seen something similar in Greek with sigma σ and final sigma ς. Even English had something like this. For example, people used to use a different form of t at the end of a word when writing cursive. My mother wrote this way.

[2] It would be more phonetically faithful to transliterate צ as ts, but that would make the LaTeX package harder to implement since it would have to disambiguate whether ts represents צ or תס.

# LaTeX command frequencies

In the previous post I present a bash one-liner to search directories for LaTeX files and count the commands used.

## College files

I first tried this out on a directory that included some old files from grad school. I chose this directory because I knew it had a lot of LaTeX files, but I was surprised at the results. Here were the top 10 results:

1. \Omega
2. \partial
3. \bf
4. \in
5. \mu
6. \real
7. \int
8. \item
9. \alpha
10. \end

I was very surprised that the top command was \Omega. I expected maybe the integral command \int would come out on top.

The notes contain a lot of integrals, but these integrals were often over a domain Ω. The set inclusion command \in also appears frequently, probably in the context of saying x ∈ Ω.

The \partial came up frequently because I used it in two contexts. First, I was studying partial differential equations, so I used the symbol for partial derivative a lot. Second, I used ∂ to denote the boundary of a domain, as in ∂Ω.

You might notice that \end made the list above but \begin didn’t. Sounds like an error if LaTeX files have more \end statements than \begin statements. The reason is that I used to have an include file that had lots of macros and ended with \begin{document}. That saved a few keystrokes, but now I think such asymmetry is bad form. In the search described below, there are exactly the same number of \begin and \end statements.

## Client files

When I looked at the command frequencies in a directory containing some client work, I got very different frequencies. Here were the top commands in that directory.

1. \hline
2. \item
3. \end
4. \begin
5. \frac
6. \xi
7. \phi
8. \lambda
9. \texttt
10. \partial

I suppose \hline is at the top because the files contained a lot of tables. It makes sense that \item, \begin, \end and \frac were near the top because those are common LaTeX commands. I don’t remember what I was working on that used the symbol ξ so much.

When I first thought about this post I thought I could get a feel for what commands are used frequently in LaTeX in general. I started with my own files because they’re at hand, but the results say more about my usage of LaTeX than about LaTeX in general.

## Other collections

I imagine if you were to look at the frequency statistics for a large corpus, such as the articles submitted to a given math journal, the results would still depend somewhat on the journal: you’re going to see \int for integral more often in an analysis journal than in an algebra journal, etc.

If you run the code from the previous post on some collection of LaTeX files and get some interesting results, leave a comment describing what you found.

# Typesetting zodiac symbols in LaTeX

Typesetting zodiac symbols in LaTeX is admittedly an unusual thing to do. LaTeX is mostly used for scientific publication, and zodiac symbols are commonly associated with astrology. But occasionally zodiac symbols are used in more respectable contexts.

The wasysym package for LaTeX includes miscellaneous symbols, including zodiac symbols. Here are the symbols, their LaTeX commands, and their corresponding Unicode code points.

The only surprise here is that the command for Capricorn is based on the Latin form of the name: \capricornus.

Each zodiac sign is used to denote a 30° region of the sky. Since the Unicode symbols are consecutive, you can compute the code point of a symbol from the longitude angle θ in degrees:

Here 9800 is the decimal form of 0x2648, and the half brackets are the floor symbol, i.e. round down to the nearest integer.

Here’s the LaTeX code that produced the table.

\documentclass{article}
\usepackage{wasysym}
\begin{document}

\begin{table}
\begin{tabular}{lll}
\aries       & \verb|\aries       | & U+2648 \\
\taurus      & \verb|\taurus      | & U+2649 \\
\gemini      & \verb|\gemini      | & U+264A \\
\cancer      & \verb|\cancer      | & U+264B \\
\leo         & \verb|\leo         | & U+264C \\
\virgo       & \verb|\virgo       | & U+264D \\
\libra       & \verb|\libra       | & U+264E \\
\scorpio     & \verb|\scorpio     | & U+264F \\
\sagittarius & \verb|\sagittarius | & U+2650 \\
\capricornus & \verb|\capricornus | & U+2651 \\
\aquarius    & \verb|\aquarius    | & U+2652 \\
\pisces      & \verb|\pisces      | & U+2653 \\
\end{tabular}
\end{table}
\end{document}


By the way, you can use the Unicode values in HTML by replacing U+ with &#x and adding a semicolon on the end.

# Regular expressions and special characters

Special characters make text processing more complicated because you have to pay close attention to context. If you’re looking at Python code containing a regular expression, you have to think about what you see, what Python sees, and what the regular expression engine sees. A character may be special to Python but not to regular expressions, or vice versa.

This post goes through an example in detail that shows how to manage special characters in several different contexts.

## Escaping special TeX characters

I recently needed to write a regular expression [1] to escape TeX special characters. I’m reading in text like ICD9_CODE and need to make that ICD9\_CODE so that TeX will understand the underscore to be a literal underscore, and a subscript instruction.

Underscore isn’t the only special character in TeX. It has ten special characters:

    \ { } $& # ^ _ % ~ The two that people most commonly stumble over are probably $ and % because these are fairly common in ordinary prose. Since % begins a comment in TeX, importing a percent sign without escaping it will fail silently. The result is syntactically valid. It just effectively cuts off the remainder of the line.

So whenever my script sees a TeX special character that isn’t already escaped, I’d like it to escape it.

## Raw strings

First I need to tell Python what the special characters are for TeX:

    special = r"\\{}$&#^_%~" There’s something interesting going on here. Most of the characters that are special to TeX are not special to Python. But backslash is special to both. Backslash is also special to regular expressions. The r prefix in front of the quotes tells Python this is a “raw” string and that it should not interpret backslashes as special. It’s saying “I literally want a string that begins with two backslashes.” Why two backslashes? Wouldn’t one do? We’re about to use this string inside a regular expression, and backslashes are special there too. More on that shortly. ## Lookbehind Here’s my regular expression:  re.sub(r"(?<!\\)([" + special + "])", r"\\\1", line) I want special characters that have not already been escaped, so I’m using a negative lookbehind pattern. Negative lookbehind expressions begin with (?<! and end with ). So if, for example, I wanted to look for the string “ball” but only if it’s not preceded by “charity” I could use the regular expression  (?<!charity )ball This expression would match “foot ball” or “foosball” but not “charity ball”. Our lookbehind expression is complicated by the fact that the thing we’re looking back for is a special character. We’re looking for a backslash, which is a special character for regular expressions [2]. After looking behind for a backslash and making sure there isn’t one, we look for our special characters. The reason we used two backslashes in defining the variable special is so the regular expression engine would see two backslashes and interpret that as one literal backslash. ## Captures The second argument to re.sub tells it what to replace its match with. We put parentheses around the character class listing TeX special characters because we want to capture it to refer to later. Captures are referred to by position, so the first capture is \1, the second is \2, etc. We want to tell re.sub to put a backslash in front of the first capture. Since backslashes are special to the regular expression engine, we send it \\ to represent a literal backslash. When we follow this with \1 for the first capture, the result is \\\1 as above. ## Testing We can test our code above on with the following.  line = r"a_b$200 {x} %5 x\y"

and get

    a\_b \$200 \{x\} \%5 x\\y which would cause TeX to produce output that looks like a_b$200 {x} %5 x\y.

Note that we used a raw string for our test case. That was only necessary for the backslash near the end of the string. Without that we could have dropped the r in front of the opening quote.

## P.S. on raw strings

Note that you don’t have to use raw strings. You could just escape your special characters with backslashes. But we’ve already got a lot of backslashes here. Without raw strings we’d need even more. Without raw strings we’d have to say

    special = "\\\\{}\$&#^_%~"

starting with four backslashes to send Python two to send the regular expression engine one.

## Related posts

[1] Whenever I write about using regular expressions someone will complain that my solution isn’t completely general and that they can create input that will break my code. I understand that, but it works for me in my circumstances. I’m just writing scripts to get my work done, not claiming to have written hardened production software for anyone else to use.

[2] Keep context in mind. We have three languages in play: TeX, Python, and regular expressions. One of the keys to understanding regular expressions is to see them as a small language embedded inside other languages like Python. So whenever you hear a character is special, ask yourself “Special to whom?”. It’s especially confusing here because backslash is special to all three languages.

# Trademark symbol, LaTeX, and Unicode

Earlier this year I was a coauthor on a paper about the Cap Score™ test for male fertility from Androvia Life Sciences [1]. I just noticed today that when I added the publication to my CV, it caused some garbled text to appear in the PDF.

Here is the corresponding LaTeX source code.

## Fixing the LaTeX problem

There were two problems: the trademark symbol and the non-printing symbol denoted by a red underscore in the source file. The trademark was a non-ASCII character (Unicode U+2122) and the underscore represented a non-printing (U+00A0). At first I only noticed the trademark symbol, and I fixed it by including a LaTeX package to allow Unicode characters:

    \usepackage[utf8x]{inputenc}

An alternative fix, one that doesn’t require including a new package, would be to replace the trademark Unicode character with \texttrademark\. Note the trailing backslash. Without the backslash there would be no space after the trademark symbol. The problem with the unprintable character would remain, but the character could just be deleted.

I found out there are two Unicode code points render the trademark glyph, U+0099 and U+2122. The former is in the Latin 1 Supplement section and is officially a control character. The correct code point for the trademark symbol is the latter. Unicode files U+2122 under Letterlike Symbols and gives it the official name TRADE MARK SIGN.

## Related posts

[1] Jay Schinfeld, Fady Sharara, Randy Morris, Gianpiero D. Palermo, Zev Rosenwaks, Eric Seaman, Steve Hirshberg, John Cook, Cristina Cardona, G. Charles Ostermeier, and Alexander J. Travis. Cap-Score™ Prospectively Predicts Probability of Pregnancy, Molecular Reproduction and Development. To appear.

# Typesetting modal logic

Modal logic extends propositional logic with two new operators, □ (“box”) and ◇ (“diamond”). There are many interpretations of these two symbols, the most common being necessity and possibility respectively. That is, □p means the proposition p is necessary, and ◇p means that p is possible. Another interpretation is using the symbols to represent things a person knows to be true and things that may be true as far as that person knows.

There are also many axiom systems for inference concerning these operators. For example, some axiom systems include the rule

and some do not. If you interpret □ as saying a proposition is provable, this axiom says whatever is provable is provably provable, which makes sense. But if you take □ to be a statement about what an agent knows, you may not want to say that if an agent knows something, it knows that it knows it.

See the next post for an example of applying logic to security, a logic with lots of modal operators and axioms. But for now, we’ll focus on how to typeset the box and diamond operators.

## LaTeX

In LaTeX, the most obvious commands would be \box and \diamond, but that doesn’t work. There is no \box command, though there is a \square command. And although there is a \diamond command, it produces a symbol much smaller than \square and so the two look odd together. The two operators are dual in the sense that

and so they should have symbols of similar size. A better approach is to use \Box and \Diamond. Those were used in the displayed equations above. These symbols are in the amsfonts package.

## Unicode

There are many box-like and diamond-like symbols in Unicode. It seems reasonable to use U+25A1 for box and U+25C7 for diamond. I don’t know of any more semantically appropriate characters. There are no Unicode characters with “modal” in their name, for example.

## HTML

You can always insert Unicode characters into HTML by using &#x, followed by the hexadecimal value of the codepoint, followed by a semicolon. For example, I typed &#x25a1; and &#x25c7; to enter the box and diamond symbols above.

If you want to stick to HTML entities because they’re easier to remember, you’re mostly out of luck. There is no HTML entity for the box operator. There is an entity &loz;` for “lozenge,” the typographical term for a diamond. This HTML entity corresponds to U+25CA and is smaller than U+25C7 recommended above. As discussed in the context of LaTeX, you want the box and diamond operators to have a similar size.