Zareba, asterism, and dinkus

Occasionally I will use a row of three asterisks to indicate the separation between the body of a blog post and the footnotes, as I did in my previous post.

***

This page quotes a typesetter who said she calls this a zareba. The OED defines a zareba as a kind of cattle pen but does not mention its use as a typographical term.

According to Wikipedia, a row of three asterisks is a particular kind of dinkus, punctuation used to signify a section break. My guess is that the word is related to dingus and dingbat but I haven’t been able to find an etymology. The OED does not have an entry for dinkus.

This article attests to both zareba and dinkus as defined above.

Apparently the plural of dinkus is dinkuses.

When the three stars are arranged in a triangle, as above, this is called an asterism. This term is less obscure than zareba or dinkus. It has an OED entry and Unicode code point (U+2042). In case your browser didn’t display the asterism above as a character, the following another one as an image.

It’s my impression that the zareba symbol is more common than the asterism symbol, but the word asterism is more common than the word zareba, at least in typographical usage.

There is no command for the asterism in The Comprehensive LaTeX Symbol List, but the document does explain how to create your own, citing the following solution by Peter Flynn.

    \newcommand{\asterism}{
        \smash{
            \raisebox{-.5ex}{
                \setlength{\tabcolsep}{-.5pt}
                \begin{tabular}{@{}cc@{}}
                    \multicolumn2c*\\[-2ex]*&*
                \end{tabular}
            }
        }
    }

If you are using XeTeX you can just insert the Unicode character U+2042 directly, but the command above will work with plain TeX or LaTeX.

Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.

\invamp

The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:

    \rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with

    \rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

\invamp

Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

Entering symbols in Emacs

Emacs has a relatively convenient way to add accents to letters or to insert a Unicode character if you know the code point for the value. See these notes.

But usually you don’t know the Unicode values of symbols. Then what do you do?

TeX commands

You enter symbols by typing their corresponding TeX commands by using

    M-x set-input-method RET tex

After doing that, you could, for example, enter π by typing \pi.

You’ll see the backslash as you type the command, but once you finish you’ll see the symbol instead [1].

HTML entities

You may know the HTML entity for a symbol and want to use that to enter characters in Emacs. Unfortunately, the following does NOT work.

    M-x set-input-method RET html

However, there is a slight variation on this that DOES work:

    M-x set-input-method RET sgml

Once you’ve set your input method to sgml, you could, for example, type √ to insert a √ symbol.

Why SGML rather than HTML?

HTML was created by simplifying SGML (Standard Generalized Markup Language). Emacs is older than HTML, and so maybe Emacs supported SGML before HTML was written.

There may be some useful SGML entities that are not in HTML, though I don’t know. I imagine these days hardly anyone knows anything about SGML beyond the subset that lives on in HTML and XML.

Changing input modes

If you want to move between your default input mode and TeX mode, you can use the command toggle-input-method. This is usually mapped to C-u C-\.

You can see a list of all available input methods with list-input-methods. Most of these are spoken languages, such as Arabic or Welsh, rather than technical input modes like TeX and SGML.

More Emacs posts

[1] I suppose there could be a problem if one command were a prefix of another. That is, if there were symbols \foo and \foobar and you intended to insert the latter, Emacs might think you’re done after you’ve typed the former. But I can’t think of a case where that would happen. TeX commands are nearly prefix codes. There are TeX commands like \tan and \tanh, but these don’t represent symbols per se. Emacs doesn’t need any help to insert the letters “tan” or “tanh” into a file.

Typesetting zodiac symbols in LaTeX

Typesetting zodiac symbols in LaTeX is admittedly an unusual thing to do. LaTeX is mostly used for scientific publication, and zodiac symbols are commonly associated with astrology. But occasionally zodiac symbols are used in more respectable contexts.

The wasysym package for LaTeX includes miscellaneous symbols, including zodiac symbols. Here are the symbols, their LaTeX commands, and their corresponding Unicode code points.

The only surprise here is that the command for Capricorn is based on the Latin form of the name: \capricornus.

Each zodiac sign is used to denote a 30° region of the sky. Since the Unicode symbols are consecutive, you can compute the code point of a symbol from the longitude angle θ in degrees:

9800 + \left\lfloor \frac{\theta}{30} \right\rfloor

Here 9800 is the decimal form of 0x2648, and the half brackets are the floor symbol, i.e. round down to the nearest integer.

Here’s the LaTeX code that produced the table.

\documentclass{article}
\usepackage{wasysym}
\begin{document}

\begin{table}
\begin{tabular}{lll}
\aries       & \verb|\aries       | & U+2648 \\
\taurus      & \verb|\taurus      | & U+2649 \\
\gemini      & \verb|\gemini      | & U+264A \\
\cancer      & \verb|\cancer      | & U+264B \\
\leo         & \verb|\leo         | & U+264C \\
\virgo       & \verb|\virgo       | & U+264D \\
\libra       & \verb|\libra       | & U+264E \\
\scorpio     & \verb|\scorpio     | & U+264F \\
\sagittarius & \verb|\sagittarius | & U+2650 \\
\capricornus & \verb|\capricornus | & U+2651 \\
\aquarius    & \verb|\aquarius    | & U+2652 \\
\pisces      & \verb|\pisces      | & U+2653 \\
\end{tabular}
\end{table}
\end{document}

By the way, you can use the Unicode values in HTML by replacing U+ with &#x and adding a semicolon on the end.

More LaTeX posts

Trademark symbol, LaTeX, and Unicode

Earlier this year I was a coauthor on a paper about the Cap Score™ test for male fertility from Androvia Life Sciences [1]. I just noticed today that when I added the publication to my CV, it caused some garbled text to appear in the PDF.

garbled LaTeX output

Here is the corresponding LaTeX source code.

LaTeX source

Fixing the LaTeX problem

There were two problems: the trademark symbol and the non-printing symbol denoted by a red underscore in the source file. The trademark was a non-ASCII character (Unicode U+2122) and the underscore represented a non-printing (U+00A0). At first I only noticed the trademark symbol, and I fixed it by including a LaTeX package to allow Unicode characters:

    \usepackage[utf8x]{inputenc}

An alternative fix, one that doesn’t require including a new package, would be to replace the trademark Unicode character with \texttrademark\. Note the trailing backslash. Without the backslash there would be no space after the trademark symbol. The problem with the unprintable character would remain, but the character could just be deleted.

Trademark and Unicode

I found out there are two Unicode code points render the trademark glyph, U+0099 and U+2122. The former is in the Latin 1 Supplement section and is officially a control character. The correct code point for the trademark symbol is the latter. Unicode files U+2122 under Letterlike Symbols and gives it the official name TRADE MARK SIGN.

Related posts

[1] Jay Schinfeld, Fady Sharara, Randy Morris, Gianpiero D. Palermo, Zev Rosenwaks, Eric Seaman, Steve Hirshberg, John Cook, Cristina Cardona, G. Charles Ostermeier, and Alexander J. Travis. Cap-Score™ Prospectively Predicts Probability of Pregnancy, Molecular Reproduction and Development. To appear.

Typesetting modal logic

Modal logic extends propositional logic with two new operators, □ (“box”) and ◇ (“diamond”). There are many interpretations of these two symbols, the most common being necessity and possibility respectively. That is, □p means the proposition p is necessary, and ◇p means that p is possible. Another interpretation is using the symbols to represent things a person knows to be true and things that may be true as far as that person knows.

There are also many axiom systems for inference concerning these operators. For example, some axiom systems include the rule

\Box p \rightarrow \Box \Box p

and some do not. If you interpret □ as saying a proposition is provable, this axiom says whatever is provable is provably provable, which makes sense. But if you take □ to be a statement about what an agent knows, you may not want to say that if an agent knows something, it knows that it knows it.

See the next post for an example of applying logic to security, a logic with lots of modal operators and axioms. But for now, we’ll focus on how to typeset the box and diamond operators.

LaTeX

In LaTeX, the most obvious commands would be \box and \diamond, but that doesn’t work. There is no \box command, though there is a \square command. And although there is a \diamond command, it produces a symbol much smaller than \square and so the two look odd together. The two operators are dual in the sense that

\begin{align*} \Box p &= \neg \Diamond \neg p \\ \Diamond p &= \neg \Box \neg p \end{align*}

and so they should have symbols of similar size. A better approach is to use \Box and \Diamond. Those were used in the displayed equations above. These symbols are in the amsfonts package.

Unicode

There are many box-like and diamond-like symbols in Unicode. It seems reasonable to use U+25A1 for box and U+25C7 for diamond. I don’t know of any more semantically appropriate characters. There are no Unicode characters with “modal” in their name, for example.

HTML

You can always insert Unicode characters into HTML by using &#x, followed by the hexadecimal value of the codepoint, followed by a semicolon. For example, I typed □ and ◇ to enter the box and diamond symbols above.

If you want to stick to HTML entities because they’re easier to remember, you’re mostly out of luck. There is no HTML entity for the box operator. There is an entity ◊ for “lozenge,” the typographical term for a diamond. This HTML entity corresponds to U+25CA and is smaller than U+25C7 recommended above. As discussed in the context of LaTeX, you want the box and diamond operators to have a similar size.

Related posts

Fraktur symbols in mathematics

When mathematicians run out of symbols, they turn to other alphabets. Most math symbols are Latin or Greek letters, but occasionally you’ll run into Russian or Hebrew letters.

Sometimes math uses a new font rather than a new alphabet, such as Fraktur. This is common in Lie groups when you want to associate related symbols to a Lie group and its Lie algebra. By convention a Lie group is denoted by an ordinary Latin letter and its associated Lie algebra is denoted by the same letter in Fraktur font.

lower case alphabet in Fraktur

LaTeX

To produce Fraktur letters in LaTeX, load the amssymb package and use the command \mathfrak{}.

Symbols such as \mathfrak{A} are math symbols and can only be used in math mode. They are not intended to be a substitute for setting text in Fraktur font. This is consistent with the semantic distinction in Unicode described below.

Unicode

The Unicode standard tries to distinguish the appearance of a symbol from its semantics, though there are compromises. For example, the Greek letter Ω has Unicode code point U+03A9 but the symbol Ω for electrical resistance in Ohms is U+2621 even though they are rendered the same [1].

The letters a through z, rendered in Fraktur font and used as mathematical symbols, have Unicode values U+1D51E through U+1D537. These values are in the “Supplementary Multilingual Plane” and do not commonly have font support [2].

The corresponding letters A through Z are encoded as U+1D504 through U+1D51C, though interestingly a few letters are missing. The code point U+1D506, which you’d expect to be Fraktur C, is reserved. The spots corresponding to H, I, and R are also reserved. Presumably these are reserved because they are not commonly used as mathematical symbols. However, the corresponding bold versions U+1D56C through U+ID585 have no such gaps [3].

Footnotes

[1] At least they usually are. A font designer could choose provide different glyphs for the two symbols. I used the same character for both because some I thought some readers might not see the Ohm symbol properly rendered.

[2] If you have the necessary fonts installed you should see the alphabet in Fraktur below:
𝔞 𝔟 𝔠 𝔡 𝔢 𝔣 𝔤 𝔥 𝔦 𝔧 𝔨 𝔩 𝔪 𝔫 𝔬 𝔭 𝔮 𝔯 𝔰 𝔱 𝔲 𝔳 𝔴 𝔵 𝔶 𝔷

I can see these symbols from my desktop and from my iPhone, but not from my Android tablet. Same with the symbols below.

[3] Here are the bold upper case and lower case Fraktur letters in Unicode:
𝕬 𝕭 𝕮 𝕯 𝕰 𝕱 𝕲 𝕳 𝕴 𝕵 𝕶 𝕷 𝕸 𝕹 𝕺 𝕻 𝕼 𝕽 𝕾 𝕿 𝖀 𝖁 𝖂 𝖃 𝖄 𝖅
𝖆 𝖇 𝖈 𝖉 𝖊 𝖋 𝖌 𝖍 𝖎 𝖏 𝖐 𝖑 𝖒 𝖓 𝖔 𝖕 𝖖 𝖗 𝖘 𝖙 𝖚 𝖛 𝖜 𝖝 𝖞 𝖟

Five lemma, ASCII art, and Unicode

A few days ago I wrote about creating ASCII art in Emacs using ditaa. Out of curiosity, I wanted to try making the Five Lemma diagram. [1]

The examples in the ditaa site all have arrows between boxes, but you don’t have to have boxes.

Here’s the ditaa source:

A₀ ---> A₁ ---> A₂ ---> A₃ ---> A₄
|       |       |       |       |            
| f₀    | f₁    | f₂    | f₃    | f₄    
|       |       |       |       |      
v       v       v       v       v      
B₀ ---> B₁ ---> B₂ ---> B₃ ---> B₄

and here’s the image it produces:

Five lemma diagram

It’s not pretty. You could make a nicer image with LaTeX. But as the old saying goes, the remarkable thing about a dancing bear is not that it dances well but that it dances at all.

The trick to getting the subscripts is to use Unicode characters 0x208n for subscript n. As I noted at the bottom of this post, ditaa isn’t strictly limited to ASCII art. You can use Unicode characters as well. You may or may not be able to see the subscripts in the source code they are not part of the most widely supported set of characters.

* * *

[1]  The Five Lemma is a diagram-chasing result from homological algebra. It lets you infer properties the middle function f from properties of the other f‘s.

Graphemes

Here’s something amusing I ran across in the glossary of Programming Perl:

grapheme A graphene is an allotrope of carbon arranged in a hexagonal crystal lattice one atom thick. Grapheme, or more fully, a grapheme cluster string is a single user-visible character, which in turn may be several characters (codepoints) long. For example … a “ȫ” is a single grapheme but one, two, or even three characters, depending on normalization.

In case the character ȫ doesn’t display correctly for you, here it is:

Unicode character U_022B

First, graphene has little to do with grapheme, but it’s geeky fun to include it anyway. (Both are related to writing. A grapheme has to do with how characters are written, and the word graphene comes from graphite, the “lead” in pencils. The origin of grapheme has nothing to do with graphene but was an analogy to phoneme.)

Second, the example shows how complicated the details of Unicode can get. The Perl code below expands on the details of the comment about ways to represent ȫ.

This demonstrates that the character . in regular expressions matches any single character, but \X matches any single grapheme. (Well, almost. The character . usually matches any character except a newline, though this can be modified via optional switches. But \X matches any grapheme including newline characters.)

   
# U+0226, o with diaeresis and macron 
my $a = "\x{22B}"; 

# U+00F6 U+0304, (o with diaeresis) + macron 
my $b = "\x{F6}\x{304}";    
     
# o U+0308 U+0304, o + diaeresis + macron   
my $c = "o\x{308}\x{304}"; 

my @versions = ($a, $b, $c);

# All versions display the same.
say @versions;

# The versions have length 1, 2, and 3.
# Only $a contains one character and so matches .
say map {length $_ if /^.$/} @versions;

# All versions consist of one grapheme.
say map {length $_ if /^\X$/} @versions;

Which Unicode characters can you depend on?

Unicode is supported everywhere, but font support for Unicode characters is sparse. When you use any slightly uncommon character, you have no guarantee someone else will be able to see it.

I’m starting a Twitter account @MusicTheoryTip and so I wanted to know whether I could count on followers seeing music symbols. I asked whether people could see ♭ (flat, U+266D), ♮ (natural, U+266E), and ♯ (sharp, U+266F). Most people could see all three symbols, from desktop or phone, browser or Twitter app. However, several were unable to see the natural sign from an Android phone, whether using a browser or a Twitter app. One person said none of the symbols show up on his Blackberry.

[Update: I gave @MusicTheoryTip over to someone else, and they didn’t keep it up for long.]

I also asked @diff_eq followers whether they could see the math symbols ∂ (partial, U+2202), Δ (Delta, U+0394), and ∇ (gradient, U+2207). One person said he couldn’t see the gradient symbol, but the rest of the feedback was positive.

So what characters can you count on nearly everyone being able to see? To answer this question, I looked at the characters in the intersection of several common fonts: Verdana, Georgia, Times New Roman, Arial, Courier New, and Droid Sans. My thought was that this would make a very conservative set of characters.

There are 585 characters supported by all the fonts listed above. Most of the characters with code points up to U+01FF are included. This range includes the code blocks for Basic Latin, Latin-1 Supplement, Latin Extended-A, and some of Latin Extended-B.

The rest of the characters in the intersection are Greek and Cyrillic letters and a few scattered symbols. Flat, natural, sharp, and gradient didn’t make the cut.

There are a dozen math symbols included:

0x2202 ∂
0x2206 ∆
0x220F ∏
0x2211 ∑
0x2212 −
0x221A √
0x221E ∞
0x222B ∫
0x2248 ≈
0x2260 ≠
0x2264 ≤
0x2265 ≥

Interestingly, even in such a conservative set of characters, there are a three characters included for semantic distinction: the minus sign (i.e. not a hyphen), the difference operator (i.e. not the Greek letter Delta), and the summation operator (i.e. not the Greek letter Sigma).

And in case you’re interested, here’s the complete list of the Unicode characters in the intersection of the fonts listed here. (Update: Added notes to indicate the start of a new code block and listed some of the isolated characters.)

0x0009           Basic Latin
0x000d
0x0020 - 0x007e 
0x00a0 - 0x017f  Latin-1 supplement
0x0192	         
0x01fa - 0x01ff
0x0218 - 0x0219  
0x02c6 - 0x02c7  
0x02c9
0x02d8 - 0x02dd 
0x0300 - 0x0301 
0x0384 - 0x038a  Greek and Coptic
0x038c
0x038e - 0x03a1
0x03a3 - 0x03ce
0x0401 - 0x040c 
0x040e - 0x044f  Cyrillic
0x0451 - 0x045c
0x045e - 0x045f
0x0490 - 0x0491
0x1e80 - 0x1e85  Latin extended additional
0x1ef2 - 0x1ef3
0x200c - 0x200f  General punctuation
0x2013 - 0x2015
0x2017 - 0x201e
0x2020 - 0x2022
0x2026
0x2028 - 0x202e
0x2030
0x2032 - 0x2033
0x2039 - 0x203a
0x203c
0x2044
0x206a - 0x206f  
0x207f           
0x20a3 - 0x20a4  Currency symbols ₣ ₤
0x20a7           ₧
0x20ac           €
0x2105           Letterlike symbols ℅
0x2116           №
0x2122           ™
0x2126           Ω
0x212e           ℮
0x215b - 0x215e  ⅛ ⅜ ⅝ ⅞
0x2202 	         Mathematical operators ∂
0x2206           ∆
0x220f           ∏
0x2211 - 0x2212  ∑ −
0x221a           √
0x221e           ∞
0x222b           ∫
0x2248           ≈
0x2260           ≠
0x2264 - 0x2265  ≤ ≥
0x25ca           Box drawing ◊
0xfb01 - 0xfb02  Alphabetic presentation forms fi fl