Fractions in Unicode

There are Unicode characters for a few fractions, such as ½. This looks a little better than 1/2, depending on the context.

Here’s the Taylor series for log(1 + x) written in pure HTML:

log(1 + x) = x – ½x² + ⅓x³ – ¼x⁴ + ⅕x⁵ – ⋯

See this post for how the exponents were made.

Notice that the three dots ⋯ on the end are centered vertically, like \cdots in LaTeX. This was done with ⋯ (U+22EF).

Available fractions

The selection of available fraction number forms is small and a little strange.

There are characters for fractions with denominator d equal to 2, 3, 4, 5, 6, and 8, with numerators 1 through d-1, except for fractions that can be reduced.

If d = 7, 9, or 10, there’s a character for 1/d but not for fractions with numerators other than 1. For example, there is a character for ⅐ but not for 2/7.

HTML Entities

For denominators 2, 3, 4, 5, 6, and 8 the HTML entity for characters is easy: they all have the form

& frac <n> <d> ;

where n is the numerator and d is the denominator. For example, &frac35; is the HTML entity for ⅗.

There are no HTML entities for 1/7, 1/9, or 1/10.

Related posts

Number sets in HTML and Unicode

When I started blogging I was very cautious about what characters I used because browsers often didn’t have font support for uncommon characters. Things have changed since then and I’ve gotten less cautious. Nobody has complained, so I assume readers are seeing the characters I intend them to see.

There are Unicode characters for sets of numbers such as the integers and the real numbers, double-struck letters similar to the blackboard bold letters \mathbb{Z} etc. in LaTeX.

\mathbb{N} \quad \mathbb{Z} \quad \mathbb{Q} \quad \mathbb{R} \quad \mathbb{C} \quad \mathbb{H}

Here’s a table of the characters, their Unicode values, and two HTML entities associated with each.

    ℕ U+2115 &Nopf; &naturals;
    ℤ U+2124 &Zopf; &integers;
    ℚ U+211A &Qopf; &rationals;
    ℝ U+211D &Ropf; &reals;
    ℂ U+2102 &Copf; &complexes;
    ℍ U+210D &Hopf; &quaternions;

If you’re going to use these symbols, you will likely also need to use ∈ (U+2208, &in;) and ∉ (U+2209, &notin;).

More letters

If you want more letters in the style of those above, you can find them starting at U+1D538 for . However, the characters corresponding to letters above are reserved.

So for example, is U+1D538, is U+1D539, but U+1D53A is reserved and you must use ℂ (U+2102) instead.

One letter not mentioned above is ℙ (U+2119). It has HTML entities &Popf; and &primes;.

So the double-struck versions of C, H, N, P, Q, R, and Z are down in the BMP (Basic Multilingual Plane) and the rest are in the SMP (Supplementary Multilingual Plane). I suspect characters in the SMP are less likely to have font support, but that may not be a problem.

Unicode superscripts and subscripts

There are alternatives to using <sup> and <sub> tags for superscripts and subscripts in HTML. These alternatives may look better, depending on context, and they can be used in plain (Unicode) text files where HTML markup isn’t available.

Superscripts

When I started blogging I would use <sup>2</sup> and <sup>3</sup> for squares and cubes. Then somewhere along the way someone told me about &sup2; (U+00B2) and &sup3; (U+00B3) and I started using these. The superscript characters generally produce slightly smaller subscripts and look nicer in my opinion.

Example with sup tags:

a2 + b2 = c2

Example with superscript characters:

a² + b² = c²

But there are no characters for exponents larger than 3. Or so I thought until recently.

There are no HTML entities for exponents larger than 3, nothing written in notation analogous to &sup2; and &sup3;. There are Unicode characters for other superscripts up to 9, but they don’t have corresponding HTML entities.

The Unicode code point for superscript n is

2070hex + n

except for n = 2 or 3. For example, U+2075 is a superscript 5. So you could write x⁵ as

<em>x</em>&#x2075;.

Subscripts

There are also Unicode symbols for subscripts, though they don’t have corresponding HTML entities. The Unicode code point for superscript n = 0, 1, 2, … 9 is

2080hex + n

For example, U+2087 is a subscript 7.

The subscript characters don’t extend below the baseline as far as subscripts in <sub> tags do. Here are x‘s with subsubcripts in <sub> tags.

x0, x1, x2, x3, x4, x5, x6, x7, x8, x9

And here are the single character subscripts.

x₀, x₁, x₂, x₃, x₄, x₅, x₆, x₇, x₈, x

I think the former looks better, but subscripts in HTML paragraphs may increase the vertical spacing between lines. If consistent line spacing is more important than conventional subscripts, you might prefer the subscript characters.

Related posts

Unicode and Emoji, or The Giant Pawn Mystery

I generally despise emoji, but I reluctantly learned a few things about them this morning.

My latest couple blog posts involved chess, and I sent out a couple tweets using chess symbols. Along the way I ran into a mystery: sometimes the black pawn is much larger than other chess symbols. I first noticed this in Excel. Then I noticed that sometimes it happens in the Twitter app, and sometimes not, sometimes on the twitter web site, and sometimes not.

For example, the following screen shot is from Safari on iOS.

screenshot of tweet with giant pawn

What’s going on? I explained in a footnote to this post, but I wanted to make this its own post to make it easier to find in the future.

In a nutshell, something in the software environment is deciding that 11 of the twelve chess characters are to be taken literally, but the character for the black pawn is to be interpreted as an emojus [1] representing chess. I’m not clear on whether this is happening in the font or in an app. Probably one, both, or neither depending on circumstances.

I erroneously thought that emoji were all outside Unicode’s BMP (Basic Multilingual Plane) so as not to be confused with ordinary characters. Alas, that is not true.

Here is a full list of Unicode characters interpreted (by …?) as emoji. There are 210 emoji characters in the BMP and 380 outside, i.e. 210 below FFFF and 380 above FFFF.

***

[1] I know that “emoji” is a Japanese word, not a Latin word, but to my ear the singular of “emoji” should be “emojus.”

Hebrew letters spotted in applied math

Math and physics use Greek letters constantly, but seldom do they use letters from any other alphabet.

The only Cyrillic letter I recall seeing in math is sha (Ш, U+0428) for the so-called Dirc comb distribution.

One Hebrew letter is commonly used in math, and that’s aleph (א, U+05D0). Aleph is used fairly often, but other Hebrew letters are much rarer. If you see any other Hebrew letter in math, it’s very likely to be one of the next three letters: beth (ב, U+05D1), gimel (ג, U+05D2), or dalet (ד, U+05D3).

To back up this claim, basic LaTeX only has a command for aleph (unsurprisingly, it’s \aleph). AMS-LaTeX adds the commands \beth, \gimel, and \daleth, but no more. Those are the only Hebrew letters you can use in LaTeX without importing a package or using XeTeX so you can use Unicode symbols.

Not only are Hebrew letters rare in math, the only area of math that uses them at all is set theory, where they are used to represent transfinite numbers.

So in short, if you see a Hebrew letter in math, it’s overwhelmingly likely to be in set theory, and it’s very likely to be aleph, or possibly beth, gimel, or dalet.

But today I was browsing through Morse and Feschbach and was very surprised to see the following on page 324.

gimel = lambda ayin + mu yod + mu yod star

I’ve never seen a Hebrew letter in applied math, and I’ve never seen ayin (ע, U+05E2) or yod (י, U+05D9) used anywhere in math.

In context, the authors had used Roman letters, Fraktur letters, and Greek letters and so they’d run out of alphabets. The entity denoted by gimel is related to a tensor the authors denoted with g, so presumably they used the Hebrew letter that sounds like “g”. But I have no idea why they chose ayin or yod.

Related posts

Zareba, asterism, and dinkus

Occasionally I will use a row of three asterisks to indicate the separation between the body of a blog post and the footnotes, as I did in my previous post.

***

This page quotes a typesetter who said she calls this a zareba. The OED defines a zareba as a kind of cattle pen but does not mention its use as a typographical term.

According to Wikipedia, a row of three asterisks is a particular kind of dinkus, punctuation used to signify a section break. My guess is that the word is related to dingus and dingbat but I haven’t been able to find an etymology. The OED does not have an entry for dinkus.

This article attests to both zareba and dinkus as defined above.

Apparently the plural of dinkus is dinkuses.

When the three stars are arranged in a triangle, as above, this is called an asterism. This term is less obscure than zareba or dinkus. It has an OED entry and Unicode code point (U+2042). In case your browser didn’t display the asterism above as a character, the following another one as an image.

It’s my impression that the zareba symbol is more common than the asterism symbol, but the word asterism is more common than the word zareba, at least in typographical usage.

There is no command for the asterism in The Comprehensive LaTeX Symbol List, but the document does explain how to create your own, citing the following solution by Peter Flynn.

    \newcommand{\asterism}{
        \smash{
            \raisebox{-.5ex}{
                \setlength{\tabcolsep}{-.5pt}
                \begin{tabular}{@{}cc@{}}
                    \multicolumn2c*\\[-2ex]*&*
                \end{tabular}
            }
        }
    }

If you are using XeTeX you can just insert the Unicode character U+2042 directly, but the command above will work with plain TeX or LaTeX.

Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.

\invamp

The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:

    \rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with

    \rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

\invamp

Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

Entering symbols in Emacs

Emacs has a relatively convenient way to add accents to letters or to insert a Unicode character if you know the code point for the value. See these notes.

But usually you don’t know the Unicode values of symbols. Then what do you do?

TeX commands

You enter symbols by typing their corresponding TeX commands by using

    M-x set-input-method RET tex

After doing that, you could, for example, enter π by typing \pi.

You’ll see the backslash as you type the command, but once you finish you’ll see the symbol instead [1].

HTML entities

You may know the HTML entity for a symbol and want to use that to enter characters in Emacs. Unfortunately, the following does NOT work.

    M-x set-input-method RET html

However, there is a slight variation on this that DOES work:

    M-x set-input-method RET sgml

Once you’ve set your input method to sgml, you could, for example, type &radic; to insert a √ symbol.

Why SGML rather than HTML?

HTML was created by simplifying SGML (Standard Generalized Markup Language). Emacs is older than HTML, and so maybe Emacs supported SGML before HTML was written.

There may be some useful SGML entities that are not in HTML, though I don’t know. I imagine these days hardly anyone knows anything about SGML beyond the subset that lives on in HTML and XML.

Changing input modes

If you want to move between your default input mode and TeX mode, you can use the command toggle-input-method. This is usually mapped to C-u C-\.

You can see a list of all available input methods with list-input-methods. Most of these are spoken languages, such as Arabic or Welsh, rather than technical input modes like TeX and SGML.

More Emacs posts

[1] I suppose there could be a problem if one command were a prefix of another. That is, if there were symbols \foo and \foobar and you intended to insert the latter, Emacs might think you’re done after you’ve typed the former. But I can’t think of a case where that would happen. TeX commands are nearly prefix codes. There are TeX commands like \tan and \tanh, but these don’t represent symbols per se. Emacs doesn’t need any help to insert the letters “tan” or “tanh” into a file.

Typesetting zodiac symbols in LaTeX

Typesetting zodiac symbols in LaTeX is admittedly an unusual thing to do. LaTeX is mostly used for scientific publication, and zodiac symbols are commonly associated with astrology. But occasionally zodiac symbols are used in more respectable contexts.

The wasysym package for LaTeX includes miscellaneous symbols, including zodiac symbols. Here are the symbols, their LaTeX commands, and their corresponding Unicode code points.

The only surprise here is that the command for Capricorn is based on the Latin form of the name: \capricornus.

Each zodiac sign is used to denote a 30° region of the sky. Since the Unicode symbols are consecutive, you can compute the code point of a symbol from the longitude angle θ in degrees:

9800 + \left\lfloor \frac{\theta}{30} \right\rfloor

Here 9800 is the decimal form of 0x2648, and the half brackets are the floor symbol, i.e. round down to the nearest integer.

Here’s the LaTeX code that produced the table.

\documentclass{article}
\usepackage{wasysym}
\begin{document}

\begin{table}
\begin{tabular}{lll}
\aries       & \verb|\aries       | & U+2648 \\
\taurus      & \verb|\taurus      | & U+2649 \\
\gemini      & \verb|\gemini      | & U+264A \\
\cancer      & \verb|\cancer      | & U+264B \\
\leo         & \verb|\leo         | & U+264C \\
\virgo       & \verb|\virgo       | & U+264D \\
\libra       & \verb|\libra       | & U+264E \\
\scorpio     & \verb|\scorpio     | & U+264F \\
\sagittarius & \verb|\sagittarius | & U+2650 \\
\capricornus & \verb|\capricornus | & U+2651 \\
\aquarius    & \verb|\aquarius    | & U+2652 \\
\pisces      & \verb|\pisces      | & U+2653 \\
\end{tabular}
\end{table}
\end{document}

By the way, you can use the Unicode values in HTML by replacing U+ with &#x and adding a semicolon on the end.

More LaTeX posts

Trademark symbol, LaTeX, and Unicode

Earlier this year I was a coauthor on a paper about the Cap Score™ test for male fertility from Androvia Life Sciences [1]. I just noticed today that when I added the publication to my CV, it caused some garbled text to appear in the PDF.

garbled LaTeX output

Here is the corresponding LaTeX source code.

LaTeX source

Fixing the LaTeX problem

There were two problems: the trademark symbol and the non-printing symbol denoted by a red underscore in the source file. The trademark was a non-ASCII character (Unicode U+2122) and the underscore represented a non-printing (U+00A0). At first I only noticed the trademark symbol, and I fixed it by including a LaTeX package to allow Unicode characters:

    \usepackage[utf8x]{inputenc}

An alternative fix, one that doesn’t require including a new package, would be to replace the trademark Unicode character with \texttrademark\. Note the trailing backslash. Without the backslash there would be no space after the trademark symbol. The problem with the unprintable character would remain, but the character could just be deleted.

Trademark and Unicode

I found out there are two Unicode code points render the trademark glyph, U+0099 and U+2122. The former is in the Latin 1 Supplement section and is officially a control character. The correct code point for the trademark symbol is the latter. Unicode files U+2122 under Letterlike Symbols and gives it the official name TRADE MARK SIGN.

Related posts

[1] Jay Schinfeld, Fady Sharara, Randy Morris, Gianpiero D. Palermo, Zev Rosenwaks, Eric Seaman, Steve Hirshberg, John Cook, Cristina Cardona, G. Charles Ostermeier, and Alexander J. Travis. Cap-Score™ Prospectively Predicts Probability of Pregnancy, Molecular Reproduction and Development. To appear.