Double-struck capital letters

I’ve needed to use double-struck capital letters lately, also called blackboard bold. There are a few quirks in how they are represented in Unicode and in HTML entities, so I’m leaving some notes for myself here and for anyone else who might need to look this up.

Unicode

The double-struck capital letters are split into two blocks for historical reasons. The double-struck capital letters used most often in math — ℂ, ℍ, ℕ, ℙ, ℚ, ℝ, ℤ — are located in the U+21XX range, while the rest are in the U+1D5XX range.

Low characters

The letters in the U+21XX range were the first to be included in the Unicode standard. There’s no significance to the code points.

ℂ U+2102
ℍ U+210D 
ℕ U+2115
ℙ U+2119 
ℚ U+211A
ℝ U+211D
ℤ U+2124 

The names, however are consistent. The official name of ℂ is

DOUBLE-STRUCK CAPITAL C

and the rest are analogous.

High characters

The code point for double-struck capital A is U+1D538 and the rest are in alphabetical order: the nth letter of the English alphabet has code point

0x1D537 + n.

However, the codes corresponding to the letters in the low range are missing. For example, U+1D53A, the code that would logically be the code for ℂ, is unused so as not to duplicate the codes in the low range.

The official names of these characters are

MATHEMATICAL DOUBLE-STRUCK CAPITAL C

and so forth. Note the “MATHEMTICAL” prefix that the letters in the low range do not have.

Incidentally, X is the 24th letter, so the new logo for Twitter has code point U+1D54F.

HTML Entities

All double-struck letters have HTML shortcuts of the form *opf; where * is the letter, whether capital or lower case. For example, is 𝔸 and is 𝕒.

The letters with lower Unicode values also have semantic entities as well.

ℂ ℂ ℂ
ℍ ℍ ℍ
ℕ ℕ ℕ
ℙ ℙ ℙ
ℚ ℚ ℚ
ℝ ℝ ℝ
ℤ ℤ ℤ

LaTeX

The LaTeX command for any double-struck capital letter is \mathbb{}. This only applies to capital letters.

Python

Here’s Python code to illustrate the discussion of Unicode values above.

def doublestrike(ch):

    exceptions = {
        'C' : 0x2102,
        'H' : 0x210D,
        'N' : 0x2115,
        'P' : 0x2119,
        'Q' : 0x211A,
        'R' : 0x211D,
        'Z' : 0x2124
    }
    if ch in exceptions:
        codepoint = exceptions[ch]
    else:
        codepoint = 0x1D538 + ord(ch) - ord('A')
    print(chr(codepoint), f"U+{format(codepoint, 'X')}")

for n in range(ord('A'), ord('Z')+1):
    doublestrike(chr(n))

Ligatures for Logic

A ligature in typesetting is a way of presenting two (or more) consecutive characters differently the individual characters would be displayed. For example, “fi” is often rendered with the top of the ‘f’ dotting the ‘i’. Here’s an example from Computer Modern, the default font in LaTeX.

fi

Usually the difference is subtle—ordinarily readers are not consciously aware of them—but a ligature could look entirely different from its components. The previous post is an example of the latter: the two-letter abbreviation for a country is rendered as the flag of that country.

I’ve been playing around with Fira Code, a font with ligatures for programming. Fonts like this aim to do for programming what ordinary ligatures do for prose. For example, a programming font might include a ligature to render >= as ≥.

Programming fonts are obviously intended for use in programming, but I personally don’t like the idea of using ligatures in programming. They compromise the simplicity of plain text [1]. They’re supported in some environments but not in others, or they require some fiddly configuration before they’ll work, etc.

Still, I like the aesthetics of Fira Code, particularly in the way it handles logic symbols. Here are some examples comparing a common monospace font and Fira Code.

(a => b) <=> (¬a \/ b), {a} |= a \/ b, |= p → |- p

The image above is a screen shot of a document created in LibreOffice Writer. The ligatures didn’t work when I tried using them in Microsoft Word.

The Fira Code was designed as a monospace font, but has been extended to include proportional fonts. Fira Code with a proportional font might be useful in prose documents. You could insert a few symbols with a couple key strokes rather than searching for the symbol or entering Unicode.

However, it seems most of Fira Code’s ligatures are only available in monospaced versions of the font. If you use Fira Code in a prose document, you could switch from proportional font to monospace font just for an occasional symbol. It’s unclear whether that would be more or less work than other alternatives.

There’s one place where I believe Fira Code would be ideal: code examples inside a prose document. In that context you care about aesthetics and you want a monospaced font. Here again are some examples comparing Inconsolata and Fira Code.

if (a >= b /\ c != d) {…}

Related links

[1] If you use Fira Code font, your code doesn’t change a bit. You can have some aesthetic improvements along with the advantages of working in plain text. But it may not just work without some research and experimentation.

Preventing characters from displaying as emoji

I rarely intentionally use emoji, and yet I often run into them unbidden. This is because some Unicode characters double as emoji.

For example, the zodiac symbol for Aries is used both in celestial navigation and in astrology. The latter is much more common, and so when some software sees U+2648 it interprets the character as the emoji for the horoscope sign.

There is a way to prevent this: append the “variation selector” character U+FE0E after the symbol. This tells software that you want the preceding character to be interpreted as a character. And if you want to request that a character be displayed as an emoji, you can append U+FE0F. But a particular software package may or may not honor your request.

I tried this in several terminals to see what would happen. On Linux and Mac, the symbol for Aries (U+2648) prints as an emoji by default, and the symbol for a black pawn (U+265F) does not. But by when I add variation selectors to reverse the defaults the shells complied.

Here’s the same output as text:

    >>> print("\u2648")
    ♈
    >>> print("\u2648\ufe0e")
    ♈︎
    >>> print("\u265f")
    ♟
    >>> print("\u265f\ufe0f")
    ♟️

When I tested this on Windows I got different results in different terminals. The default cmd terminal was unable to display either Aries or the pawn. The ConEmu terminal displayed both as characters, even when I requested emoji. The new Windows Terminal app displayed both as emoji, even when I requested plain characters. This probably has something to do with the fonts the terminals use as well as the terminal software itself.

Update: View this page to see how your browser renders various characters as text or emoji. Also see this Twitter thread on how Twitter renders these characters.

Number slang and numbered lists

Here’s a list of five numbers used as slang in various contexts.

  1. Location (CB and police radio)
  2. End of column (journalism)
  3. Best wishes (ham radio)
  4. All aircraft in area (US Navy)
  5. I love you (text messages)

The motivation for this post was an article Those HTML attributes you never use. I wanted to make a note of how to change the numbering of an ordered list in HTML, and this post is that note.

Related posts

Look up HTML entity or Unicode

Fairly often I want to find out whether there is an HTML entity for a given Unicode character, or given an HTML entity I want to look up its Unicode value.

For example, ∃ has Unicode value U+2203. I might want to look up whether there is an HTML entity for this. There is, and it’s &exist;.

Another example is ○ (U+25CB). There is no HTML entity for this character.

I imagine a few other people need to do this occasionally, so I make a web page to make it easier.

The page is very similar to my page for converting back and forth between Unicode and LaTeX commands.

I’ve written a few other online tools. Here’s a full list.

Update: If you enter a single character, the page will look up the Unicode value of the input. For example, entering ∀ is the same as entering U+2200.

Fractions in Unicode

There are Unicode characters for a few fractions, such as ½. This looks a little better than 1/2, depending on the context.

Here’s the Taylor series for log(1 + x) written in pure HTML:

log(1 + x) = x – ½x² + ⅓x³ – ¼x⁴ + ⅕x⁵ – ⋯

See this post for how the exponents were made.

Notice that the three dots ⋯ on the end are centered vertically, like \cdots in LaTeX. This was done with &ctdot; (U+22EF).

Available fractions

The selection of available fraction number forms is small and a little strange.

There are characters for fractions with denominator d equal to 2, 3, 4, 5, 6, and 8, with numerators 1 through d-1, except for fractions that can be reduced.

If d = 7, 9, or 10, there’s a character for 1/d but not for fractions with numerators other than 1. For example, there is a character for ⅐ but not for 2/7.

HTML Entities

For denominators 2, 3, 4, 5, 6, and 8 the HTML entity for characters is easy: they all have the form

& frac <n> <d> ;

where n is the numerator and d is the denominator. For example, &frac35; is the HTML entity for ⅗.

There are no HTML entities for 1/7, 1/9, or 1/10.

Related posts

Number sets in HTML and Unicode

When I started blogging I was very cautious about what characters I used because browsers often didn’t have font support for uncommon characters. Things have changed since then and I’ve gotten less cautious. Nobody has complained, so I assume readers are seeing the characters I intend them to see.

There are Unicode characters for sets of numbers such as the integers and the real numbers, double-struck letters similar to the blackboard bold letters \mathbb{Z} etc. in LaTeX.

\mathbb{N} \quad \mathbb{Z} \quad \mathbb{Q} \quad \mathbb{R} \quad \mathbb{C} \quad \mathbb{H}

Here’s a table of the characters, their Unicode values, and two HTML entities associated with each.

    ℕ U+2115 &Nopf; &naturals;
    ℤ U+2124 &Zopf; &integers;
    ℚ U+211A &Qopf; &rationals;
    ℝ U+211D &Ropf; &reals;
    ℂ U+2102 &Copf; &complexes;
    ℍ U+210D &Hopf; &quaternions;

If you’re going to use these symbols, you will likely also need to use ∈ (U+2208, &in;) and ∉ (U+2209, &notin;).

More letters

If you want more letters in the style of those above, you can find them starting at U+1D538 for . However, the characters corresponding to letters above are reserved.

So for example, is U+1D538, is U+1D539, but U+1D53A is reserved and you must use ℂ (U+2102) instead.

One letter not mentioned above is ℙ (U+2119). It has HTML entities &Popf; and &primes;.

So the double-struck versions of C, H, N, P, Q, R, and Z are down in the BMP (Basic Multilingual Plane) and the rest are in the SMP (Supplementary Multilingual Plane). I suspect characters in the SMP are less likely to have font support, but that may not be a problem.

Unicode superscripts and subscripts

There are alternatives to using <sup> and <sub> tags for superscripts and subscripts in HTML. These alternatives may look better, depending on context, and they can be used in plain (Unicode) text files where HTML markup isn’t available.

Superscripts

When I started blogging I would use <sup>2</sup> and <sup>3</sup> for squares and cubes. Then somewhere along the way someone told me about &sup2; (U+00B2) and &sup3; (U+00B3) and I started using these. The superscript characters generally produce slightly smaller subscripts and look nicer in my opinion.

Example with sup tags:

a2 + b2 = c2

Example with superscript characters:

a² + b² = c²

But there are no characters for exponents larger than 3. Or so I thought until recently.

There are no HTML entities for exponents larger than 3, nothing written in notation analogous to &sup2; and &sup3;. There are Unicode characters for other superscripts up to 9, but they don’t have corresponding HTML entities.

The Unicode code point for superscript n is

2070hex + n

except for n = 2 or 3. For example, U+2075 is a superscript 5. So you could write x⁵ as

<em>x</em>&#x2075;.

Subscripts

There are also Unicode symbols for subscripts, though they don’t have corresponding HTML entities. The Unicode code point for superscript n = 0, 1, 2, … 9 is

2080hex + n

For example, U+2087 is a subscript 7.

The subscript characters don’t extend below the baseline as far as subscripts in <sub> tags do. Here are x‘s with subsubcripts in <sub> tags.

x0, x1, x2, x3, x4, x5, x6, x7, x8, x9

And here are the single character subscripts.

x₀, x₁, x₂, x₃, x₄, x₅, x₆, x₇, x₈, x

I think the former looks better, but subscripts in HTML paragraphs may increase the vertical spacing between lines. If consistent line spacing is more important than conventional subscripts, you might prefer the subscript characters.

Related posts

Unicode and Emoji, or The Giant Pawn Mystery

I generally despise emoji, but I reluctantly learned a few things about them this morning.

My latest couple blog posts involved chess, and I sent out a couple tweets using chess symbols. Along the way I ran into a mystery: sometimes the black pawn is much larger than other chess symbols. I first noticed this in Excel. Then I noticed that sometimes it happens in the Twitter app, and sometimes not, sometimes on the twitter website, and sometimes not.

For example, the following screen shot is from Safari on iOS.

screenshot of tweet with giant pawn

What’s going on? I explained in a footnote to this post, but I wanted to make this its own post to make it easier to find in the future.

In a nutshell, something in the software environment is deciding that 11 of the twelve chess characters are to be taken literally, but the character for the black pawn is to be interpreted as an emojus [1] representing chess. I’m not clear on whether this is happening in the font or in an app. Probably one, both, or neither depending on circumstances.

I erroneously thought that emoji were all outside Unicode’s BMP (Basic Multilingual Plane) so as not to be confused with ordinary characters. Alas, that is not true.

Here is a full list of Unicode characters interpreted (by …?) as emoji. There are 210 emoji characters in the BMP and 380 outside, i.e. 210 below FFFF and 380 above FFFF.

***

[1] I know that “emoji” is a Japanese word, not a Latin word, but to my ear the singular of “emoji” should be “emojus.”

Hebrew letters spotted in applied math

Math and physics use Greek letters constantly, but seldom do they use letters from any other alphabet.

The only Cyrillic letter I recall seeing in math is sha (Ш, U+0428) for the so-called Dirc comb distribution.

One Hebrew letter is commonly used in math, and that’s aleph (א, U+05D0). Aleph is used fairly often, but other Hebrew letters are much rarer. If you see any other Hebrew letter in math, it’s very likely to be one of the next three letters: beth (ב, U+05D1), gimel (ג, U+05D2), or dalet (ד, U+05D3).

To back up this claim, basic LaTeX only has a command for aleph (unsurprisingly, it’s \aleph). AMS-LaTeX adds the commands \beth, \gimel, and \daleth, but no more. Those are the only Hebrew letters you can use in LaTeX without importing a package or using XeTeX so you can use Unicode symbols.

Not only are Hebrew letters rare in math, the only area of math that uses them at all is set theory, where they are used to represent transfinite numbers.

So in short, if you see a Hebrew letter in math, it’s overwhelmingly likely to be in set theory, and it’s very likely to be aleph, or possibly beth, gimel, or dalet.

But today I was browsing through Morse and Feschbach and was very surprised to see the following on page 324.

gimel = lambda ayin + mu yod + mu yod star

I’ve never seen a Hebrew letter in applied math, and I’ve never seen ayin (ע, U+05E2) or yod (י, U+05D9) used anywhere in math.

In context, the authors had used Roman letters, Fraktur letters, and Greek letters and so they’d run out of alphabets. The entity denoted by gimel is related to a tensor the authors denoted with g, so presumably they used the Hebrew letter that sounds like “g”. But I have no idea why they chose ayin or yod.

Related posts