Which Unicode characters can you depend on?

Unicode is supported everywhere, but font support for Unicode characters is sparse. When you use any slightly uncommon character, you have no guarantee someone else will be able to see it.

I’m starting a Twitter account @MusicTheoryTip and so I wanted to know whether I could count on followers seeing music symbols. I asked whether people could see ♭ (flat, U+266D), ♮ (natural, U+266E), and ♯ (sharp, U+266F). Most people could see all three symbols, from desktop or phone, browser or Twitter app. However, several were unable to see the natural sign from an Android phone, whether using a browser or a Twitter app. One person said none of the symbols show up on his Blackberry.

I also asked @diff_eq followers whether they could see the math symbols ∂ (partial, U+2202), Δ (Delta, U+0394), and ∇ (gradient, U+2207). One person said he couldn’t see the gradient symbol, but the rest of the feedback was positive.

So what characters can you count on nearly everyone being able to see? To answer this question, I looked at the characters in the intersection of several common fonts: Verdana, Georgia, Times New Roman, Arial, Courier New, and Droid Sans. My thought was that this would make a very conservative set of characters.

There are 585 characters supported by all the fonts listed above. Most of the characters with code points up to U+01FF are included. This range includes the code blocks for Basic Latin, Latin-1 Supplement, Latin Extended-A, and some of Latin Extended-B.

The rest of the characters in the intersection are Greek and Cyrillic letters and a few scattered symbols. Flat, natural, sharp, and gradient didn’t make the cut.

There are a dozen math symbols included:

0x2202 ∂
0x2206 ∆
0x220F ∏
0x2211 ∑
0x2212 −
0x221A √
0x221E ∞
0x222B ∫
0x2248 ≈
0x2260 ≠
0x2264 ≤
0x2265 ≥

Interestingly, even in such a conservative set of characters, there are a three characters included for semantic distinction: the minus sign (i.e. not a hyphen), the difference operator (i.e. not the Greek letter Delta), and the summation operator (i.e. not the Greek letter Sigma).

And in case you’re interested, here’s the complete list of the Unicode characters in the intersection of the fonts listed here. (Update: Added notes to indicate the start of a new code block and listed some of the isolated characters.)

0x0009           Basic Latin
0x0020 - 0x007e 
0x00a0 - 0x017f  Latin-1 supplement
0x01fa - 0x01ff
0x0218 - 0x0219  
0x02c6 - 0x02c7  
0x02d8 - 0x02dd 
0x0300 - 0x0301 
0x0384 - 0x038a  Greek and Coptic
0x038e - 0x03a1
0x03a3 - 0x03ce
0x0401 - 0x040c 
0x040e - 0x044f  Cyrillic
0x0451 - 0x045c
0x045e - 0x045f
0x0490 - 0x0491
0x1e80 - 0x1e85  Latin extended additional
0x1ef2 - 0x1ef3
0x200c - 0x200f  General punctuation
0x2013 - 0x2015
0x2017 - 0x201e
0x2020 - 0x2022
0x2028 - 0x202e
0x2032 - 0x2033
0x2039 - 0x203a
0x206a - 0x206f  
0x20a3 - 0x20a4  Currency symbols ₣ ₤
0x20a7           ₧
0x20ac           €
0x2105           Letterlike symbols ℅
0x2116           №
0x2122           ™
0x2126           Ω
0x212e           ℮
0x215b - 0x215e  ⅛ ⅜ ⅝ ⅞
0x2202 	         Mathematical operators ∂
0x2206           ∆
0x220f           ∏
0x2211 - 0x2212  ∑ −
0x221a           √
0x221e           ∞
0x222b           ∫
0x2248           ≈
0x2260           ≠
0x2264 - 0x2265  ≤ ≥
0x25ca           Box drawing ◊
0xfb01 - 0xfb02  Alphabetic presentation forms fi fl

34 thoughts on “Which Unicode characters can you depend on?

  1. I’m on a PC. Your blog was fine in IE and Firefox, but the 2nd and 3rd paragraphs were a mess in Google Chrome, which is currently my preferred browser.

  2. I’m missing flat and gradient above, but _only_ on Chrome. Firefox or IE on the same (Windows 7) computer renders them all.

    Obviously the same fonts are installed, so I don’t know why they are processed differently in Chrome. Maybe Firefox and IE are doing a per-character font fallback if they are missing a code point.

  3. Some browser+os combinations will automatically substitute a glyph from an alternate font if it’s absent from the target font. Chrome apparently does not. You could probably solve the missing character problem for many Chrome+Windows users by adding Arial Unicode MS to the end of your CSS font-family list. Many Windows users have this one installed already.

  4. I’m in the same position as Mark.

    I can see neutral and sharp, but not flat. Likewise, I can see partial and delta, but not gradient.

    I am on Chrome on Windows 7.

  5. If Chrome can’t find a glyph and does a substitution, why doesn’t it just display a placeholder for that character?

    On my PC, I see the flat sign, but the following parenthesis has a couple horizontal bars. Ditto with the gradient. Otherwise the page looks OK in Chrome for me.

    But others see a mess using Chrome, such as this screen shot.

  6. I too cannot see the flat and gradient symbols with Chrome. The problems is more subtle, however; when highlighting those symbols and right clicking, the symbol is correctly displayed in the pop-up window under “Search Google for ‘∇'”. Also just now when I pasted this string into the comment box, the gradient symbol displayed correctly.

  7. Jack: I added Arial Unicode MS to my style sheet per your suggestion. I added Gnu Unifont too for good measure. Now the page looks fine on Chrome for me.

  8. It also looks good in my feed reader. Unfortunately, I don’t know if that’s the page now, or as it was originally published.

  9. John, that screen shot is exactly what I was seeing in Chrome, but now it looks much better. Only the flat and gradient symbols are missing now, and nothing is overlapping—just boxes where those two symbols should be.

  10. Saw all but the natural on an Android tablet (nexus 7). Using chrome the natural was just missing; using Firefox it was a gray box.

  11. It would be nice if you printed the first ten or twenty characters next to each hexadecimal range. Then you get an idea what’s supported. The hexadecimal ranges don’t say anything to me.

  12. Add “product operator” (i.e. not the Greek letter Pi) to the list of redundant symbols for semantic reasons.

  13. Using Chrome on a Mac, I saw everything fine.

    I know that android can’t display x-bar x̄, which is frustrating when you’re trying to write a statistical calculator.

    It would be nice if every computer contained just one font with a complete set of Unicode symbols, and had the smarts to fail over to that font whenever necessary.

  14. Ed: Agreed. I don’t know why more fonts don’t support at least the first 2^16 characters, the “basic multilingual plane.” It’s not that big: 2^16 = 65,536.

    On Windows, Arial Unicode MS supports most of the BMP. Gnu Unifont does the same on Linux, though it’s ugly. I would assume these fonts are available for various operating systems. I’ve installed Gnu Unifont on Windows.

    You would think that browsers could ship with one of these fonts and failover to them instead displaying a missing character box.

  15. I used Deja Vu for a recent project (via web fonts). It rendered a ton of mathematical symbols on every device I tested — previously there were quite a few missing on some platforms.

  16. John, minus and hyphen are rendered differently in a good font: compare ‘-−+-−’. Minus is longer, and it is positioned by height in the middle of numbers, not lower-case letters.

  17. I can only see partial and delta. The rest are just blocks. I’m still running Win XP so the lack of Unicode support isn’t surprising. The results are the same in IE and Chrome.

  18. Roger: Windows XP uses Unicode internally, but it didn’t ship with good Unicode fonts.

    Arial Unicode MS doesn’t ship with Windows, but it ships with many Microsoft products, such as Office. If you have Office installed, I imagine your version of Office is as old as your version of Windows, and that’s why you don’t have Arial Unicode MS. If you’d like, you could try installing Gnu Unifont. It’s free. It’s ugly, but it’s a fallback for missing symbols.

  19. Chrome on WinXP is displaying everything quite well for me. No messes, no boxes indicating missing characters anywhere.

    I do have a fair number of fonts installed, and some graphic design programs (GIMP, Fireworks, Illustrator).

  20. Does anyone know of a good free Unicode font that has most of the modern characters.

    I really don’t care about ancient runes or a modern alphabet that is only used by a few thousand people, but I would like to go to Wikipedia and not have every 5th page or so have text I can’t see.

  21. Steve: Arial Unicode MS is nearly complete and looks nice. It’s expensive to buy stand-alone, but comes with many programs. Gnu Unifont is complete, free, and ugly.

  22. @Daniel Lemire Actualy union (0x222a ∪) and intersection (0x2229 ∩) are in character map. You can view it with some character map program (In Windows Character Map use Advanced View – Group by: Unicode Subrange – Mathematical Operators). I used free font called unifont to see them. Here are many other symbols like: ⊂⊃⊄⊅∀∃∝∞∥∦
    I even found with program BabelMap domino and Mahjong tiles.

  23. Just pointing out that all the characters in the blog post render fine. Ubuntu 12.04.2, xxxterm 1.10.0 (based on webkit.)

  24. I wish the characters used by SymPy to do Unicode pretty printing had better support. Not just for including the character, but for rendering them correctly. For example,

    >>> Integral(f(x), x)

    ⎮ f(x) dx

    That character ⎮ in the middle of the integral is INTEGRAL EXTENSION (U+23AE), which is supposed to line up with and connect to the ⌠ (TOP HALF INTEGRAL, U+2320) and ⌡ (BOTTOM HALF INTEGRAL, U+2321) perfectly. But of all the monospace fonts, only DejaVu Sans Mono renders it correctly, and this is only because a SymPy developer implemented it several years ago.

    It also makes extensive use of the box drawing characters (like ╲, BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT in the square root printing), which to work correctly, need to connect completely to the corner of the box, and with the other box drawing characters)

Leave a Reply

Your email address will not be published. Required fields are marked *