Unicode is supported everywhere, but font support for Unicode characters is sparse. When you use any slightly uncommon character, you have no guarantee someone else will be able to see it.
I’m starting a Twitter account @MusicTheoryTip and so I wanted to know whether I could count on followers seeing music symbols. I asked whether people could see ♭ (flat, U+266D), ♮ (natural, U+266E), and ♯ (sharp, U+266F). Most people could see all three symbols, from desktop or phone, browser or Twitter app. However, several were unable to see the natural sign from an Android phone, whether using a browser or a Twitter app. One person said none of the symbols show up on his Blackberry.
[Update: I gave @MusicTheoryTip over to someone else, and they didn’t keep it up for long.]
I also asked @diff_eq followers whether they could see the math symbols ∂ (partial, U+2202), Δ (Delta, U+0394), and ∇ (gradient, U+2207). One person said he couldn’t see the gradient symbol, but the rest of the feedback was positive.
So what characters can you count on nearly everyone being able to see? To answer this question, I looked at the characters in the intersection of several common fonts: Verdana, Georgia, Times New Roman, Arial, Courier New, and Droid Sans. My thought was that this would make a very conservative set of characters.
There are 585 characters supported by all the fonts listed above. Most of the characters with code points up to U+01FF are included. This range includes the code blocks for Basic Latin, Latin-1 Supplement, Latin Extended-A, and some of Latin Extended-B.
The rest of the characters in the intersection are Greek and Cyrillic letters and a few scattered symbols. Flat, natural, sharp, and gradient didn’t make the cut.
There are a dozen math symbols included:
0x2202 ∂ 0x2206 ∆ 0x220F ∏ 0x2211 ∑ 0x2212 − 0x221A √ 0x221E ∞ 0x222B ∫ 0x2248 ≈ 0x2260 ≠ 0x2264 ≤ 0x2265 ≥
Interestingly, even in such a conservative set of characters, there are a three characters included for semantic distinction: the minus sign (i.e. not a hyphen), the difference operator (i.e. not the Greek letter Delta), and the summation operator (i.e. not the Greek letter Sigma).
And in case you’re interested, here’s the complete list of the Unicode characters in the intersection of the fonts listed here. (Update: Added notes to indicate the start of a new code block and listed some of the isolated characters.)
0x0009 Basic Latin 0x000d 0x0020 - 0x007e 0x00a0 - 0x017f Latin-1 supplement 0x0192 0x01fa - 0x01ff 0x0218 - 0x0219 0x02c6 - 0x02c7 0x02c9 0x02d8 - 0x02dd 0x0300 - 0x0301 0x0384 - 0x038a Greek and Coptic 0x038c 0x038e - 0x03a1 0x03a3 - 0x03ce 0x0401 - 0x040c 0x040e - 0x044f Cyrillic 0x0451 - 0x045c 0x045e - 0x045f 0x0490 - 0x0491 0x1e80 - 0x1e85 Latin extended additional 0x1ef2 - 0x1ef3 0x200c - 0x200f General punctuation 0x2013 - 0x2015 0x2017 - 0x201e 0x2020 - 0x2022 0x2026 0x2028 - 0x202e 0x2030 0x2032 - 0x2033 0x2039 - 0x203a 0x203c 0x2044 0x206a - 0x206f 0x207f 0x20a3 - 0x20a4 Currency symbols ₣ ₤ 0x20a7 ₧ 0x20ac € 0x2105 Letterlike symbols ℅ 0x2116 № 0x2122 ™ 0x2126 Ω 0x212e ℮ 0x215b - 0x215e ⅛ ⅜ ⅝ ⅞ 0x2202 Mathematical operators ∂ 0x2206 ∆ 0x220f ∏ 0x2211 - 0x2212 ∑ − 0x221a √ 0x221e ∞ 0x222b ∫ 0x2248 ≈ 0x2260 ≠ 0x2264 - 0x2265 ≤ ≥ 0x25ca Box drawing ◊ 0xfb01 - 0xfb02 Alphabetic presentation forms ﬁ ﬂ
31 thoughts on “Which Unicode characters can you depend on?”
Interesting that we are missing set theory (union, intersection). I would think that they are pretty basic.
Pretty basic as in defining everything else I guess?
I’m on a PC. Your blog was fine in IE and Firefox, but the 2nd and 3rd paragraphs were a mess in Google Chrome, which is currently my preferred browser.
I’m missing flat and gradient above, but _only_ on Chrome. Firefox or IE on the same (Windows 7) computer renders them all.
Obviously the same fonts are installed, so I don’t know why they are processed differently in Chrome. Maybe Firefox and IE are doing a per-character font fallback if they are missing a code point.
I can see everything and using chrome on linux (27.0.1453.47).
Some browser+os combinations will automatically substitute a glyph from an alternate font if it’s absent from the target font. Chrome apparently does not. You could probably solve the missing character problem for many Chrome+Windows users by adding Arial Unicode MS to the end of your CSS font-family list. Many Windows users have this one installed already.
I’m in the same position as Mark.
I can see neutral and sharp, but not flat. Likewise, I can see partial and delta, but not gradient.
I am on Chrome on Windows 7.
If Chrome can’t find a glyph and does a substitution, why doesn’t it just display a placeholder for that character?
On my PC, I see the flat sign, but the following parenthesis has a couple horizontal bars. Ditto with the gradient. Otherwise the page looks OK in Chrome for me.
But others see a mess using Chrome, such as this screen shot.
I too cannot see the flat and gradient symbols with Chrome. The problems is more subtle, however; when highlighting those symbols and right clicking, the symbol is correctly displayed in the pop-up window under “Search Google for ‘∇'”. Also just now when I pasted this string into the comment box, the gradient symbol displayed correctly.
Jack: I added Arial Unicode MS to my style sheet per your suggestion. I added Gnu Unifont too for good measure. Now the page looks fine on Chrome for me.
It also looks good in my feed reader. Unfortunately, I don’t know if that’s the page now, or as it was originally published.
John, that screen shot is exactly what I was seeing in Chrome, but now it looks much better. Only the flat and gradient symbols are missing now, and nothing is overlapping—just boxes where those two symbols should be.
Saw all but the natural on an Android tablet (nexus 7). Using chrome the natural was just missing; using Firefox it was a gray box.
You can input those characters in vim easily
I made a list of those I personally like to use in my .vimrc
https://github.com/hydroo/config-files/blob/master/.vimrc (scroll down)
Using those characters is very nice when writing emails about mathy things, or documentation, variable names in a unicode aware programming language like go.
It would be nice if you printed the first ten or twenty characters next to each hexadecimal range. Then you get an idea what’s supported. The hexadecimal ranges don’t say anything to me.
Can’t see flat, U+266D and gradient, U+2207 on Win 7 and Chrome.
Add “product operator” (i.e. not the Greek letter Pi) to the list of redundant symbols for semantic reasons.
Using Chrome on a Mac, I saw everything fine.
I know that android can’t display x-bar x̄, which is frustrating when you’re trying to write a statistical calculator.
It would be nice if every computer contained just one font with a complete set of Unicode symbols, and had the smarts to fail over to that font whenever necessary.
Ed: Agreed. I don’t know why more fonts don’t support at least the first 2^16 characters, the “basic multilingual plane.” It’s not that big: 2^16 = 65,536.
On Windows, Arial Unicode MS supports most of the BMP. Gnu Unifont does the same on Linux, though it’s ugly. I would assume these fonts are available for various operating systems. I’ve installed Gnu Unifont on Windows.
You would think that browsers could ship with one of these fonts and failover to them instead displaying a missing character box.
I used Deja Vu for a recent project (via web fonts). It rendered a ton of mathematical symbols on every device I tested — previously there were quite a few missing on some platforms.
John, minus and hyphen are rendered differently in a good font: compare ‘-−+-−’. Minus is longer, and it is positioned by height in the middle of numbers, not lower-case letters.
Sure. But in a pinch, you could use a hyphen for a minus.
All the symbols in your article display correctly on my Windows Phone 8
I can only see partial and delta. The rest are just blocks. I’m still running Win XP so the lack of Unicode support isn’t surprising. The results are the same in IE and Chrome.
Roger: Windows XP uses Unicode internally, but it didn’t ship with good Unicode fonts.
Arial Unicode MS doesn’t ship with Windows, but it ships with many Microsoft products, such as Office. If you have Office installed, I imagine your version of Office is as old as your version of Windows, and that’s why you don’t have Arial Unicode MS. If you’d like, you could try installing Gnu Unifont. It’s free. It’s ugly, but it’s a fallback for missing symbols.
Chrome on WinXP is displaying everything quite well for me. No messes, no boxes indicating missing characters anywhere.
I do have a fair number of fonts installed, and some graphic design programs (GIMP, Fireworks, Illustrator).
Does anyone know of a good free Unicode font that has most of the modern characters.
I really don’t care about ancient runes or a modern alphabet that is only used by a few thousand people, but I would like to go to Wikipedia and not have every 5th page or so have text I can’t see.
Steve: Arial Unicode MS is nearly complete and looks nice. It’s expensive to buy stand-alone, but comes with many programs. Gnu Unifont is complete, free, and ugly.
@Daniel Lemire Actualy union (0x222a ∪) and intersection (0x2229 ∩) are in character map. You can view it with some character map program (In Windows Character Map use Advanced View – Group by: Unicode Subrange – Mathematical Operators). I used free font called unifont to see them. Here are many other symbols like: ⊂⊃⊄⊅∀∃∝∞∥∦
I even found with program BabelMap domino and Mahjong tiles.
Just pointing out that all the characters in the blog post render fine. Ubuntu 12.04.2, xxxterm 1.10.0 (based on webkit.)
I wish the characters used by SymPy to do Unicode pretty printing had better support. Not just for including the character, but for rendering them correctly. For example,
>>> Integral(f(x), x)
⎮ f(x) dx
That character ⎮ in the middle of the integral is INTEGRAL EXTENSION (U+23AE), which is supposed to line up with and connect to the ⌠ (TOP HALF INTEGRAL, U+2320) and ⌡ (BOTTOM HALF INTEGRAL, U+2321) perfectly. But of all the monospace fonts, only DejaVu Sans Mono renders it correctly, and this is only because a SymPy developer implemented it several years ago.
It also makes extensive use of the box drawing characters (like ╲, BOX DRAWINGS LIGHT DIAGONAL UPPER LEFT TO LOWER RIGHT in the square root printing), which to work correctly, need to connect completely to the corner of the box, and with the other box drawing characters)