Unicode and Emoji, or The Giant Pawn Mystery

I generally despise emoji, but I reluctantly learned a few things about them this morning.

My latest couple blog posts involved chess, and I sent out a couple tweets using chess symbols. Along the way I ran into a mystery: sometimes the black pawn is much larger than other chess symbols. I first noticed this in Excel. Then I noticed that sometimes it happens in the Twitter app, and sometimes not, sometimes on the twitter website, and sometimes not.

For example, the following screen shot is from Safari on iOS.

screenshot of tweet with giant pawn

What’s going on? I explained in a footnote to this post, but I wanted to make this its own post to make it easier to find in the future.

In a nutshell, something in the software environment is deciding that 11 of the twelve chess characters are to be taken literally, but the character for the black pawn is to be interpreted as an emojus [1] representing chess. I’m not clear on whether this is happening in the font or in an app. Probably one, both, or neither depending on circumstances.

I erroneously thought that emoji were all outside Unicode’s BMP (Basic Multilingual Plane) so as not to be confused with ordinary characters. Alas, that is not true.

Here is a full list of Unicode characters interpreted (by …?) as emoji. There are 210 emoji characters in the BMP and 380 outside, i.e. 210 below FFFF and 380 above FFFF.

***

[1] I know that “emoji” is a Japanese word, not a Latin word, but to my ear the singular of “emoji” should be “emojus.”

3 thoughts on “Unicode and Emoji, or The Giant Pawn Mystery

  1. In some situations, you can prevent a character from being rendered as an emojus by following it with the special codepoint U+FE0E, “VARIATION SELECTOR-15”. This character indicates that the previous codepoint should be rendered as text, not as an emoji. (This is only applicable to those codepoints, like the black pawn, that can go either way.) There’s also U+FE0F VARIATION SELECTOR-16, which explicitly requests emoji rendering.

    Of course, these only have the desired effect if they’re supported by your font and/or application and/or text rendering system. I’m not sure whether Twitter would strip out these characters, for example.

  2. The ones in the BMP are the ones that existed before the Unicode Consortium turned into the International Emoji Factory; they were added to the emoji list retroactively. They were included in Unicode because they were included in some popular pre-existing encoding, and Unicode wanted to be able to round-trip all such encodings. The fact that they *were* included was what opened the door to requests for later versions of Unicode to codify more and more of them.

  3. David J. Littleboy

    The moji part of emoji is 文字, the Japanese word for character/letter. Since Japanese does not have a syntactic distinctions for “singular” and “plural” (dizzyingly complex concepts in English (really!)), you are on your own here. (I like emojus, but that’s because I went to Boston Latin School, where they really do teach Latin. (But since you can’t muck with the root in Japanese, it may not sell in Japan.)) Interestingly, if you add in Wikipedia’s “emojis” as the plural form, you have a trinary system: singular/indeterminate/plural. Since I’ve only heard of singular/dual/plural syntactic systems in actually existent human languages, we may have invented a new linguistic phenomenon here.

Comments are closed.