My previous post quoted Randall Munroe saying Unicode “started out just trying to unify a couple different character sets” and grew much more ambitious.

The first version of Unicode, published in 1991, had 7,191 characters. Now the latest version has 137,994 characters and so is about 19 times bigger. Here’s a plot of the number of characters in Unicode over time.

Number of Unicode characters over time

Here’s a slightly different plot where the horizontal axis is version number rather than time.

Number of Unicode characters by standard version number

There’s plenty of room left in Unicode. The maximum number of possible Unicode characters is 1,111,998 for reasons I get into here.

  1. The jump seemingly between 2000 and 2001 stands out immediately, and made me curious what it was. According to Wikipedia[1], this was between versions 3.0 and 3.1, when 42000 CJK ideographs were added. That brought the total number of characters from about 49K to a bit over 92K in one sub-version.


  2. Ha, I just saw that you address that jump in a lot of detail in the next post. Came across that one in my RSS reader just now.

