Braille, Unicode, and Binary

Braille characters live in a 4×2 matrix. This means there are eight positions where the surface is either flat or raised. You can naturally denote a Braille character by an 8-bit binary number: the bit for a single position is either 0 for flat and 1 for raised.

This is how Braille characters are encoded in Unicode. Braille characters are U+2800 through U+28FF, 2800 plus the binary number corresponding to the pattern of dots. However, there’s one surprise: the dots are numbered irregularly as indicated below:

1 4
2 5
3 6
7 8

Historically Braille had six cells, a 3×2 matrix, and the numbering made more sense: consecutive numbers, by column, left to right, the way Fortran stores matrices:

1 4
2 5
3 6

But when Braille was extended to a 4×2 matrix, the new positions were labeled 7 and 8 so as not to rename the previous positions.

The numbered positions above correspond to the last eight bits of the Unicode character, from right to left. That is, position 1 determines the least significant bit and position 8 determines the 8th bit from the end.

For example, here is Unicode character U+288A:

The dots that are filled in correspond to positions 2, 4, and 8, so the last eight bits of the Unicode value are 10001010. The hexadecimal form of 10001010 is 8A, and the Unicode character is U+288A.

5 thoughts on “Braille, Unicode, and Binary”

1. Joseph

Seems like they planned for horizontal more than vertical extensibility. Any idea why they didn’t add a third column instead, and have a spare unused bit for more options?

2. I don’t know, but one idea is that a human finger prints are more rectangular than square. Maybe a Braille pattern is as wide as someone can read in one touch.

3. Joseph

If so, they should have gone row-major instead of column-major, no? Perhaps it’s also a lesson for planning in the inevitable extensions. ðŸ™‚

Regardless, it’s an interesting new set of facts of which I was previously unaware. Thanks!

4. 8-dot Braille is relatively recent and extremely rare (at least in American English). It’s mostly used for a variant called “computer Braille'”, and “Braille display” devices usually do feature 8-dot cells. However, virtually all printed (“embossed”) material and all ADA labels in elevators, etc. are 6-dot Braille.

Even though 6-dot Braille has only 64 combinations, it can encode larger code sets through the use of multi-code sequences. Besides plain English, there are 6-dot variants used to encode math notation, music and others.

So, they didn’t plan for either horizontal or vertical extensibility. Extensibility in Braille has always been achieved through sequences, rather than by “adding dots”. I actually find surprising that 8-dot Braille exists at all; addding dots is not a trivial matter since it takes a long time to train your fingers to recognize the patterns efficiently, and the more dots there are the harder it gets.

5. I’m amused by your phrase “the way Fortran stores matrices”, since my recollection of Fortran was that arrays were stored “backwards” in memory (lower indices in higher addresses), a legacy of IBM 704 hardware. Of course, I started out on machines that were BigEndian but LittleAddressian (for want of a better word). That is, lower addresses held higher significance, but variables were addressed by their LSB.

Also the 3/4 row Braille reminds me of IBM System3 cards, wherein after dissing multi-tiered cards with round holes and less-redundant codes for 50 years, they introduced their own. Originally designed with 4 tiers of 32 columns of 6-bit characters, it was tweaked to put 96 six-bit characters in the bottom three tiers, and the upper two bits of of each tier in a sub-tier of the top tier.

And I couldn’t help thinking “UTF-6” while reading Euro Micelli’s comment.