How to memorize Unicode codepoints

At the end of each month I write a newsletter highlighting the most popular posts of that month. When I looked back at my traffic stats to write this month’s newsletter I noticed that a post I wrote last year about how to memorize the ASCII table continues to be popular. This post is a follow up, how to memorize Unicode values.

Memorizing all 128 ASCII values is doable. Memorizing all Unicode values would be insurmountable. There are nearly 150,000 Unicode characters at the moment, and the list grows over time. But knowing a few Unicode characters is handy. I often need to insert a π symbol, for example, and so I made an effort to remember its Unicode value, U+03C0.

There are convenient ways of inserting common non-ASCII characters without knowing their Unicode values, but these offer a limited range of characters and they work differently in different environments. Inserting Unicode values gives you access to more characters in more environments.

As with ASCII, you can memorize the Unicode value of a symbol by associating an image with a number and associating that image with the symbol. The most common way to associate an image with a number is the Major system. As with everything else, the Major system becomes easier with practice.

However, Unicode presents a couple challenges. First, Unicode codepoints are nearly always written in hexadecimal, and so you’ll run into the letters A through F as well as digits. Second, Unicode codepoints are four hex digits long (or five outside the Basic Multilingual Plane.) We’ll address both of these difficulties shortly.

It may not seem worthwhile to go to the effort of encoding and decoding numbers like this, but it scales well. Brute force is fine for small amounts of data and short-term memory, but image association works much better for large amounts of data and long-term memory.

Unicode is organized into blocks of related characters. For example, U+22xx are math symbols and U+26xx are miscellaneous symbols. If you know what block a symbols is in, you only need to remember the last two hex digits.

You can convert a pair of hex digits to decimal by changing bases. For example, you could convert the C0 in U+03C0 to 192. But this is a moderately difficult mental calculation.

An easier approach would be to leave hex digits alone that correspond to decimal digits, reduce hex digits A through F mod 10, and tack on an extra digit to disambiguate. Stick on a 0, 1, 2, or 3 according to whether no digits, the first digit, the second digit, or both digits had been reduced mod 10. See this page for details. With this system, C0 becomes 201. You could encode 201 as “nest” using the Major system, and imagine a π sitting in a nest, maybe something like the 3Blue1Brown plushie.

3Blue1Brown plushieFor another example, ♕ (U+2655), is the symbol for the white queen in chess. You might think of the White Queen from The Lion, the Witch, and the Wardrobe [2] and associate her with the hex number 0x55. If you convert 0x55 to decimal, you get 85, which you could associate with the Eiffel Tower using the Major system. So maybe imagine the White Queen driving her sleigh under the Eiffel Tower. If you convert 0x55 to 550 as suggested here, you might imagine her driving through a field of lilies.

Often Unicode characters are arranged consecutively in a logical sequence so you can compute the value of the rest of the sequence from knowing the value of the first element. Alphabets are arranged in alphabetical order (mostly [1]), symbols for Roman numerals are arranged in numerical order, symbols for chess pieces are arrange in an order that would make sense to chess players, etc.

[1] There are a few exceptions such as Cyrillic Ё and a gap in Greek capital letters.

[2] She’s not really a queen, but she thinks of herself as a queen. See the book for details.

Redoing images in Midjourney

My son in law was playing around with Midjourney v5 and I asked him to try to redo some of the images I’ve made with DALL-E 2.

Back in August i wrote a post about using DALL-E 2 to generate mnemonic images for memorizing the US presidents using the Major mnemonic system.

To memorize that Rutherford B. Hayes was the 19th president. you might visualize Hayes playing a tuba because you can encode 19 as tuba. The image I created last year was cartoonish and showed Hayes playing something more like a sousaphone than a tuba. Midjourney created a photorealistic image of Hayes playing some weird instrument which is something like a tuba.

Rutherford B. Hayes playing something like a ruba

Franklin Delano Roosevelt was the 32nd president. If we use an image of the moon as the peg for 32, we could imagine FDR looking up at the moon. The previous image of FDR was really creepy and looked nothing like him. The image Midjourney created with FDR was really good.

FDR looking up at the moon

Midjourney choked on the request to create a create an image of Grover Cleveland holding an onion and a wiener dog, just as DALL-E had. It didn’t do any better substituting Grover the Muppet for Grover Cleveland.

A few weeks ago I wrote a blog post about a thought experiment involving an alien astronomer with 12 fingers working in base 12. I could only get DALL-E to draw a hint of an extra finger. We weren’t able to get Midjourney to put 12 fingers on a hand either, but we did get an interesting image of an alien astronomer.

Alien astronomer

Square root mnemonics

Here’s a cute little poem:

I wish I knew
The root of two.

O charmed was he
To know root three.

So we now strive
To find root five.

The beginning of each stanza is a mnemonic for the number mentioned in the following line.

√ 2 = 1.414
√ 3 = 1.732
√ 5 = 2.236

I found this in Twenty Years Before the Blackboard by Michael Stueben. Steuben sites the sources as Dictionary of Mnemonics, London 1972. He doesn’t give any other information such as author or editor.

Update: Additional verse from the comments.

It rhymes with heaven
The root of seven.

Update: Here’s Python code to validate the mnemonics above.

    from math import sqrt

    def validate(line, num):
        digits = [str(len(w)) for w in line.split()]
        x = int("".join(digits)) / 1000
        assert(x == round(sqrt(num),3))

    validate("I wish I knew", 2)
    validate("O charmed was he", 3)
    validate("So we now strive", 5)
    validate("It rhymes with heaven", 7)

Related post: Numbers worth memorizing.

Chimera and sine of 60°

I was playing around with DALL-E last night. I pasted the definition of chimera into DALL-E and the results were bizarre. See this Twitter thread for images.

I also played around with some more mnemonic images. Students often memorize the sines and cosines of 30°, 45°, and 60°. I thought about making a mnemonic image for sin(60°) = √3 / 2, using the decimal value 0.8660 rather than the exact value.

Using the Major system, you can encode 60 as chess or shoes, and 86 as fish. So I thought of a chess board, with a fish wearing shoes standing on it. (Absurd images are easier to remember, and DALL-E excels at making absurd images, though it’s difficult to get it to generate a particular image you may have in mind.)

I was able to generate images of a fish wearing shoes, but when I asked for the fish to be standing on a chess board, it wouldn’t put the shoes on the fish. Here’s the best image I got:

A fish on a chessboard with someone standing behind it

DALL-E images tend to be very large. The original version of the image above was 2 MB. I used Squoosh to reduce the size of the image 98% to produce the image above.

You could make the image much smaller still. I didn’t crop the image because I wanted to show what DALL-E generated, but you could crop it to cut off some of the bottom. You could also edit the image to eliminate the marbling in the squares; solid colored squares would compress much better.

Memorizing Planck’s constant with DALL-E

Planck’s constant used to be a measured quantity and now it is exact by definition.

h = 6.62607015×10−34 J / Hz

Rather than the kilogram being implicit in the units used to measure Planck’s constant, the mass of a kilogram is now defined to be whatever it has to be to make Planck’s constant have the value above.

Now that it’s exact by definition, maybe you’d like to memorize it. Using the Major system described here we could encode the digits as “Judge enjoys quesadilla.” [1]

As with the previous post, I’m using a memorization exercise as an excuse to play around with DALL-E. I typed “A judge enjoying eating a quesadilla” into DALL-E 2 and got back four images, as always. The best of these was the following.

First attempt at judge enjoying a quesadilla

The food in the image looks like a quesadilla, but it’s not clear that the man eating it is a judge, or that he’s enjoying himself.

Next I changed “judge” to “a supreme court justice,” hoping DALL-E would create an image that more obviously features a judge.

Here’s one of the outputs:

Second attempt at judge enjoying a quesadilla

This fellow looks more like a judge, and he’s obviously enjoying himself. Maybe he’s eating a calzone, but we’ll call it a quesadilla.

Not all the images created by DALL-E are as accurate as the ones above. I suspect there’s a lot of selection bias in the examples of images posted online. I’m contributing to that selection bias by showing images that were good enough to include in a blog post. I tried other images for blogging on other topics, and the results were not worth sharing.

So in an attempt at mitigating selection bias, here’s another image generated from the prompt “A supreme court justice enjoying eating a quesadilla.”

Young lady who is not eating anything

This young lady is wearing black, as supreme court justices are wont to do. And she appears to be enjoying herself, but she’s definitely not eating a quesadilla.

Incidentally, another possible encoding of 662607015 is “Judge enjoys Costello” as in Abbot and Costello. When I typed “A supreme court justice enjoying watching Abbot and Costello on television” I got the following creepy image.

DALL-E attempt at a judge watching Abbot and Costello on television

Related posts

[1] This mnemonic is a little bit of a cheat, depending on how you pronounce quesadilla. The sound of ll is sorta like that of a y in English. Here I’m using it to represent 5 just as the l sound does. Here in southeast Texas, I believe most people use the Spanish pronunciation, at least approximately. If you completely anglicize the pronunciation so that ll is pronounced as in pillow, then you can use the mnemonic with no qualms.

DALL-E 2 and mnemonic images

I recently got an account for using OpenAI’s DALL-E 2 image generator. The example images I’ve seen are sorta surreal combinations of common words, and that made me think of the Major memory system.

I’ve written about the Major system before. For example, I give an overview here and I describe how to use it to memorize an ASCII table here. In a nutshell, there are consonant sounds associated with each digit. Choose constant sounds and add any vowel sounds you like to make words you can visualize.

There are a couple ways people use the Major system. One is simply to memorize numbers. Any encoding that leads to something you find easy to remember is OK. For example, suppose you want to encode 19. The consonant sounds for 1 are tth, and d, and the consonant sounds for 9 are p and b. So you could encode 19 as adobe, Debbie, Ethiopia, tuba, etc.

The other way people use the Major system is to memorize specific pegs for numbers. For example, you might choose tuba as your peg for 19. To memorize a list, you associate each item with its peg, such as associating the 19th item with a tuba. Pegs have to be unique so you can pull up a particular mental image to call a list item, such as remembering what you associated with a tuba.

For example, suppose you wanted to memorize a list of the US presidents. The 19th president was Rutherford B. Hayes, and so you might want to imagine him playing a tuba. I uploaded a photo of Hayes and asked DALL-E to make him play a tuba. The software rejected my request, saying that realistic photos of persons are not allowed at this time.

Rutherford B. Hayes playing a tuba

DALL-E knows about some people but not others. For example, it doesn’t know who Evariste Galois is, but apparently it has some idea who Rutherford B. Hayes is. When I asked for “Rutherford B. Hayes playing a tuba” it came back with the image above.

Franklin Delano Roosevelt was the 32nd president. The consonant sound for 3 is m and the consonant sound for 2 is n. Suppose your peg for 32 is moon, and you’d like to imagine FDR looking up at the moon. When I asked DALL-E to make an image of this, I got a very strange image of FDR, but he was looking up at the moon.

FDR looking up at the moon

The only US president to serve two non-consecutive terms was Grover Cleveland, the 22nd and 24th president. I asked DALL-E for an image of Grover the Muppet holding an onion (22) in one hand and a wiener dog (24) in the other [1]. The result was not great.

Blue dog holding cucumber dog?

I thought Grover the Muppet would be more memorable than Grover Cleveland himself. But DALL-E did better with Mr. Cleveland. Maybe there’s some copyright issue with the muppets?

Grover Cleveland with a onion-banana and a dog

Well, he does have an onion, with something weird underneath. Bananas? Eggplant? Cow udders? And he has a dog, though not a wiener dog.

Creating your own mental images is far more efficient than having DALL-E come up with images for you, but the DALL-E images are useful examples of what you might imagine for yourself.

Related posts

[1] The Major system doesn’t use w and so you can throw it in as you would a vowel. So wiener decodes as n and r, 24.

How to memorize the ASCII table

Before discussing how you could memorize a table of ASCII characters and numeric values, I should say a little about why you might do so.

One reason is simply for the challenge. It’s more doable than it may sound.

It’s also useful information, though it’s debatable whether it’s worth memorizing. YMMV. There was a time early in my career when I spent a lot of time staring at a hex editor, and it definitely would have been useful to know then. I still need to look up ASCII values occasionally, though not as often.

ASCII table at the command line: ascii -d

One step in the process of memorizing the ASCII table is to associate an image with each ASCII character. This is useful by itself without learning the ASCII values of the characters. You could memorize strong passwords, for example, by linking together the images associated with each character.

I’ll only consider printable characters in this post.

ASCII landmarks

It’s very handy to know that the ASCII code for the number 0 is 48 and that the digits codes are in order. So the ASCII code for the digit d is d+48.

Wouldn’t it have been easier if the digits started at a location ending in zero? They do, in hexadecimal: digits start at 0x30.

Similarly, capital letters start at 65 (or 0x41). So the nth capital letter starts at n+64 (or n+0x40).

Lower case letters start at 97 (0x61), and so the nth lower case letter starts at n+96 (or n+0x60).

Incidentally, you’ll sometimes see software that sorts capital letters before lower case letters. The software is probably simply sorting by ASCII value.

In short, if you know the ASCII values of 0, A, and a then you can calculate the values of all digits and letters.

Memory pegs

You can create images for each ASCII character based on three things you may already know, and which are worth learning if you don’t.

Letters

One is the NATO phonetic alphabet: Alpha, Bravo, Charlie, Delta, …. If you know the NATO alphabet then you already have an image for each letter.

If you want to memorize a strong password, something like u\#mC_cNJ$o, then you need to distinguish upper and lower case letters. One way to do this would be to memorize two variations on each NATO word, one large and one small. For example, you might split the word Charlie into Charlie Chaplin for capital C and Charlie Brown for lower case c. For another, you might split golf into golf cart for G and golf ball for g.

Digits

The most common way to memorize digits is to encode them as words using the Major system. Zero is associated with the S sound, one is associated with the T sound, two is associated with the N sound, etc. See the preceding link for details.

You should come up with your own pegs that work for you, but here’s an example where each peg is a spice.

  1. cilantro
  2. turmeric
  3. nutmeg
  4. mustard
  5. rosemary
  6. lemon
  7. chili powder
  8. cayenne pepper
  9. vanilla
  10. peppermint.

Your pegs don’t need to be thematically related but I thought it would be fun to create a list based on spices. Maybe the association with various tastes would make your mnemonics more memorable.

Symbols

As with letters and digits, we can appeal to prior art to come up with images. Hacker slang associates a word with each symbol: bang for !, rabbit ears for “, splat for *, etc.

You might start with hacker slang and modify it to suit your taste.

Code values

Using the landmarks mentioned above, you could calculate the ASCII codes for any letter or digit. But if you want to associate letters and digits with their ASCII values more quickly, you need to directly associate each symbol directly with its value. Even if you don’t do this with letters and digits, you have to do it with other symbols because there’s no way to calculate the ASCII value of * or %, for example.

The most common technique for memorizing a numbered list is to memorize a set of pegs for each digit, then associate each list item with its number.

Creating and memorizing pegs for the numbers 1 through 128 is a lot of work, but it’s reusable. If you go to the effort, you can use it for memorizing a lot more than the ASCII table. For example, you could use it to memorize chemical elements.

If your peg for 42 is rain, you can imagine water balloons raining down and splattering on your driveway. That links the number 42 with the symbol * whose hacker slang is splat.

For another example, 94 is the ASCII value of caret (^), sometimes called carrot in hacker slang. You could a bear (94) standing up and eating a carrot like Bugs Bunny.

Update: What if you’d like to memorize Unicode?

More posts on mnemonics

Voiced and unvoiced consonants and digits

The latest episode of The History of English Podcast discusses the history of pronunciation changes in the Elizabethan period. The episode has a lot to say about the connections between voiced and unvoiced pairs of consonants, and the circumstances under which a consonant might change from voiced to unvoiced and vice versa.

The major mnemonic system encodes digits as consonant sounds in order to make words out of numbers. The system works in terms of sounds, not spellings, and so some of the symbols below are not English letters but rather IPA symbols [1]. More on the system here.

0: S Z, 1: T D ð θ, 2: N ŋ, 3: M, 4: R, 5: L, 6: ʤ ʧ ʃ ʒ, 7: K G, 8: F V, 9: P B

If you’re not familiar with the concept of voiced and unvoiced vocal sounds it may seem arbitrary that, for example, the F and V sounds both decode to 8, or that the S and SH sounds map to different numbers, 0 and 6 respectively.

The allocation of sounds may seem inefficient at first.Some numbers get more sounds than others because some sounds belong to clusters of related sounds and some do not. For example, there’s no such thing as an unvoiced L sound, so 5 gets L and no other sound. But 8 gets P and B because these are unvoiced and voiced variations of the same sound.

The allocation is more uniform than it seems at first when you count consonant groups rather than individual consonant sounds.

Related

[1] Here are the IPA symbols above that do not correspond to English letters.

|-----+---------|
| IPA | Example |
|-----+---------|
| ð   | THis    |
| θ   | THistle |
| ŋ   | siNG    |
| ʤ   | Jar     |
| ʧ   | CHurCH  |
| ʃ   | SHoe    |
| ʒ   | corsaGe |
|-----+---------|

For more on IPA, see the Wikipedia IPA help page.

100 digits worth memorizing

I was thinking today about how people memorize many digits of π, and how it would be much more practical to memorize a moderate amount of numbers to low precision.

So suppose instead of memorizing 100 digits of π, you memorized 100 digits of other numbers. What might those numbers be? I decided to take a shot at it. I exclude things that are common knowledge, like multiplication tables up to 12 and familiar constants like the number of hours in a day.

There’s some ambiguity over what constitutes a digit. For example, if you say the speed of light is 300,000 km/s, is that one digit? Five digits? My first thought was to count it as one digit, but then what about Avagadro’s number 6×1023? I decided to write numbers in scientific notation, so the speed of light is 2 digits (3e5 km/s) and Avagadro’s number is 3 digits (6e23).

Here are 40 numbers worth remembering, with a total of 100 digits.

Powers of 2

23 = 8
24 = 16
25 = 32
26 = 64
27 = 128
28 = 256
29 = 512
210 = 1024

Squares

13² = 169
14² = 196
15² = 225

Probability

P(|Z| < 1) ≈ 0.68
P(|Z| < 2) ≈ 0.95

Here Z is a standard normal random variable.

Music

Middle C ≈ 262 Hz
27/12 ≈ 1.5

The second fact says that seven half steps equals one (Pythagorean) fifth.

Mathematical constants

π ≈ 3.14
√2 ≈ 1.414
1/√2 ≈ 0.707
φ ≈ 1.618
loge 10 ≈ 2.3
log10 e ≈ 0.4343
γ ≈ 0.577
e ≈ 2.718

Here φ is the golden ratio and γ is the Euler-Mascheroni constant.

I included √2 and 1/√2 because both come up so often.

Similarly, loge 10 and log10 e are reciprocals, but it’s convenient to know both.

The number of significant figures above is intentionally inconsistent. It’s at least as easy, and maybe easier, to remember √2 as 1.414 than as 1.41. Similarly, if you’re going to memorize that log10 e  is 0.43, you might as well memorize 0.4343. Buy two get two free.

Each constant is truncated before a digit less than 5, so all the figures are correct and correctly rounded. For φ and loge 10 the next digit is a zero, so you get an implicit extra digit of precision.

The requirement that truncation = rounding means that you have to truncate e at either 2.7 or 2.718. If you’re going to memorize the latter, you could memorize six more digits with little extra effort since these digits are repetitive:

e = 2.7 1828 1828

Measurements and unit conversion

c = 3e8 m/s
g = 9.8 m/s²
NA = 6e23
Earth circumference = 4e7m
1 AU = 1.5e8 km
1 inch = 2.54 cm

Floating point

Maximum double = 1.8e308
Epsilon = 2e-16

These numbers could change from system to system, but they rarely do. See Anatomy of a floating point number.

Decibels

1 dB = 100.1 = 1.25
2 dB = 100.2 = 1.6
3 dB = 100.3 = 2
4 dB = 100.4 = 2.5
5 dB = 100.5 = 3.2
6 dB = 100.6 = 4
7 dB = 100.7 = 5
8 dB = 100.8 = 6.3
9 db = 100.9 = 8

These numbers are handy, even if you don’t work with decibels per se.

Update: The next post points out a remarkable pattern between the first and last sets of numbers in this post.

Related posts

Major memory system telephone keypad

This weekend I had to enter an alphabetic passcode on a numeric keypad. The keypad used the same letter-to-digit convention as a phone, but the letters were not printed on the keypad. That made me think about how much better the Major system is.

I wondered what phone keypads would look like if they used the Major memory system, and so I made the image below.

0: S Z, 1: T D ð θ, 2: N ŋ, 3: M, 4: R, 5: L, 6: ʤ ʧ ʃ ʒ, 7: K G, 8: F V, 9: P B

The Major memory system is a way of encoding numbers as words to make them easier to memorize. The system associates a consonant sound with each digit; you’re free to insert any vowels you like. For example, if you wanted to memorize 745, you might encode it as gorilla.

Note that gorilla decodes to 745 and not 7455 because the word has only one L sound, even though it is spelled with two Ls.

One nice feature of the Major system is that if you multiply a number by 10, you can pluralize its mnemonic. For example, a possible mnemonic for 7450 is gorillas.

The Major system emphasizes sounds because humans remember sounds more easily than symbols. If you have a photographic memory for symbols, just memorize the digits and don’t bother with any mnemonic system.

Some of the sounds associated with digits are not represented by a single letter in English and so the keypad above contains a few IPA (International Phonetic Alphabet) symbols. The number 1 is encoded by any of the sounds “t”, “d”, or “th.” The IPA symbol θ represents the th sound in think and the ð represents the th sound in this. The symbol ŋ as a possible encoding for 2 represents the ng sound in sing. And the number 6 can be encoded as one of several similar sounds: ch, sh, soft g, or soft z.

The conventional phone keypad looks simpler: 2 = A, B, or C, 3 = D, E, or F, etc. It’s the kind of thing James Scott would call “legible,” something that looks simple on paper and warms a bureaucrat’s heart, but doesn’t necessarily work well in practice. The sounds associated with the letters for a given digit have nothing in common, so a number can be represented by dissimilar sounds, and similar sounds do not represent the same number.

Encoding telephone numbers as words is rarely possible using the conventional keypad letters. Phone numbers that do correspond to memorable words are highly valued. Every letter has to correspond to a digit, and it matters how the word is spelled.

The Major system is much more flexible since you’re free to supply vowels as you wish, and you can choose from a wide variety of words that spell a single consonant sound in different ways.

Related posts