Memorizing four-digit numbers

The Major mnemonic system is a method of converting numbers to words that can be more easily memorized. The basics of the system can be written on an index card, but there are practical details that are seldom written down.

Presentations of the Major system can be misleading, intentionally or unintentionally, by implying that it is easy to find single words that encode numbers with three or four digits. Books and articles can unintentionally leave a wrong impression by being brief, but I can think of one book that I thought was intentionally misleading, opening with an example that was obviously reverse-engineered from its mnemonic. It was something like “Isn’t it easier to remember ‘constitution’ than 7201162?” Indeed it is, but I make up that example by starting with “constitution,” not starting with 7201162.

The Major system maps digits to consonant sounds. Spelling doesn’t matter, only pronunciation, and you can insert any vowels (or semivowels) you like. I list the mapping here in ARPAbet notation and here in IPA notation. The former is less precise but easier for most people to understand, so I’ll repeat it here.

0: S or Z
1: D, DH, T, or DH
2: N or NG
3: M
4: R
5: L
6: CH, JH, SH, or ZH
7: G or K
8: F or V
9: P or B

It is easy to find words that encode single digits. It’s a little harder to find words that encode some two-digit numbers, but it’s certainly doable. But if you want to encode all three-digit numbers as single words, you have to make some compromises. I estimate there’s about a 52% chance of being able to encode a four-digit number as a single word, for reasons I’ll explain below.

The CMU Pronouncing Dictionary lists 134,373 words along with their ARPAbet pronunciation. In this post I describe how I mapped the words to numbers, creating a file cmu_major.txt.

Not every three-digit number is in this file. The command

    grep -P -o ' \d{3}$' cmu_major.txt | sort -u | wc -l

shows that there are 958 unique three-digit numbers in the file, i.e. 42 three-digit numbers cannot be encoded as words in the CMU dictionary. By changing the ‘3’ to a ‘4’ in the one-liner above we see there are 5,191 unique four-digit numbers in the file, i.e. about 52% of all possible four-digit numbers.

Since it is very often not possible to encode numbers with four or more digits as single words, a common approach is to not try. Instead, just pay attention to the first three digits that a word would encode. The advantage of this is that it opens up more possibilities for encoding three-digit numbers. The downside is that you give up the possibility of encoding four-digit numbers in a single word, but this isn’t giving up much since there’s a 40% chance you’d fail anyway.

So if you want to memorize a four-digit number, you could memorize a pair of two-digit numbers. Some people like to draw these two numbers from different sets, such as using the name of a person for the first two digits and an action for the second two digits. I’ll explore this more in my next post on the PAO system.

Code to convert words to Major system numbers

A few days ago I wrote about using the CMU Pronouncing Dictionary to search for words that decode to certain numbers in the Major mnemonic system. You can find a brief description of the Major system in that post.

As large as the CMU dictionary is, it did not contain words mapping to some three-digit numbers, so it would be good to explore a larger, or at least different, dictionary. But the CMU dictionary is apparently the largest dictionary with pronunciation openly available.

To get more pronunciation data, you’ll need to generate it. This is what linguists call the grapheme to phoneme problem. There are software packages that create phonetic spellings using large neural network models, including models trained on the CMU data.

Why quick-and-dirty is OK

However, it’s possible to do a good enough job with much simpler software. There are several reasons why we don’t need the sophistication of research software. First and foremost, we can tolerate errors. If we get a few false positives, we can skim through those and ignore them. And if we get a few false negatives, that’s OK as long as we find a few of the words we’re looking for.

Another thing in our favor is that we’re not looking for pronunciation per se, only the numbers generated from the pronunciation. The hardest part of the grapheme to phoneme problem is vowel sounds, and we don’t care about vowel sounds at all. And we don’t care about distinguishing, for example, between voiced and unvoiced variations on the th sound because they both map to 1.

Code

The Major mnemonic system is based on pronunciation, not spelling. Nevertheless, you can do a rough-and-ready conversion, adequate for our purposes, based on spelling. I take into account a minimal amount of context, such as noting that c is soft before i, e, and y, but hard before a, o, and u. The handling of ch is probably biggest source of errors because the sound of ch depends on etymology.

I wrote this as a Python script initially because I wanted to share it with someone who knows Python. But I’ll present it here in Perl because the Perl code is much more compact.

sub word2num {
    local $_ = shift;
    
    tr/A-Z/a-z/; # lower case
    
    s/ng/n/g;
    s/sch/j/g;
    s/che/k/g;
    s/[cs]h/j/g;
    s/g[iey]/j/g; # soft g -> j
    s/c[eiy]/s/g; # soft c -> s
    s/c[aou]/k/g; # hard c -> k
    s/ph/f/g;
    s/([bflmprv])\1+/\1/g; # condense double letters
    s/qu/k/g;
    s/x/ks/g;

    tr/szdnmrljgkfvpb/00123456778899/;
    tr/a-z//d; # remove remaining letters
    
    return $_
}

Perl has implicit variables, for better and for worse, and here it’s for the better. All the translation (tr//) and substitution (s//) operate in place on the implicit argument, the word sent to the function.

The corresponding Python code is more verbose:

def word2num(w):

    w = w.lower()

    w = w.replace('ng', 'n')
    w = w.replace('sch', 'j')
    w = w.replace('che', 'k')

    for x in ['gi', 'ge', 'gy', 'ch', 'sh']:
        w = w.replace(x, 'j')

    ...

The order of the replacement statements matters. For example, you want to decide whether c and g are hard before you discard the vowels.

This script works better than I expected it would for being such a dirty hack. I ran it on some large word lists looking for more alternatives to the three-digit numbers not in the output of the script processing the CMU dictionary. I list a few of the words I found here. The most amusing find was phobophobia, the fear of phobias, for 898.

Aside from filling in gaps in three-digit numbers, you could also use a script like this to search for mnemonic words in specialized lists of words, such as baseball players, or animal species, or brand names.

ARPAbet and the Major mnemonic system

Giraffe

ARPAbet is a phonetic spelling system developed by— you guessed it—ARPA, before it became DARPA.

The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted ʒ in IPA is ZH in ARPAbet.

In ARPAbet notation, the Major mnemonic system can be summarized as follows:

0: S or Z
1: D, DH, T, or DH
2: N or NG
3: M
4: R
5: L
6: CH, JH, SH, or ZH
7: G or K
8: F or V
9: P or B

Numbers are encoded using the consonant sounds above; the system is based on sounds and not on spelling. You can insert any vowels or semivowels (e.g. w or y) you like. For example, you could encode 648 as “giraffe” or 85 as “waffle.”

The CMU Pronouncing Dictionary lists 134,373 words along with their ARPAbet pronunciation. The Python code below will read in the pronouncing dictionary and produce a Major mnemonic dictionary. The resulting file is available here as a zip compressed text file.

To find a word that encodes a number, search the code output for that number. For example,

    grep ' 648' cmu_major.txt

will find words whose Major encoding begins with 648, and

    grep ' 648$' cmu_major.txt

fill find words whose Major encoding is exactly 648.

From this we learn that “sheriff” is another possible encoding for 648.

Filling in the gaps

Suppose you’re looking for encodings for all three digit numbers, 000 through 999. This can be hard to do. A common compromise is to only regard up to the first three consonants in a word. For example, you might use “ladybug” to encode 519, ignoring the final G sound on the end.

The tradeoff is that if you adopt this rule then you can’t use “ladybug” to encode 5197. But finding single words that encode 4-digit numbers can be challenging if not impossible, so you may just forego the possibility. (I quantify this here.) This is why in the example above I show both searching for numbers that begin with 648 and numbers that are exactly 648.

Despite the large size of the CMU dictionary, it does not contain words that map to numbers beginning with 42 three-digit numbers. I can offer suggestions for these numbers, but it’s hard to use anyone else’s mnemonics. You may have to make up your own, using, for example, names of people you know personally or brand names you’re familiar with etc.

Python code

# NB: File encoding is Latin-1, not UTF-8.
with open("cmudict-0.7b", "r", encoding="latin-1") as f:
    lines = f.readlines()

for line in lines:
    line.replace('0','') # remove stress notation
    line.replace('1','')
    line.replace('2','')
    
    pieces = line.split()
    numstr = ""
    for p in pieces[1:]:
        match p:
            case "S" | "Z":
                numstr += "0"
            case "D" | "DH" | "T" | "DH":
                numstr += "1"
            case "N" | "NG":
                numstr += "2"
            case "M":
                numstr += "3"
            case "R":
                numstr += "4"
            case "L":
                numstr += "5"
            case "CH" | "JH" | "SH" | "ZH":
                numstr += "6"
            case "G" | "K":
                numstr += "7"
            case "F" | "V":
                numstr += "8"
            case "P" | "B":
                numstr += "9"
    print(pieces[0], numstr)

How to memorize Unicode codepoints

At the end of each month I write a newsletter highlighting the most popular posts of that month. When I looked back at my traffic stats to write this month’s newsletter I noticed that a post I wrote last year about how to memorize the ASCII table continues to be popular. This post is a follow up, how to memorize Unicode values.

Memorizing all 128 ASCII values is doable. Memorizing all Unicode values would be insurmountable. There are nearly 150,000 Unicode characters at the moment, and the list grows over time. But knowing a few Unicode characters is handy. I often need to insert a π symbol, for example, and so I made an effort to remember its Unicode value, U+03C0.

There are convenient ways of inserting common non-ASCII characters without knowing their Unicode values, but these offer a limited range of characters and they work differently in different environments. Inserting Unicode values gives you access to more characters in more environments.

As with ASCII, you can memorize the Unicode value of a symbol by associating an image with a number and associating that image with the symbol. The most common way to associate an image with a number is the Major system. As with everything else, the Major system becomes easier with practice.

However, Unicode presents a couple challenges. First, Unicode codepoints are nearly always written in hexadecimal, and so you’ll run into the letters A through F as well as digits. Second, Unicode codepoints are four hex digits long (or five outside the Basic Multilingual Plane.) We’ll address both of these difficulties shortly.

It may not seem worthwhile to go to the effort of encoding and decoding numbers like this, but it scales well. Brute force is fine for small amounts of data and short-term memory, but image association works much better for large amounts of data and long-term memory.

Unicode is organized into blocks of related characters. For example, U+22xx are math symbols and U+26xx are miscellaneous symbols. If you know what block a symbols is in, you only need to remember the last two hex digits.

You can convert a pair of hex digits to decimal by changing bases. For example, you could convert the C0 in U+03C0 to 192. But this is a moderately difficult mental calculation.

An easier approach would be to leave hex digits alone that correspond to decimal digits, reduce hex digits A through F mod 10, and tack on an extra digit to disambiguate. Stick on a 0, 1, 2, or 3 according to whether no digits, the first digit, the second digit, or both digits had been reduced mod 10. See this page for details. With this system, C0 becomes 201. You could encode 201 as “nest” using the Major system, and imagine a π sitting in a nest, maybe something like the 3Blue1Brown plushie.

3Blue1Brown plushieFor another example, ♕ (U+2655), is the symbol for the white queen in chess. You might think of the White Queen from The Lion, the Witch, and the Wardrobe [2] and associate her with the hex number 0x55. If you convert 0x55 to decimal, you get 85, which you could associate with the Eiffel Tower using the Major system. So maybe imagine the White Queen driving her sleigh under the Eiffel Tower. If you convert 0x55 to 550 as suggested here, you might imagine her driving through a field of lilies.

Often Unicode characters are arranged consecutively in a logical sequence so you can compute the value of the rest of the sequence from knowing the value of the first element. Alphabets are arranged in alphabetical order (mostly [1]), symbols for Roman numerals are arranged in numerical order, symbols for chess pieces are arrange in an order that would make sense to chess players, etc.

[1] There are a few exceptions such as Cyrillic Ё and a gap in Greek capital letters.

[2] She’s not really a queen, but she thinks of herself as a queen. See the book for details.

Redoing images in Midjourney

My son in law was playing around with Midjourney v5 and I asked him to try to redo some of the images I’ve made with DALL-E 2.

Back in August i wrote a post about using DALL-E 2 to generate mnemonic images for memorizing the US presidents using the Major mnemonic system.

To memorize that Rutherford B. Hayes was the 19th president. you might visualize Hayes playing a tuba because you can encode 19 as tuba. The image I created last year was cartoonish and showed Hayes playing something more like a sousaphone than a tuba. Midjourney created a photorealistic image of Hayes playing some weird instrument which is something like a tuba.

Rutherford B. Hayes playing something like a ruba

Franklin Delano Roosevelt was the 32nd president. If we use an image of the moon as the peg for 32, we could imagine FDR looking up at the moon. The previous image of FDR was really creepy and looked nothing like him. The image Midjourney created with FDR was really good.

FDR looking up at the moon

Midjourney choked on the request to create a create an image of Grover Cleveland holding an onion and a wiener dog, just as DALL-E had. It didn’t do any better substituting Grover the Muppet for Grover Cleveland.

A few weeks ago I wrote a blog post about a thought experiment involving an alien astronomer with 12 fingers working in base 12. I could only get DALL-E to draw a hint of an extra finger. We weren’t able to get Midjourney to put 12 fingers on a hand either, but we did get an interesting image of an alien astronomer.

Alien astronomer

Square root mnemonics

Here’s a cute little poem:

I wish I knew
The root of two.

O charmed was he
To know root three.

So we now strive
To find root five.

The beginning of each stanza is a mnemonic for the number mentioned in the following line.

√ 2 = 1.414
√ 3 = 1.732
√ 5 = 2.236

I found this in Twenty Years Before the Blackboard by Michael Stueben. Steuben sites the sources as Dictionary of Mnemonics, London 1972. He doesn’t give any other information such as author or editor.

Update: Additional verse from the comments.

It rhymes with heaven
The root of seven.

Update: Here’s Python code to validate the mnemonics above.

    from math import sqrt

    def validate(line, num):
        digits = [str(len(w)) for w in line.split()]
        x = int("".join(digits)) / 1000
        assert(x == round(sqrt(num),3))

    validate("I wish I knew", 2)
    validate("O charmed was he", 3)
    validate("So we now strive", 5)
    validate("It rhymes with heaven", 7)

Related post: Numbers worth memorizing.

Chimera and sine of 60°

I was playing around with DALL-E last night. I pasted the definition of chimera into DALL-E and the results were bizarre. See this Twitter thread for images.

I also played around with some more mnemonic images. Students often memorize the sines and cosines of 30°, 45°, and 60°. I thought about making a mnemonic image for sin(60°) = √3 / 2, using the decimal value 0.8660 rather than the exact value.

Using the Major system, you can encode 60 as chess or shoes, and 86 as fish. So I thought of a chess board, with a fish wearing shoes standing on it. (Absurd images are easier to remember, and DALL-E excels at making absurd images, though it’s difficult to get it to generate a particular image you may have in mind.)

I was able to generate images of a fish wearing shoes, but when I asked for the fish to be standing on a chess board, it wouldn’t put the shoes on the fish. Here’s the best image I got:

A fish on a chessboard with someone standing behind it

DALL-E images tend to be very large. The original version of the image above was 2 MB. I used Squoosh to reduce the size of the image 98% to produce the image above.

You could make the image much smaller still. I didn’t crop the image because I wanted to show what DALL-E generated, but you could crop it to cut off some of the bottom. You could also edit the image to eliminate the marbling in the squares; solid colored squares would compress much better.

Memorizing Planck’s constant with DALL-E

Planck’s constant used to be a measured quantity and now it is exact by definition.

h = 6.62607015×10−34 J / Hz

Rather than the kilogram being implicit in the units used to measure Planck’s constant, the mass of a kilogram is now defined to be whatever it has to be to make Planck’s constant have the value above.

Now that it’s exact by definition, maybe you’d like to memorize it. Using the Major system described here we could encode the digits as “Judge enjoys quesadilla.” [1]

As with the previous post, I’m using a memorization exercise as an excuse to play around with DALL-E. I typed “A judge enjoying eating a quesadilla” into DALL-E 2 and got back four images, as always. The best of these was the following.

First attempt at judge enjoying a quesadilla

The food in the image looks like a quesadilla, but it’s not clear that the man eating it is a judge, or that he’s enjoying himself.

Next I changed “judge” to “a supreme court justice,” hoping DALL-E would create an image that more obviously features a judge.

Here’s one of the outputs:

Second attempt at judge enjoying a quesadilla

This fellow looks more like a judge, and he’s obviously enjoying himself. Maybe he’s eating a calzone, but we’ll call it a quesadilla.

Not all the images created by DALL-E are as accurate as the ones above. I suspect there’s a lot of selection bias in the examples of images posted online. I’m contributing to that selection bias by showing images that were good enough to include in a blog post. I tried other images for blogging on other topics, and the results were not worth sharing.

So in an attempt at mitigating selection bias, here’s another image generated from the prompt “A supreme court justice enjoying eating a quesadilla.”

Young lady who is not eating anything

This young lady is wearing black, as supreme court justices are wont to do. And she appears to be enjoying herself, but she’s definitely not eating a quesadilla.

Incidentally, another possible encoding of 662607015 is “Judge enjoys Costello” as in Abbot and Costello. When I typed “A supreme court justice enjoying watching Abbot and Costello on television” I got the following creepy image.

DALL-E attempt at a judge watching Abbot and Costello on television

Related posts

[1] This mnemonic is a little bit of a cheat, depending on how you pronounce quesadilla. The sound of ll is sorta like that of a y in English. Here I’m using it to represent 5 just as the l sound does. Here in southeast Texas, I believe most people use the Spanish pronunciation, at least approximately. If you completely anglicize the pronunciation so that ll is pronounced as in pillow, then you can use the mnemonic with no qualms.

DALL-E 2 and mnemonic images

I recently got an account for using OpenAI’s DALL-E 2 image generator. The example images I’ve seen are sorta surreal combinations of common words, and that made me think of the Major memory system.

I’ve written about the Major system before. For example, I give an overview here and I describe how to use it to memorize an ASCII table here. In a nutshell, there are consonant sounds associated with each digit. Choose constant sounds and add any vowel sounds you like to make words you can visualize.

There are a couple ways people use the Major system. One is simply to memorize numbers. Any encoding that leads to something you find easy to remember is OK. For example, suppose you want to encode 19. The consonant sounds for 1 are tth, and d, and the consonant sounds for 9 are p and b. So you could encode 19 as adobe, Debbie, Ethiopia, tuba, etc.

The other way people use the Major system is to memorize specific pegs for numbers. For example, you might choose tuba as your peg for 19. To memorize a list, you associate each item with its peg, such as associating the 19th item with a tuba. Pegs have to be unique so you can pull up a particular mental image to call a list item, such as remembering what you associated with a tuba.

For example, suppose you wanted to memorize a list of the US presidents. The 19th president was Rutherford B. Hayes, and so you might want to imagine him playing a tuba. I uploaded a photo of Hayes and asked DALL-E to make him play a tuba. The software rejected my request, saying that realistic photos of persons are not allowed at this time.

Rutherford B. Hayes playing a tuba

DALL-E knows about some people but not others. For example, it doesn’t know who Evariste Galois is, but apparently it has some idea who Rutherford B. Hayes is. When I asked for “Rutherford B. Hayes playing a tuba” it came back with the image above.

Franklin Delano Roosevelt was the 32nd president. The consonant sound for 3 is m and the consonant sound for 2 is n. Suppose your peg for 32 is moon, and you’d like to imagine FDR looking up at the moon. When I asked DALL-E to make an image of this, I got a very strange image of FDR, but he was looking up at the moon.

FDR looking up at the moon

The only US president to serve two non-consecutive terms was Grover Cleveland, the 22nd and 24th president. I asked DALL-E for an image of Grover the Muppet holding an onion (22) in one hand and a wiener dog (24) in the other [1]. The result was not great.

Blue dog holding cucumber dog?

I thought Grover the Muppet would be more memorable than Grover Cleveland himself. But DALL-E did better with Mr. Cleveland. Maybe there’s some copyright issue with the muppets?

Grover Cleveland with a onion-banana and a dog

Well, he does have an onion, with something weird underneath. Bananas? Eggplant? Cow udders? And he has a dog, though not a wiener dog.

Creating your own mental images is far more efficient than having DALL-E come up with images for you, but the DALL-E images are useful examples of what you might imagine for yourself.

Related posts

[1] The Major system doesn’t use w and so you can throw it in as you would a vowel. So wiener decodes as n and r, 24.

How to memorize the ASCII table

Before discussing how you could memorize a table of ASCII characters and numeric values, I should say a little about why you might do so.

One reason is simply for the challenge. It’s more doable than it may sound.

It’s also useful information, though it’s debatable whether it’s worth memorizing. YMMV. There was a time early in my career when I spent a lot of time staring at a hex editor, and it definitely would have been useful to know then. I still need to look up ASCII values occasionally, though not as often.

ASCII table at the command line: ascii -d

One step in the process of memorizing the ASCII table is to associate an image with each ASCII character. This is useful by itself without learning the ASCII values of the characters. You could memorize strong passwords, for example, by linking together the images associated with each character.

I’ll only consider printable characters in this post.

ASCII landmarks

It’s very handy to know that the ASCII code for the number 0 is 48 and that the digits codes are in order. So the ASCII code for the digit d is d+48.

Wouldn’t it have been easier if the digits started at a location ending in zero? They do, in hexadecimal: digits start at 0x30.

Similarly, capital letters start at 65 (or 0x41). So the nth capital letter starts at n+64 (or n+0x40).

Lower case letters start at 97 (0x61), and so the nth lower case letter starts at n+96 (or n+0x60).

Incidentally, you’ll sometimes see software that sorts capital letters before lower case letters. The software is probably simply sorting by ASCII value.

In short, if you know the ASCII values of 0, A, and a then you can calculate the values of all digits and letters.

Memory pegs

You can create images for each ASCII character based on three things you may already know, and which are worth learning if you don’t.

Letters

One is the NATO phonetic alphabet: Alpha, Bravo, Charlie, Delta, …. If you know the NATO alphabet then you already have an image for each letter.

If you want to memorize a strong password, something like u\#mC_cNJ$o, then you need to distinguish upper and lower case letters. One way to do this would be to memorize two variations on each NATO word, one large and one small. For example, you might split the word Charlie into Charlie Chaplin for capital C and Charlie Brown for lower case c. For another, you might split golf into golf cart for G and golf ball for g.

Digits

The most common way to memorize digits is to encode them as words using the Major system. Zero is associated with the S sound, one is associated with the T sound, two is associated with the N sound, etc. See the preceding link for details.

You should come up with your own pegs that work for you, but here’s an example where each peg is a spice.

  1. cilantro
  2. turmeric
  3. nutmeg
  4. mustard
  5. rosemary
  6. lemon
  7. chili powder
  8. cayenne pepper
  9. vanilla
  10. peppermint.

Your pegs don’t need to be thematically related but I thought it would be fun to create a list based on spices. Maybe the association with various tastes would make your mnemonics more memorable.

Symbols

As with letters and digits, we can appeal to prior art to come up with images. Hacker slang associates a word with each symbol: bang for !, rabbit ears for “, splat for *, etc.

You might start with hacker slang and modify it to suit your taste.

Code values

Using the landmarks mentioned above, you could calculate the ASCII codes for any letter or digit. But if you want to associate letters and digits with their ASCII values more quickly, you need to directly associate each symbol directly with its value. Even if you don’t do this with letters and digits, you have to do it with other symbols because there’s no way to calculate the ASCII value of * or %, for example.

The most common technique for memorizing a numbered list is to memorize a set of pegs for each digit, then associate each list item with its number.

Creating and memorizing pegs for the numbers 1 through 128 is a lot of work, but it’s reusable. If you go to the effort, you can use it for memorizing a lot more than the ASCII table. For example, you could use it to memorize chemical elements.

If your peg for 42 is rain, you can imagine water balloons raining down and splattering on your driveway. That links the number 42 with the symbol * whose hacker slang is splat.

For another example, 94 is the ASCII value of caret (^), sometimes called carrot in hacker slang. You could a bear (94) standing up and eating a carrot like Bugs Bunny.

Update: What if you’d like to memorize Unicode?

More posts on mnemonics