ARPAbet and the Major mnemonic system

Giraffe

ARPAbet is a phonetic spelling system developed by— you guessed it—ARPA, before it became DARPA.

The ARPAbet system is less expressive than IPA, but much easier for English speakers to understand. Every sound is encoded as one or two English letters. So, for example, the sound denoted ʒ in IPA is ZH in ARPAbet.

In ARPAbet notation, the Major mnemonic system can be summarized as follows:

0: S or Z
1: D, DH, T, or DH
2: N or NG
3: M
4: R
5: L
6: CH, JH, SH, or ZH
7: G or K
8: F or V
9: P or B

Numbers are encoded using the consonant sounds above; the system is based on sounds and not on spelling. You can insert any vowels or semivowels (e.g. w or y) you like. For example, you could encode 648 as “giraffe” or 85 as “waffle.”

The CMU Pronouncing Dictionary lists 134,373 words along with their ARPAbet pronunciation. The Python code below will read in the pronouncing dictionary and produce a Major mnemonic dictionary. The resulting file is available here as a zip compressed text file.

To find a word that encodes a number, search the code output for that number. For example,

    grep ' 648' cmu_major.txt

will find words whose Major encoding begins with 648, and

    grep ' 648$' cmu_major.txt

fill find words whose Major encoding is exactly 648.

From this we learn that “sheriff” is another possible encoding for 648.

Filling in the gaps

Suppose you’re looking for encodings for all three digit numbers, 000 through 999. This can be hard to do. A common compromise is to only regard up to the first three consonants in a word. For example, you might use “ladybug” to encode 519, ignoring the final G sound on the end.

The tradeoff is that if you adopt this rule then you can’t use “ladybug” to encode 5197. But finding single words that encode 4-digit numbers can be challenging if not impossible, so you may just forego the possibility. (I quantify this here.) This is why in the example above I show both searching for numbers that begin with 648 and numbers that are exactly 648.

Despite the large size of the CMU dictionary, it does not contain words that map to numbers beginning with 42 three-digit numbers. I can offer suggestions for these numbers, but it’s hard to use anyone else’s mnemonics. You may have to make up your own, using, for example, names of people you know personally or brand names you’re familiar with etc.

Python code

# NB: File encoding is Latin-1, not UTF-8.
with open("cmudict-0.7b", "r", encoding="latin-1") as f:
    lines = f.readlines()

for line in lines:
    line.replace('0','') # remove stress notation
    line.replace('1','')
    line.replace('2','')
    
    pieces = line.split()
    numstr = ""
    for p in pieces[1:]:
        match p:
            case "S" | "Z":
                numstr += "0"
            case "D" | "DH" | "T" | "DH":
                numstr += "1"
            case "N" | "NG":
                numstr += "2"
            case "M":
                numstr += "3"
            case "R":
                numstr += "4"
            case "L":
                numstr += "5"
            case "CH" | "JH" | "SH" | "ZH":
                numstr += "6"
            case "G" | "K":
                numstr += "7"
            case "F" | "V":
                numstr += "8"
            case "P" | "B":
                numstr += "9"
    print(pieces[0], numstr)

6 thoughts on “ARPAbet and the Major mnemonic system

  1. Wouldn’t 748 be a more appropriate encoding of “giraffe”, based on the table above?

  2. The Major system is based on pronunciation rather than spelling. Giraffe starts with a G, but the initial sound is a J sound. To put it another way, soft G encodes 6 and hard G encodes 7,

  3. I was shown a system a long time ago in a seminar that is very similar but easier to remember the letters.

    1:t just a line down for the most part like a one
    2:n 2 lines down
    3:m 3 lines down
    4:r four ends in r
    5:L hand up with thumb out looks like an L
    6:b obvious
    7:k or hard c think 7 with line through down stroke
    8:f script f looks like 8
    9:g obvious
    0:s or z zero

    then like above you fill in words with vowels to make phrases you can remember

  4. I like the major system. I think Martin Gardner introduced it decades ago in a way similar to Bill Fite’s mnemonic.
    Vichyssoise would be 860, right?
    I’m surprised the dictionary doesn’t have “rarer” for 444. Is there a reason to avoid comparatives?

  5. Hey Scott. Maybe I mispronounce vichyssoise. Or maybe the third consonant sound is ambiguous to an English ear. That shows how the Major system can be somewhat personal. You run into differences like rhotic versus non-rhotic dialects.

    I suppose the CMU dictionary might leave out comparatives for brevity. Or maybe it contains “rarer” but for whatever reason that slipped past my script.

    For memorization it’s best when encodings are concrete nouns that are easy to visualize. But you make whatever compromises you have to. “Rarer” is a bit abstract, but maybe you could visualize sending a steak back, telling the waiter that you’d like it rarer. :)

  6. Another possible encoding of 898: phobophobia. Fear of phobias.

    Makes me think of FDR’s speech: “The only thing we have to fear is fear itself.”

    A few other alternates:

    444 rearwards, rarer, rare earth, ruhr river
    466 archjoker, Rosh Hashanah
    688 chief of staff
    866 fishshop
    868 fishwife, fishfork
    988 Bay of Fundy

Comments are closed.