Anagram frequency

An anagram of a word is another word formed by rearranging its letters. For example, “restful” and “fluster” are anagrams.

How common are anagrams? What is the largest set of words that are anagrams of each other? What are the longest words which have anagrams?

You’ll get different answers to these questions depending on what dictionary you use. I started with the american-english dictionary from the Linux package wamerican. This list contains 102,305 words. I removed all words containing an apostrophe so that possessive forms didn’t count as separate words. I then converted all words to lower case and removed duplicates. This reduced the list to 72,276 words.

Next I alphabetized the letters in each word to create a signature. Signatures that appear more than once correspond to anagrams.

Here are the statistics on anagram classes.

    |------+-------|
    | size | count |
    |------+-------|
    |    1 | 62093 |
    |    2 |  3600 |
    |    3 |   646 |
    |    4 |   160 |
    |    5 |    61 |
    |    6 |    13 |
    |    7 |     2 |
    |    8 |     1 |
    |------+-------|

This means that 62,093 words or about 86% are in an anagram class by themselves. So about 14% of words are an anagram of at least one other word.

The largest anagram class had eight members:

least
slate
stael
stale
steal
tales
teals
tesla

Stael is a proper name. Tesla is a proper name, but it is also a unit of magnetic induction. In my opinion, tesla should count as an English word and Stael should not.

My search found two anagram classes of size seven:

pares
parse
pears
rapes
reaps
spare
spear

and

carets
caster
caters
crates
reacts
recast
traces

The longest words in this dictionary that form anagrams are the following, two pair of 14-letter words and one pair of 12-letter words.

certifications, rectifications
impressiveness, permissiveness
teaspoonsful, teaspoonfuls

Dictionary of anagrams

I made a dictionary of anagrams here. Every word which has a anagram is listed, followed by its anagrams. Here are the first few lines:

abby: baby
abeam: ameba
abed: bade, bead
abel: able, bale, bela, elba
abet: bate, beat, beta
abets: baste, bates, beast, beats, betas
abetter: beretta
abhorred: harbored 

There is some redundancy in this dictionary for convenience: every word in the list of anagrams will also appear as the first entry on a line.

Here’s the Python code that produced the dictionary.

from collections import defaultdict

lines = open("american-english", "r").readlines()

words = set()
for line in lines:
    if "'" not in line:
        line = line.strip().lower()
        words.add(line)

def sig(word):
    return "".join(sorted(word))

d = defaultdict(set)
for w in words:
    d[sig(w)].add(w)

for w in sorted(words):
    anas = sorted(d[sig(w)])
    if len(anas) > 1:
        anas.remove(w)
        print("{}: {}".format(w, ", ".join(anas)))

Related post

Words with the most consecutive vowels.

3 thoughts on “Anagram frequency

  1. From a larger dictionary, TWL06, “pares” is still the biggest equivalence class, but “carets” is only number 10. Here are the top 2::

    {‘apers’,
    ‘apres’,
    ‘asper’,
    ‘pares’,
    ‘parse’,
    ‘pears’,
    ‘prase’,
    ‘presa’,
    ‘rapes’,
    ‘reaps’,
    ‘spare’,
    ‘spear’}

    {‘alerts’,
    ‘alters’,
    ‘artels’,
    ‘estral’,
    ‘laster’,
    ‘ratels’,
    ‘salter’,
    ‘slater’,
    ‘staler’,
    ‘stelar’,
    ‘talers’}

    Also, I just did `Counter(map(sig, words)).most_common(10)` and then recovered the words with the common signatures.

Comments are closed.