NLP software infers parts of speech by context. For example, the SpaCy NLP software can determine the parts of speech in the poem Jabberwocky even though the words are nonsense. More on this here.
If you want to tell the parts of speech for isolated words, maybe software like SpaCy isn’t the right tool. You might use lists of nouns, verbs, etc. This is what I’ll do in this post using the WordNet corpus. I’d like to show how you could use WordNet to create a mnemonic system.
One way that people memorize six-digit numbers, or memorize longer numbers six digits at a time, is the PAO (person-action-object) system. They memorize a list of 100 people, 100 actions, and 100 direct objects. The first two digits of a six-digit number are encoded as a person, the next two digits as an action, and the last two digits as an object. For example, “Einstein dances with a broom” might encode 201294 if 20 is associated with Einstein, 12 with dance, and 94 with broom.
These mappings could be completely arbitrary, but you could memorize them faster if there were some patterns to the mappings, such as using the Major mnemonic system.
I’ve written before about how to use the CMU Pronouncing Dictionary to create a list of words along with the numbers they correspond to in the Major system. This post will show how to pull out the nouns and verbs from this list. The nouns are potential objects and the verbs are potential actions. I may deal with persons in another post.
Noun and verb lists in WordNet
The WordNet data contains a file
index.noun with nouns and other information. We want to discard the first 29 lines of preamble and extract the nouns in the first column of the file. We can do this with the following one-liner.
tail -n +30 index.noun | cut -d' ' -f1 > nouns.txt
Likewise we can extract a list of verbs with the following
tail -n +30 index.verb | cut -d' ' -f1 > verbs.txt
There is some overlap between the two lists since some words can be nouns or verbs depending on context. For example, running
grep '^read$' nouns.txt
grep '^read$' verbs.txt
shows that “read” is in both lists. (The regex above anchors the beginning of the match with
^ and the end with
$ so we don’t get unwanted matches like “treadmill” and “readjust.”)
Sorting CMU dictionary by part of speech
The following Python code will parse the file
cmu_major.txt from here to pull out a list of nouns and a list of verbs, along with their Major system encodings.
with open("nouns.txt") as f: nouns = set() for line in f.readlines(): nouns.add(line.strip()) with open("verbs.txt") as f: verbs = set() for line in f.readlines(): verbs.add(line.strip()) cmunouns = open("cmu_nouns.txt", "w") cmuverbs = open("cmu_verbs.txt", "w") with open("cmu_major.txt") as f: for line in f.readlines(): w = line.split().lower() if w in nouns: cmunouns.write(line) if w in verbs: cmuverbs.write(line)
Going back to the example of 201294, the file
cmu_verbs.txt contains 82 words that correspond to numbers starting with 12. And the file
cmu_nouns.txt contains 1,057 words that correspond to numbers starting with 94.
Choosing verbs is the hard part. Although there are verbs for every number 00 through 99, many of these would not be good choices. You want active verbs that can be combined with any subject and object.
My impression is that most people who use the PAO system—I do not—pick their names, verbs, and objects without regard to the Major system, and I could understand why: your choices are sometimes limited if you want to be compatible with the Major system. You might compromise and use Major-compatible pegs when possible and make exceptions as needed.