A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe key-chord combinations for Emacs.
Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.
The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to
jk, that command will execute whenever you type
k in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the
j and the
k. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.
The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.
Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)
Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.
Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.
Update (2 Feb 2015):
This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that
qe would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).
Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).
Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).