Rare letter combinations and key chords

A bigram is a pair of letters. For various reasons—word games, cryptography, user interface development, etc.—people are interested in knowing which bigrams occur most often, and so such information is easy to find. But sometimes you might want to know which bigrams occur least often, and that’s harder to find. My interest is finding safe key-chord combinations for Emacs.

Peter Norvig calculated frequencies for all pairs of letters based on the corpus Google extracted from millions of books. He gives a table that will show you the frequency of a bigram when you mouse over it. I scraped his HTML page to create a simple CSV version of the data. My file lists bigrams, frequencies to three decimal places, and the raw counts: bigram_frequencies.csv. The file is sorted in decreasing order of frequency.

The Emacs key-chord module lets you bind pairs of letters to Emacs commands. For example, if you map a command to jk, that command will execute whenever you type j and k in quick succession. In that case if you want the literal sequence “jk” to appear in a file, pause between typing the j and the k. This may sound like a bad idea, but I haven’t run into any problems using it. It allows you to execute your most frequently used commands very quickly. Also, there’s no danger of conflict since neither basic Emacs nor any of its common packages use key chords.

The table below gives bigrams whose percentage frequency rounds to zero keeping three decimal places. See the data file for details.

Since Q is always followed by U in native English words, it’s safe to combine Q with any other letter. (If you need to type Qatar, just pause a little after typing the Q.) It’s also safe to use any consonant after J and most consonants after Z. (It’s rare for a consonant to follow Z, but not quite rare enough to round to zero. ZH and ZL occur with 0.001% frequency, ZY 0.002% and ZZ 0.003%.)

Double letters make especially convenient key chords since they’re easy to type quickly. JJ, KK, QQ, VV, WW, and YY all have frequency rounding to zero. HH and UU have frequency 0.001% and AA, XX, and ZZ have frequency 0.003%.

Note that the discussion above does not distinguish upper and lower case letters in counting frequencies, but Emacs key chords are case-sensitive. You could make a key chord out of any pair of capital letters unless you like to shout in online discussions, use a lot of acronyms, or write old-school FORTRAN.

Update (2 Feb 2015):

This post only considered ordered bigrams. But Emacs key chords are unordered, combinations of keys pressed at or near the same time. This means, for example, that qe would not be a good keychord because although QE is a rare bigram, EQ is not (0.057%). The file unordered_bigram_frequencies.csv gives the combined frequencies of bigrams and their reverse (except for double letters, in which case it simply gives the frequency).

Combinations of J and a consonant are still mostly good key chords except for JB (0.023%), JN (0.011%), and JD (0.005%).

Combinations of Q and a consonant are also good key chords except for QS (0.007%), QN (0.006%), and QC (0.005%). And although O is a vowel, QO still makes a good key chord (0.001%).

Numerical computing resources

This week’s resource post: some numerical computing pages on this site.

See also the Twitter account SciPyTip and numerical programming articles I’ve written for Code Project.

Last week: Regular expressions

Next week: Probability approximations

Blog seventh anniversary

Seven years

This blog is seven years old today. I’ve written 2,273 posts so far, a little less than one per day.

Over the holidays I combed through older posts looking for typos, broken links, etc. I fixed a lot of little things, but I’m sure I missed a few. If you find any problems, please let me know.

Most popular posts

Here are some of the most popular posts for the last three years.

In 2011 I split my most popular post list into three parts:

In 2010 I split the list into two parts:

I didn’t post a list of most popular posts for 2009, but the most popular post that year was Why programmers are not paid in proportion to their productivity.

Finally, my most popular posts for 2008.

Other writing

Here are a few other places you can find things I write:

Probability resources

Each Wednesday I post a list of notes on some topic. This week it’s probability.

See also posts tagged probability and statistics and the Twitter account ProbFact.

Last week: Python resources

Next week: Regular expression resources

Photo review

Here are some of the photos I took on my travels last year.

Bicycles on the Google campus in Mountainview, California:
Google bicycles

Sunrise at Isle Vista, California:

View from University of California Santa Barbara:

Reflection of the Space Needle in the EMP museum in Seattle, Washington:

Paradise Falls, Thousand Oaks, California:

Bed and breakfast in Geldermalsen, The Netherlands:

Amsterdam, The Netherlands:

Chinatown in San Francisco, California

Heidelberg, Germany

Bringing back Coltrane

In the novel Chasing Shadows the bad guys have built a time machine named Magog.

“The bottom line is this. And it is hard for me to believe. They are going to use Magog to bring someone back from the past.”

Jack did not blink or move. His heart was beating very quickly now, but he merely said wryly, “Yes? Who are they going to bring back? Tell me it is John Coltrane.”

“You are never really serious are you?”

“I’m very serious about my music. Why are bad guys always bringing back people that everyone was glad to see go the first time? We could use more Coltrane …”

Related post: Nunc dimittis

Thou, thee, you, and ye

Ever wonder what the rules were for when to use thou, thee, ye, or you in Shakespeare or the King James Bible?

For example, the inscription on front of the Main Building at The University of Texas says

Ye shall know the truth and the truth shall make you free.

Why ye at the beginning and you at the end?

The latest episode of The History of English Podcast explains what the rules were and how they came to be. Regarding the UT inscription, ye was the subject form of the second person plural and you was the object form. Eventually you became used for subject and object, singular and plural.

The singular subject form was thou and the singular object form was thee. For example, the opening lines of Shakespeare’s Sonnet 18:

Shall I compare thee to a summer’s day?
Thou art more lovely and more temperate.

Originally the singular forms were intimate and the plural forms were formal. Only later did thee and thou take on an air of reverence or formality.

Notes on HTML, XML, TeX, and Unicode

This week’s resource post: some notes on typesetting, Unicode, etc.

See also blog posts tagged LaTeX, HTML, and Unicode and the Twitter account TeXtip.

Last week: C++ resources

Next week: Special functions

Why assign two characters to the same symbol?

Unicode often counts the same symbol (glyph) as two or more different characters. For example, Ω is U+03A9 when it represents the Greek letter omega and U+2126 when it represents Ohms, the unit of electrical resistance. Similarly, M is U+004D when it’s used as a Latin letter but U+216F when it’s used as the Roman numeral for 1,000.

The purpose of such distinctions is to capture semantic differences. One example of how this could be useful is increased accessibility. A text-to-speech reader should pronounce things the same way people do. When such software sees “a 25 Ω resistor” it should say “a twenty five Ohm resistor” and not “a twenty five uppercase omega resistor,” just as a person would. [1]

Making text more accessible to the blind helps everyone else as well. For example, it makes the text more accessible to search engines as well. As Elliotte Rusty Harold points out in Refactoring HTML:

Wheelchair ramps are far more commonly used by parents with strollers, students with bicycles, and delivery people with hand trucks than they are by people in wheelchairs. When properly done, increasing accessibility for the disabled increases accessibility for everyone.

However, there are practical limits to how many semantic distinctions Unicode can make without becoming impossibly large, and so the standard is full of compromises. It can be quite difficult to decide when two uses of the same glyph should correspond to separate characters, and no standard could satisfy everyone.


[1] Someone may discover that when I wrote “a 25 Ω resistor” above, I actually used an Omega  (Ω, U+03A9) rather than an Ohm character (Ω, U+2126). That’s because font support for Unicode is disappointing. If I had used the technically correct Ohm character, some people would not be able to see it.  Ironically, this would make the text less accessible.

On my Android phone, I can see Ω (Ohm) but I cannot see Ⅿ (Roman numeral M) because the installed fonts have a glyph for the former but not the latter.


This post first appeared on Symbolism, a blog that I’ve now shut down.

Updating blog posts

I’ve been going through my old blog posts and fixing a few problems. I found a few missing images, code samples that had lost their indentation, etc. Most of the errors have been my fault, but some were due to bugs in plug-ins.

If you see any problems with a post, please let me know. You could send me an email, or leave a comment on the post. (For a while I had comments automatically turn off on older posts, but I’ve disabled that. Now you can comment on any post.)

For the first couple years, this blog didn’t have many readers, and so not many people pointed out my errors. Now that there are more readers, I find out about errors more quickly. But I’ve found some egregious errors in some of the older posts.

Thanks for your contribution to this blog. I’ve been writing here for almost seven years, and I’ve benefited greatly from your input.

What do you mean by can’t?

You can’t subtract 4 from 3 (and stay inside the natural numbers, but you can inside the integers).

You can’t divide 3 by 4 (inside the ring of integers, but you can inside the rational numbers).

You can’t take the square root of a negative number (in the real numbers, but in the complex numbers you can, once you pick a branch of the square root function).

You can’t divide by zero (in the field of real numbers, but you may be able to do something that could informally be referred to as dividing by zero, depending on the context, by reformulating your statement, often in terms of limits).

When people say a thing cannot be done, they may mean it cannot be done in some assumed context. They may mean that the thing is difficult, and assume that the listener is sophisticated enough to interpret their answer as hyperbole. Maybe they mean that they don’t know how to do it and presume it can’t be done.

When you hear that something can’t be done, it’s worth pressing to find out in what sense it can’t be done.

Related post: How to differentiate a non-differentiable function