Randomize, then humanize

Yesterday I wrote about a way to memorize a random 256-bit encryption key. This isn’t trivial, but it’s doable using memory techniques.

There’s a much easier way to create a memorable encryption key: start with something memorable, then apply a hash function. Why not just do that?

There are two conflicting criteria to satisfy: cryptographic strength and ease of memorization. Any password is a compromise between these two goals.

You get better security if you generate a random password, then try to make it memorable through some technique for making it more palatable to a human mind. You get something easier to remember if you start with something human-friendly and apply some process to make it appear more random.

In a nutshell, you can either randomize then humanize, or humanize then randomize.

Humanize then randomize, or randomize then humanize

You get better ease of use if you humanize then randomize. You get better security if you randomize then humanize.

This morning I ran across a paper by Arnold Reinhold suggesting that people generate 10-digit passwords by first generating 10 random letters, then create a mnemonic sentence with each word starting with one of the letters. Reinhold says that this leads to a greater variety of passwords than if you were to start with mnemonic sentence and somehow reduce it to 10 letters. This is an example of the randomize-then-humanize pattern.

Why?

There are more possibilities for an attacker to have to explore if you start with random input.

For example, suppose a site requires an 8-character password and you choose an 8-letter word. There are only about 30,000 English words with eight letters [1], and people are far more likely to choose some of these words than others. If you randomly choose a 3-character password using digits and letters (upper case and lower case) there are 623 = 238,328 possibilities. A three-character random password is far better than an 8-character word.

In Reinhold’s example, there are 2610 possible passwords made of 10 lowercase letters. That’s over 100 trillion possibilities. There are certainly fewer than 100 trillion pass phrases that humans are likely to come up with. Say you want to use your favorite sentence from a famous book. Suppose there are 1,000 famous books and each has 10,000 sentences. That’s only 10 million possibilities.

Human-generated randomness

People are not that good at generating randomness. Here’s a passage from David Kahn’s book The Codebreakers about the results of asking typists to create pages of random numbers for use in one-time pads.

Interestingly, some pads seem to be produced by typists and not by machines. They show strike-overs and erasures — neither likely to be made by machines. More significant are statistical analyses of the digits. One such pad, for example, has seven times as many groups in which digits in the 1-to-5 group alternate with digits in the 6-to-0 group, like 18293, as a purely random arrangement would have. This suggests that the typist is striking alternately with her left hand (which would type the 1-to-5 group on a Continental machine) and her right (which would type the 6-to-0 group). Again, instead of just half the groups beginning with a low number, which would be expected in a random selection, three quarters of them do, possibly because the typist is spacing with her right hand, then starting a new group with her left. Fewer doubles and triples appear than chance expects.

How hacks work

Websites implicitly use the humanize-then-randomize approach. When you create a password, the site hashes what you type and stores the hashed value. (A naive site might store the actual password.) Then the next time you log in, the site hashes your password input and compares it to the stored value.

If the site is hacked, and the site’s hashing algorithm is known, then many of these passwords can be recovered. This happens routinely. If you apply a hash function to a list of 10,000 common passwords, there are only 10,000 hash values, and you simply search the hacked list for these values. And since people often reuse passwords, someone who knows your password on one site can try that password on another site.

If you use a randomly generated password for each site, it’s less likely any individual password will be exposed. And if a password is exposed, a hacker cannot use it on another site.

Related posts

[1] On my Macbook, grep '^........$' words | wc -l returned 30,001. You’d get different results from searching different word lists, but your results wouldn’t vary too much.