This post will look at a progression of ways to store passwords, from naive to sophisticated.
Most naive: clear text
Storing passwords in plain text is least secure thing a server could do. If this list is leaked, someone knows all the passwords with no effort.
Better: hash values
A better approach would be to run passwords through a secure hash function before storing them. The server would never store a password per se, only its hashed value. When someone enters a password, the server would check whether the hash of the password matches the stored hashed value. If the list of hashed values is leaked, it will take some effort to recover the original passwords, though maybe not much effort, as I wrote about in my previous post.
Better still: add salt
The next step in sophistication is to append a random value, called a salt, to the password before applying the hash function. The server would store each user’s salt value and the hash of the password with the salt added.
Now if the user has a naive password like “qwerty” you couldn’t crack the password by doing a simple Google search. You can find the hash value of “qwerty” by searching, but not the hash value of qwerty + random salt, because although the former is common the latter is probably unique. You could still crack the password if the hash is insecure, but it would take a little effort.
Better still: key stretching
Suppose an attacker has a list of salt values and corresponding hash values for salt + password. They could guess passwords, hashing each with a salt value, to see if any hash values match. They would start with the most common passwords and probably get some matches.
Key stretching is a way to make this brute force search more time consuming by requiring repeated hashing. In the following stretching algorithm, p is the password, s is the salt, h is the hash function, and || means string concatenation.
Now the time required to test each password has been multiplied by r. The idea is to pick a value of r that is affordable for legitimate use but expensive for attacks.
Next step in sophistication is to avoid sending the plaintext password to the server altogether, by using a protocol like OPAQUE. See https://blog.cryptographyengineering.com/2018/10/19/lets-talk-about-pake/
This key stretching algorithm has an unfortunate runtime complexity of O(|p| * r) which forces websites to artificially limit password length to avoid ddos / very slow hashing. There’s no other real reason to limit password length except network bandwidth, and even that could easily allow passwords of 10k+ characters without issue.
So I propose a new algorithm:
x[0] = h(theta||p||s), x[n] = h(x[n-1]||s) for i from 1 to r.
I.e. the password is only read once to initialize the hash, and then it’s iterated without the password again. This changes the complexity to O(|p| + r), and makes the iteration time constant for any length password (and thus easier to tune prececely). This is much more reasonable from a performance perspective and should allow for much larger passwords.
As far as I can tell this should have the same security as the “normal” algorithm but with a significant algorithmic complexity improvement. Is there something that I’m not seeing? Perhaps a security trade-off? (If there’s a worry about “losing” entropy from the password over repeated iterations, you could add one more round with p: K = h(x[r]||p||s), which doesn’t change the better complexity).
But for me, a password is seldom enough? With 100+ logins, I often need 5 or 6 pieces of information? I store them (plain text) as
key:{key:value [,key:value]*}
e.g. url: , login: , mmn: etc.
Could this be used for a more practical approach? I’d be interested.
@Dave: An individual could do things that wouldn’t be practical for a server. You might consider a password manager or a YubiKey.
@Joe: I don’t think password length matters because password lengths will be less than hash lengths. Hardly anybody would use a 256-bit password, for example.
Is this true?
“Now the time required to test each password has been multiplied by r. ”
As an attacker, I don’t know how many iterations you’ve used (r). Therefore I need to pick a test password and either iterate until I get a match (which may never happen if that password isn’t in the list,) or give up after some predetermined number of iterations. If that number is be less than r, I miss when I should have hit.
Also, I could make this harder by using a list of salts s_i; i = 1,…n and making each iteration
x_i+1 = h(x_i || p || s_i) i = 1,…,r
Now as an attacker, I need to know all the salts and the correct order in which to apply them, which multiplies the attack cost by s! at no additional encryption cost.
* said s!, meant n!
I’ve implicitly assumed that the attacker knows r. In practice they may or may not. Kerckhoff’s principle would say that it’s safest to assume any fixed parameters such as r are known.
@John: If you use a password generator you can get lots of entropy in 32 bytes, but most people should use passphrases — which are much better than passwords for real people — (and much less entropy dense) then suddenly 32 bytes start to seem very restrictive. 32 bytes is just a couple words and an ok-ish passphrase. I hate running up against these kinds of arbitrary password length limits, and I don’t like the idea of restricting the way a user of my service is allowed to secure themselves — who am I to say your scheme of typing a unique little story as your password is “wrong”? I would prefer the limit to be based on something *real*, like bandwidth or memory limits, UX reasons, or security reasons, not “I chose a bad algorithm so I have to punish my users with the consequences.”
From this post I deduce that you type on Dvorak or some other alternative keyboard layout, for how else does one misspell “qwerty”?
Dvorak is so mainstream. I use my own artisanal keyboard layout. :)