Microsoft replacing SHA-1

According to this article, Microsoft is patching Windows 7 and Windows Server 2008 to look for SHA-2 hash functions of updates. These older versions of Windows have been using SHA-1, while newer version are already using SHA-2.

This is a good move, but unnecessary. Here’s what I mean by that. The update was likely unnecessary for reasons I’ll explain below, but it was easy to do, and it increased consistency across Microsoft’s product line. It’s also good PR.

What are SHA-1 and SHA-2?

Let’s back up a bit. SHA-1 and SHA-2 are secure hash functions [1]. They take a file, in this case a Microsoft software update, and return a relatively small number, small relative to the original file size. In the case of SHA-1, the result is 160 bits (20 bytes).  They’re designed so that if a file is changed, the function value is nearly certain to change. That is, it’s extremely unlikely that a change to the file would not result in a change to the hash value.

The concern isn’t accidental changes. The probability of accidentally producing two files with the same hash function value is tiny as I show here.

The concern is a clever attacker who could modify the software update in such a way that the hash function remains unchanged, bypassing the hash as a security measure. That would be harder to do with SHA-2 than with SHA-1, hence Microsoft’s decision years ago to move to SHA-2 for new versions of the operating system, and its recent decision to make the change retroactive.

How hard is it to produce collisions?

By a collision we mean two files that hash to the same value. It’s obvious from the pigeon hole principle [2] that collisions are possible, but how hard are they to produce deliberately?

Google demonstrated two years ago that it could produce two PDF files with the same SHA-1 hash value. But doing so required over 6,500 years of CPU time running in parallel [3]. Also, Google started with a file designed to make collisions possible. According to their announcement,

We started by creating a PDF prefix specifically crafted to allow us to generate two documents with arbitrary distinct visual contents, but that would hash to the same SHA-1 digest.

It would be harder to start with a specified input, such as a software update file and generate a collision. It would be harder still to generate a collision that had some desired behavior.

According to this page, it’s known how to tamper with two files simultaneously so that they will have the same SHA-1 hash values. This is what Google did, at the cost of thousands of CPU years. But so far, nobody has been able to start with a given file and create another file with the same SHA-1 value. (Update: Now they have!)

As I said at the beginning, it made sense for Microsoft to decide to move from SHA-1 to SHA-2 because the cost of doing so was small. But the use of SHA-1 hash codes is probably not the biggest security risk in Windows 7.

More secure hash posts

[1] SHA-1 is a hash function, but SHA-2 is actually a family of hash functions: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. All are believed to provide at least 112 bits of security, while SHA-1 provides less than 63.

The SHA-x functions output x bits. The SHA-x/y functions use x bits of internal state and output y bits. To be consistent with this naming convention, SHA-1 should be called SHA-160.

[2] The pigeon hole principle says that if you put more than n things into n boxes, one of the boxes has to have more than one thing. If you hash files of more than n bits to n-bit numbers, at least two files have to go to the same value.

[3] If you were to rent this much CPU time in the cloud at 5 cents per CPU hour, it would cost about $2,800,000. If the only obstacle were the cost of computing resources, someone might be willing to pay that to tamper with a Microsoft update. (Update: with a new algorithm announced in January 2020, the estimated cost has dropped $45,000.)

Hash function menagerie

Here’s an oversimplified survey of cryptographic hash functions: Everyone used to use MD5, now they use some variation on SHA.

There’s some truth to that. MD5 was very popular, and remains popular years after it was proven insecure. And now variations on SHA like SHA1 and SHA256 are commonly used. But there are a lot more cryptographic hash functions in common use.

Python support

If Python’s hashlib is a reliable guide, the most common hashing algorithms are

  • MD5
  • SHA1
  • SHA224
  • SHA256
  • SHA384
  • SHA512

because these are the six algorithms guaranteed to be supported on every platform, as listed in the output of the algorithms_guaranteed method in hashlib.

The algorithms_available methods in hashlib includes additional algorithms available in a particular installation. On the computer I’m using at the moment, it lists 18 hash algorithms in addition to those on the guaranteed list.

Mathematica support

Mathematica supports the hash functions on hashlib‘s guaranteed list, and a few more:

  • Adler32
  • CRC32
  • MD3
  • MD4
  • RIPEMD160
  • RIPEMD160SHA256
  • SHA256SHA256

The first two hashes, Adler32 and CRC32, were never intended to be secure. They were designed simply as error detection codes and weren’t designed to be tamper-resistant. As the names imply, MD3 and MD4 were predecessors to MD5.

The hash that Mathematica calls RIPEMD160SHA256 is SHA 256 followed by the RIPEMD160. The reason this combination gets a name of its own is because it is used in Bitcoin. Finally, SHA256SHA256 is simply SHA256 applied twice.

The long tail

The hash functions mentioned above are the most commonly used, but there are hundreds of others in common use. The hashcat password cracking tool lists 260 kinds of hash functions it can attack.

Some of these hash functions are fundamental algorithms, such as Whirlpool and variations of GOST. Some are combinations of primitive functions, such as salted or iterated variations. Many of them are vendor and product specific. For example, hashcat lists nine different hashing algorithms associated with various versions of Microsoft Office, six algorithms for Cisco products, five algorithms for SAP, etc.

It’s interesting to speculate on why there are so many custom hash functions: hashing technology progress, differing emphases on security and speed, not-invented-here syndrome, etc.

Security by variety

There’s something going on that isn’t exactly security-by-obscurity, i.e. relying on keeping your encryption algorithm a secret. The hashing algorithms for all the products mentioned above are well known, but there may be some small advantage to being a little bit off the beaten path.

People have built special hardware and software for attacking popular hashing algorithms, and doing something a little different could prevent this from working on your hash. Of course doing something a little different could also introduce a weakness you didn’t anticipate. Creating your own encryption algorithm is a bad idea unless you’re an expert, and often even if you are an expert But making a new hash function by combining secure primitives is not as dangerous as creating your own encryption algorithm.

More hash function posts

Economics, power laws, and hacking

Increasing costs impact some players more than others. Those who know about power laws and know how to prioritize are impacted less than those who naively believe everything is equally important.

This post will look at economics and power laws in the context of password cracking. Increasing the cost of verifying a password does not impact attackers proportionately because the attackers have a power law on their side.

Key stretching

In an earlier post I explained how key stretching increases the time required to verify a password. The idea is that if authentication systems can afford to spend more computation time verifying a password, they can force attackers to spend more time as well.

The stretching algorithm increases the time required to test a single salt and password combination by a factor of r where r is a parameter in the algorithm. But increasing the amount of work per instance by a factor of r does not decrease the number of users an attacker can compromise by a factor of r.

Power laws

If an attacker is going through a list of compromised passwords, the number of user passwords recovered for a fixed amount of computing power decreases by less than r because passwords follow a power law distribution [1]. The nth most common password appears with frequency proportional to ns for some constant s. Studies suggest s is between 0.5 and 0.9. Let’s say s = 0.7.

Suppose an attacker has a sorted list of common passwords. He starts with the most common password and tries it for each user. Then he tries the next most common password, and keeps going until he runs out of resources. If you increase the difficulty of testing each password by r, the attacker will have to stop after trying fewer passwords.

Example

Suppose the attacker had time to test the 1000 most common passwords, but then you set r to 1000, increasing the resources required for each individual test by 1000. Now the attacker can only test the most common password. The most common password accounts for about 1/24 of the users of the 1000 most common passwords. So the number of passwords the attacker can test went down by 1000, but the number of users effected only went down by 24.

It would take 1000 times longer if the attacker still tested the same number of passwords. But the attacker is hitting diminishing return after the first password tested. By testing passwords in order of popularity, his return on resources does not need to go down by a factor of 1000. In this example it only went down by a factor of 24.

How much the hacker’s ROI decreases depends on several factors. It never decreases by more than r, and often it decreases less.

Related posts

[1] Any mention of power laws brings out two opposite reactions, one enthusiastic and one pedantic. The naive reaction is “Cool! Another example of how power laws are everywhere!” The pedantic reaction is “Few things actually follow a power law. When people claim that data follow a power law distribution the data usually follow a slightly different distribution.”

The pedantic reaction is technically correct but not very helpful. The enthusiastic reaction is correct with a little nuance: many things approximately follow a power law distribution over a significant range. Nothing follows a power law distribution exactly and everywhere, but the same is true of any probability distribution.

It doesn’t matter for this post whether data exactly follow a power law. This post is just a sort of back-of-the-envelope calculation. The power law distribution has some empirical basis in this context, and it’s a convenient form for calculations.

Salting and stretching a password

This post will look at a progression of ways to store passwords, from naive to sophisticated.

Most naive: clear text

Storing passwords in plain text is least secure thing a server could do. If this list is leaked, someone knows all the passwords with no effort.

Better: hash values

A better approach would be to run passwords through a secure hash function before storing them. The server would never store a password per se, only its hashed value. When someone enters a password, the server would check whether the hash of the password matches the stored hashed value. If the list of hashed values is leaked, it will take some effort to recover the original passwords, though maybe not much effort, as I wrote about in my previous post.

Better still: add salt

The next step in sophistication is to append a random value, called a salt, to the password before applying the hash function. The server would store each user’s salt value and the hash of the password with the salt added.

Now if the user has a naive password like “qwerty” you couldn’t crack the password by doing a simple Google search. You can find the hash value of “qwerty” by searching, but not the hash value of qwerty + random salt, because although the former is common the latter is probably unique. You could still crack the password if the hash is insecure, but it would take a little effort.

Better still: key stretching

Suppose an attacker has a list of salt values and corresponding hash values for salt + password. They could guess passwords, hashing each with a salt value, to see if any hash values match. They would start with the most common passwords and probably get some matches.

Key stretching is a way to make this brute force search more time consuming by requiring repeated hashing. In the following stretching algorithm, p is the password, s is the salt, h is the hash function, and || means string concatenation.

\begin{align*} x_0 &= \phi \\ x_i &= h(x_{i-1} || p || s) \mbox{ for i = 1, \ldots, }r \\ K &= x_r \end{align*}

Now the time required to test each password has been multiplied by r. The idea is to pick a value of r that is affordable for legitimate use but expensive for attacks.

Reversing an MD5 hash

The MD5 hashing algorithm was once considered secure cryptographic hash, but those days are long gone [1]. For a given hash value, it doesn’t take much computing power to create a document with the same hash.

Hash functions are not reversible in general. MD5 is a 128-bit hash, and so it maps any string, no matter how long, into 128 bits. Obviously if you run all strings of length, say, 129 bits, some of them have to hash to the same value. (Another win for the pigeon hole principle.)

And yet in practice it may be possible to reverse a hash, given some context. In the context of short, weak passwords, it may be possible to determine what text was hashed to create a particular value. All it may take is a quick web search [2]. For example, let’s take an insecure password like “password” and run it through a bit of Python to compute its MD5 hash.

    >>> import hashlib
    >>> def foo(x): 
    ...     print(hashlib.md5(x.encode('utf-8')).hexdigest())
    >>> foo("password")
    5f4dcc3b5aa765d61d8327deb882cf99

A Google search returns over 21,000 hits on 5f4dcc3b5aa765d61d8327deb882cf99, and the second one shows that it’s the hash of “password”.

If I try a slightly less naive password like “p@$$word” I still get several hits, indicating that the hash is part of a list of compromised passwords.

Not every hash of a short string can be reversed this way. If I hash my business phone number, for example, I get something that does not yet appear in Google searches. The hash could still be reversed easily, but it would take more than just a Google search.

See the next post for how salting can thwart web search attacks.

More security posts

[1] MD5 was invented in 1992 and the first flaw was discovered in 1996. Experts started moving away from MD5 at that time. More flaws were discovered over time. In 2010 the CMU Software Engineering Institute declared that MD5 was “cryptographically broken and unsuitable for further use.” It’s still being used, though not as much. MD5 still makes a useful checksum, though it’s not cryptographically secure.

[2] The same would be true of a secure hash of an insecure password. For example, SHA-256 is better than MD5, but you could look up the SHA-256 hash values of terrible passwords too. But MD5 hashes are easier to search on. In my experiments, I got far fewer hits searching on SHA-256 hash values.

If you’re trying to reverse hash values on your own computer without a web search, the MD5 value would require much less computation than the SHA-256 value.

Base 32 and base 64 encoding

Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical.

Math convention

By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 would use 0, 1, 2, …, 9, A, B, C, D, E, and F as its “digits.” This works for bases up to 36; base 36 would use all the letters of the alphabet. There’s no firm convention for whether to use upper or lower case letters.

Base 64 encoding

The common use for base 64 encoding isn’t to represent bits as numbers per se, but to have an efficient way to transmit bits in a context that requires text characters.

There are around 100 possible characters on a keyboard, and 64 is the largest power of 2 less than 100 [1], and so base 64 is the most dense encoding using common characters in a base that is a power of 2.

Base 64 encoding does not follow the math convention of using the digits first and then adding more symbols; it’s free not to because there’s no intention of treating the output as numbers. Instead, the capital letters A through Z represent the numbers 0 though 25, the lower case letters a through z represent the numbers 26 through 51, and the digits 0 through 9 represent the numbers 52 through 61. The symbol + is used for 62 and / is used for 63.

Crockford’s base 32 encoding

Douglas Crockford proposed an interesting form of base 32 encoding. His encoding mostly follows the math convention: 0, 1, 2, …, 9, A, B, …, except he does not use the letters I, L, O, and U. This eliminates the possibility of confusing i, I, or l with 1, or confusing O with 0. Crockford had one more letter he could eliminate, and he chose U in order to avoid an “accidental obscenity.” [2]

Crockford’s base 32 encoding is a compromise between efficiency and human legibility. It is more efficient than hexadecimal, representing 25% more bits per character. It’s less efficient than base 64, representing 17% fewer bits per character, but is more legible than base 64 encoding because it eliminates commonly confused characters.

His encoding is also case insensitive. He recommends using only capital letters for output, but permitting upper or lower case letters in input. This is in the spirit of Postel’s law, also known as the robustness principle:

Be conservative in what you send, and liberal in what you accept.

See the next post for an explanation of Crockford’s check sum proposal.

A password generator

Here’s a Python script to generate passwords using Crockford’s base 32 encoding.

    from secrets import randbits
    from base32_crockford import encode

    def gen_pwd(numbits):
        print(encode(randbits(numbits)))

For example, gen_pwd(60) would create a 12-character password with 60-bits of entropy, and this password would be free of commonly confused characters.

Related posts

[1] We want to use powers of 2 because it’s easy to convert between base 2 and base 2n: start at the right end and convert bits in groups of n. For example, to convert a binary string to hexadecimal (base 24 = 16), convert groups of four bits each to hexadecimal. So to convert the binary number 101111001 to hex, we break it into 1 0111 1001 and convert each piece to hex, with 1 -> 1, 0111 -> 7, and 1001 -> 9, to find 101111001 -> 179. If we a base that is not a power of 2, the conversion would be more complicated and not so localized.

[2] All the words on George Carlin’s infamous list include either an I or a U, and so none can result from Crockford’s base 32 encoding. If one were willing to risk accidental obscenities, it would be good to put U back in and remove S since the latter resembles 5, particularly in some fonts.

RSA with Pseudoprimes

RSA setup

Recall the setup for RSA encryption given in the previous post.

  1. Select two very large prime numbers p and q.
  2. Compute n = pq and φ(n) = (p – 1)(q – 1).
  3. Choose an encryption key e relatively prime to φ(n).
  4. Calculate the decryption key d such that ed = 1 (mod φ(n)).
  5. Publish e and n, and keep dp, and q secret.

φ is Euler’s totient function, defined here.

There’s a complication in the first step. Maybe you think the numbers p and q are prime, but they’re not. In that case the calculation of φ in step 2 is wrong.

Pseudoprimes

The numbers p and q need to be “very large,” where the definition of what constitutes large changes over time due to Moore’s law and progress with factoring algorithms. Currently p and q would typically have at least 2048 bits each. It’s easy to find numbers that large that are probably prime, but it’s difficult to be certain.

At the moment, the fastest way to test for primes has a small chance making a false positive error, but no chance of a false negative. That is, if the test says a number is composite, it is certainly composite. But if the test says a number may be prime, there is a small chance that it is not prime. (Details here.) So in practice RSA starts by finding two large probable primes or pseudoprimes.

Discussions of RSA encryption often point out that large pseudoprimes are very rare, so it isn’t a problem that RSA starts with pseudoprimes. But what does that mean? Is there a one in a trillion chance that your private key won’t work and nobody can send you messages? Or that you can receive some messages and not others?

Encryption and decryption

RSA encryption works by taking a message m and raising it to the exponent e modulo n where e and n are defined at the top of the post. To decrypt the message, you raise it to the exponent d modulo n where d is your private decryption key. Because d was computed to satisfy

ed = 1 (mod φ(n)),

Euler’s theorem says that we’ll get our message back. We’ll give an example below.

What if p or q are pseudoprime?

If p and q are prime, then φ(n) = (p – 1)(q – 1). But if we’re wrong in our assumption that one of these factors is prime, our calculation of φ(n) will be wrong. Will our encryption and decryption process work anyway? Maybe.

We’ll do three examples below, all using small numbers. We’ll use e = 257 as our public encryption exponent and m = 42 as the message to encrypt in all examples.

In the first example p and q are indeed prime, and everything works as it should. In the next two examples we will replace p with a pseudoprime. In the second example everything works despite our error, but in the third example decryption fails.

Example 1: p and q prime

We’ll start with p = 337 and q = 283. Both are prime. The Python code below shows that d = 60833 and the encrypted message is 45431. Everything works as advertised.

Example 2: p pseudoprime

Now we use p = 341 and q = 283. Here p is a pseudoprime for base 2, i.e.

2340 = 1 mod 341

and so 341 passes Fermat’s primality test [1] for b = 2. Now d = 10073 and the encrypted message is 94956. Decrypting the encrypted message works, even though p is not prime and our calculation for φ is wrong. In fact, the process works not just for our message m = 42 but for every message.

Example 3: p pseudoprime

Here again we let p be the pseudoprime 341 but set q to the prime 389. Now d = 6673, the encrypted message is 7660, but decrypting the encrypted message returns 55669, not 42 as we started with. Decryption fails for all other messages as well.

If we use the correct value for φ(pq) the example works. That is, if we use

φ(341*389) = φ(11*31*389) = 10*30*388

rather than the incorrect value 340*388 the decryption process recovers our original message.

What can go wrong

The examples above show that if we mistakenly think one of our numbers is a prime when it is only a pseudoprime, decryption might succeed or it might fail. In either case, we assume npq has two prime factors when in fact it has more, and so n is easier to factor than we thought.

Python code

Here’s the code for the examples above.

    from sympy import gcd, mod_inverse
    
    message = 42
    e = 257
    
    def example(p, q):
        n = p*q
        phi = (p-1)*(q-1)
        assert( gcd(e, phi) == 1 )
        
        d = mod_inverse(e, phi)
        assert( d*e % phi == 1 )
    
        encrypted = pow(message, e, n)
        decrypted = pow(encrypted, d, n)
        return (message == decrypted)
    
    print(example(337, 283))
    print(example(341, 283))
    print(example(341, 389))

Related posts

[1] Fermat’s primality test is explained here. In our example we only tried one base, b = 2. If we had also tried b = 3 we would have discovered that 341 is not prime. In practice one would try several different bases. Using the heuristic that failures are independent for each base, it’s very unlikely that a composite number would be a pseudoprime for each of, say, 5o different bases.

Revealing information by trying to suppress it

FAS posted an article yesterday explaining how blurring military installations out of satellite photos points draws attention to them, showing exactly where they are and how big they are. The Russian mapping service Yandex Maps blurred out sensitive locations in Israel and Turkey. As the article says, this is an example of the Streisand effect, named after Barbra Streisand’s failed attempt to suppress photos of her Malibu home. Her efforts to protect her privacy caused the photos to go viral.

A similar issue arises in differential privacy. A practical implementation of differential privacy must rule out some queries a priori in some way that is not data-dependent. Otherwise it could be a violation of differential privacy either to answer or refuse to answer the same query. Short of refusing the query entirely, it could also be informative to reply “Are you sure you want to submit that query? It’s going to use up a lot of your privacy budget.”

Differential privacy adds random noise to query results, but in a more sophisticated way than Yandex blurring out portions of photos. Adding noise selectively reveals information about what the noise is trying to conceal. As the FAS article indicated, by blurring out only the sensitive areas, the blurred maps point out their exact location.

Selective security measures can have a similar effect. By placing greater protection on some information, you can mark that information as being particularly important. Once you realize this, otherwise nonsensical security measures start to make sense: applying uniform security results in excessive protection for some information, but keeps attackers from being able to identify the most valuable targets.

More privacy posts

Biometric security and hypothesis testing

A few weeks ago I wrote about how there are many ways to summarize the operating characteristics of a test. The most basic terms are accuracy, precision, and recall, but there are many others. Nobody uses all of them. Each application area has their own jargon.

Biometric security has its own lingo, and it doesn’t match any of the terms in the list I gave before.

facial scan authentication

False Acceptance Rate

Biometric security uses False Acceptance Rate (FAR) for the proportion of times a system grants access to an unauthorized person. In statistical terms, FAR is Type II error. Also known as False Match Rate (FRM).

False Rejection Rate

False Rejection Rate (FRR) is the proportion of times a biometric system fails to grant access to an authorized person. In statistical terms, FRR is Type I error. FAR is also known as False Non Match Rate (FNMR).

Crossover Error Rate

One way to summarize the operating characteristics of a biometric security system is to look at the Crossover Error Rate (CER), also known as the Equal Error Rate (EER). The system has parameters that can be tuned to adjust the FAR and FRR. Adjust these to the point where the FAR and FRR are equal. When the two are equal, their common value is the CER or EER.

The CER gives a way to compare systems. The smaller the CER the better. A smaller CER value means it’s possible to tune the system so that both the Type I and Type II error rates are smaller than they would be for another system.

Loss Function

CER is kind of a strange metric. Everyone agrees that you wouldn’t want to calibrate a system so that FAR = FRR. In security applications, FAR (unauthorized access) is worse than FRR (authorized user locked out). The former could be a disaster while the latter is an inconvenience. Of course there could be a context where the consequences of FAR and FRR are equal, or that FRR is worse, but that’s not usually the case.

A better approach would be to specify a loss function (or its negative, a utility function). If unauthorized access is K times more costly than locking out an authorized user, then you might want to know at what point K * FAR = FRR or your minimum expected loss [1] over the range of tuning parameters. The better system for you, in your application, is the one corresponding to your value of K.

Since everyone has a different value of K, it is easier to just use K = 1, even though everyone’s value of K is likely to be much larger than 1. Unfortunately this often happens in decision theory. When people can’t agree on a realistic loss function, they standardize on a mathematically convenient implicit loss function that nobody would consciously choose.

If everyone had different values of K near 1, the CER metric might be robust, i.e. it might often make the right comparison between two different systems even though the criteria is wrong. But since K is probably much larger than 1 for most people, it’s questionable that CER would rank two systems the same way people would if they could use their own value of K.

***

[1] These are not the same thing. To compute expected loss you’d need to take into account the frequency of friendly and unfriendly access attempts. In a physically secure location, friends may attempt to log in much more often than foes. On a public website the opposite is more likely to be true.