Microsoft replacing SHA-1 for software updates

According to this article, Microsoft is patching Windows 7 and Windows Server 2008 to look for SHA-2 hash functions of updates. These older versions of Windows have been using SHA-1, while newer version are already using SHA-2.

This is a good move, but unnecessary. Here’s what I mean by that. The update was likely unnecessary for reasons I’ll explain below, but it was easy to do, and it increased consistency across Microsoft’s product line. It’s also good PR.

What are SHA-1 and SHA-2?

Let’s back up a bit. SHA-1 and SHA-2 are secure hash functions [1]. They take a file, in this case a Microsoft software update, and return a relatively small number, small relative to the original file size. In the case of SHA-1, the result is 160 bits (20 bytes). They’re designed so that if a file is changed, the function value is nearly certain to change. That is, it’s extremely unlikely that a change to the file would not result in a change to the hash value.

The concern isn’t accidental changes. The probability of accidentally producing two files with the same hash function value is tiny as I show here.

The concern is a clever attacker who could modify the software update in such a way that the hash function remains unchanged, bypassing the hash as a security measure. That would be harder to do with SHA-2 than with SHA-1, hence Microsoft’s decision years ago to move to SHA-2 for new versions of the operating system, and its recent decision to make the change retroactive.

How hard is it to produce collisions?

By a collision we mean two files that hash to the same value. It’s obvious from the pigeon hole principle [2] that collisions are possible, but how hard are they to produce deliberately?

Google demonstrated two years ago that it could produce two PDF files with the same SHA-1 hash value. But doing so required over 6,500 years of CPU time running in parallel [3]. Also, Google started with a file designed to make collisions possible. According to their announcement,

We started by creating a PDF prefix specifically crafted to allow us to generate two documents with arbitrary distinct visual contents, but that would hash to the same SHA-1 digest.

It would be harder to start with a specified input, such as a software update file and generate a collision. It would be harder still to generate a collision that had some desired behavior.

According to this page, it’s known how to tamper with two files simultaneously so that they will have the same SHA-1 hash values. This is what Google did, at the cost of thousands of CPU years. But so far, nobody has been able to start with a given file and create another file with the same SHA-1 value. (Update: Now they have!)

As I said at the beginning, it made sense for Microsoft to decide to move from SHA-1 to SHA-2 because the cost of doing so was small. But the use of SHA-1 hash codes is probably not the biggest security risk in Windows 7.

What are SHA-1 and SHA-2?

How hard is it to produce collisions?

More secure hash posts