The MD5 hashing algorithm was once considered secure cryptographic hash, but those days are long gone [1]. For a given hash value, it doesn’t take much computing power to create a document with the same hash.
Hash functions are not reversible in general. MD5 is a 128-bit hash, and so it maps any string, no matter how long, into 128 bits. Obviously if you run all strings of length, say, 129 bits, some of them have to hash to the same value. (Another win for the pigeon hole principle.)
And yet in practice it may be possible to reverse a hash, given some context. In the context of short, weak passwords, it may be possible to determine what text was hashed to create a particular value. All it may take is a quick web search [2]. For example, let’s take an insecure password like “password” and run it through a bit of Python to compute its MD5 hash.
>>> import hashlib >>> def foo(x): ... print(hashlib.md5(x.encode('utf-8')).hexdigest()) >>> foo("password") 5f4dcc3b5aa765d61d8327deb882cf99
A Google search returns over 21,000 hits on 5f4dcc3b5aa765d61d8327deb882cf99
, and the second one shows that it’s the hash of “password”.
If I try a slightly less naive password like “p@$$word” I still get several hits, indicating that the hash is part of a list of compromised passwords.
Not every hash of a short string can be reversed this way. If I hash my business phone number, for example, I get something that does not yet appear in Google searches. The hash could still be reversed easily, but it would take more than just a Google search.
See the next post for how salting can thwart web search attacks.
More security posts
[1] MD5 was invented in 1992 and the first flaw was discovered in 1996. Experts started moving away from MD5 at that time. More flaws were discovered over time. In 2010 the CMU Software Engineering Institute declared that MD5 was “cryptographically broken and unsuitable for further use.” It’s still being used, though not as much. MD5 still makes a useful checksum, though it’s not cryptographically secure.
[2] The same would be true of a secure hash of an insecure password. For example, SHA-256 is better than MD5, but you could look up the SHA-256 hash values of terrible passwords too. But MD5 hashes are easier to search on. In my experiments, I got far fewer hits searching on SHA-256 hash values.
If you’re trying to reverse hash values on your own computer without a web search, the MD5 value would require much less computation than the SHA-256 value.
In the password-cracking community, we encourage the word ‘cracking’ over words like ‘reversing’ or ‘dehashing’ – in order to avoid accidentally conveying to the uninitiated the idea that reversal is actually happening.
https://www.techsolvency.com/passwords/dehashing-reversing/
This exemplifies one of the reasons salting helps protect your encrypted data.