4.1. Does hashing attributes protect privacy?
Hashing may indeed protect private information, but it can also fail in subtle ways.
Cryptographical hashing algorithms make it impractical to infer the input of the algorithm from its output, provided we know nothing about the input. But if the data values come from a small set of possible values, and if the hashing algorithm is known, then it is possible to hash all possible values, making what is known as a “rainbow table.”
For example, there are only 50 US states. If US state of residence is hashed and the hash value is used as a database field, anyone could hash the 50 state names and see which one corresponds to each hashed value.
Even for fields with many more possible values, such as phone numbers, it is feasible to create a rainbow table by exhaustively hashing the values. However, if you concatenate several attributes together before hashing, the universe of possible inputs may be too large to exhaustively hash.
One way to make a rainbow table attack less feasible is to use a key with the hash so that in effect the hashing algorithm is unknown. (The hashing algorithm could be a standard algorithm like SHA-256, but if the data is XOR‘d with a private key before hashing, then in a sense the hashing algorithm is unknown.) Another way to thwart rainbow table attacks is to use an algorithm designed to be time-consuming or memory-consuming, such as Argon2.
Sometimes it is possible to infer the input to a hash function after the fact even if it is not possible to pre-compute the hash values. For example, if someone hashed US states of residence, using an expensive hash function with a private key, one could still infer, for example, that the hash value that appears most frequently in the data is likely to be California since that is the most populous state..
4.2. Does metadata pose a privacy risk if the content is encrypted?
Suppose you know that someone called a drug abuse hotline one night, and called several drug rehabilitation facilities the next day. You know what phone numbers they called and how long each call lasted. But the content of the call was encrypted. What do you suppose the phone calls were about?
Trusted consultants to some of the world’s leading companies