Personal information in digital photos

Is it possible to identify the people in the photo above? Maybe. Digital images potentially contain a large amount of metadata that could reveal the photographer’s identify and location. There may also be a surprising number of clues in the photo itself.

EXIF metadata

The standard format for image metadata is EXIF, Exchangeable Image File Format. Some of this information is obviously identifiable, such as fields called CameraOwnerName, Photographer, and ImageEditor. A camera may or may not include such information, and someone may remove this image from photos after they are taken, but this image is possible inside the photo.

Similarly, the photo may include information regarding where the photo was taken, such as in the GPSLatitude, GPSLongitude, and GPSAltitude fields. There are also fields for recording when the photo was taken or edited.

A recurring theme in data privacy is that information that is not obviously identifiable may still be used to identify someone. If this data doesn’t do the whole job, it narrows down possibilities to the point that other known information may complete the identification.

For example, the highly technical fields contained in an image could identify the camera equipment. The camera serial number directly identifies the camera, but other fields may indirectly identify the camera.

Similarly, a image without GPS data still maybe contain indirect location. For example, there are fields for recording temperature, humidity, and atmospheric pressure. These fields used in combination with timestamps could identify a location, or at least narrow down the set of possible locations.

There are many EXIF fields that are allowed to be arbitrarily long ASCII or Unicode (UTF-8) sequences. A program for editing EXIF data would allow someone to copy the contents of Moby Dick into one of these fields.

The next post describes a similar situation for medical images.

Clues in the photo itself

Stripping EXIF data from an image before making it public is a good idea both for privacy and for size. If a free text field does contain Moby Dick, you could make your image 1.2 MB smaller by removing it.

However, it’s often possible to detect from the photo itself where the photo was taken. I stumbled on a YouTube channel of someone who identifies photos as a hobby. No doubt there are many such people. The host invites people to send in photos and he uses openly available information to track down where they are.

If you strip the precise time and location information from the metadata, someone may be able to infer approximate replacements from clues in the photo itself such as shadows or seasonal vegetation.

Ordinary people have no idea how much location information can be inferred from a photo. Neither do some people who ought to know better. There was a story a few months ago about a photo at a secret military location whose position was inferred from, among other clues, stars that faintly appeared in the sky near dusk.

Update: As noted in the comments, Facebook has a patent on a way to identify people from the pattern of dust on their camera lenses.

Related posts

Photo by Evgeniy Prokofiev on Unsplash

What can you learn from a phone number?

What can someone learn about you from your phone number?

The answer depends on what other information someone has. Identifiers always depend on context. To a naked man in a tree [1] the phone number doesn’t carry any information. But to someone with a list of names and phone numbers, some sort of reverse phone number look up, it might tell them your name.

A while back I wrote about area codes and how they are distributed among states. NANPA is publicly posts data that goes into greater detail with central office codes. Using this data, you can look up the first six digits of a phone number and find more specifically where the central office associated with the number is located geographically.

For example, take the phone number 469 863 7090. This is a business phone number, and so you could type it into a search engine and find out exactly whose number it is. But if that weren’t possible, you could look up 469-863 in the NANPA database to find that the number is located in Frisco, Texas. In fact, the number belongs to Sky Rocket Burger. Recommended.

Now people can move around and keep their mobile phone numbers, so any kind of phone look up may tell you about where someone used to be rather than where they are. That could be even more useful.

Related posts

[1] A lawyer once told me that his law school professor said that the only thing the interstate commerce clause of the US Constitution doesn’t apply to is a naked man in a tree.

What can you learn from a credit card number?

The first 4 to 6 digits of a credit card number are the bank identification number or BIN. The information needed to decode a BIN is publicly available, with some effort, and so anyone could tell from a credit card number what institution issued it, what bank it draws on, whether its a personal or business card, etc.

Suppose your credit card number was exposed in a data breach. Someone makes a suspicious purchase with your card, the issuer contacts you, you cancel the card, and you get a new card from the same source. The number can no longer be used to make purchases on your account, but what information did it leave behind?

The cancelled number might tell someone where you used to bank, which is probably where you still bank. And it may tell them the first few digits of your new card since the new card is issued by the same institution [1]. If the old BIN doesn’t directly reveal your new BIN, it at least narrows down the possibilities.

The information in your BIN, by itself, will not identify you, but it does provide clues that might lead to identifying you when combined with other information.

Related posts

[1] According to Andrew in the comments, American Express often changes credit card numbers as little as possible when issuing a replacement, changing only one content digit and the checksum.

Computed IDs and privacy implications

Thirty years ago, a lot of US states thought it would be a good idea to compute someone’s drivers license number (DLN) from their personal information [1]. In 1991, fifteen states simply used your Social Security Number as your DLN. Eleven other states computed DLNs by applying a hash function to personal information such as name, birth date, and sex. A few other states based DLNs in part but not entirely on personal information.

Presumably things have changed a lot since then. If you know of any states that still do this, please let me know in the comments. Even if states have stopped computing DLNs from personal data, I’m sure many organizations still compute IDs this way.

The article I stumbled on from 1991 gave no hint perhaps encoding personal information into an ID number could be a problem. And at the time it wasn’t as much of a problem as it would be now.

Why is it a problem if IDs are computed from personal data? People don’t realize what information they’re giving away. Maybe they would be willing to give someone their personal information, but not their DLN, or vice versa, not realizing that the two are equivalent. They also don’t realize what information about them someone may already have; a little bit more info may be all an attacker needs. And they don’t realize the potential consequences of their loss of privacy.

In some cases the hashing functions were complicated, but not too complicated to carry out by hand. And even if states were applying a cryptographic hash function, which they certainly were not, this would still be a problem for reasons explained here. If you have a database of personal information, say from voter registration records, you could compute the hash value of everyone in the state, or at least a large enough portion that you stand a good chance of being able to reverse a hashed value.

Related posts

[1] Joseph A. Gallian. Assigning Driver’s License Numbers. Mathematics Magazine, Vol. 64, No. 1 (Feb., 1991), pp. 13-22.

No funding for uncomfortable results

In 1997 Latanya Sweeney dramatically demonstrated that supposedly anonymized data was not anonymous. The state of Massachusetts had released data on 135,000 state employees and their families with obvious identifiers removed. However, the data contained zip code, birth date, and sex for each individual. Sweeney was able to cross reference this data with publicly available voter registration data to find the medical records of then Massachusetts governor William Weld.

An estimated 87% of Americans can be identified by the combination of zip code, birth date, and sex. A back-of-the-envelope calculation shows that this should not be surprising, but Sweeney appears to be the first to do this calculation and pursue the results. (Update: See such a calculation in the next post.)

In her paper Only You, Your Doctor, and Many Others May Know Sweeney says that her research was unwelcome. Over 20 journals turned down her paper on the Weld study, and nobody wanted to fund privacy research that might reach uncomfortable conclusions.

A decade ago, funding sources refused to fund re-identification experiments unless there was a promise that results would likely show that no risk existed or that all problems could be solved by some promising new theoretical technology under development. Financial resources were unavailable to support rigorous scientific studies otherwise.

There’s a perennial debate over whether it is best to make security and privacy flaws public or to suppress them. The consensus, as much as there is a consensus, is that one should reveal flaws discreetly at first and then err on the side of openness. For example, a security researcher finding a vulnerability in Windows would notify Microsoft first and give the company a chance to fix the problem before announcing the vulnerability publicly. In Sweeney’s case, however, there was no single responsible party who could quietly fix the world’s privacy vulnerabilities. Calling attention to the problem was the only way to make things better.

More privacy posts

Photo of Latanya Sweeney via Parker Higgins [CC BY 4.0], from Wikimedia Commons