Imagine this conversation.
“Could you tell me your social security number?”
“Absolutely not! That’s private.”
“OK, how about just the last four digits?”
“Oh, OK. That’s fine.”
When I was in college, professors would post grades by the last four digits of student social security numbers. Now that seems incredibly naive, but no one objected at the time. Using these four digits rather than names would keep your grades private from the most lazy observer but not from anyone willing to put out a little effort.
There’s a widespread belief in the US that your social security number is a deep secret, and that telling someone your social security number gives them power over you akin to a fairy telling someone his true name. On the other hand, we also believe that telling someone just the last four digits of your SSN is harmless. Both are wrong. It’s not that hard to find someone’s full SSN, and revealing the last four digits gives someone a lot of information to use in identifying you.
In an earlier post I looked at how easily most people could be identified by the combination of birth date, sex, and zip code. We’ll use the analytical results from that post to look at how easily someone could be identified by their birthday, state, and the last four digits of their SSN . Note that the previous post used birth date, i.e. including year, where here we only look at birth day, i.e. month and day but no year. Note also that there’s nothing special about social security numbers for our purposes. The last four digits of your phone number would provide just as much information.
If you know someone lives in Wyoming, and you know their birthday and the last four digits of their SSN, you can uniquely identify them 85% of the time, and in an addition 7% of cases you can narrow down the possibilities to just two people. In Texas, by contrast, the chances of a birthday and four-digit ID being unique are 0.03%. The chances of narrowing the possibilities to two people are larger but still only 0.1%.
Here are results for a few states. Note that even though Texas has between two and three times the population of Ohio, it’s over 100x harder to uniquely identify someone with the information discussed here.
|-----------+------------+--------+--------| | State | Population | Unique | Pairs | |-----------+------------+--------+--------| | Texas | 30,000,000 | 0.03% | 0.11% | | Ohio | 12,000,000 | 3.73% | 6.14% | | Tennessee | 6,700,000 | 15.95% | 14.64% | | Wyoming | 600,000 | 84.84% | 6.97% | |-----------+------------+--------+--------|
 In that post we made the dubious simplifying assumption that birth dates were uniformly distributed from 0 to 78 years. This assumption is not accurate, but it was good enough to prove the point that it’s easier to identify people than you might think. Here our assumptions are better founded. Birthdays are nearly uniformly distributed, though there are some slight irregularities. The last four digits of social security numbers are uniformly distributed, though the first digits are correlated with the state.