May I have the last four digits of your social?

call center

Imagine this conversation.

“Could you tell me your social security number?”

“Absolutely not! That’s private.”

“OK, how about just the last four digits?”

“Oh, OK. That’s fine.”

When I was in college, professors would post grades by the last four digits of student social security numbers [1]. Now that seems incredibly naive, but no one objected at the time. Using these four digits rather than names would keep your grades private from the most lazy observer but not from anyone willing to put out a little effort.

There’s a widespread belief in the US that your social security number is a deep secret, and that telling someone your social security number gives them power over you akin to a fairy telling someone his true name. On the other hand, we also believe that telling someone just the last four digits of your SSN is harmless. Both are wrong. It’s not that hard to find someone’s full SSN, and revealing the last four digits gives someone a lot of information to use in identifying you.

In an earlier post I looked at how easily most people could be identified by the combination of birth date, sex, and zip code. We’ll use the analytical results from that post to look at how easily someone could be identified by their birthday, state, and the last four digits of their SSN [2]. Note that the previous post used birth date, i.e. including year, where here we only look at birth day, i.e. month and day but no year. Note also that there’s nothing special about social security numbers for our purposes. The last four digits of your phone number would provide just as much information.

If you know someone lives in Wyoming, and you know their birthday and the last four digits of their SSN, you can uniquely identify them 85% of the time, and in an addition 7% of cases you can narrow down the possibilities to just two people. In Texas, by contrast, the chances of a birthday and four-digit ID being unique are 0.03%. The chances of narrowing the possibilities to two people are larger but still only 0.1%.

Here are results for a few states. Note that even though Texas has between two and three times the population of Ohio, it’s over 100x harder to uniquely identify someone with the information discussed here.

|-----------+------------+--------+--------|
| State     | Population | Unique |  Pairs |
|-----------+------------+--------+--------|
| Texas     | 30,000,000 |  0.03% |  0.11% |
| Ohio      | 12,000,000 |  3.73% |  6.14% |
| Tennessee |  6,700,000 | 15.95% | 14.64% |
| Wyoming   |    600,000 | 84.84% |  6.97% |
|-----------+------------+--------+--------|

Related posts

[1] Not only did they post SSNs, the SSNs were often taken from an alphabetized class roster. You could guess about where someone’s grade would be on the list based on their last name alone.

[2] In that post we made the dubious simplifying assumption that birth dates were uniformly distributed from 0 to 78 years. This assumption is not accurate, but it was good enough to prove the point that it’s easier to identify people than you might think. Here our assumptions are better founded. Birthdays are nearly uniformly distributed, though there are some slight irregularities. The last four digits of social security numbers are uniformly distributed, though the first digits are correlated with the state.

4 thoughts on “May I have the last four digits of your social?

  1. I must be older than you. Our professors would post our *entire* SSN along with our grades.

    I remember one time going down the list for a class I was doing really poorly in. Even though all the digits were there, I was scanning only the last 4 digits of SSN, looking for my own. Found my digits, scanned to the right for my grade — “F”!

    I knew I was doing poorly — but not that poorly! I repeated the process two more times, thinking maybe I shifted rows or something. Nope. “F” every time.

    Finally, I looked at the *whole* SSN — and it was not mine. A classmate and I had the same last four digits.

    I was never so happy to get a C in my life!

  2. Is part of this post missing? You say “We’ll use the analytical results from that post to look at how easily someone could be identified by their birthday, state, and the last four digits of their SSN [1]”, add a note about something being slightly off with yesterday’s post, and then just straight to results, with no discussion of how you generated them.

  3. Sol, Just use the expressions exp(-r) and r exp(-r)/2 from the earlier post where r = population size / demographic categories. For example, with Wyoming, r = 600,000 / (365*10,000).

  4. Had a class that did something similar, but still posted the results in alphabetical order. In the small class of people who knew each other, it was trivial to figure out who got what grade.

Comments are closed.