The dormitory I lived in as an undergraduate had its own five-digit zip code at one time. It was rumored to be the largest dorm in the US, or maybe the largest west of the Mississippi, or something like that. There were about 3,000 of us living there. Although the dorm had enough people to justify its own zip code—some zip codes have far fewer people—zip code boundaries were redraw so that the dorm shares its zip code with other areas.
Some zip code are so sparsely populated that people living in these areas are relatively easy to identify if you have other data. The so-called Safe Harbor provision of HIPAA (Health Insurance Portability and Accountability Act) says that it’s usually OK to include the first three digits of someone’s zip code in de-identified data. But there are 17 areas so thinly populated that even listing the first three digits of their zip code is considered too much of an identification risk. These are areas such that the first three digits of the zip code are:
This list could change over time. These are the regions that currently contain fewer than 20,000 people, the criterion given in the HIPAA regulations.
Knowing that someone is part of an area containing 20,000 people hardly identifies them. The concern is that in combination with other information, zip code data is more informative in these areas.
Related post: Bayesian clinical trials in one zip code