Sparsely Populated Zip Codes | HIPAA Safe Harbor

Country home

The dormitory I lived in as an undergraduate had its own five-digit zip code. It was rumored to be the largest dorm in the US, or maybe the largest west of the Mississippi, or something like that. There were about 3,000 of us living there. Although the dorm had enough people to justify its own zip code—some zip codes have far fewer people—zip code boundaries were later redraw so that the dorm shares its zip code with other areas.

Some zip code are so sparsely populated that people living in these areas are relatively easy to identify if you have other data. The so-called Safe Harbor provision of HIPAA (Health Insurance Portability and Accountability Act) says that it’s usually OK to include the first three digits of someone’s zip code in de-identified data. But there are some areas so thinly populated that even listing the first three digits of their zip code is considered too much of an identification risk.

Knowing that someone is part of an area containing 20,000 people hardly identifies them. The concern is that in combination with other information, zip code data is more informative in these areas.

2000 census

According to data from the 2000 census, the sparsely populated 3-digit zip code areas were

2010 census

The list of sparsely populated zip codes is shorter now according to the 2010 census.

Zip codes 063, 790, 830, 831, and 890 dropped off the list, and zip codes 205 and 369 were added.

Speculation for 2020

It appears the list of sparsely populated zip codes has not changed since 2010 based on the 2017 American Community Survey. However, one should go by official census data rather than ACS data, and things could change between 2017 and 2020.

Help with HIPAA de-identification

3 thoughts on “Sparsely populated zip codes”

Dimitriy Masterov

1 July 2016 at 12:42

You can even have zip codes with a single fictional inhabitant: Smokey Bear has his own ZIP, 20252.

Do you have strong feeling about using ZIP codes (or some approximation like ZCTAs) for analysis over things like block groups? I’ve thought that looking at ZIP codes is not always a very illuminating exercise, but one I see people use all the time. ZIP codes are not geographic areas, but are simply arrays of street addresses or carrier routes, modified at will by USPS for the purpose of routing mail as efficiently as possible. That means that this way of clustering people is arbitrary method of aggregation that may obscure valuable signal since these clusters are based on factors the PO finds convenient for delivery, and not any kind of underlying similarity. ZIP codes also change fairly frequently, so year-on-year comparisons can be compromised because it is not an apples to apples comparison. It might also be the case that bundling people that respond heterogeneously decreases the statistical power of our tests. Rural maps can have such a sparse network of roads with such strange zip code assignments that some rural areas cannot even be approximated with zip code regions. Finally, ZIPs don’t have characteristics such as population, so it is almost always an approximation to say that sales per capita in some ZIP code are low since the denominator is mis-measured. The trouble is that while census blocks use streets as edge boundaries, postal delivery routes generally service both sides of a single street. Therefore, census blocks near the edge are commonly split between ZIP codes.

Randy

10 February 2020 at 13:25

Hi John,

Do you have a reference for the 2010 list of zip codes? I’m trying to find a resource through a ‘.gov’ website for those.

Thanks in advance.

John

10 February 2020 at 13:28

I don’t recall where I found it, but I remember it was hard to track down.

Comments are closed.

2000 census

2010 census

Speculation for 2020

Related

3 thoughts on “Sparsely populated zip codes”