Safe Harbor and the calendar rollover problem

elderly woman

Data privacy is subtle and difficult to regulate. The lawmakers who wrote the HIPAA privacy regulations took a stab at what would protect privacy when they crafted the “Safe Harbor” list. The list is neither necessary or sufficient, depending on context, but it’s a start.

Extreme values of any measurement are more likely to lead to re-identification. Age in particular may be newsworthy. For example, a newspaper might run a story about a woman in the community turning 100. For this reason, the Safe Harbor previsions require that ages over 90 be lumped together. Specifically,

All elements of dates (except year) for dates that are directly related to an individual, including birth date, admission date, discharge date, death date, and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older.

One problem with this rule is that “age 90” is a moving target. Suppose that last year, in 2018, a data set recorded that a woman was born in 1930 and had a child in 1960. This data set was considered de-identified under the Safe Harbor provisions and published in a medical journal. On New Year’s Day 2019, does that data suddenly become sensitive? Or on New Year’s Day 2020? Should the journal retract the paper?!

No additional information is conveyed by the passage of time per se. However, if we knew in 2018 that the woman in question was still alive, and we also know that she’s alive now in 2019, we have more information. Knowing that someone born in 1930 is alive in 2019 is more informative than knowing that the same person was alive in 2018; there are fewer people in the former category than in the latter category.

The hypothetical journal article, committed to print in 2018, does not become more informative in 2019. But an online version of the article, revised with new information in 2019 implying that the woman in question is still alive, does become more informative.

No law can cover every possible use of data, and it would be a bad idea to try. Such a law would be both overly restrictive in some cases and not restrictive enough in others. HIPAA‘s expert determination provision allows a statistician to say, for example, that the above scenario is OK, even though it doesn’t satisfy the letter of the Safe Harbor rule.

More privacy posts