This week President Biden signed a long, technically detailed executive order (Executive Order 14110) that among other things requires the Secretary of Commerce to look into differential privacy.
Within 365 days of the date of this order … the Secretary of Commerce … shall create guidelines for agencies to evaluate the efficacy of differential-privacy-guarantee protections, including for AI. The guidelines shall, at a minimum, describe the significant factors that bear on differential-privacy safeguards and common risks to realizing differential privacy in practice.
I doubt many people have read this order. Print preview on my laptop said it would take 64 pages to print. Those brave souls who try to read it will find technical terms like differential privacy that they likely do not understand.
So just what is differential privacy? A technical definition involves bounds on ratios of cumulative probability distributions functions, not the kind of thing you usually see in newspapers, or in executive orders.
What is differential privacy?
The basic idea behind differential privacy is to protect the privacy of individuals represented in a database by limiting the degree to which each person’s presence in the database can impact queries of the database.
A calibrated amount of randomness is added to the result of each query. The amount of randomness is proportional to the sensitivity of the query. For innocuous queries the amount of added randomness may be very small, maybe even less than the amount of uncertainty inherent in the data. But if you ask a query that risks revealing information about an individual, the amount of added randomness increases, possibly increasing so much that the result is meaningless.
Each question (database query) potentially reveals some information about a person, and so a privacy budget keeps track of the queries a person has posed. Once you’ve used up your privacy budget, you’re not allowed to ask any more questions. Otherwise you could ask the same question (or closely related questions) over and over, then average your results to essentially remove the randomness that was added.
Differential privacy is great in theory, and possibly in practice too. But the practicality depends a great deal on context. For example, exactly how much noise is added to query results? That depends on the level of privacy you want to achieve, usually denoted by a parameter ε. Smaller values of ε provide more privacy, and larger values provide less.
How big should ε be? There is no generic answer. The size of ε must depend on context. Set ε too small and the utility of the data vanishes. Set ε too high and there’s effectively no privacy protection.
US Census Bureau
Biden’s executive order isn’t the US government forey into differential privacy. The US Census Bureau used differential privacy on results released from the 2020 census. This means that the reported results are deliberately not accurate, though hopefully the results are accurate enough, with only the minimum amount of inaccuracy injected as necessary to preserve privacy. Opinions are divided on whether that was the case. Some researchers have complained that the results were too noisy for the data they care about. The Census Bureau could reply “Sorry, but we gave you the best results we could while adhering to our privacy framework.”
Implementing differential privacy at the scale of the US Census took an enormous amount of work. The census serves as a case study that would allow other government agencies to have an idea of what they’re getting into.
Pros and cons of differential privacy
Differential privacy rests on a solid mathematical foundation. While this means that it provides strong privacy guarantees (if implemented correctly), it also means that it takes some effort to understand. Differential privacy opens up new possibilities but requires new ways of working.
If you’d like help understanding how your company could take advantage of differential privacy, or minimize the disruption of being required to implement differential privacy, let’s talk.