Covered entities: TMPRA extends HIPAA

The US HIPAA law only protects the privacy of health data held by “covered entities,” which essentially means health care providers and insurance companies. If you give your heart monitoring data or DNA to your doctor, it comes under HIPAA. If you give it to Fitbit or 23andMe, it does not. Government entities are not covered by HIPAA either, a fact that Latanya Sweeney exploited to demonstrate how service dates be used to identify individuals.

Texas passed the Texas Medical Records Privacy Act (a.k.a. HB 300 or TMPRA) to close this gap. Texas has a much broader definition of covered entity. In a nutshell, Texas law defines a covered entity to include anyone “assembling, collecting, analyzing, using, evaluating, storing, or transmitting protected health information.” The full definition, available here, says

“Covered entity” means any person who:

(A) for commercial, financial, or professional gain, monetary fees, or dues, or on a cooperative, nonprofit, or pro bono basis, engages, in whole or in part, and with real or constructive knowledge, in the practice of assembling, collecting, analyzing, using, evaluating, storing, or transmitting protected health information. The term includes a business associate, health care payer, governmental unit, information or computer management entity, school, health researcher, health care facility, clinic, health care provider, or person who maintains an Internet site;

(B) comes into possession of protected health information;

(C) obtains or stores protected health information under this chapter; or

(D) is an employee, agent, or contractor of a person described by Paragraph (A), (B), or (C) insofar as the employee, agent, or contractor creates, receives, obtains, maintains, uses, or transmits protected health information.

Posts on other privacy regulations

Inferring religion from fitness data

woman looking at fitness tracker

Fitness monitors reveal more information than most people realize. For example, it may be possible to infer someone’s religious beliefs from their heart rate data.

If you have location data, it’s trivial to tell whether someone is attending religious services. But you could make a reasonable guess from cardio monitoring data alone.

Muslim prayers occur at five prescribed times a day. If you could detect that someone is kneeling every day at precisely those prescribed times, it’s likely they are Muslim. Maybe they just happen to be stretching while Muslims are praying, but that’s less likely.

It should be possible to detect when a person is singing by looking at fitness data. If you find that someone is singing every Sunday morning, it’s likely they are attending a church service. And if someone is consistently singing on Saturday evenings, they may be attending a large church, likely Catholic, which added a Saturday night service. Maybe they just have Saturday evening voice lessons, but attending a church service is more likely.

Maybe you could infer that someone is an observant Jew because they unusually inactive on Saturdays. Of course a lot of people take it easy on Saturdays. But if someone runs, for example, six days a week but not on Saturdays, something you could certainly tell from fitness data, that’s evidence that they may be Jewish. Not proof, but evidence.

All these inferences are fallible, of course. But that’s the nature of most privacy leaks. They don’t usually offer irrefutable evidence, but they update probabilities. One of the contributions of differential privacy is to acknowledge that all personal data leaks at least a little bit of information, and it’s better to acknowledge and control the amount of information leak than to pretend it doesn’t exist.

By the way, if you to keep your Fitbit data from revealing your religion, you might reveal it anyway. This is called the Barbara Streisand Effect for reasons explained here. If you take off your Fitbit five times a day, just before the Muslim call to prayer, you’re still giving someone who has access to your data clues to your religious affiliation.

Related posts

Putting topological data analysis in context

I got a review copy of The Mathematics of Data recently. Five of the six chapters are relatively conventional, a mixture of topics in numerical linear algebra, optimization, and probability. The final chapter, written by Robert Ghrist, is entitled Homological Algebra and Data. Those who grew up with Sesame Street may recall the song “Which one of these, is not like the other …”

When I first heard of topological data analysis (TDA), I was excited about the possibility of putting some beautiful mathematics to practical application. But it was hard for me to put TDA in context. How do you get actionable information out of it? If you find a seven-dimensional doughnut hiding in your data, that’s very interesting, but what do you do with that information?

Robert’s chapter in the book I’m reviewing has a nice introductory paragraph that helps put TDA in context. The section heading for the paragraph is “When is Homology Useful?”

Homological methods are, almost by definition, robust, relying on neither precise coordinates nor careful estimates for efficiency. As such, they are most useful in settings where geometric precision fails. With great robustness comes both great flexibility and great weakness. Topological data analysis is more fundamental than revolutionary: such techniques are not intended to supplant analytic, probabilistic, or spectral techniques. They can however reveal a deeper basis for why some data sets and systems behave the way they do. It is unwise to wield topological techniques in isolation, assuming that the weapons of unfamiliar “higher” mathematics are clad with incorruptible silver.

Robert’s background was in engineering and more conventional applied mathematics before he turned to applications of topology, and so he brings a broader perspective to TDA than someone trained in topology looking for ways to make topology useful. He also has a decade more experience applying TDA than when I interviewed him here. I’m looking forward to reading his new chapter carefully.

As I wrote about the other day, apparently the US Army believes that topological data analysis can be useful, presumably in combination with more quantitative methods. [1] More generally, it seems the Army is interested in mathematical models that are complementary to traditional models, models that are robust and flexible. The quote above cautions that with robustness and flexibility comes weakness, though ideally weakness that is offset by other models.

Related posts

[1] Algebraic topology is quantitative in one sense and qualitative in another. It aims to describe qualitative properties using algebraic invariants. It’s quantitative in the sense of computing homology groups, but it’s not as directly quantitative as more traditional mathematical models. It’s quantitative at a higher level of abstraction.

Assumed technologies

I just had a client ship me a laptop. We never discussed what OS the computer would run. I haven’t opened the box yet, but I imagine it’s running Windows 10.

I’ve had clients assume I run Windows, but also others who assume I run Linux or Mac. I don’t recall anyone asking me whether I used a particular operating system.

When clients say “I’ll send you some code” without specifying the language, it’s usually Python. Maybe they’ve seen Python code I’ve written, but my impression is that they assume everyone knows Python.

Other tools some will assume everyone uses:

  • Bash
  • Git and Github
  • Skype
  • MS Office
  • Signal
  • Google apps
  • Adobe Photoshop and Illustrator
  • Markdown
  • Jupyter notebooks

When I was in college, nearly every computer I saw ran Unix or Mac. I had no idea that the vast majority of the world’s computers ran Windows. I was in a bubble. And like most people in a bubble, I didn’t know I was in a bubble. I wish I had seen something like this blog post.

Elementary solutions to differential equations

Differential equations rarely have closed-form solutions. Some do, and these are emphasized in textbooks.

For this post we want to look specifically at homogeneous second order linear equations:

y ” + a(x) y‘ + b(x) y = 0.

If the coefficient functions a and b are constant, then the solution can be written down in terms of elementary functions, i.e. functions a first year calculus student would recognize. This would include, for example, polynomials, sines, and cosines, but would not include, the gamma function, Bessel functions, Airy functions, etc.

If the coefficients a and b are not constant, the differential equation usually does not have an elementary solution. In fact, you might wonder if it is ever possible in that case for the differential equation to have an elementary solution. Experience would suggest not.

A paper by Kovacic [1] thoroughly answers this question. The author gives algorithms for determining whether elementary solutions exist and how to find them if they do exist. The following example comes from that paper.

Consider the equation

y” + ry = 0

where [2]

r(x) = (4 x6 – 8 x5 + 12 x4 + 4 x3 + 7 x2 – 20 x + 4)/4 x4.

Then

y(x) = x-3/2 (x2 – 1) exp(x2/2 – x – 1/x)

is a solution, which the following Mathematica code verifies by evaluating to 0.

r[x_] := (4 x^6 - 8 x^5 + 12 x^4 + 4 x^3 + 7 x^2 - 20 x + 4)/(4 x^4)
f[x_] := x^(-3/2) (x^2 - 1) Exp[x^2/2 - x - 1/x]
Simplify[D[f[x], {x, 2}] - r[x] f[x]]

Related posts

[1] Jerald J. Kovacic. An algorithm for solving second order linear homogeneous differential equations. Journal of Symbolic Computation (1986) 2, 3–43.

[2] There’s an error in the paper, where the denominator of r is given as 4x rather than 4x4.

Finite rings

It occurred to me recently that I rarely hear about finite rings. I did a Google Ngram search to make sure this isn’t just my experience.

Finite group, finite ring, finite field ngram

Source

Why are finite groups and finite fields common while finite rings are not?

Finite groups have relatively weak algebraic structure, and demonstrate a lot of variety. Finite fields have very strong algebraic structure. Their complete classification has been known for a long time and is easy to state.

I imagine that most of the references to finite groups above have to do with classifying finite groups, and that most of the references to finite fields have to do with applications of finite fields, which are many.

You can see that references to finite groups hit their peak around the time of the Feit-Thompson theorem in 1962, and drop sharply after the classification of finite simple groups was essentially done in 1994. There’s a timeline of the progress toward the classification theorem on Wikipedia.

Rings have more structure than groups, but less structure than fields. Finite rings in particular are in a kind of delicate position: they easily become fields. Wedderburn’s little theorem says every finite domain is a field.

The classification of finite rings is much simpler than that of finite groups. And in applications you often want a finite field. Even if a finite ring (not necessarily a field) would do, you’d often use a finite field anyway.

In summary, my speculation as to why you don’t hear much about finite rings is that they’re not as interesting to classify as finite groups, and not as useful in application as finite fields.

Posts on finite simple groups

Posts on finite fields

Mixing error-correcting codes and cryptography

Secret codes and error-correcting codes have nothing to do with each other. Except when they do!

Error-correcting codes

Error correcting code make digital communication possible. Without some way to detect and correct errors, the corruption of a single bit could wreak havoc. A simple example of an error-detection code is check sums. A more sophisticated example would be erasure codes, a method used by data centers to protect customer data against hard drive failures or even entire data centers going offline.

People who work in coding theory are quick to point out that they do not work in cryptography. “No, not that kind of code. Error-correcting codes, not secret codes.” The goal isn’t secrecy. The goal is maximize the probability of correctly transmitting data while minimizing the amount of extra information added.

Codes and ciphers

You don’t hear the word “code” used in connection with cryptography much anymore. People used to refer to “codes and ciphers” in one breath. Historically, the technical distinction was that a code operated on words, while a cipher operated on characters. Codes in this sense have long been obsolete, but people still speak of codes colloquially.

David Kahn’s classic book on pre-modern cryptography is entitled The Codebreakers, not the Cipherbreakers, because the public at the time was more familiar with the term code than the term cipher. Maybe that’s still the case because, for example, Jason Fagone entitled his biography of Elizabeth Friedman The Woman Who Smashed Codes. Perhaps the author suggested The Woman Who Smashed Ciphers and an editor objected.

Code-based cryptography

If you’re accustomed to the older use of “codes,” the term “code-based cryptography” is redundant. But it means something specific in modern usage: cryptographic systems that incorporate error-correction codes. So error-correcting codes and secret “codes” do have something to do with each other after all!

Robert McEliece had this idea back in 1978. His encryption method starts with a particular error-correcting code, a binary Goppa code, and scrambles it with an invertible linear transformation. At a very high level, McEliece’s method boils down to a secret factorization, sorta like RSA but even more like oil and vinegar. The public key is the product of the Goppa code and the linear transformation, but only the owner knows the factorization of this key.

To encrypt a message with McEliece’s method, the sender adds a specific amount of random noise, noise that the Goppa code can remove. An attacker faces a challenging computational problem to recover the message without knowing how to factor the public key.

Post-quantum cryptography

McEliece’s method did not attract much interest at the time because it requires much larger public keys than other methods, such as RSA. However, there is renewed interest in McEliece’s approach because his scheme is apparently quantum-resistant whereas RSA and other popular public key systems are not.

If and when large quantum computers become practical, they could factor the product of large primes efficiently, and thus break RSA. They could also solve the discrete logarithm and elliptic discrete logarithm problems, breaking Diffie-Hellman and elliptic curve cryptosystems. All public key cryptosystems now in common use would be broken.

Why worry about this now while quantum computers don’t exist? (They exist, but only as prototypes. So far the largest number a quantum computer has been able to factor is 21.) The reason is that it takes a long time to develop, analyze, standardize, and deploy encryption methods. There’s also the matter of forward security: someone could store encrypted messages with the hope of decrypting them in the future. This doesn’t matter for cat photos transmitted over TLS, but it could matter for state secrets; governments may be encrypting documents that they wish to keep secret for decades.

NIST is sponsoring a competition to develop and standardize quantum-resistant encryption methods. Two months ago NIST announced the candidates that advanced to the second round. Seven of these methods use code-based cryptography, including the classic McEliece method and six variations: BIKE, HQC, LEDAcrypt, NTS-KEM, ROLLO, and RQC.

Related posts

US Army applying new areas of math

Many times on this blog I’ve argued that the difference between pure and applied math is motivation. As my graduate advisor used to say, “Applied mathematics is not a subject classification. It’s an attitude.”

Uncle Sam wants homotopy type theory

Traditionally there was general agreement regarding what is pure math and what is applied. Number theory and topology, for example, are pure, while differential equations and numerical analysis are applied.

But then public key cryptography and topological data analysis brought number theory and topology over into the applied column, at least for some people. And there are people working in differential equations and numerical analysis that aren’t actually interested in applications. It would be more accurate to say that some areas of math are more directly and more commonly applied than others. Also, some areas of math are predominantly filled with people interested in applications and some are not.

The US Army is interested in applying some areas of math that you would normally think of as very pure, including homotopy type theory (HoTT).

From an Army announcement:

Modeling frameworks are desired that are able to eschew the usual computational simplification assumptions and realistically capture … complexities of real world environments and phenomena, while still maintaining some degree of computational tractability. Of specific interest are causal and predictive modeling frameworks, hybrid model frameworks that capture both causal and predictive features, statistical modeling frameworks, and abstract categorical models (cf. Homotopy Type Theory).

And later in the same announcement

Homotopy Type Theory and its applications are such an area that is of significant interest in military applications.

HoTT isn’t the only area of math the Army announcement mentions. There are the usual suspects, such as (stochastic) PDEs, but also more ostensibly pure areas of math such as topology; the word “topological” appears 23 times in the document.

This would be fascinating. It can be interesting when a statistical model works well in application, but it’s no surprise: that’s what statistics was developed for. It’s more interesting when something finds an unexpected application, such as when number theory entered cryptography. The applications the Army has in mind are even more interesting because the math involved is more abstract and, one would have thought, less likely to be applied.

Related posts

Riffing on mistakes

I mentioned on Twitter yesterday that one way to relieve the boredom of grading math papers is to explore mistakes. If a statement is wrong, what would it take to make it right? Is it approximately correct? Is there some different context where it is correct? Several people said they’d like to see examples, so this blog post is a sort of response.

***

One famous example of this is the so-called Freshman’s Dream theorem:

(a + b)p = ap + bp

This is not true over the real numbers, but it is true, for example, when working with integers mod p.

(More generally, the Freshman’s Dream is true in any ring of characteristic p. This is more than an amusing result; it’s useful in applications of finite fields.)

***

A common misunderstanding in calculus is that a series converges if its terms converge to zero. The canonical counterexample is the harmonic series. It’s terms converge to zero, but the sum diverges.

But this can’t happen in the p-adic numbers. There if the terms of a series converge to zero, the series converges (though maybe not absolutely).

***

Here’s something sorta along these lines. It looks wrong, and someone might arrive at it via a wrong understanding, but it’s actually correct.

sin(xy) sin(x + y) = (sin(x) – sin(y)) (sin(x) + sin(y))

***

Odd integers end in odd digits, but that might not be true if you’re not working in base 10. See Odd numbers in odd bases.

***

You can misunderstand how percentages work, but still get a useful results. See Sales tax included.

***

When probabilities are small, you can often get by with adding them together even when strictly speaking they don’t add. See Probability mistake can make a good approximation.