A sufficient statistic summarizes a set of data. If the data come from a known distribution with an unknown parameter, then the sufficient statistic carries as much information as the full data [0]. For example, given a set of *n* coin flips, the number of heads and the number of flips are sufficient statistics. More on sufficient statistics here.

There is a theorem by Koopman, Pitman, and Darmois that essentially says that useful sufficient statistics exist only if the data come from a class of probability distributions known as normal families [1]. This leads to what Persi Diaconis [2] labeled a paradox.

Exponential families are quite a restricted family of measures. Modern statistics deals with far richer classes of probabilities. This suggests a kind of paradox. If statistics is to be of any real use it must provide ways of boiling down great masses of data to a few humanly interpretable numbers. The Koopman-Pitman-Darmois theorem suggests this is impossible unless nature follows highly specialized laws which no one really believes.

Diaconis suggests two ways around the paradox. First, the theorem he refers to concerns sufficient statistics of a fixed size; it doesn’t apply if the summary size varies with the data size. Second, and more importantly, the theorem says nothing about a summary containing *approximately* as much information as the full data.

***

[0] Students may naively take this to mean that all you need is sufficient statistics. No, it says that *if you know the distribution* the data come from then all you need is the sufficient statistics. You cannot test whether a model fits given sufficient statistics that assume the model. For example, mean and variance are sufficient statistics assuming data come from a normal distribution. But knowing only the mean and variance, you can’t assess whether a normal distribution model fits the data.

[1] This does not mean the data have to come from the normal *distribution*. The normal family of distributions includes the normal (Gaussian) distribution, but other distributions as well.

[2] Persi Diaconis. Sufficiency as Statistical Symmetry. Proceedings of the AMS Centennial Symposium. August 8–12, 1988.

On the plus side: there are huge families of processes to that do give rise to distributions that we can work with, because they’re formed by combining the contributions of lots of small independent events.

Okay, maybe they’re not *really* independent, but somehow it often works out that the result is convincingly Gaussian or Poisson or Beta or whatever if you don’t look *too* close, and we’re able to do something useful with that.

Thanks for this keen observation and the two ways to avoid the paradox: variable length coding and lossy compression. Two highly influential ideas.

It seems like we might be ready for a different measure for sufficiency than ones reliant on idealized source distributions that are never to be.

Just as Physics is turning towards Sufficient Reason and Background Independence, perhaps it is time for Statistics, AI and Machine Learning to do the same. Isn’t sufficiency more mundanely about the coverage of questions we might ask and then answer, in principle, about a given dataset?

Perhaps we might think of sufficiency in the context of a domain of discourse?

I quote here George Boole from 1853, who helped seed our current thinking on information and its completeness:

“In every discourse, whether of the mind conversing with its own thoughts, or of the individual in his intercourse with others, there is an assumed or expressed limit within which the subjects of its operation are confined. The most unfettered discourse is that in which the words we use are understood in the widest possible application, and for them the limits of discourse are co-extensive with those of the universe itself. But more usually we confine ourselves to a less spacious field. Sometimes, in discoursing of men we imply (without expressing the limitation) that it is of men only under certain circumstances and conditions that we speak, as of civilized men, or of men in the vigour of life, or of men under some other condition or relation. Now, whatever may be the extent of the field within which all the objects of our discourse are found, that field may properly be termed the universe of discourse. Furthermore, this universe of discourse is in the strictest sense the ultimate subject of the discourse.”

This is one of those instances where your philosophical interpretation of probabilities is decisive.

If you think probabilities are stable physical frequencies, then you tend to think a probability distribution is something physically given and hence it will only occasionally have sufficient statistics, which is a problem.

If you think of probabilities as describing uncertainty ranges though, then the statistic itself is primary. The distribution is the maximum entropy distribution (i.e. what the post calls an exponential family distribution) which is in some sense the most spread out distribution compatible with that statistic.

In other words, with one interpretation you see the distribution as given and the statistics only rarely derived from it, and in the other you see the statistic as given and the distribution derived from it.