A probability distribution is called “fat tailed” if its probability density goes to zero slowly. Slowly relative to what? That is often implicit and left up to context, but generally speaking the exponential distribution is the dividing line. Probability densities that decay faster than the exponential distribution are called “thin” or “light,” and densities that decay slower are called “thick”, “heavy,” or “fat,” or more technically “subexponential.” The distinction is important because fat-tailed distributions tend to defy our intuition.
One surprising property of heavy-tailed (subexponential) distributions is the single big jump principle. Roughly speaking, most of the contribution to the sum of several heavy-tailed random variables comes from the largest of the samples. To be more specific, let “several” = 4 for reasons that’ll be apparent soon, though the result is true for any n. As x goes to infinity, the probability that
X1 + X2 + X3 + X4
is larger than a given x is asymptotically equal the probability that
max(X1, X2, X3, X4)
is larger than the same x.
The idea behind the obesity index  is to turn the theorem above around, making it an empirical measure of how thick a distribution’s tail is. If you draw four samples from a random variable and sort them, the obesity index is the probability that that the sum of the max and min, X1 + X4, is greater than the sum of the middle samples, X2 + X3.
The obesity index could be defined for any distribution, but it only measures what the name implies for right-tailed distributions. For any symmetric distribution, the obesity index is exactly 1/2. A Cauchy distribution is heavy-tailed, but it has two equally heavy tails, and so its obesity index is the same as the normal distribution, which has two light tails.
Note that location and scale parameters have no effect on the obesity index; shifting and scaling effect all the X values the same, so it doesn’t change the probability that X1 + X4 is greater than X2 + X3.
To get an idea of the obesity index in action, we’ll look at the normal, exponential, and Cauchy distributions, since these are the canonical examples of thin, medium, and thick tailed distributions. But for reasons explained above, we’ll actually look at the folded normal and folded Cauchy distributions, i.e. we’ll take their absolute values to create right-tailed distributions.
To calculate the obesity index exactly you’d need to do analytical calculations with order statistics. We’ll simulate the obesity index because that’s easier. It’s also more in the spirit of calculating the obesity index from data.
from scipy.stats import norm, expon, cauchy def simulate_obesity(dist, N): data = abs(dist.rvs(size=(N,4))) count = 0 for row in range(N): X = sorted(data[row]) if X + X > X + X: count += 1 return count/N for dist in [norm, expon, cauchy]: print( simulate_obesity(dist, 10000) )
When I ran the Python code above, I got
0.6692 0.7519 0.8396
This ranks the three distributions in the anticipated order of tail thickness.
Note that the code above takes the absolute value of the random samples. This lets us pass in ordinary (unfolded) versions of the normal and Cauchy distributions, and its redundant for any distribution like the exponential that’s already positive-valued.
[I found out after writing this blog post that SciPy now has
foldcauchy, but they don’t seem to work like I expect.]
Let’s try it on a few more distributions. Lognormal is between exponential and Cauchy in thickness. A Pareto distribution with parameter b goes to zero like x-1-b and so we expect a Pareto distribution to have a smaller obesity index than Cauchy when b is greater than 1, and a larger index when b is less than one. Once again the simulation results are what we’d expect.
for dist in [lognorm, pareto(2), pareto(0.5)]: print( simulate_obesity(dist, 10000) )
0.7766 0.8242 0.9249
By this measure, lognormal is just a little heavier than exponential. Pareto(2) comes in lighter than Cauchy, but not by much, and Pareto(0.5) comes in heavier.
Since the obesity index is a probability, it will always return a value between 0 and 1. Maybe it would be easier to interpret if we did something like take the logit transform of the index to spread the values out more. Then the distinctions between Pareto distributions of different orders, for example, might match intuition better.
 Roger M. Cooke et al. Fat-Tailed Distributions: Data, Diagnostics and Dependence. Wiley, 2014.