A probability distribution is called “fat tailed” if its probability density goes to zero slowly. Slowly relative to what? That is often implicit and left up to context, but generally speaking the exponential distribution is the dividing line. Probability densities that decay faster than the exponential distribution are called “thin” or “light,” and densities that decay slower are called “thick”, “heavy,” or “fat,” or more technically “subexponential.” The distinction is important because fat-tailed distributions tend to defy our intuition.

One surprising property of heavy-tailed (subexponential) distributions is the single big jump principle. Roughly speaking, most of the contribution to the sum of several heavy-tailed random variables comes from the largest of the samples. To be more specific, let “several” = 4 for reasons that’ll be apparent soon, though the result is true for any *n*. As *x* goes to infinity, the probability that

*X*_{1} + *X*_{2} + *X*_{3} + *X*_{4}

is larger than *x* is asymptotically equal the probability that

max(*X*_{1}, *X*_{2}, *X*_{3}, *X*_{4})

is larger than *x*.

The idea behind the **obesity index** [1] is turn the theorem above around, making it an empirical measure of how thick a distribution’s tail is. If you draw four samples from a random variable and sort them, the obesity index is the probability that that the sum of the max and min, *X*_{1} + *X*_{4}, is greater than the sum of the middle samples, *X*_{2} + *X*_{3}.

The obesity index could be defined for any distribution, but it only measures what the name implies for right-tailed distributions. For any symmetric distribution, the obesity index is 1/2. A Cauchy distribution is heavy-tailed, but it has two equally heavy tails, and so its obesity index is the same as the normal distribution, which has two light tails.

Note that location and scale parameters have no effect on the obesity index; shifting and scaling effect all the *X* values the same, so it doesn’t change the probability that *X*_{1} + *X*_{4} is greater than *X*_{2} + *X*_{3}.

To get an idea of the obesity index in action, we’ll look at the normal, exponential, and Cauchy distributions, since these are the canonical examples of thin, medium, and thick tailed distributions. But for reasons explained above, we’ll actually look at the *folded* normal and *folded* Cauchy distributions, i.e. we’ll take their absolute values to create right-tailed distributions.

To calculate the obesity index exactly you’d need to do analytical calculations with order statistics. We’ll simulate the obesity index because that’s easier. It’s also more in the spirit of calculating the obesity index from data.

from scipy.stats import norm, expon, cauchy def simulate_obesity(dist, N): data = abs(dist.rvs(size=(N,4))) count = 0 for row in range(N): X = sorted(data[row]) if X[0] + X[3] > X[1] + X[2]: count += 1 return count/N for dist in [norm, expon, cauchy]: print( simulate_obesity(dist, 10000) )

When I ran the Python code above, I got

0.6692 0.7519 0.8396

This ranks the three distributions in the anticipated order of tail thickness.

Note that the code above takes the absolute value of the random samples. This lets us pass in ordinary (unfolded) versions of the normal and Cauchy distributions, and its redundant for any distribution like the exponential that’s already positive-valued.

[I found out after writing this blog post that SciPy now has `foldnorm`

and `foldcauchy`

, but they don’t seem to work like I expect.]

Let’s try it on a few more distributions. Lognormal is between exponential and Cauchy in thickness. A Pareto distribution with parameter *b* goes to zero like *x*^{-1-b} and so we expect a Pareto distribution to have a smaller obesity index than Cauchy when *b* is greater than 1, and a larger index when *b* is less than one. Once again the simulation results are what we’d expect.

The code

for dist in [lognorm, pareto(2), pareto(0.5)]: print( simulate_obesity(dist, 10000) )

returns

0.7766 0.8242 0.9249

By this measure, lognormal is just a little heavier than exponential. Pareto(2) comes in lighter than Cauchy, but not by much, and Pareto(0.5) comes in heavier.

Since the obesity index is a probability, it will always return a value between 0 and 1. Maybe it would be easier to interpret if we did something like take the logit transform of the index to spread the values out more. Then the distinctions between Pareto distributions of different orders, for example, might match intuition better.

[1] Roger M. Cooke et al. Fat-Tailed Distributions: Data, Diagnostics and Dependence. Wiley, 2014.

Interesting. This appears to be related to some of the definitions of skewness that use quantiles (rather than moments) to provide a robust estimate of skewness. (See “A quantile definition for skewness”. ) In particular, I would rewrite the comparison from

X[3] + X[0] > X[1] + X[2]

to the equivalent statement

X[3] – X[2] > X[1] – X[2]

The statement is now a comparison of the lengths of tertiles for the samples of size 4. The simulation compares the average length of the right tertiles to the average length of the left tertiles. Do understand why this estimates kurtosis (fatness of tails) and not skewness?

1) When you’re trying to figure out your quartiles what do you do with repeated values like a dataset 0,0,0,5? Is that two numbers or four?

2) When a data set starts at zero, you have four zeros in the sample, and zero is a mode, does that give you a thick tail at zero?

3) Regardless of the second question, does every mode have a short tail?

Just for fun for R aficionados:

`simulate_obesity <- function(dist, N){

data <- lapply(seq_len(N), function(x) abs(dist(4)))

count <- 0

for(row in seq_len(N)){

X <- sort(data[[row]])

comp X[2] + X[3])

if(comp) count <- count + 1

}

return(count/N)

}

lapply(list(rnorm, rexp, rcauchy), function(x) simulate_obesity(dist = x, 100))

`