There’s a theorem that’s often used and assumed to be true but rarely stated explicitly. I’m going to call it “the baseball inequality” for reasons I’ll get to shortly.

Suppose you have two lists of *k* positive numbers each:

and

Then

This says, for example, that the batting average of a baseball team is somewhere between the best individual batting average and the worst individual batting average.

The only place I can recall seeing this inequality stated is in The Cauchy-Schwarz Master Class by Michael Steele. He states the inequality in exercise 5.1 and gives it the batting average interpretation. (**Update**: This is known as the “mediant inequality.” Thanks to Tom in the comments for letting me know. So the thing in the middle is called the “mediant” of the fractions.)

Note that this is not the same as saying the average of a list of numbers is between the smallest and largest numbers in the list, though that’s true. The batting average of a team as a whole is not the same as the average of the individual batting averages on that team. It might happen to be, but in general it is not.

I’ll give a quick proof of the baseball inequality. I’ll only prove the first of the two inequalities. That is, I’ll prove that the minimum fraction is no greater than the ratio of the sums of numerators and denominators. Proving that the latter is no greater than the maximum fraction is completely analogous.

Also, I’ll only prove the theorem for two numerators and two denominators. Once you have proved the inequality for two numerators and denominators, you can bootstrap that to prove the inequality for three numerators and three denominators, and continue this process for any number of numbers on top and bottom.

So we start by assuming

Then we have

## More inequality posts

If you set each d equal to 1, you do recover the fact that the average of a set of numbers is between the smallest and the biggest.

This is also known as the mediant inequality if you’re looking for other sources. Roger Nelsen’s proof without words books have a couple of visual proofs of the two fraction fact.

You have actually mentioned mediants in your blog before :) This was in relation to the Farey sequence and rational approximation [1].

The average of the team isn’t the average of the averages, but it is a weighted average, isn’t it?

Set N = sum n_i, D = sum d_i, c_i = d_i/D. Then

N/D = sum c_i n_i / d_i

and so the team average is a convex combination of the player averages and therefore contained in its convex hull. (Am I making a dumb mistake here?) This seems a simpler and more general proof.

You can also think of this as integrating a function that is defined piecewise to be n_i/d_i on an interval of size d_i (or more generally a set of measure d_i) for each i. Clearly this function is bounded by the constants that are the minimum and maximum n_i/d_i, and integrating all three over the domain with measure d_1+…+d_k gives the inequality.

And most importantly, this inequality is very much related to Simpson paradox.