The baseball inequality

baseball game

There’s a theorem that’s often used and assumed to be true but rarely stated explicitly. I’m going to call it “the baseball inequality” for reasons I’ll get to shortly.

Suppose you have two lists of k positive numbers each:

n_1, n_2, n_3, \ldots, n_k

and

d_1, d_2, d_3, \ldots, d_k

Then

\min_{1 \leq i \leq k} \frac{n_i}{d_i} \leq \frac{n_1 + n_2 + n_3 + \cdots + n_k}{d_1 + d_2 + d_3 + \cdots + d_k} \leq \max_{1 \leq i \leq k} \frac{n_i}{d_i}

This says, for example, that the batting average of a baseball team is somewhere between the best individual batting average and the worst individual batting average.

The only place I can recall seeing this inequality stated is in The Cauchy-Schwarz Master Class by Michael Steele. He states the inequality in exercise 5.1 and gives it the batting average interpretation. (Update: This is known as the “mediant inequality.” Thanks to Tom in the comments for letting me know. So the thing in the middle is called the “mediant” of the fractions.)

Note that this is not the same as saying the average of a list of numbers is between the smallest and largest numbers in the list, though that’s true. The batting average of a team as a whole is not the same as the average of the individual batting averages on that team. It might happen to be, but in general it is not.

I’ll give a quick proof of the baseball inequality. I’ll only prove the first of the two inequalities. That is, I’ll prove that the minimum fraction is no greater than the ratio of the sums of numerators and denominators. Proving that the latter is no greater than the maximum fraction is completely analogous.

Also, I’ll only prove the theorem for two numerators and two denominators. Once you have proved the inequality for two numerators and denominators, you can bootstrap that to prove the inequality for three numerators and three denominators, and continue this process for any number of numbers on top and bottom.

So we start by assuming

\frac{a}{b} \leq \frac{c}{d}

Then we have

\begin{align*} \frac{a}{b} &= \frac{a\left(1 + \dfrac{d}{b} \right )}{b\left(1 + \dfrac{d}{b} \right )} \\ &= \frac{a + \dfrac{a}{b}d}{b + d} \\ &\leq \frac{a + \dfrac{c}{d}d}{b+d} \\ &= \frac{a + c}{b+d} \end{align*}

More inequality posts

7 thoughts on “The baseball inequality

  1. If you set each d equal to 1, you do recover the fact that the average of a set of numbers is between the smallest and the biggest.

  2. This is also known as the mediant inequality if you’re looking for other sources. Roger Nelsen’s proof without words books have a couple of visual proofs of the two fraction fact.

  3. You have actually mentioned mediants in your blog before :) This was in relation to the Farey sequence and rational approximation [1].

  4. The average of the team isn’t the average of the averages, but it is a weighted average, isn’t it?

    Set N = sum n_i, D = sum d_i, c_i = d_i/D. Then

    N/D = sum c_i n_i / d_i

    and so the team average is a convex combination of the player averages and therefore contained in its convex hull. (Am I making a dumb mistake here?) This seems a simpler and more general proof.

  5. You can also think of this as integrating a function that is defined piecewise to be n_i/d_i on an interval of size d_i (or more generally a set of measure d_i) for each i. Clearly this function is bounded by the constants that are the minimum and maximum n_i/d_i, and integrating all three over the domain with measure d_1+…+d_k gives the inequality.

Leave a Reply

Your email address will not be published. Required fields are marked *