Bounding the 3rd moment by the 4th moment

Posted on 29 January 2018 by John

For a random variable X, the kth moment of X is the expected value of X^k.

For any random variable X with 0 mean, or negative mean, there’s an inequality that bounds the 3rd moment, m₃ in terms of the 4th moment, m₄:

$m_3 \leq \left(\frac{4}{27} \right )^{1/4} m_4^{3/4}$

The following example shows that this bound is the best possible.

Define

u = (1 – √ 3)/√ 2
v = (1 + √ 3)/√ 2
p = (3 + √ 3) / 6

and suppose X = u with probability p and X = v with probability q = 1 – p. Then X has mean 0, and you can show that exact equality holds in the inequality above.

Here’s some Mathematica code to verify this claim.

    u = (1 - Sqrt[3])/Sqrt[2]
    v = (1 + Sqrt[3])/Sqrt[2]
    p = (Sqrt[3] + 1)/(2 Sqrt[3])
    q = 1 - p
    m3 = p u^3 + q v^3
    m4 = p u^4 + q v^4
    Simplify[ (4/27)^(1/4) m4^(3/4) / m3 ]

As hoped, the code returns 1.

Reference: Iosif Pinelis, Relations Between the First Four Moments, American Mathematical Monthly, Vol 122, No 5, pp 479-481.

Gaussian correlation inequality

Posted on 5 June 2017 by John

The Gaussian correlation inequality was proven in 2014, but the proof only became widely known this year. You can find Thomas Royan’s remarkably short proof here.

Let X be a multivariate Gaussian random variable with mean zero and let E and F be two symmetric convex sets, both centered at the origin. The Gaussian correlation inequality says that

Prob(X in E and F) ≥ Prob(X in E) Prob(X in F).

Here’s a bit of Python code for illustrating the inequality. For symmetric convex sets we take balls of p-norm r where p ≥ 1 and r > 0. We could, for example, set one of the values of p to 1 to get a cube and set the other p to 2 to get a Euclidean ball.

from scipy.stats import norm as gaussian

def pnorm(v, p):
    return sum( abs(x)**p for x in v )**(1./p)

def simulate(dim, r1, p1, r2, p2, numReps):
    count_1, count_2, count_both = (0, 0, 0)
    for _ in range(numReps):
        x = gaussian.rvs(0, 1, dim)
        in_1 = (pnorm(x, p1) < r1)
        in_2 = (pnorm(x, p2) < r2)
        if in_1:
            count_1 += 1
        if in_2:
            count_2 += 1
        if in_1 and in_2:
            count_both += 1
    print("Prob in both:", count_both / numReps)
    print("Lower bound: ", count_1*count_2 * numReps**-2)


simulate(3, 1, 2, 1, 1, 1000)

When numReps is large, we expect the simulated probability of the intersection to be greater than the simulated lower bound. In the example above, the former was 0.075 and the latter 0.015075, ordered as we’d expect.

If we didn’t know that the theorem has been proven, we could use code like this to try to find counterexamples. Of course a simulation cannot prove or disprove a theorem, but if we found what appeared to be a counterexample, we could see whether it persists with different random number generation seeds and with a large value of numReps. If so, then we could try to establish the inequality analytically. Now that the theorem has been proven we know that we’re not going to find real counterexamples, but the code is only useful as an illustration.

Truncated exponential series inequality

Posted on 14 November 2016 by John

Define T_n to be the Taylor series for exp(x) truncated after n terms:

$T_n(x) = 1 + x+ \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots + \frac{x^n}{n!}$

How does this function compare to its limit, exp(x)? We might want to know because it’s often useful to have polynomial upper or lower bounds on exp(x).

For x > 0 it’s clear that exp(x) is larger than T_n(x) since the discarded terms in the power series for exp(x) are all positive.

The case of x < 0 is more interesting. There exp(x) > T_n(x) if n is odd and exp(x) < T_n(x) if n is even.

Define f_n(x) = exp(x) – T_n(x). If x > 0 then f_n(x) > 0.

We want to show that if x < 0 then f_n(x) > 0 for odd n and f_n(x) < 0 for even n.

For n = 1, note that f₁ and its derivative are both zero at 0. Now suppose f₁ is zero at some point a < 0. Then by Rolle’s theorem, there is some point b with a < b < 0 where the derivative of f₁ is 0. Since the derivative of f₁ is also zero at 0, there must be some point c with b < c < 0 where the second derivative of f₁ is 0, again by Rolle’s theorem. But the second derivative of f₁ is exp(x) which is never 0. So our assumption f₁(a) = 0 leads to a contradiction.

Now f₁(0) = 0 and f₁(x) ≠ 0 for x < 0. So f₁(x) must be always positive or always negative. Which is it? For negative x, exp(x) is bounded and so

f₁(x) = exp(x) – 1 – x

is eventually dominated by the –x term, which is positive since x is negative.

The proof for n = 2 is similar. If f₂(x) is zero at some point a < 0, then we can use Rolle’s theorem to find a point b < 0 where the derivative of f₂ is zero, and a point c < 0 where the second derivative is zero, and a point d < 0 where the third derivative is zero. But the third derivative of f₂ is exp(x) which is never zero.

As before the contradiction shows f₂(x) ≠ 0 for x < 0. So is f₂(x) always positive or always negative? This time we have

f₂(x) = exp(x) – 1 – x – x²/2

which is eventually dominated by the –x² term, which is negative.

For general n, we assume f_n is zero for some point x < 0 and apply Rolle’s theorem n+1 times to reach the contradiction that exp(x) is zero somewhere. This tells us that f_n(x) is never zero for negative x. We then look at the dominant term –xⁿ to argue that f_n is positive or negative depending on whether n is odd or even.

Another way to show the sign of f_n(x) for negative x would be to apply the alternating series theorem to x = -1.

Improving on Chebyshev’s inequality

Posted on 12 February 2016 by John

Chebyshev’s inequality says that the probability of a random variable being more than k standard deviations away from its mean is less than 1/k². In symbols,

$P(|X - E(X)| \geq k\sigma) \leq \frac{1}{k^2}$

This inequality is very general, but also very weak. It assumes very little about the random variable X but it also gives a loose bound. If we assume slightly more, namely that X has a unimodal distribution, then we have a tighter bound, the Vysochanskiĭ-Petunin inequality.

$P(|X - E(X)| \geq k\sigma) \leq \frac{4}{9k^2}$

However, the Vysochanskiĭ-Petunin inequality does require that k be larger than √(8/3). In exchange for the assumption of unimodality and the restriction on k we get to reduce our upper bound by more than half.

While tighter than Chebyshev’s inequality, the stronger inequality is still very general. We can usually do much better if we can say more about the distribution family. For example, suppose X has a uniform distribution. What is the probability that X is more than two standard deviations from its mean? Zero, because two standard deviations puts you outside the interval the uniform is defined on!

Among familiar distributions, when is the Vysochanskiĭ-Petunin inequality most accurate? That depends, of course, on what distributions you consider familiar, and what value of k you use. Let’s look at normal, exponential, and Pareto. These were chosen because they have thin, medium, and thick tails. We’ll also throw in the double exponential, because it has the same tail thickness as exponential but is symmetric. We’ll let k be 2 and 3.

Distribution family	P(\|X – E(X)\| > 2σ)	V-P estimate	P(\|X – E(X)\| > 3σ)	V-P estimate
Uniform	0.0000	0.1111	0.0000	0.0494
Normal	0.0455	0.1111	0.0027	0.0494
Exponential	0.0498	0.1111	0.0183	0.0494
Pareto	0.0277	0.1111	0.0156	0.0494
Double exponential	0.0591	0.1111	0.0144	0.0494

A normal random variable is more than 2 standard deviations away from its mean with probability 0.0455, compared to the Vysochanskiĭ-Petunin bound of 1/9 = 0.1111. A normal random variable is more than 3 standard deviations away from its mean with probability 0.0027, compared to the bound of 4/81 = 0.0484.

An exponential random variable with mean μ also has standard deviation μ, so the only way it could be more than 2μ from its mean is to be 3μ from 0. So an exponential is more that 2 standard deviations from its mean with probability exp(-3) = 0.0498, and more than 3 standard deviations with probability exp(-4) = 0.0183.

We’ll set the minimum value of our Pareto random variable to be 1. As with the exponential, the Pareto cannot be 2 standard deviations less than its mean, so we look at the probability of it being more than 2 greater than its mean. The shape parameter α must be bigger than 2 for the variance to exist. The probability of our random variable being more than k standard deviations away from its mean works out to ((α-1)/((k-1)α))^α and is largest as α converges down toward 2. The limiting values for k equal to 2 and 3 are 1/36 = 0.0277 and 1/64 = 0.0156 respectively. Of our examples, the Pareto distribution comes closest to the Vysochanskiĭ-Petunin bounds, but doesn’t come that close.

The double exponential, also known as Laplace, has the highest probability of any of our examples of being two standard deviations from its mean, but this probability is still less than half of the Vysochanskiĭ-Petunin bound. The limit of the Pareto distribution has the highest probability of being three standard deviations from its mean, but still less than one-third of the Vysochanskiĭ-Petunin bound.

Generic bounds are useful, especially in theoretical calculations, but it’s usually possible to do much better with specific distributions.

Functional means

Posted on 2 March 2013 by John

Functional means are a way of inducing a mean M_f from a differentiable function f.

$M_f(x, y) = (f')^{-1} \left( \frac{f(y) - f(x)}{y-x} \right)$

If f(x) = x², M_f(x, y) = (x + y)/2 is the arithmetic mean.

If f(x) = 1/x, M_f(x, y) = √(xy) is the geometric mean.

Seems like this should be useful for something.

Source

Related post: Means and inequalities

Mortgages, banks, and Jensen’s inequality

Posted on 26 August 2009 by John

Sam Savage’s new book Flaw of Averages has a brilliantly simple explanation of why volatility in the housing market caused such problems for banks recently. When housing prices drop, more people default on their mortgages and obviously that hurts banks. But banks are also in trouble when the housing market is volatile, even if on average house prices are good.

Suppose there’s no change in the housing market on average. Prices go up in some areas and down in other areas. As long as the ups and downs average out, there should be no change in bank profits, right? Wrong!

When the housing market goes up a little bit, mortgage defaults drop a little bit and bank profits go up a little bit. But when the market goes down a little bit, defaults go up more than just a little bit, and bank profits go down more than a little bit. There is much more down-side potential than up-side potential. Say 95% of homeowners pay their mortgages. Then a good housing market can only improve repayments by 5%. But a bad housing market could decrease repayments by much more.

In mathematical terminology, the bank profits are a concave function of house prices. Jensen’s inequality says that if f() is a concave function (say bank profits) and X is a random variable (say, house prices) then the average of f(X) is less than f(average of X). Average profit is less than the profit from the average.

Three books on inequalities

Posted on 3 June 2009 by John

The classic book on inequalities is Inequalities by Hardy, Littlewood, and Pólya, first published in 1934 (ISBN 0521358809). I’ve mentioned it a few times before. The book is quite thorough. I imagine more people use it as a reference than read it cover to cover.

The Cauchy-Schwarz Master Class by Michael Steele (ISBN 052154677X) is a more recent book on inequalities, published in 2004. The preface explains what the title means by a “master class.”

In the fine arts, a master class is a small class where students and coaches work together to support a high level of technical and creative excellence. This book tries to capture the spirit of a master class while providing coaching for readers who want to refine their skills as solvers of problems, especially those problems dealing with mathematical inequalities.

The Cauchy-Schwarz Master Class book lives up to its stated aim. Reading the book and working through the exercises reminds me of taking music lessons. The emphasis is on learning techniques rather than cataloging specific results.

The final book I’ll mention is When Less is More: Visualizing Basic Inequalities by Claudi Alsina and Roger B. Nelsen (ISBN 0883853426). Like The Cauchy-Schwarz Master Class, this book emphasizes problem solving technique. More specifically, it emphasizes geometric techniques for understanding and proving inequalities. (I’ve written a review of When Less is More, but unfortunately you have to log in as an MAA member to read it.) When Less is More is as concerned with beauty as with truth.

You can tell from the cover art and the architectural allusion in the title that this book cares about aesthetics. Each proof is a polished gem.

Means and inequalities for functions

Posted on 26 March 2009 by John

A post on Monday looked at means an inequalities for a lists of non-negative numbers. This post looks at analogous means and inequalities for non-negative functions. We go from means defined in terms of sums to means defined in terms of integrals.

Let p(x) be a non-negative function that integrates to 1. (That makes p(x) a probability density, but you don’t have to think of this in terms of probability.) Think of p(x) as being a weighting function. The dependence on p(x) will be implicit. For a non-negative function f(x) and real number r ≠ 0, define M_r( f ) by

$M_r(f) = \left(\int p(x) \left( f(x) \right)^r \, dx \right)^{1/r}$

If r > 0 and the integral defining M_r( f ) diverges, we say M_r( f ) = ∞ and M_–r( f ) = 0.

Define the geometric mean of the function f by

$G(f) = \exp\left( \int p(x) \log f(x)\, dx \right)$

There are a few special cases to consider when defining G(f). See Inequalities for details.

First we give several limiting theorems.

As r → –∞, M_r( f ) → min( f )
As r → +∞, M_r( f ) → max( f )
As r → 0⁺, M_r( f ) → G( f )

And now for the big theorem: If r ≤ s, then M_r( f ) ≤ M_s( f ).The conditions under which equality hold are a little complicated. Again, see Inequalities for details.

We could derive analogous results for infinite sums since sums are just a special case of integrals.

The assumption that the weight function p(x) has a finite integral is critical. We could change the definition of M_r( f ) slightly to accommodate the case that the integral of p(x) is finite but not equal to 1, and all the conclusions above would remain true. But if we allowed p(x) to have a divergent interval, the theorems do not hold. Suppose p(x) is constantly 1, and our region of integration is (0, ∞). Then M_r( f ) might be more or less than M_s( f ) depending on f. For example, let f(x) = b exp( – bx ) for some b > 0. M₁( f ) = 1, but M_∞( f ) = b. Then M₁( f ) is less than or greater than M_∞( f ) depending on whether b is less than or greater than 1.

Means and inequalities

Posted on 23 March 2009 by John

The arithmetic mean of two numbers a and b is (a + b)/2.

The geometric mean of a and b is √(ab).

The harmonic mean of a and b is 2/(¹/_a + ¹/_b).

This post will generalize these definitions of means and state a general inequality relating the generalized means.

Let x be a vector of non-negative real numbers, x = (x₁, x₂, x₃…, x_n). Define M_r( x ) to be

$M_r(x) = \left( \frac{1}{n} \sum_{i = 1}^n x_i^r \right)^{1/r}$

unless r = 0 or r is negative and one of the x_i is zero. If r = 0, define M_r( x ) to be the limit of M_r( x ) as r decreases to 0 . And if r is negative and one of the x_i is zero, define M_r( x ) to be zero. The arithmetic, geometric, and harmonic means correspond to M₁, M₀, and M_-1 respectively.

Define M_∞( x ) to be the limit of M_r( x ) as r goes to ∞. Similarly, define M_-∞( x ) to be the limit of M_r( x ) as r goes to –∞. Then M_∞( x ) equals max(x₁, x₂, x₃…, x_n) and M_-∞( x ) equals min(x₁, x₂, x₃…, x_n).

In summary, the minimum, harmonic mean, geometric mean, arithmetic mean and maximum are all special cases of M_r( x ) corresponding to r = –∞, –1, 0, 1, and ∞ respectively. Of course other values of r are possible; these five are just the most familiar. Another common example is the root-mean-square (RMS) corresponding to r = 2.

A famous theorem says that the geometric mean is never greater than the arithmetic mean. This is a very special case of the following theorem.

If r ≤ s then M_r( x ) ≤ M_s( x ).

In fact we can say a little more. If r < s then M_r( x ) < M_s( x ) unless x₁ = x₂ = x₃= …. = x_nor s ≤ 0 and one of the x_i is zero.

We could generalize the means M_r a bit more by introducing positive weights p_i such that p₁ + p₂ + p₃ + … + p_n = 1. We could then define M_r( x ) as

$M_r(x) = \left( \sum_{i = 1}^n p_i x_i^r \right)^{1/r}$

with the same fine print as in the previous definition. The earlier definition reduces to this new definition with p_i = 1/n. The above statements about the means M_r( x ) continue to hold under this more general definition.

For more on means and inequalities, see Inequalities by Hardy, Littlewood, and Pólya.

Update: Analogous results for means of functions, replacing sums with integrals. Also, physical examples of harmonic mean with springs and resistors.

Related post: Old math books

Inequalities