How well do moments determine a probability distribution?

If two random variables X and Y have the same first few moments, how different can their distributions be?

Suppose E[Xⁱ] = E[Yⁱ] for i = 0, 1, 2, … 2p. Then there is a polynomial P(x) of degree 2p such that

|F(x) − G(x)| ≤ 1/P(x)

where F and G are the CDFs of X and Y respectively.

The polynomial P(x) is given by

V‘ M⁻¹ V

where V is a vector of dimension p + 1 and M is a (p + 1) × (p + 1) matrix. The ith element of V is xⁱ and the (i, j) element of M is E(X^i+j) if we start our indexes start from 0.

Reference: “Moments determine the tail of a distribution (but not much else)” by Bruce Lindsay and Prasanta Basak, The American Statistician, Vol 54, No 4, p. 248–251.

12 thoughts on “How well do moments determine a distribution?”

Steven H. Noble

16 November 2012 at 21:58

The title of that paper really helps put this result in context.

Jan Galkowski

17 November 2012 at 00:21

Yes, agree with Mr Noble: This is neat, but to be appreciated beyond the “priesthood”, needs to be wrapped in explanatory language.

Jan Van lent

17 November 2012 at 08:13

I haven’t read the linked article, but I think the following Wikipedia links for the moment and truncated moment problem are relevant.
The second link shows a connection with orthogonal polynomials.

http://en.wikipedia.org/wiki/Moment_problem
http://en.wikipedia.org/wiki/Chebyshev%E2%80%93Markov%E2%80%93Stieltjes_inequalities

David Feuer

17 November 2012 at 16:54

If I read this summary correctly, if two distributions have the same first 2p-1 moments, then if you know the first 4p moments of one of the distributions, you can determine inverse-polynomial bounds for the difference between them. Those bounds tend to infinite width near the shared mean (revealing less than the trivial bound |F(x)-G(x)|≤1). The larger the higher moments of the chosen distribution, the faster the bound narrows. That’s how I read it anyway.

John

17 November 2012 at 17:30

David: Your comment made me realize I’d incorrectly written the dimensions of V and M. The matrix M depends only on the 2p moments in common between X and Y.

I updated the post. Thanks for pointing out the error.

Jonathan

18 November 2012 at 10:31

If V is of dimension p+1, wouldn’t this make P a polynomial of degree p?

John

18 November 2012 at 13:00

Jonathan: You multiply by V on the left and the right. That’s why it’s degree 2p. For example, if the matrix in the middle were the identity, you’d get 1 + x^2 + … + x^2p.

Vladimir Bakhrushin

19 November 2012 at 10:07

This is an interesting result. But I would like to pay attention to one circumstance – it is assumed that moments which we know, we know exactly. In fact, as a rule, it is not so. And it would be interesting to recieve estimates, taking into account the errors of moments.

Mirek Kukla

19 November 2012 at 16:28

I’m not exactly sure how to interpret this result.

On the one hand, the title of the paper is “Moments determine the tail of a distribution (but not much else).”

On the other hand, the fact that there is a polynomial P(x) where |F(x) – G(x)| ≤ 1/P(x) doesn’t tell us much if we don’t know what P(x) looks like. Moreover, this is an upper bound, which limits the dissimilarity of the distributions.

In other words, the title of the paper implies that moments don’t determine a distribution very well, whereas the result allows you to conclude nothing more than that moments *might* match a distribution quite well.

What’s the takeaway, then?

John

19 November 2012 at 17:24

For a particular set of moments, you can calculate the matrix M and get P(x) exactly. The paper I reference gives specific examples.

But in any case, you know that asymptotically the difference between the two distribution functions is O(1/x^2p). The reason moments tell you more in the tails than in the middle is that for sufficiently large values of x, only the leading term in the polynomial matters.

David W Locke

25 May 2018 at 15:10

Statistical inference depends on the tails. So how do we get to “Moments determine the tail[s] of a distribution (but not much else).”

The core is not informative with or without moments.

David W Locke

15 April 2019 at 21:48

Alas, statistical inference is tail driven. Alpha and Beta tell us how much of the tails we used. Cores tell us nothing.

Comments are closed.