How well do moments determine a distribution?

If two random variables X and Y have the same first few moments, how different can their distributions be?

Suppose E[Xi] = E[Yi] for i = 0, 1, 2, … 2p. Then there is a polynomial P(x) of degree 2p such that

|F(x) – G(x)| ≤ 1/P(x)

where F and G are the CDFs of X and Y respectively.

The polynomial P(x) is given by

VM-1 V

where V is a vector of dimension p+1 and M is a (p+1) × (p+1) matrix. The ith element of V is xi and the (i, j) element of M is E(Xi+j) if we start our indexes start from 0.

Reference: “Moments determine the tail of a distribution (but not much else)” by Bruce Lindsay and Prasanta Basak, The American Statistician, Vol 54, No 4, p. 248–251.

Tagged with:
Posted in Math
10 comments on “How well do moments determine a distribution?
  1. The title of that paper really helps put this result in context.

  2. Yes, agree with Mr Noble: This is neat, but to be appreciated beyond the “priesthood”, needs to be wrapped in explanatory language.

  3. Jan Van lent says:

    I haven’t read the linked article, but I think the following Wikipedia links for the moment and truncated moment problem are relevant.
    The second link shows a connection with orthogonal polynomials.

  4. David Feuer says:

    If I read this summary correctly, if two distributions have the same first 2p-1 moments, then if you know the first 4p moments of one of the distributions, you can determine inverse-polynomial bounds for the difference between them. Those bounds tend to infinite width near the shared mean (revealing less than the trivial bound |F(x)-G(x)|≤1). The larger the higher moments of the chosen distribution, the faster the bound narrows. That’s how I read it anyway.

  5. John says:

    David: Your comment made me realize I’d incorrectly written the dimensions of V and M. The matrix M depends only on the 2p moments in common between X and Y.

    I updated the post. Thanks for pointing out the error.

  6. Jonathan says:

    If V is of dimension p+1, wouldn’t this make P a polynomial of degree p?

  7. John says:

    Jonathan: You multiply by V on the left and the right. That’s why it’s degree 2p. For example, if the matrix in the middle were the identity, you’d get 1 + x^2 + … + x^2p.

  8. Vladimir Bakhrushin says:

    This is an interesting result. But I would like to pay attention to one circumstance – it is assumed that moments which we know, we know exactly. In fact, as a rule, it is not so. And it would be interesting to recieve estimates, taking into account the errors of moments.

  9. Mirek Kukla says:

    I’m not exactly sure how to interpret this result.

    On the one hand, the title of the paper is “Moments determine the tail of a distribution (but not much else).”

    On the other hand, the fact that there is a polynomial P(x) where |F(x) – G(x)| ≤ 1/P(x) doesn’t tell us much if we don’t know what P(x) looks like. Moreover, this is an upper bound, which limits the dissimilarity of the distributions.

    In other words, the title of the paper implies that moments don’t determine a distribution very well, whereas the result allows you to conclude nothing more than that moments *might* match a distribution quite well.

    What’s the takeaway, then?

  10. John says:

    For a particular set of moments, you can calculate the matrix M and get P(x) exactly. The paper I reference gives specific examples.

    But in any case, you know that asymptotically the difference between the two distribution functions is O(1/x^2p). The reason moments tell you more in the tails than in the middle is that for sufficiently large values of x, only the leading term in the polynomial matters.