In geometry, you’d say that if a square has side *x*, then it has area *x*^{2}.

In calculus, you’d say more. First you’d say that if a square has side *near* *x*, then it has area *near* *x*^{2}. That is, area is a continuous function of the length of a side. As the length of the side changes, there’s never an abrupt jump in area. Next you could be more specific and say that a small change Δ*x* to a side of length *x* corresponds to approximately a change of 2*x* Δ*x* in the area.

In probability, you ask what is the area of a square like if you pick the length of its side at random. If you pick the length of the side from a distribution with mean μ, does the distribution of the area have mean μ^{2}? No, but if the probability distribution on side length is tightly concentrated around μ, then the distribution on area will be concentrated near μ^{2}. And you can approximate just how near the area is to μ^{2} using the delta method, analogous to the calculus discussion above.

If the distribution on side lengths is not particularly concentrated, finding the distribution on the area is more interesting. It will depend on the specific distribution on side length, and the mean area might not be particularly close to the square of the mean side length. The function to compute area is trivial, and yet the question of what happens when you stick a random variable into that function is not trivial. **Random variables** behave as you might expect when you stick them into linear functions, but **offer surprises when you stick them into nonlinear functions**.

Suppose you pick the length of the side of a square uniformly from the interval [0, 1]. Then the average side is 1/2, and so you might expect the average area to be 1/4. But the expected area is actually 1/3. You could see this a couple ways, analytically and empirically.

First an analytical derivation. If *X* has a uniform [0, 1] distribution and *Z* = *X*^{2}, then the CDF of *Z* is

Prob(*Z* ≤ *z*) = Prob(*X* ≤ √*z*) = √ *z*.

and so the PDF for *Z*, the derivative of the CDF, is -1/2√*z*. From there you can compute the expected value by integrating *z* times the PDF.

You could check your calculations by seeing whether simulation gives you similar results. Here’s a little Python code to do that.

from random import random N = 1000000 print( sum([random()**2 for _ in range(N)] )/N )

When I run this, I get 0.33386, close to 1/3.

Now lets look at an exponential distribution on side length with mean 1. Then a calculation similar to the one above shows that the expected value of the product is 2. You can also check this with simulation. This time we’ll be a little fancier and let SciPy generate our random values for us.

print( sum(expon.rvs(size=N)**2)/N )

When I ran this, I got 1.99934, close to the expected value of 2.

You’ll notice that in both examples, the expected value of the area is more than the square of the expected value of the side. This is not a coincidence but consequence of Jensen’s inequality. Squaring is a convex function, so the expected value of the square is larger than the square of the expected value for any random variable.