Fundamental theorem of calculus generalized

The first fundamental theorem of calculus says that integration undoes differentiation.

The second fundamental theorem of calculus says that differentiation undoes integration.

This post looks at the fine print of these two theorems, in their basic forms and when generalized to Lebesgue integration.

Second fundamental theorem of calculus

We’ll start with the second fundamental theorem because it’s simpler. In its basic form, it says that if f is a continuous function on an open interval I, and a is a point in I, then the function F defined by

F(x) = \int_a^x f(t)\, dt

is an antiderivative for f on the interval I, i.e.

F'(x) = f(x)

for all x in I. In that sense differentiation undoes integration.

If we remove the requirement that f be continuous, we still have F‘ = f almost everywhere as long as f is absolutely integrable, i.e. the integral of |f| over I is finite. In more detail,

F'(x) = f(x)

at every Lebesgue point x, i.e. every point x that satisfies

\lim_{\varepsilon \to 0} \, \frac{1}{2\varepsilon}\int_{x - \varepsilon}^{x + \varepsilon} |f(x) - f(y)| \ dy = 0

First fundamental theorem of calculus

The first fundamental theorem of calculus says that if the derivative of F is f and f is continuous on an interval [a, b], then

\int_a^b f(x) \, dx = F(b) - F(a)

So if F has a continuous derivative, then integration undoes differentiation. What if F is continuous and but differentiable at almost every point rather than at every point? Then the theorem doesn’t necessarily hold. But the theorem does hold if we require F to be absolutely continuous rather than just continuous.

A function is absolutely continuous if it maps sets of measure zero to sets of measure zero. It’s not easy to imagine continuous functions that are not absolutely continuous, but Cantor’s function, a.k.a. the Devil’s staircase, takes the Cantor set, a set of measure zero, to a set of measure one.

The usual definition of absolute continuity, equivalent to the one above, takes the ordinary definition of continuity and chops into n pieces. That is, for every ε > 0 and for every n, there exists a δ > 0 such that for any collection of n intervals of total length less than δ, the sum of the variation in f over all the intervals is less than ε. If n = 1 this is the definition of uniform continuity, so absolute continuity is a more demanding criterion than uniform continuity.

Fractional parts, invariant measures, and simulation

A function f: XX is measure-preserving if for each iteration of f sends the same amount of stuff into a given set. To be more precise, given a measure μ and any μ-measurable set E with μ(E) > 0, we have

\mu( E ) = \mu( f^{-1}(E) )

You can read the right side of the equation above as “the measure of the set of points that f maps into E.” You can apply this condition repeatedly to see that the measure of the set of points mapped into E after n iterations is still just the measure of E.

If X is a probability space, i.e. μ( ) = 1, then you could interpret the definition of measure-preserving to mean that the probability that a point ends up in E after n iterations is independent of n. We’ll illustrate this below with a simulation.

Let X be the half-open unit interval (0, 1] and let f be the Gauss map, i.e.

f(x) = \frac{1}{x} - \left\lfloor \frac{1}{x}\right\rfloor

where [z] is the integer part of z. The function f is measure-preserving, though not for the usual Lebesgue measure. Instead it preserves the following measure:

\mu(E) = \frac{1}{\log 2} \int_E \frac{1}{1+x} \, dx

Let’s take as our set E an interval [a, b] and test via simulation whether the probability of landing in E after n iterations really is just the measure of E, independent of n.

We can’t just generate points uniformly in the interval (0, 1]. We have to generate the points so that the probability of a point coming from a set E is μ(E). That means the PDF of the distribution must be p(x) = 1 / (log(2) (1 + x)). We use the inverse-CDF method to generate points with this density in the Python code below.

from math import log, floor
from random import random

def gauss_map(x):
    y = 1.0/x
    return y - floor(y)

# iterate gauss map n times
def iterated_gauss_map(x, n):
    while n > 0:
        x = gauss_map(x)
        n = n - 1
    return x      

# measure mu( [a,b] )
def mu(a, b):
    return (log(1.0+b) - log(1.0+a))/log(2.0)

# Generate samples with Prob(x in E) = mu( E )
def sample():
    u = random()
    return 2.0**u - 1.0

def simulate(num_points, num_iterations, a, b):
    count = 0
    for _ in range(num_points):
        x = sample()
        y = iterated_gauss_map(x, num_iterations)
        if a < y < b:
            count += 1
    return count / num_points

# Change these parameters however you like
a, b = 0.1, 0.25
N = 1000000

exact = mu(a, b)
print("Exact probability:", exact)
print("Simulated probability after n iterations")
for n in range(1, 10):
    simulated = simulate(N, n, a, b)
    print("n =", n, ":", simulated)

Here’s the output I got:

Exact probability: 0.18442457113742736
Simulated probability after n iterations
n = 1 : 0.184329
n = 2 : 0.183969
n = 3 : 0.184233
n = 4 : 0.184322
n = 5 : 0.184439
n = 6 : 0.184059
n = 7 : 0.184602
n = 8 : 0.183877
n = 9 : 0.184834

With 1,000,000 samples, we expect the results to be the same to about 3 decimal places, which is what we see above.

Related post: Irrational rotations are ergodic.

(A transformation f  is ergodic if it is measure preserving and the only sets E with  f –1(E)  = E are those with measure 0 or full measure. Rational rotations are measure-preserving but not ergodic. The Gauss map above is ergodic.)

Two definitions of expectation

In an introductory probability class, the expected value of a random variable X is defined as

E(X) = \int_{-\infty}^\infty x\, f_X(x) \,dx

where fX is the probability density function of X. I’ll call this the analytical definition.

In a more advanced class the expected value of X is defined as

E(X) = \int_\Omega X \,dP

where (Ω, P) is a probability space. I’ll call this the measure-theoretic definition. It’s not obvious that these two definitions are equivalent. They may even seem contradictory unless you look closely: they’re integrating different functions over different spaces.

If for some odd reason you learned the measure-theoretic definition first, you could see the analytical definition as a theorem. But if, like most people, you learn the analytical definition first, the measure-theoretic version is quite mysterious. When you take an advanced course and look at the details previously swept under the rug, probability looks like an entirely different subject, unrelated to your introductory course. The definition of expectation is just one concept among many that takes some work to resolve.

I’ve written a couple pages of notes that bridge the gap between the two definitions of expectation and show that they are equivalent.