The contraction mapping theorem says that if a function moves points closer together, then there must be some point the function doesn’t move. We’ll make this statement more precise and give a historically important application.
Definitions and theorem
A function f on a metric space X is a contraction if there exists a constant q with 0 ≤ q < 1 such that for any pair of points x and y in X,
d( f(x), f(y) ) ≤ q d(x, y)
where d is the metric on X.
A point x is a fixed point of a function f if f(x) = x.
Banach’s fixed point theorem, also known as the contraction mapping theorem, says that every contraction on a complete metric space has a fixed point. The proof is constructive: start with any point in the space and repeatedly apply the contraction. The sequence of iterates will converge to the fixed point.
Application: Kepler’s equation
Kepler’s equation in for an object in an elliptical orbit says
M + e sin E = E
where M is the mean anomaly, e is the eccentricity, and E is the eccentric anomaly. These “anomalies” are parameters that describe the location of an object in orbit. Kepler solved for E given M and e using the contraction mapping theorem, though he didn’t call it that.
Kepler speculated that it is not possible to solve for E in closed form—he was right—and used a couple iterations [1] of
f(E) = M + e sin E
to find an approximate fixed point. Since the mean anomaly is a good approximation for the eccentric anomaly, M makes a good starting point for the iteration. The iteration will converge from any starting point, as we will show below, but you’ll get a useful answer sooner starting from a good approximation.
Proof of convergence
Kepler came up with his idea for finding E around 1620, and Banach stated his fixed point theorem three centuries later. Kepler had the idea of Banach’s theorem, but he didn’t have a rigorous formulation of the theorem or a proof.
In modern terminology, the real line is a complete metric space and so we only need to prove that the function f above is a contraction. By the mean value theorem, it suffices to show that the absolute value of its derivative is less than 1. That is, we can use an upper bound on |f ‘| as the q in the definition of contraction.
Now
f ‘ (E) = e cos E
and so
|f ‘(E)| ≤ e
for all E. If our object is in an elliptical orbit, e < 1 and so we have a contraction.
Example
The following example comes from [2], though the author uses Newton’s method to solve Kepler’s equation. This is more efficient, but anachronistic.
Consider a satellite on a geocentric orbit with eccentricity e = 0.37255. Determine the true anomaly at three hours after perigee passage, and calculate the position of the satellite.
The author determines that M = 3.6029 and solves Kepler’s equation
M + e sin E = E
for E, which she then uses to solve for the true anomaly and position of the satellite.
The following Python code shows the results of the first 10 iterations of Kepler’s equation.
from math import sin M = 3.6029 e = 0.37255 E = M for _ in range(10): E = M + e*sin(E) print(E)
This produces
3.437070 3.494414 3.474166 3.481271 3.478772 3.479650 3.479341 3.479450 3.479412 3.479425
and so it appears the iteration has converged to E = 3.4794 to four decimal places.
Note that this example has a fairly large eccentricity. Presumably Kepler would have been concerned with much smaller eccentricities. The eccentricity of Jupiter’s orbit, for example, is around 0.05. For such small values of e the iteration would converge more quickly.
Update: See this post for more efficient ways to solve Kepler’s equation.
Related posts
[1] Bertil Gustafsson saying in his book Scientific Computing: A Historical Perspective that Kepler only used two iterations. Since M gives a good starting approximation to E, two iterations would give a good answer. I imagine Kepler would have done more iterations if necessary but found empirically that two was enough. Incidentally, it appears Gustaffson has a sign error in his statement of Kepler’s equation.
[2] Euler Celestial Analysis by Dora Musielak.
I’ve been studying Interval Analysis recently and this excellent article strikes me as something very similar to that. Not being a mathematician, I’m not very good at articulating myself, so please forgive. For example, IA speaks of Lipschitz continuity when |f(x)-f(y)| <= L|x-y| where the metric they use is a Hausdorff Distance. They also speak of *Brouwer's* fixed point theorem, so I was very surprised to hear of Banach's.
I would love for you to write another short article on this, perhaps comparing the two or elaborating in some way how they are the same or diffferent in some way? Or perhaps I am completely wrong and you could clear up my confusion?!
Inspiration for another of your great articles – keep them coming!
Thanks. Brower’s fixed point theorem is fairly different. It’s not constructive, so it cannot be used in calculations the way Banach’s fixed point theorem can. On the other hand, it places weaker assumptions on the function in question.
A fun consequence of Brower’s fixed point theorem is that if you are standing somewhere looking at a map of that region, then some point on your map is exactly above the point it corresponds to geographically. If you crumple up the map, the theorem still holds. But if you tear the map, you the theorem doesn’t hold.