Klein bottle

One of my daughters gave me a Klein bottle for Christmas.

Klein bottle

Imagine starting with a cylinder and joining the two ends together. This makes a torus (doughnut). But if you twist the ends before joining them, much like you twist the ends of a rectangular strip to make a Möbius strip, you get a Klein bottle. This isn’t possible to do in 3D without making the cylinder pass through itself, so you’re supposed to imagine that the part where the bottle intersects itself isn’t there.

But is a Klein bottle real? My Christmas present is a real physical object, so it’s real in that sense. Is a Klein bottle real as a mathematical object? Can it be defined without any appeal to imagining things that aren’t true? Yes it can.

Formal definition

Start with a unit square, the set of points (x, y) with 0 ≤ x, y ≤ 1. If you identify the top and bottom of the square, the points with y coordinate equal to 0 or 1, you get a cylinder. You can imagine curling the square in 3D and taping the top and bottom together.

Similarly, if you start with the unit square and identify the vertical sides together with a twist, you get a Möbius strip. You won’t be able to physically do this with a square, but you could with a rectangle. Or you could imagine the square to be made out of rubber, and you stretch it before you twist it and join the edges together.

If you start with the unit square and do both things described above—join the top and bottom as-is and join the sides with a twist—you get a Klein bottle. You can’t quite physically do both at the same time in 3D; you’d have to cut a little hole in the square to let part of the square pass through, as in the glass bottle at the top of the post.

Although you can’t construct a physical Klein bottle without a bit of cheating, there’s nothing wrong with the mathematical definition. There are some details that have been left out, but there’s nothing illegal about the construction.

More formality

To fill in the missing details, we have to say just what we mean by identifying points. When we identify the top edge and bottom edge of the square to make a cylinder, we mean that we imagine that for every x, (x, 0) and (x, 1) are the same point. Similarly, when we identify the sides with a twist, we imagine that for all y, (0, y) and (1, 1 − y) are the same point.

But this is unsatisfying. What does all this imagining mean? How is this any better than imagining that the hole in the glass bottle isn’t there? We can define what it means to “identify” or “glue” edges together in a way that’s perfectly rigorous.

We can say that as a set of points, the Klein bottle is

K = [0, 1) × [0, 1),

removing the top and right edge. But what makes this set of points a Klein bottle is the topology we put on it, the way we define which points are close together.

We define an ε neighborhood of a point (x, 0) to be the union of two half disks, the intersection with K of an open disk of radius ε centered at (x, 0) and the intersection with K of an open disk centered at (x, 1). This is a way to make rigorous the idea of gluing (x, 0) and (x, 1) together.

Along the same lines, we define an ε neighborhood of a point (0, y) to be the intersection with K of an open disk of radius ε centered at (0, y) and an open disk of radius ε centered at (1, 1 − y).

The discussion with coordinates is more complicated than the talk about imagining this and that, but it’s more rigorous. You can’t have simplicity and rigor at the same time, so you alternate back and forth. You think in terms of the simple visualization, but when you’re concerned that you may be saying something untrue, you go down to the detail of coordinates and prove things carefully.

Topology can seem all hand-wavy because that’s how topologist communicate. They speak in terms of twisting this and glueing that. But they have in the back of their mind that all these manipulations can be justified. The formalism may be left implicit, even in a scholarly publication, when it’s assumed that the reader could fill in the details. But when things are more subtle, the formalism is written out.

Escaping 3D

In the construction above, we define the Klein bottle as a set of points in the 2D plane with a new topology. That works, but there’s another approach. I said above that you can’t join the edges to make a Klein bottle in three dimensions. I added this disclaimer because you can join the edges without cheating if you work in higher dimensions.

If you’d like a parameterization of the Klein bottle, say because you want to calculate something, you can do that, but you’ll need to work in four dimensions. There’s more room to move around in higher dimensions, letting you do things you can’t do in three dimensions.

 

Automation and Validation

It’s been said whatever you can validate, you can automate. An AI that produces correct work 90% of the time could be very valuable, provided you have a way to identify the 10% of the cases where it is wrong. Often verifying a solution takes far less computation than finding a solution. Examples here.

Validating AI output can be tricky since the results are plausible by construction, though not always correct.

Consistency checks

One way to validate output is to apply consistency checks. Such checks are necessary, but not sufficient, and often easy to implement. An simple consistency check might be that inputs to a transaction equal outputs. A more sophisticated consistency check might be conservation of energy or something analogous to it.

Certificates

Some problems have certificates, ways of verifying that a calculation is correct that can be evaluated with far less effort than finding the solution that they verify. I’ve written about certificates in the context of optimization, solving equations, and finding prime numbers.

Formal methods

Correctness is more important in some contexts than others. If a recommendation engine makes a bad recommendation once in a while, the cost is a lower probability of conversion in a few instances. If an aircraft collision avoidance system makes an occasional error, the consequences could be catastrophic.

When the cost of errors is extremely high, formal verification may be worthwhile. Formal correctness proofs using something like Lean or Rocq are extremely tedious and expensive to create, and hence not economical. But if an AI can generate a result and a formal proof of correctness, hurrah!

Who watches the watchmen?

But if an AI result can be wrong, why couldn’t a formal proof generated to defend the result also be wrong? As the Roman poet Juvenal asked, Quis custodiet ipsos custodes? Who will watch the watchmen?

An AI could indeed generate an incorrect proof, but if it does, the proof assistant will reject it. So the answer to who will watch Claude, Gemini, and ChatGPT is Lean, Rocq, and Isabelle.

Who watches the watchers of the watchmen?

Isn’t it possible that a theorem prover like Rocq could have a bug? Of course it’s possible; there is no absolute certainty under the sun. But hundreds of PhD-years of work have gone into Rocq (formerly Coq) and so bugs in the kernel of that system are very unlikely. The rest of the system is bootstrapped, verified by the kernel.

Even so, an error in the theorem prover does not mean an error in the original result. For an incorrect result to slip through, the AI-generated proof would have to be wrong in a way that happens to exploit an unknown error in the theorem prover. It is far more likely that you’re trying to prove the wrong thing than that the theorem prover let you down.

I mentioned collision avoidance software above. I looked into collision avoidance software when I did some work for Amazon’s drone program. The software that was formally verified was also unrealistic in its assumptions. The software was guaranteed to work correctly, if two objects are flying at precisely constant velocity at precisely the same altitude etc. If everything were operating according to geometrically perfect assumptions, there would be no need for collision avoidance software.

When was Newton born?

A young Isaac Newton unwrapping and apple as a Christmas present.

Newton’s birthday was on Christmas when he was born, but now his birthday is not.

When Newton was born, England was still using the Julian calendar, and would continue to use the Julian calendar until 25 years after his death.

On the day of Newton’s birth, his parents would have said the date was December 25, 1642. We would now describe the date as January 4, 1643.

You’ll sometimes see Newton’s birthday written as December 25, 1642 O.S. The “O.S.” stands for “Old Style,” i.e. Julian calendar. Of course the Newton family would not have written O.S. because there was no old style until the new style (i.e. Gregorian calendar) was adopted, just as nobody living in the years before Christ would have written a date as B.C.

In a nutshell, the Julian year was too long, which made it drift out of sync with the astronomical calendar. The Julian year was 365 1/4 days, whereas the Gregorian calendar has 365 97/400 days, which more closely matches the time it takes Earth to orbit the sun. Removing three Leap Days (in centuries not divisible by 400) put the calendar back in sync. When countries adopted the Gregorian calendar, they had to retroactively remove excess Leap Days. That’s why Newton’s birthday got moved up 10 days.

You can read more on the Julian and Gregorian calendars here.

The winter solstice in the northern hemisphere was two days ago: December 21, 2025. And in 1642, using the Gregorian calendar, the solstice was also on December 21. But in England, in 1642, people would have said the solstice occurred on December 11, because the civil calendar was 10 days ahead of the astronomical calendar.

Mason, Dixon, and Latitude

A few weeks ago I mentioned that I was reading Stephen Ambrose’s account of the Lewis & Clark expedition and wrote a post about their astronomical measurements. James Campbell left a comment recommending Edwin Danson’s book [1] on the history of the Mason-Dixon line. I ordered the book, and now that work has slowed down for Christmas I have had the time to open it.

In addition to determining their eponymous line, surveyors Charles Mason and Jeremiah Dixon were also the first to measure a degree of latitude in 1767.

Observations for determining the Length of a Degree of Latitude in the Provinces of Maryland and Pennsylvania, in North America, by Messieurs Charles Mason and Jeremiah Dixon

What exactly did they measure? We’ll get to that, but first we need some background.

The shape of the Earth

To first approximation a degree of latitude is simply 1/360th of the Earth’s circumference, but Mason and Dixon were more accurate than that. Isaac Newton (1643–1727) deduced that our planet was not a perfect sphere but rather an oblate spheroid. The best measurement in Mason and Dixon’s time was that the Earth’s semi-major axis was 6,397,300 meters with flattening 1/f = 216.8.

It’s a bit of an anachronism to describe the distance in meters since the meter was defined in 1791. The meter was originally defined as one ten-millionth of the distance from the equator to the North Pole along a great circle through Paris.

What exactly is a degree of latitude?

If the Earth were a perfect sphere, a degree of latitude would be 1/360th of its circumference. Using the original definition of the meter, this would be exactly 10,000,000/360 meters. But because the Earth is not a perfect sphere, each degree of latitude has a slightly different length. To put it another way, the length of a degree of latitude varies by latitude.

Another complication due to the flattening of the Earth is that there are multiple ways to define latitude. The two most common are geocentric and geodetic. The geocentric latitude of a point P on the Earth’s surface is the angle between the equatorial plane and a line between the center of the earth and P. The geodetic latitude (a.k.a. geographic latitude) of P is the angle between the equatorial plane and a line perpendicular to the Earth’s surface at P. More on the difference between geocentric and geodetic latitude here.

What did Mason and Dixon measure?

Since the length of a degree of latitude varies, we need to say at what latitude they measured the length of a degree. In short, they measured the length of a degree near what we now know as the Mason-Dixon line, the border between Pennsylvania and Maryland.

To be more precise, the starting point was Stargazer’s Stone, a stone placed by Mason and Dixon on John Harland’s farm near Embreeville, Pennsylvania, to a point about a degree and a half due south near what is now Delmar, a town on the Deleware / Maryland border.

I’ve had some difficulty determining how accurate Mason and Dixon were. Some sources I’ve found are obviously wrong. I haven’t verified this, but it seems Mason and Dixon overestimated the length of a degree of latitude at their location by only 465.55 ft or about 0.13%, a remarkable feat given the technology of their day.

Related posts

[1] Edwin Danson. Drawing The Line: How Mason and Dixon Surveyed the Most Famous Border in America. John Wiley & Sons. 2001.

Bowie integrator and the nonlinear pendulum

I recently learned of Bowie’s numerical method for solving ordinary differential equations of the form

y″ = f(y)

via Alex Scarazzini’s masters thesis [1].

The only reference I’ve been able to find for the method, other than [1], is the NASA Orbital Flight Handbook from 1963. The handbook describes the method as “a method employed by C. Bowie and incorporated in many Martin programs” and says nothing more about its origin.

Martin Company

What does it mean by “Martin programs”? The first line of the foreword of the manual says

This handbook has been produced by the Space Systems Division of the Martin Company under Contract NAS8-S03l with the George C. Marshall Space Flight Center of the National Aeronautics and Space Administration.

The Martin Company was the Glenn L. Martin Company, which became Martin Marietta after merging with American-Marietta Corporation in 1961. The handbook was written after the merger but used the older name. Martin Marietta would go on to become Lockheed Martin in 1995.

Bowie’s method was used “in many Martin programs” and yet is practically unknown in academic circles. Scarazzini’s thesis shows the method works well for his problem.

Nonlinear pendulum

My first thought when I saw the form of differential equations Bowie’s method solves was the nonlinear pendulum equation

y″ = − sin(y)

where the initial displacement y(0) is too large for the approximation sin θ ≈ θ to be sufficiently accurate. I wrote some Python code to try out Bowie’s method on this equation.

import numpy as np

N = 100
y  = np.zeros(N)
yp = np.zeros(N) # y'

y[0] = 1
yp[0] = 0

T = 4*ellipk(np.sin(y[0]/2)**2)
h = T/N

f   = lambda x: -np.sin(x)
fp  = lambda x: -np.cos(x) # f'
fpp = lambda x:  np.sin(x) # f''

for n in range(0, N-1):
    y[n+1] = y[n] + h*yp[n] + 0.5*h**2*f(y[n]) + \
              (h**3/6)*fp(y[n])*yp[n] + \
              (h**4/24)*(fpp(yp[n])**2 + fp(y[n])*f(y[n]))
    yp[n+1] = yp[n] + h*f(y[n]) + 0.5*h**2*fp(y[n])*yp[n] + \
              (h**3/6)*(fpp(yp[n])**2 + fp(y[n])*f(y[n]))

Here’s a graph of the numerical solution.

The solution looks like a cosine, but it isn’t exactly. As I explain here,

The solution to the nonlinear pendulum equation is also periodic, though the solution is a combination of Jacobi functions rather than a combination of trig functions. The difference between the two solutions is small when θ0 is small, but becomes more significant as θ0 increases.

The difference in the periods is more evident than the difference in shape for the two waves. The period of the nonlinear solution is longer than that of the linearized solution.

That’s why the period T in the code is not

2π = 6.28

but rather

4 K(sin² θ0/2) = 6.70.

You’ll also see the period of the nonlinear pendulum given as 4 K(sin θ0/2). As pointed out in the article linked above,

There are two conventions for defining the complete elliptic integral of the first kind. SciPy uses a convention for K that requires us to square the argument.

Related posts

[1] Alex Scarazzini.3D Visualization of a Schwarzschild Black Hole Environment. University of Bern. August 2025.

Trying to fit exponential data

The first difficulty in trying to fit an exponential distribution to data is that the data may not follow an exponential distribution. Nothing grows exponentially forever. Eventually growth slows down. The simplest way growth can slow down is to follow a logistic curve, but fitting a logistic curve has its own problems, as detailed in the previous post.

Suppose you are convinced that whatever you’re wanting to model follows an exponential curve, at least over the time scale that you’re interested in. This is easier to fit than a logistic curve. If you take the logarithm of the data, you now have a linear regression problem. Linear regression is numerically well-behaved and has been thoroughly explored.

There is a catch, however. When you extrapolate a linear regression, your uncertainly region flares out as you go from observed data to linear predictions based on the observed data. Your uncertainty grows linearly. But remember that we’re not working with the data per se; we’re working with the logarithm of the data. So on the original scale, the uncertainty flares out exponentially.

Trying to fit a logistic curve

A logistic curve, sometimes called an S curve, looks different in different regions. Like the proverbial blind men feeling different parts of an elephant, people looking at different segments of the curve could come to very different impressions of the full picture.
logistic curve with extrapolations

It’s naive to look at the left end and assume the curve will grow exponentially forever, even if the data are statistically indistinguishable from exponential growth.

A slightly less naive approach is to look at the left end, assume logistic growth, and try to infer the parameters of the logistic curve. In the image above, you may be able to forecast the asymptotic value if you have data up to time t = 2, but it would be hopeless to do so with only data up to time t = −2. (This post was motivated by seeing someone trying to extrapolate a logistic curve from just its left tail.)

Suppose you know with absolute certainty that your data have the form

y(t) = \frac{a}{\exp(-b(t - c)) + 1} + \varepsilon

where ε is some small amount of measurement error. The world is not obligated follow a simple mathematical model, or any mathematical model for that matter, but for this post we will assume that for some inexplicable reason you know the future follows a logistic curve; the only question is what the parameters are.

Furthermore, we only care about fitting the a parameter. That is, we only want to predict the asymptotic value of the curve. This is easier than trying to fit the b or c parameters.

Simulation experiment

I generated 16 random t values between −5 and −2, plugged them into the logistic function with parameters a = 1, b = 1, and c = 0, then added Gaussian noise with standard deviation 0.05.

My intention was to do this 1000 times and report the range of fitted values for a. However, the software I was using (scipy.optimize.curve_fit) failed to converge. Instead it returned the following error message.

RuntimeError: Optimal parameters not found: Number of calls to function has reached maxfev = 800.

When you see a message like that, your first response is probably to tweak the code so that it converges. Sometimes that’s the right thing to do, but often such numerical difficulties are trying to tell you that you’re solving the wrong problem.

When I generated points between −5 and 0, the curve_fit algorithm still failed to converge.

When I generated points between −5 and 2, the fitting algorithm converged. The range of a values was from 0.8254 to 1.6965.

When I generated points between −5 and 3, the range of a values was from 0.9039 to 1.1815.

Increasing the number of generated points did not change whether the curve fitting method converge, though it did result in a smaller range of fitted parameter values when it did converge.

I said we’re only interested in fitting the a parameter. I looked at the ranges of the other parameters as well, and as expected, they had a wider range of values.

So in summary, fitting a logistic curve with data only on the left side of the curve, to the left of the inflection point in the middle, may completely fail or give you results with wide error estimates. And it’s better to have a few points spread out through the domain of the function than to have a large number of points only on one end.

Related posts

Regular expressions that cross lines

One of the fiddly parts of regular expressions is how to handle line breaks. Should regular expression searches be applied one line at a time, or should an entire file be treated as a single line?

This morning I was trying to track down a LaTeX file that said “discussed in the Section” rather than simply “discussed in Section.” I wanted to search on “the Section” to see whether I had a similar error in other files.

Line breaks don’t matter to LaTeX [1], so “the” could be at the end of one line and “Section” at the beginning of another. I found what I was after by using

    grep -Pzo "the\s+Section" foo.tex

Here -P tells grep to use Perl regular expressions. That’s not necessary here, but I imprinted on Perl regular expressions long ago, and I use PCRE (Perl compatible regular expressions) whenever possible so I don’t have to remember the annoying little syntax differences between various regex implementations.

The -z option says to treat the entire file as one long string. This eliminates the line break issue.

The -o option says to output only what the regular expression matches. Otherwise grep will return the matching line. Ordinarily that wouldn’t be so bad, but because of the -z option, the matching line is the entire file.

The \s+ charcters between the and Section represent one or more whitespace characters, such as a space or a newline.

The -P flag is a Gnu feature, so it works on Linux. But macOS ships with BSD-derived versions of its utilities, and its version grep does not support the -P option. On my Macbook I have ggrep mapped to the Gnu version of grep.

Another option is to use ripgrep rather than grep. It uses Perl-like regular expressions, and so there is no need for anything like the -P flag. The analog of -z in ripgrep is -U, so the counterpart of the command above would be

    ripgrep -Uo "the\s+Section" foo.tex

Usually regular expression searches are so fast that execution time doesn’t matter. But when it does matter, ripgrep can be an order of magnitude faster than grep.

Related posts

[1] LaTeX decides how to break lines in the output independent of line breaks in the input. This allows you to arrange the source file logically rather than aesthetically.

Multiples with no large digits

Here’s a curious theorem I stumbled across recently [1]. Take an integer N which is not a multiple of 10. Then there is some multiple of N which only contains the digits 1, 2, 3, 4, and 5.

For example, my business phone number 8324228646 has a couple 8s and a couple 6s. But

6312 × 8324228646 = 52542531213552

which contains only digits 1 through 5.

For a general base b, let p be the smallest prime factor of b. Then for every integer N that is not a multiple of b, there is some multiple of N whose base b representation contains only the digits 1, 2, 3, …, b/p.

This means that for every number N that is not a multiple of 16, there is some k such that the hex representation of kN contains only the digits 1 through 8. For example, if we take the magic number at the beginning of every Java class file, 0xCAFEBABE, we find

1341 × CAFEBABEhex = 42758583546hex.

In the examples above, we’re looking for multiple containing only half the possible digits. If the largest prime dividing the base is larger than 2 then we can find a multiples with digits in a smaller range. For example, in base 35 we can find a multiple containing only the digits 1 through 7.

[1] Gregory Galperin and Michael Reid. Multiples Without Large Digits. The American Mathematical Monthly, Vol. 126, No. 10 (December 2019), pp. 950-951.

Golden iteration

The expression

\sqrt{1 + \sqrt{1 + \sqrt{1 + \cdots}}}

converges to the golden ratio φ. Another way to say this is that the sequence defined by x0 = 1 and

x_{n+1} = \sqrt{x_n + 1}

for n > 0 converges to φ. This post will be about how it converges.

I wrote a little script to look at the error in approximating φ by xn and noticed that the error is about three times smaller at each step. Here’s why that observation was correct.

The ratio of the error at one step to the error at the previous step is

\frac{\sqrt{x + 1} - \varphi}{x - \varphi}

If x = φ + ε the expression above becomes

\frac{\sqrt{\varphi + \epsilon + 1} - \varphi}{\epsilon} = \frac{1}{\sqrt{2(3 + \sqrt{5})}} - \frac{\epsilon}{\sqrt{8} (3 + \sqrt{5})^{3/2}} + {\cal O}(\epsilon^2)

when you expand as a Taylor series in ε centered at 0. This says the error multiplied by a factor of about

\frac{1}{\sqrt{2(3 + \sqrt{5})}} = 0.309016994\ldots

at each step. The next term in the Taylor series is approximately −0.03ε, so the exact rate of convergence is a slightly faster at first, but essentially the error is multiplied by 0.309 at each iteration.