Asimov’s question about π

In 1977, Isaac Asimov [1] asked how many terms of the slowly converging series

π = 4 – 4/3 + 4/5 – 4/7 + 4/9 – …

would you have to sum before doing better than the approximation

π ≈ 355/113.

A couple years later Richard Johnsonbaugh [2] answered Asimov’s question in the course of an article on techniques for computing the sum of series. Johnsonbaugh said you would need at least N = 3,748,630 terms.

Johnsonbaug’s answer is based on exact calculations. I wondered how well you’d do with N terms using ordinary floating point arithmetic. Would there be so much rounding error that the result is terrible?

I wrote the most direct implementation in Python, with no tricks to improve the accuracy.

    from math import pi
    s = 0
    N = 3748630
    for n in range(1, N+1):
        s += (-1)**(n+1) * 4/(2*n - 1)

I intended to follow this up by showing that you could do better by summing all the positive and negative terms separately, then doing one subtraction at the end. But the naive version actually does quite well. It’s essentially as accurate as 355/113, with both approximations having an error of 2.66764 × 10-7.

Extended precision with bc

Next, I translated my program to bc [3] so I could control the precision. bc lets you specify your desired precision with its scale parameter.

    scale = 20
    pi = 4*a(1)
    s = 0
    m = 3748630
    for (n = 1; n <= m; n++) {
        s += (-1)^(n+1) * 4/(2*n - 1)

Floating point precision is between 15 and 16 decimal places. I added more precision by setting set scale to 20, i.e. carrying out calculations to 20 decimal places, and summed the series again.

The absolute error in the series was less than the error in 355/113 in the 14th decimal place. When I used one less term in the series, its error was larger than the error in 355/113 in the 14th decimal place. In other words, the calculations suggest Johnsonbaugh found exactly the minimum number of terms needed.

I doubt Johnsonbaugh ever verified his result computationally. He doesn’t mention computer calculations in his paper [4], and it would have been difficult in 1979 to have access to the necessary hardware and software.

If he had access to an Apple II at the time, it would have run at 1 MHz. My calculation took around a minute to run on a 2 GHz laptop, so I’m guessing the same calculation would have taken a day or two on an Apple II. This assumes he could find extended precision software like bc on an Apple II, which is doubtful.

The bc programming language had been written four years earlier, so someone could have run a program like the one above on a Unix machine somewhere. However, such machines were hard to find, and their owners would have been reluctant to give up a couple days of compute time for a guest to run a frivolous calculation.

Related posts

[1] Isaac Asimov, Asimov on Numbers, 1977.

[2] Richard Johnsonbaugh, Summing an Alternating Series. The American Mathematical Monthly, Vol 86, No 8, pp.637–648.

[3] Notice that N from the Python program became m in bc. I’ve used bc occasionally for years, and didn’t know until now that you cannot use capital letters for variables in standard bc. I guess I never tried before. The next post explains why bc doesn’t allow capital letters in variable names.

[4] Johnsonbaugh’s paper does include some numerical calculations, but he only sums up 500 terms, not millions of terms, and it appears he only uses ordinary precision, not extended precision.

Lebesgue’s proof of Weierstrass’ theorem

A couple weeks ago I wrote about the Weierstrass approximation theorem, the theorem that says every continuous function on a closed finite interval can be approximated as closely as you like by a polynomial.

The post mentioned above uses a proof by Bernstein. And in that post I used the absolute value function as an example. Not only is |x| an example, you could go the other way around and use it as a step in the proof. That is, there is a proof of the Weierstrass approximation theorem that starts by proving the special case of |x| then use that result to build a proof for the general case.

There have been many proofs of Weierstrass’ theorem, and recently I ran across a proof due to Lebesgue. Here I’ll show how Lebesgue constructed a sequence of polynomials approximating |x|. It’s like pulling a rabbit out of a hat.

The staring point is the binomial theorem. If x and y are real numbers with |x| > |y| and r is any real number, then

(x+y)^r & =\sum_{k=0}^\infty {r \choose k} x^{r-k} y^k.

Now apply the theorem substituting 1 for x and x² – 1 for y above and you get

|x| = (1 + (x^2 - 1))^{1/2} =\sum_{k=0}^\infty {1/2 \choose k} (x^2-1)^k

The partial sums of the right hand side are a sequence of polynomials converging to |x| on the interval [-1, 1].


If you’re puzzled by the binomial coefficient with a top number that isn’t a positive integer, see the general definition of binomial coefficients. The top number can even be complex, and indeed the binomial theorem holds for complex r.

You might also be puzzled by the binomial theorem being an infinite sum. Surely if r is a positive integer we should get the more familiar binomial theorem which is a finite sum. And indeed we do. The general definition of binomial coefficients insures that if r is a positive integer, all the binomial coefficients with k > r are zero.

How to plot the spiral of Theodorus

You may have seen the spiral of Theodorus. It sticks a sequence of right triangles together to make a sort of spiral. Each triangle has a short side of length 1, and the hypotenuse of each triangle becomes the long leg of the next triangle as shown below.

Spiral of Theodorus

How would you plot this spiral? At each step, you need to draw a segment of length 1, perpendicular to the hypotenuse of the previous triangle. There are two perpendicular directions, and you want to choose the one that moves counterclockwise.

If we step outside the xy plane, we can compute the cross product of the unit vector in the z direction with the vector (x, y). The cross product will be perpendicular to both, and by the right-hand rule, it will point in the counterclockwise direction.

The cross product of (0, 0, 1) and (x, y, 0) is (-y, x, 0), so the direction we want to go in the xy plane is (-y, x). We divide this vector by its length to get a vector of length 1, then add it to our previous point. [1]

Here’s Python code to draw the spiral.

    import matplotlib.pyplot as plt
    def next_vertex(x, y):
        h = (x**2 + y**2)**0.5
        return (x - y/h, y + x/h)
    # base of the first triangle
    plt.plot([0, 1], [0, 0])
    N = 17
    x_old, y_old = 1, 0
    for n in range(1, N):
        x_new, y_new = next_vertex(x_old, y_old)
        # draw short side
        plt.plot([x_old, x_new], [y_old, y_new])
        # draw hypotenuse
        plt.plot([0, x_new], [0, y_new])
        x_old, y_old = x_new, y_new

If you’re not familiar with the plot function above, you might expect the two arguments to plot to be points. But the first argument is a list of x coordinates and the second a list of y coordinates.

Update: See the next post for how the hypotenuse angles fill in a circle.

Related posts

[1] Here’s an alternative derivation using complex numbers.

Label the current vertex in the complex plane zn. Then zn/|zn| has length 1 and points in the same direction from the origin. Multiplying this point by i rotates it a quarter turn counterclockwise, so

zn+1 = zn + i zn/|zn|.

Discriminant of a cubic

The discriminant of a quadratic equation

ax² + bx + c = 0


Δ = b² – 4ac.

If the discriminant Δ is zero, the equation has a double root, i.e. there is a unique x that makes the equation zero, and it counts twice as a root. If the discriminant is not zero, there are two distinct roots.

Cubic equations also have a discriminant. For a cubic equation

ax³ + bx² + cx + d = 0

the discriminant is given by

Δ = 18abcd – 4b³d + b²c² – 4ac³ – 27a²d².

If Δ = 0, the equation has a multiple root, but otherwise it has three distinct roots.

A change of variable can reduce the general cubic equation to a so-called “depressed” cubic equation of the form

x³ + px + q = 0

in which case the discriminant simplifies to

Δ = – 4 – 27q².

Here are a couple interesting connections. The idea reducing a cubic equation to a depressed cubic goes back to Cardano (1501–1576). What’s called a depressed cubic in this context is known as the Weierstrass (1815–1897) form in the context of elliptic curves. That is, an elliptic curve of the form

y² = x³ + ax + b

is said to be in Weierstrass form. In other words, an elliptic curve is in Weierstrass form if the right hand side is a depressed cubic.

Furthermore, an elliptic curve is required to be non-singular, which means it must satisfy

4 + 27b² ≠ 0.

In other words, the discriminant of the right hand side is non-zero. In the context of elliptic curves, the discriminant is defined to be

Δ = -16(4 + 27b²)

which is the same as the discriminant above, except for a factor of 16 that simplifies some calculations with elliptic curves.

A note on fields

In the context of solving quadratic and cubic equations, we’re usually implicitly working with real or complex numbers. Suppose all the coefficients of a quadratic equation are real. I the discriminant is positive, there are two distinct real roots. If the discriminant is negative, there are two distinct complex roots, and these roots are complex conjugates of each other.

Similar remarks hold for cubic equations when the coefficients are all real. If the discriminant is positive, there are three distinct real roots. If the discriminant is negative, there is one real root and a complex conjugate pair of complex roots.

In the first section I only considered whether the discriminant was zero, and so the statements are independent of the field the coefficients come from.

For elliptic curves, one works over a variety of fields. Maybe real or complex numbers, but also finite fields. In most of the blog posts I’ve written about elliptic curves, the field is integers modulo a large prime.

Related posts

Quantum leaps

A literal quantum leap is a discrete change, typically extremely small [1].

A metaphorical quantum leap is a sudden, large change.

I can’t think of a good metaphor for a small but discrete change. I was reaching for such a metaphor recently and my first thought was “quantum leap,” though that would imply something much bigger than I had in mind.

Sometimes progress comes in small discrete jumps, and only in such jumps. Or at least that’s how it feels.

There’s a mathematical model for this called the single big jump principle. If you make a series of jumps according to a fat tailed probability distribution, most of your progress will come from your largest jump alone.

Your distribution can be continuous, and yet there’s something subjectively discrete about it. If you have a Lévy distribution, for example, your jumps can be any size, and so they are continuous in that sense. But when the lion’s share of your progress comes from one jump, it feels discrete, as if the big jump counted and the little ones didn’t.

Related posts

[1] A literal quantum leap, such an electron moving from one energy level to another in a hydrogen atom, is on the order of a billionth of a billionth of a joule. A joule is roughly the amount of energy needed to bring a hamburger to your mouth.

Squircle perimeter and the isoparametric problem

If you have a fixed length of rope and you want to enclose the most area inside the rope, make it into a circle. This is the solution to the so-called isoparametric problem.

Dido’s problem is similar. If one side of your bounded area is given by a straight line, make your rope into a semi-circle.

This post looks at a similar problem for a squircle. Peter Panholzer mentioned the problem of finding the squircle exponent that led to the largest area in proportion to its arclength. This sounds like the isoparametric problem, but it’s not.

The isoparametric problem holds perimeter constant and lets the shape enclosed vary, maximizing the area. The question here is to hold the axes constant and maximize the ratio of the area to the perimeter. Panholzer reports the maximum occurs at p = 4.39365.

Computing the perimeter

The volume of a squircle can be found in closed form, and I’ve mentioned the equation a few times, for example here. The perimeter, however, cannot be found in closed form, as far as I know, except for special exponents.

We can solve for y as a function of x and find the arclength using the formula taught in calculus courses. However, the derivative of y has a singularity at x = 1. By switching to polar coordinates, we can find arclength in terms of an integrand with no singularities. We can also simplify things a little by computing the total arclength as 4 times the arclength in the first quadrant; this avoids having to take absolute values.

The following Python code computes the perimeter and the ratio of the area to the perimeter.

    from scipy import sin, cos, pi
    from scipy.integrate import quad
    from scipy.special import gamma
    def r(t, p):
        return (cos(t)**p + sin(t)**p)**-(1/p)
    def drdt(t, p):
        deriv = (cos(t)**p + sin(t)**p)**(-1-1/p)
        deriv *= cos(t)**(p-1)*sin(t) - sin(t)**(p-1)*cos(t)
        return deriv
    def integrand(t, p):
        return (drdt(t,p)**2 + r(t,p)**2)**0.5
    def arclength(p):
        integral = quad(lambda t: integrand(t,p), 0, pi/2)[0]
        return 4*integral
    def area(p):
        return 4*gamma(1 + 1/p)**2 / gamma(1 + 2/p)
    def ratio(p):
        return area(p) / arclength(p)

Basic geometry tells us the ratio is 1/2 when p = 2 and we have a circle. The ratio is also 1/2 when p = ∞, i.e. when we have a square. We can use the code above to find that the ratio when p = 0.528627, so there is at least one local maximum for values of p between 2 and ∞.

Here’s a plot of the ratio of area to perimeter as a function of p.

ratio of area to perimeter for squircle

The plot is quite flat for large values of p, but if we zoom in we can see that there is a local maximum near 4.4.

close up of previous graph near the maximum

When I find the maximum of the ratio function using Brent’s method (scipy.optimize.brent) I find p = 4.39365679, which agrees with the value Panholzer stated.

Here’s a plot of the squircle corresponding to this value of p.

squircle with largest area to perimeter ratio

Back to the isoparametric problem

Now why doesn’t this contradict the isoparametric problem? Area scales quadratically but perimeter scales linearly. If you don’t hold perimeter constant, you can find a larger ratio of area to volume by making the perimeter larger. And that’s what we’ve done. When p = 2, we have a unit circle, with area π and perimeter 2π. When p = 4.39365679 we have area 3.750961567 and permimeter 7.09566295. If we were to take a circle with the same perimeter, it would have area 4.00660097, larger than the squircle we started with.

Related posts

Ratio of Lebesgue norm ball volumes

As dimension increases, the ratio of volume between a unit ball and a unit cube goes to zero. Said another way, if you have a high-dimensional ball inside a high-dimensional box, nearly all the volume is in the corners. This is a surprising result when you first see it, but it’s well known among people who work with such things.

In terms of Lp (Lebesgue) norms, this says that the ratio of the volume of the 2-norm ball to that of the ∞-norm ball goes to zero. More generally, you could prove, using the volume formula in the previous post, that if p < q, then the ratio of the volume of a p-norm ball to that of a q-norm ball goes to zero as the dimension n goes to infinity.

Proof sketch: Write down the volume ratio, take logs, use the asymptotic series for log gamma, slug it out.

Here’s a plot comparing p = 2 and q = 3.

Plot of volume ratio for balls in L2 and L3 norm as dimension increases

Related posts

Higher dimensional squircles

The previous post looked at what exponent makes the area of a squircle midway between the area of a square and circle of the same radius. We could ask the analogous question in three dimensions, or in any dimension.

(What do you call a shape between a cube and a sphere? A cuere? A sphube?)


The sphube

In more conventional mathematical terminology, higher dimensional squircles are balls under Lp norms. The unit ball in n dimensions under the Lp norm has volume

2^n \frac{\Gamma\left(1 + \frac{1}{p}\right)^n}{\Gamma\left( 1 + \frac{n}{p} \right)}

We’re asking to solve for p so the volume of a p-norm ball is midway between that of 2-norm ball and an ∞-norm ball. We can compute this with the following Mathematica code.

    v[p_, n_] := 2^n Gamma[1 + 1/p]^n / Gamma[1 + n/p]
            v[p, n] - (2^n + v[2, n])/2, 
            {p, 3}
        {n, 2, 10}

This shows that the value of p increases steadily with dimension:


We saw the value 3.16204 in the previous post. The result for three dimensions is 3.43184, etc. The image above uses the solution for n = 3, and so it has volume halfway between that of a sphere and a cube.

In order to keep the total volume midway between that of a cube and a sphere, p has to increase with dimension, making each 2-D cross section more and more like a square.

Here’s the Mathematica code to draw the cuere/sphube.

    p = 3.43184
         Abs[x]^p + Abs[y]^p + Abs[z]^p == 1, 
         {x, -1, 1}, 
         {y, -1, 1}, 
         {z, -1, 1}

History of the “Squircle”

Architect Peter Panholzer coined the term “squircle” in the summer of 1966 while working for Gerald Robinson. Robinson had seen a Scientific American article on the superellipse shape popularized by Piet Hein and suggested Panholzer use the shape in a project.

Piet Hein used the term superellipse for a compromise between an ellipse and a rectangle, and the term “supercircle” for the special case of axes of equal length. While Piet Hein popularized the superellipse shape, the discovery of the shape goes back to Gabriel Lamé in 1818.

Squircle with p = 3.162034

You can find more on the superellipse and squircle by following these links, but essentially the idea is to take the equation for an ellipse or circle and replace the exponent 2 with a larger exponent. The larger the exponent is, the closer the superellipse is to being a rectangle, and the closer the supercircle/squircle is to being a square.

Panholzer contacted me in response to my article on squircles. He gives several pieces of evidence to support his claim to have been the first to use the term. One is a letter from his employer at the time, Gerald Robinson. He also cites these links. [However, see Andrew Dalke’s comment below.]

Optimal exponent

As mentioned above, squircles and more generally superellipses, involve an exponent p. The case p = 2 gives a circle. As p goes to infinity, the squircle converges to a square. As p goes to 0, you get a star-shape as shown here. As noted in that same post, Apple uses p = 4 in some designs. The Sergels Torg fountain in Stockholm is a superellipse with p = 2.5. Gerald Robinson designed a parking garage using a superellipse with p = e = 2.71828.

Panholzer experimented with various exponents [1] and decided that the optimal value of p would be the one for which the squircle has an area half way between the circle and corresponding square. This would create visual interest, leaving the viewer undecided whether the shape is closer to a circle or square.

The area of the portion of the unit circle contained in the first quadrant is π/4, and so we want to find the exponent p such that the area of the squircle in the first quadrant is (1 + π/4)/2. This means we need to solve

\int_0^1 (1 - x^p)^{1/p}\, dx = \frac{\Gamma\left(\frac{p+1}{p}\right)^2}{\Gamma\left(\frac{p+2}{p} \right )} = \frac{1}{2} + \frac{\pi}{8}

We can solve this numerically [2] to find p = 3.1620. It would be a nice coincidence if the solution were π, but it’s not quite.

Sometime around 1966 Panholzer had a conference table made in the shape of a squircle with this exponent.


I asked Panholzer how he created his squircles, and whether he had access to a computer in 1966. He did use a computer to find the optimal value of p; his brother in law, Hans Thurow, had access to a computer at McPhar Geophysics in Toronto. But he drew the plots by hand.

There was no plotter around at that time, but I used transparent vellum over graph paper and my architectural drawing skills with “French curves” to draw 15 squircles from p=2.6 (obviously “circlish”) to p=4.0 (obviously “squarish”).

Related posts

[1] The 15 plots mentioned in the quote at the end came first. A survey found that people preferred the curve corresponding to p around 3.1. Later the solution to the equation for the area to be half way between that of a circle and a square produced a similar value.

[2] Here are a couple lines of Mathematica code to find p.

    f[p_] := Gamma[1 + 1/p]^2/Gamma[1 + 2/p]
    FindRoot[f[p] - (1 + Pi/4)/2, {p, 4}]

The 4 in the final argument to FindRoot is just a suggested starting point for the search.

Finite rings

It occurred to me recently that I rarely hear about finite rings. I did a Google Ngram search to make sure this isn’t just my experience.

Finite group, finite ring, finite field ngram


Why are finite groups and finite fields common while finite rings are not?

Finite groups have relatively weak algebraic structure, and demonstrate a lot of variety. Finite fields have very strong algebraic structure. Their complete classification has been known for a long time and is easy to state.

I imagine that most of the references to finite groups above have to do with classifying finite groups, and that most of the references to finite fields have to do with applications of finite fields, which are many.

You can see that references to finite groups hit their peak around the time of the Feit-Thompson theorem in 1962, and drop sharply after the classification of finite simple groups was essentially done in 1994. There’s a timeline of the progress toward the classification theorem on Wikipedia.

Rings have more structure than groups, but less structure than fields. Finite rings in particular are in a kind of delicate position: they easily become fields. Wedderburn’s little theorem says every finite domain is a field.

The classification of finite rings is much simpler than that of finite groups. And in applications you often want a finite field. Even if a finite ring (not necessarily a field) would do, you’d often use a finite field anyway.

In summary, my speculation as to why you don’t hear much about finite rings is that they’re not as interesting to classify as finite groups, and not as useful in application as finite fields.

Posts on finite simple groups

Posts on finite fields