Gauss introduced the notation [*x*] for the greatest integer less than or equal to *x* in 1808. The notation was standard until relatively recently, though some authors used the same notation to mean the integer part of *x*. The two definitions agree if *x* is positive, but not if *x* is negative.

Not only is there an ambiguity between the two meanings of [*x*], it’s not immediately obvious that there *is* an ambiguity since we naturally think first of positive numbers. This leads to latent errors, such as software that works fine until the first person gives something a negative input.

In 1962 Kenneth Iverson introduced the notation ⌊*x*⌋ (“floor of *x*“) and ⌈*x*⌉ (“ceiling of *x*“) in his book *A Programming Language*, the book that introduced APL. According to Concrete Mathematics, Iverson

found that typesetters could handle the symbols by shaving off the tops and bottoms of ‘[‘ and ‘]’.

This slight modification of the existing notation made things much clearer. The notation [*x*] is not mnemonic, but clearly ⌊*x*⌋ means to move down and ⌈*x*⌉ means to move up.

Before Iverson introduced his ceiling function, there wasn’t a standard notation for the smallest integer greater than or equal to *x*. If you did need to refer to what we now call the ceiling function, it was awkward to do so. And if there was a symmetry in some operation between rounding down and rounding up, the symmetry was obscured by asymmetric notation.

My impression is that ⌊*x*⌋ became more common than [*x*] somewhere around 1990, maybe earlier in computer science and later in mathematics.

Iverson’s introduction of the floor and ceiling functions was brilliant. The notation is mnemonic, and it filled what in retrospect was a gaping hole. In hindsight, it’s obvious that if you have a notation for what we now call floor, you should also have a notation for what we now call ceiling.

Iverson also introduced the indicator function notation, putting a Boolean expression in brackets to denote the function that is 1 when the expression is true and 0 when the expression is false. Like his floor and ceiling notation, the indicator function notation is brilliant. I give an example of this notation in action here.

I had a small consulting project once where my main contribution was to introduce indicator function notation. That simple change in notation made it clear how to untangle a complicated calculation.

Since two of Iverson’s notations were so simple and useful, might there be more? He introduced a lot of new notations in his programming language APL, and so it makes sense to mine APL for more notations that might be useful. But at least in my experience, that hasn’t paid off.

I’ve tried to read Iverson’s lecture Notation as a Tool of Thought several times, and every time I’ve given up in frustration. Judging by which notations have been widely adopted, the consensus seems to be that the floor, ceiling, and indicator function notations were the only ones worth stealing from APL.

The post Floor, ceiling, bracket first appeared on John D. Cook.]]>The circle fits inside the square better than the square fits inside the circle. That is, the ratio of the area of the circle to the area of the circumscribed square is larger than the ratio of the area of the inscribed square to the area of the circle.

David Singmaster [1] generalized this theorem to higher dimensions as follows:

The

n-ball fits better in then-cube than then-cube fits in then-ball if and only ifn≤ 8.

Singmaster rigorously proves his statement in his paper. Here I will illustrate that it’s true with a graph.

We need three facts. First, the volume of a ball in *n* dimensions is

*r*^{n} π^{n/2} / Γ((*n*+2)/2).

Second, the volume of a cube in *n* dimensions with edge 2*r* is

(2*r*)^{n}.

And finally, the edge of the inscribed cube is 2/√*n*. (To see this is right, note that the distance from the center of the cube to a corner is 1.)

This gives us all we need to create the following plot.

Both ratio curves go to zero quickly as *n* increases. (Note that the vertical scale is logarithmic, so the curves are going to zero exponentially.) This result is surprising but well known. Singmaster’s theorem about the ratio of the two curves is not as well known.

[1] David Singmaster. On Round Pegs in Square Holes and Square Pegs in Round Holes. Mathematics Magazine, Nov., 1964, Vol. 37, No. 5, pp. 335-337

The post Box in ball in box in high dimension first appeared on John D. Cook.]]>In this post I want to look again at

and

It turns out that the approximations above are both Padé approximants [1], rational functions that match the first few terms of the power series of the function being approximated.

“First few” means up to degree *m* + *n* where *m* is the degree of the numerator and *n* is the degree of the denominator. In our examples, *m* = *n* = 1, and so the series terms up to order 2 match.

The approximations I wrote about before were derived by solving for a constant that made the approximation error vanish at the ends of the interval of interest. Note that there’s no interval in the definition of a Padé approximant.

Also, the constants that I derived were rounded in order to have something easy to compute mentally. The approximation for log, for example, works out to have a factor of 2.0413, but I rounded it to 2 for convenience.

And yet the end result is exactly was exactly a Padé approximant.

First let’s look at the exponential function. We can see that the series for our approximation and for exp match up to *x*².

The error in the Padé approximation for exp is less than the error in the 2nd order power series approximation for all *x* less than around 0.78.

Here again we see that our function and our approximation have series that agree up to the *x*² terms.

The error in the Padé approximation for log is less than the error in the 2nd order power series approximation for all *x*

[1] The other approximations I presented in that series are not Padé approximations.

The post More on why simple approximations work first appeared on John D. Cook.]]>If you set *x *= 0 and compute *f*(*x*) you will get exactly 0. Apply *f* a thousand times and you’ll never get anything but zero.

But this does not mean 0 is a stable attractor, and in fact it is not stable.

It’s easy to get mathematical stability and numerical stability confused, because **the latter often illustrates the former**. In this post I point out that the points on a particular unstable orbit cannot be represented exactly in floating point numbers, so iterations that start at the floating point representations of these points will drift away.

In the example here, 0 *can* be represented exactly, and so we do have a *computationally* stable fixed point. But 0 is not a stable fixed point in the mathematical sense. Any starting point close to 0 but not exactly 0 will eventually go all over the place.

If we start at some very small ε > 0, then *f*(ε) ≈ 4ε. Every iteration multiplies the result by approximately 4, and eventually the result is large enough that the approximation no longer holds.

For example, suppose we start with *x* = 10^{-100}. After 100 iterations we have

*x* = 4^{100} 10^{-100} = 1.6 × 10^{-40}.

If we were to plot this, we wouldn’t see any movement. But by the time we iterate our function 200 times, the result is chaotic. I get the following:

`<=>`

that returns a three-state comparison. The expression
a <=> b

evaluates to -1, 0, or 1 depending on whether *a* < *b, **a* = *b*, or *a >* *b*. You could think of `<=>`

as a concatenation of `<`

, `=`

, and `>`

.

The `<=>`

operator is often called the “spaceship operator” because it looks like Darth Vader’s ship in Star Wars.

Python doesn’t have a spaceship operator, but you can get the same effect with `numpy.sign(a-b)`

. For example, suppose you wanted to write a program to compare two integers.

You could write

from numpy import sign def compare(x, y): cmp = ["equal to", "greater than", "less than"][sign(x-y)] print(f"{x} is {cmp} {y}.")

Here we take advantage of the fact that an index of -1 points to the last element of a list.

The `sign`

function will return an integer if its argument is an integer or a float if its argument is a float. The code above will break if you pass in floating point numbers because `sign`

will return -1.0, 0.0, or 1.0. But if you replace `sign(x-y)`

with `int(sign(x-y))`

it will work for floating point arguments.

**Related post**: Symbol pronunciation

64 *x*³ – 112 *x*² + 56 *x* – 7

and

64 *x*³ – 96 *x*² + 36 *x* – 3.

I ended the post by saying you could find their roots numerically using the `NSolve`

function in Mathematica.

What if you wanted to find the roots exactly? You could try using the `NSolve`

function rather than the `NSolve`

solve function, but you won’t get anything back. Doesn’t Mathematica know how to solve a cubic equation? Yes it does, and you can force it to use the cubic equation by using the `ToRadicals`

function.

If you apply `ToRadicals`

to the roots, Mathematica will give you the roots in radical form, with expressions involving complex numbers. For example, here’s the smallest root of the first polynomial:

But the roots of these polynomials are clearly real as you can see from their graphs.

.

If we didn’t have access to a graph, we could show that the roots are real and distinct by computing the discriminant and seeing that it is positive.

The polynomials have degree less than 5, so their roots can be expressed in terms of radicals. And the roots are real. But they cannot be expressed in terms of real radicals!

This an example of the “casus irreducibilis.” Most cubic equations with real roots cannot be solved using real radicals. The casus irreducibilis theorem says that the roots of a cubic polynomial can be written in terms of *real* radicals if and only if one of the roots is rational.

This says there’s a sort of **excluded middle ground**: the roots cannot involve cubic roots with real numbers. They are either simpler or more complex. If there is a rational root, you can factor it out and find the other two roots with the quadratic equation, i.e. only using *square* roots, not cubic roots. But if there is no rational root, expressing the roots in terms of radicals requires cube roots and complex numbers.

The rational root test says that if a polynomial has a rational root *p*/*q* then *q* must be a factor of the leading coefficient and *p* must be a factor of the constant term. So in our first cubic

64 *x*³ – 112 *x*² + 56 *x* – 7

any rational root must have denominator 1, 2, 4, …, 64 and numerator either 1 or 7. We can check all the possibilities and see that none of them are zeros.

Similarly the only possible rational roots of

64 *x*³ – 96 *x*² + 36 *x* – 3

must have denominator 1, 2, 4, …, 64 and numerator either 1 or 3, and none of them are zeros.

So the roots of our cubic polynomials can be expressed in terms of radicals, but not without complex numbers, even though the roots are all real.

The reason we were interested in these two polynomials is that we were looking for orbits of period 3 under iterations of 4*x*(1-*x*). And we found them. But they form an unstable orbit.

If we make a cobweb plot of the iterations starting at any one of the roots, it appears that multiple applications of 4*x*(1-*x*) just rotate between the three roots.

But with a few more iterations—the plot below shows 100 iterations—we can see that these points are not stable. Unless you start exactly on one of the roots, which you cannot since they’re irrational, you will wander away from them.

The post Real radical roots first appeared on John D. Cook.]]>Now suppose you want to go “upstream” in Sharkovskii’s chain of implications. If you have a point with period 5, do you have a point with period 3? The answer is no: a map can have points with period 5 without having points of period 3, though it necessarily has points with all other positive integer periods except 3.

There’s an example illustrating the claim above that I’ve seen in multiple places, but I haven’t seen it presented graphically. You could work through the example analytically, but here I present it graphically.

This is the example function written in Python.

def f(x): assert(1 <= x <= 5) if x < 2: return 1 + 2*x if x < 3: return 7 - x if x < 4: return 10 - 2*x if x <= 5: return 6 - x

Here’s a graphical demonstration that *f* has a fixed point, but no points of period 3.

The only point fixed under applying *f* three times is the point that was already fixed under applying *f* once.

This graph shows that *f* has points with period 5:

By Sharkovskii’s theorem *f* must have points with all other periods, except 3. Here’s a demonstration that it has points with period 6.

The map *f* is chaotic, but it does not have a point with period 3.

Let’s look at the most famous chaotic map, the logistic map.

*f*(*x*) = *rx* (1 – *x*)

where *x* is in [0, 1] and *r* is in [0. 4].

The images above shows orbits as *r* ranges over [0, 4]. Clearly *f* has points with period 2. There’s a whole interval of values of *r* that lead to points with period 2, roughly for *r* between 3 and 3.5. And we can see for *r* a little bigger there are points of period 4. But is there any point with period 3?

We can look for points of period 3 at the end of the plot, where *r* = 4, using Mathematica.

Define

f[x_] := 4 x (1 - x)

and look for points where *f*³(*x*) = *x* using

Factor[f[f[f[x]]] - x]

This shows that the solutions are the roots of

*x* (-3 + 4 *x*) (-7 + 56*x* – 112*x*² + 64 *x*³) (-3 + 36*x* – 96*x*² + 64*x*³)

The first two roots are fixed points, points of period 1, but the roots of the two cubic factors are points with period 3.

The cubics clearly have all their roots in the interval [0,1] and we could find their numerical values with

NSolve[f[f[f[x]]] == x, x]

Although the roots are fixed points, they are unstable fixed points, as demonstrated at the bottom the next post.

First of all, Mr. Sarkovsky is variously known Sharkovsky, Sharkovskii, etc. As with many Slavic names, his name can be anglicized multiple ways. You might use the regular expression `Sh?arkovsk(ii|y)`

in a search.

The theorem in the previous post, by Li and Yorke, says that if a continuous function from a closed interval to itself has a point with period three, it has points with all positive periods. This was published in 1975.

Unbeknownst to Li and Yorke, and everyone else in the West at the time, Sarkovsky had published a more general result in 1964 in a Ukrainian journal. He demonstrated a total order on the positive integers so that the existence of a point with a given period implies the existence of points with all periods further down the sequence. The sequence starts with 3, and every other positive integer is in the sequence somewhere, so period 3 implies the rest.

Sarkivsky showed that period 3 implies period 5, period 5 implies period 7, period 7 implies period 9, etc. If a continuous map of an interval to itself has a point of odd period *n* > 1, it has points with order given by all odd numbers larger than *n*. That is, Sarkovsky’s order starts out

3 > 5 > 7 > …

The sequence continues

… 2×3 > 2×5 > 2×7 > …

then

… 2²×3 > 2²×5 > 2²×7 > …

then

… 2³×3 > 2³×5 > 2³×7 > …

and so on for all powers of 2 times odd numbers greater than 1.

The sequence ends with the powers of 2 in reverse order

… 2³ > 2² > 1.

Here’s Python code to determine whether period *m* implies period *n*, assuming *m* and *n* are not equal.

from sympy import factorint # Return whether m comes befor n in Sarkovsky order def before(m, n): assert(m != n) if m == 1 or n == 1: return m > n m_factors = factorint(m) n_factors = factorint(n) m_odd = 2 not in m_factors n_odd = 2 not in n_factors m_power_of_2 = len(m_factors) == 1 and not m_odd n_power_of_2 = len(n_factors) == 1 and not n_odd if m_odd: return m < n if n_odd else True if m_power_of_2: return m > n if n_power_of_2 else False # m is even and not a power of 2 if n_odd: return False if n_power_of_2: return True if m_factors[2] < n_factors[2]: return True if m_factors[2] > n_factors[2]: return False return m < n

**Next post**: Can you swim “upstream” in Sarkovsky’s order?

[1] There are two parts to the paper of Li and Yorke. First, that period three implies all other periods. This is a very special case of Sarkovsky’s theorem. But Li and Yorke also proved that period three implies an uncountable number of non-periodic points, which is not part of Sarkovsky’s paper.

The post Sarkovsky’s theorem first appeared on John D. Cook.]]>This post will look at what the statement means, and the next post will look at a generalization.

First of all, the theorem refers to period in the sense of **function iterations**, *not* in the sense of **translations**. And it applies to particular **points**, not to the **function** as a whole.

The sine function is periodic in the sense that it doesn’t change if you shift it by 2π. That is,

sin(2π+*x*) = sin(*x*)

for all *x*. The sine function has period 2π in the sense of translations.

In dynamical systems, period refers to getting the same result, not when you shift a function, but when you apply it to itself. A point *x* has period *n *under a function *f* if applying the function *n* times gives you *x* back, but applying it any less than *n* times does not.

So, for example, the function *f*(*x*) = –*x* has period 2 for non-zero *x*, and period 1 for *x* = 0.

Period three specifically means

*f*( *f*( *f*(*x*) ) ) = *x*

but

*x* ≠ *f*(*x*)

and

*x* ≠ *f*( *f*(*x*) ).

Note that this is a property of *x*, not a property of *f* per se. That is, it is a property of *f* that one such *x* **exists**, but it’s not true of all points.

In fact it’s necessarily **far** from true for all points, which leads to what we mean by chaos.

If *f* is a continuous function from some interval *I* to itself, it cannot be the case that all points have period 3.

If one point has period 3, then some other point must have period 4. And another point must have period 97. And some point has period 1776.

If some point in *I* has period 3, then there are points in *I* that have period *n* for **all** positive *n*. Buy one, get infinitely many for free.

And there are some points that are not periodic, but every point in *I* is arbitrarily close to a point that is periodic. That is, the periodic points are dense in the interval. That is what is meant by chaos [2].

This is really an amazing theorem. It says that if there is one point that satisfies a simple condition (period three) then the function as a whole must have very complicated behavior as far as iteration is concerned.

If the existence of a point with period 3 implies the existence of points with every other period, what can you say about, for example, period 10? That’s the subject of the next post.

[1] Li, T.Y.; Yorke, J.A. (1975). “Period Three Implies Chaos” American Mathematical Monthly. 82 (10): 985–92.

[2] There is no single definition of chaos. Some take the existence of dense periodic orbits as the defining characteristic of a chaotic system. See, for example, [3].

[3] Bernd Aulbach and Bernd Kieninger. On Three Definitions of Chaos. Nonlinear Dynamics and Systems Theory, 1(1) (2001) 23–37

The post Period three implies chaos first appeared on John D. Cook.]]>log(*x*) ≈ (2*x* – 2)(*x* + 1)

for *x* between exp(-0.5) and exp(0.5). It’s accurate enough for quick mental estimates.

I recently found an approximation by Ronald Doerfler that is a little more complicated but much more accurate:

log(*x*) ≈ 6(*x* – 1)/(*x* + 1 + 4√*x*)

for *x* in the same range. This comes from Doerfler’s book Dead Reckoning.

It requires calculating a square root, and in exchange for this complication gives about three orders of magnitude better approximation. You could use it, for example, on a calculator that has a square root key but not log key. Or if you’re dead reckoning, you need to take a square root by hand.

Here’s a plot of the error for both approximations.

The simpler approximation has error about 10^{-2} on each end, whereas Doerfler’s algorithm has error about 10^{-5} on each end.

If you can reduce your range to [-1/√2, √2] by pulling out powers of 2 first and remembering the value of log(2), then Doerfler’s algorithm is about six times more accurate.

By the way, you might wonder where Doerfler’s approximation comes from. It’s not recognizable as, say, a series expansion. It comes from doing Richardson extrapolation, but algebraically simplified rather than expressing it as an algorithm.

The post Better approximation for ln, still doable by hand first appeared on John D. Cook.]]>The beta distribution has two positive parameters, *a* and *b*, and has probability density proportional to [1]

for *x* between 0 and 1.

The mean of a beta(*a*, *b*) distribution is

and the variance is

Given μ and σ² we want to solve for *a* and *b*. In order for the problem to be meaningful μ must be between 0 and 1, and σ² must be less than μ(1-μ). [2]

As we will see shortly, these two necessary conditions for a solution are also sufficient.

Graphically, we want to find the intersection of a line of constant mean

with a line of constant variance.

Note that the scales in the two plots differ.

If we set

then *b* = *ka* and so we can eliminate *b* from the equation for variance to get

Now since *a* > 0, we can divide by *a*²

and from there solve for *a*:

and *b* = *ka*.

We require σ² to be less than μ(1-μ), or equivalently we require the ratio of μ(1-μ) to σ² to be greater than 1. It works out that the solution *a* is the product of the mean and the amount by which the ratio of μ(1-μ) to σ² exceeds 1.

Here is a little code to check for errors in the derivation above. It generates μ and σ² values at random, solves for *a* and *b*, then checks that the beta(*a*, *b*) distribution has the specified mean and variance.

from scipy.stats import uniform, beta for _ in range(100): mu = uniform.rvs() sigma2 = uniform.rvs(0, mu*(1-mu)) a = mu*(mu*(1-mu)/sigma2 - 1) b = a*(1-mu)/mu x = beta(a, b) assert( abs(x.mean() - mu) < 1e-10) assert( abs(x.var() - sigma2) < 1e-10) print("Done")

- Determining a distribution from two quantiles
- Error in the normal approximation to a beta
- Diagram of probability distribution relationships

[1] It’s often easiest to think of probability densities ignoring proportionality constants. Densities integrate to 1, so the proportionality constants are determined by the rest of the expression for the density. In the case of the beta distribution, the proportionality constant works out to Γ(*a* + *b*) / Γ(*a*) Γ(*b*).

[2] The variance of a beta distribution factors into μ(1-μ)/(*a* + *b* + 1), so it is less than μ(1-μ).

And by almost true, I mean correct to well over 200 decimal places. This sum comes from [1]. Here I will show why the two sides are very nearly equal and why they’re not exactly equal.

Let’s explore the numerator of the sum with a little code.

>>> from math import tanh, pi >>> for n in range(1, 11): print(n*tanh(pi)) 0.99627207622075 1.9925441524415 2.98881622866225 3.985088304883 .... 10.95899283842825

When we take the floor (the integer part [2]) of the numbers above, the pattern seems to be

⌊*n* tanh π⌋ = *n *– 1

If the pattern continues, our sum would be 1/81. To see this, multiply the series by 100, evaluate the equation below at *x* = 1/10, and divide by 100.

Our sum is close to 1/81, but not exactly equal to it, because

⌊*n* tanh π⌋ = *n *– 1

holds for a lot of *n*‘s but not for all *n*.

Note that

tanh π = 0.996… = 1 – 0.00372…

and so

⌊*n* tanh π⌋ = *n *– 1

will hold as long as *n* < 1/0.00372… = 268.2…

Now

⌊268 tanh π⌋ = 268-1

but

⌊269 tanh π⌋ = 269-2.

So the 269th term on the left side

is less than the 269th term of the sum

10^{-2} + 2×10^{-3} + 3×10^{-4} + … = 1/81

for the right side.

We can compare the decimal expansions of both sides by using the Mathematica command

N[Sum[Floor[n Tanh[Pi]]/10^n, {n, 1, 300}], 300]

This shows the following:

[1] J. M. Borwein and P. B. Borwein. Strange Series and High Precision Fraud. The American Mathematical Monthly, Vol. 99, No. 7, pp. 622-640

[2] The floor of a real number *x* is the greatest integer ≤ *x*. For positive *x*, this is the integer part of *x*, but not for negative *x*.

Let *a* and *b* be two positive numbers. Then the arithmetic and geometric means are defined by

*A*(*a*, *b*) = (*a* + *b*)/2

*G(a, b*) = √(*ab*)

The **arithmetic-geometric mean** (AGM) of *a* and *b* is the limit of the sequence that takes the arithmetic and geometric mean of the arithmetic and geometric mean over and over. Specifically, the sequence starts with

*a*_{0} = *A*(*a*, *b*)

*b*_{0} = *G*(*a*, *b*)

and continues

*a*_{n+1} = *A*(*a*_{n}, *b*_{n})

*b*_{n+1} = *G*(*a*_{n}, *b*_{n})

AGM(*a*, *b*) is defined as the common limit of the *a*‘s and the *b*‘s [1].

The arithmetic mean dominates the geometric mean. That is,

*G*(*a*, *b*) ≤ *A*(*a*, *b*)

with equality if and only if *a* = *b*. So the *a*‘s converge down to AGM(*a*, *b*) and the *b*‘s converge up to AGM(*a*, *b*). The AGM is between the arithmetic and geometric means.

There’s another way to create means between the arithmetic and geometric means, and that is by varying the parameter *r* in the family of means

The arithmetic mean corresponds to *r* = 1 and the geometric mean corresponds to the limit as *r* approaches 0. So if *r* is between 1 and 0, we have a mean that’s somewhere between the arithmetic and geometric mean. For a fixed argument, these means are an increasing function of *r* as discussed here.

So here’s the idea I wanted to get to. Since the AGM is between the arithmetic and geometric mean, as are the *r*-means for *r* between 0 and 1, is there some value of *r* where the AGM is well approximated by an *r*-mean? This question was inspired by the result I wrote about here that the perimeter of an ellipse is very well approximated by an *r*-mean of its axes.

We can simplify things a little by assuming one of our arguments is 1. This is because all the means mentioned here are homogeneous, i.e. you can pull out constants.

I looked at AGM(1, *x*) for *x* ranging from 1 to 100 and compared it to *r*-means for varying values of *r* and found that I got a good fit for *r* = 0.415. I have no theory behind this, just tinkering. The optimal value depends on how you measure the error, and probably depends on the range of *x*.

When I plot AGM(1, *x*) and *M*_{0.415}(1, *x*) it’s hard to tell the lines apart. Here’s what I get for their relative difference.

There’s a connection between the AGM and elliptic functions, and so maybe *r*-means provide useful approximations to elliptic functions.

[1] The sequence converges *very* rapidly. I intended to show a plot, but the convergence is so rapid that it’s hard to plot. If I start with *a* = 1 and *b* = 100, then *a* and *b* agree to 44 significant figures after just 7 iterations.

We can extend Newton’s method to find cube roots and *n*th roots in general. And when we do, we begin to see a connection to *r*-means. I’ve written about these means several times, most recently in connection with finding the perimeter of an ellipse and the surface area of an ellipsoid.

To find the *n*th root of *y*, we apply Newton’s root-finding method to find where the function

is zero. We start with an initial estimate *x*_{0} and our updated estimate is

When *n* = 2, this reduces to the method in our previous post: the updated estimate *x*_{1} equals the average of our initial estimate *x*_{0} and *y*/*x*_{0}. That is, our updated estimate is the **arithmetic mean** of *x*_{0} and *y*/*x*_{0}, and the **geometric mean** of the two terms is the square root of *y*.

For *n* in general, we have two terms whose geometric mean is the *n*th root of *y*, and we take their weighted arithmetic mean. Said another way, our updated estimate is a convex combination of these two terms. The rest of the post will explore this further and point out some connections.

The ellipse and ellipsoid posts mention above make use of the means

as defined in Hardy, Littlewood, and Pólya. More generally the authors define

where the weights *p*_{i} are positive numbers that sum to 1. The unweighted mean corresponds to the special case where all *p*_{i} equal 1/*n*.

The authors also define the geometric mean

and weighted extensions which we will not need here.

We note two connections between the geometric mean and the *r*-mean. First, the geometric mean is the exponential of the arithmetic mean of the logarithms.

Second, the geometric mean is the limit of the *r*-means as *r* goes to 0, and so you could define the *r*-mean with *r* = 0 to be the geometric mean.

Reading Newton’s method for *n*th roots in terms of *r* means, we take an initial estimate *x*_{0} and an auxiliary estimate *x*_{0}‘ such that their geometric mean is the *n*th root we’re after. Then we take the arithmetic mean of *x*_{0} and *x*_{0}‘ with weights (*n*-1)/*n* and 1/*n*.

That is, let

and solve for x_0′ such that

Then

where

If we write the geometric mean as the *r*-mean with *r* = 0, we could describe Newton’s method entirely in terms of *r*-means.

If you want to compute square roots mentally or with pencil and paper, how accurate can you get with this method? Could you, for example, get within 1%?

Obviously the answer depends on your guess. One way to form an initial guess is to round *x* up to the nearest square and take the root of that as your guess. In symbols, we take ⌈√*x*⌉ as our guess.

For example, if *x* = 38, you could take 7 as your guess since 49 is the next square after 38. In that case you’d get

(7 + 38/7)/2 = 6.214

The correct value to three places is 6.164, so our error is about 0.8%.

But we could do better. 38 is much closer to 36 than to 49, so we could take 6 as our initial guess. That is, ⌊√38⌋. Then we have

(6 + 38/6) = 6.167

and our relative error is less than 0.04%.

We’ll look at three strategies:

- ⌊√
*x*⌋ - ⌈√
*x*⌉ - ⌊√
*x*⌋ or ⌈√*x*⌉

In strategy 3, we choose the guess whose square is closer to *x*.

Here’s what we get when we use the method above to estimate the square roots of the numbers from 1 to 100.

Guess 1 has maximum error 15.5%, where as guess 2 and guess 3 have maximum error 6%.

Guess 2, taking the ceiling, is generally better than guess 1, taking the floor. But guess 3, taking the better of the two, is even better.

More important than which of the three methods used to find the initial guess is being closer to the far end of the range. We could assume *x* is between 25 and 100 if we first multiply *x* by a square if necessary to get it into that range. To find the square root of 13, for example, we multiply by 4 and calculate the square root of 13 as half the square root of 52.

If we assume *x* is between 25 and 100, the maximum errors for the three initial guess methods are 1.4%, 1.3%, and 0.4%.

So to answer our initial question, yes, you can get relative error less than 1%, but you have to do a couple things. You have to multiply by a square to get your number into the range [25, 100] and you have to use the third method of guessing a starting approximation.

You don’t have to move your number into the range [25, 100] if you put a little more effort into your initial guess. In the example of *x* = 13 mentioned above, multiplying by 4 before taking the square root is essentially the same as taking 7/2 as your initial guess for √13 instead of using 3 or 4 as an initial guess.

In short, our discussion of the range on *x* was really a discussion of initial error in disguise. The size of *x* determines the quality of our initial guesses, but the size of *x* itself doesn’t matter to the relative error.

Here’s a plot of the final error as a function of the error in the initial guess.

Notice two things. First, the final error is roughly proportional to the square of the error in the initial guess. Second, it’s a little better to guess high than to guess low, which explains why the second method above for guessing starting points is a little better than the first.

Another way to get more accuracy is to repeat our process. That is, after we find our estimate, we use it as a new guess and apply our method again. For example, suppose we want to compute the square root of 30. We would find

(5 + 30/5)/2 = 5.5

(5.5 + 30/5.5)/2 = 5.47727

which is correct to four decimal places

We can find square roots to any accuracy we wish by applying this method enough times. In fact, what we’re doing is applying a special case Newton’s root-finding method.

We showed above that the error after each iteration is on the order of the square of the previous error. So roughly speaking, each iteration doubles the number of correct decimal places.

See the next post for an analogous method for computing cube roots and *n*th roots in general.

M + *e* sin *E* = *E*.

Given mean anomaly *M* and eccentricity *e*, you want to solve for eccentric anomaly *E*.

There is a simple way to solve this equation. Define

*f*(*E*) = *M* + *e* sin *E*

and take an initial guess at the solution and stick it into *f*. Then take the output and stick it back into *f*, over and over, until you find a fixed point, i.e. *f*(*E*) = *E*.

The algorithm above is elegant, and practical if you only need to do it once. However, if you need to solve Kepler’s equation billions of times, say in the process of tracking satellite debris, this isn’t fast enough.

An obvious improvement would be to use Newton’s root-finding method rather than the simple iteration scheme above, and this isn’t far from the state of the art. However, there have been improvements over Newton’s method, and a paper posted on arXiv this week gives an algorithm that is about 3 times faster than Newton’s method [1].

This paper is an example of a common pattern in applied math. It starts with a simple problem that has a simple solution, but this simple solution doesn’t scale. And so we apply advanced mathematics to a problem formulated in terms of elementary mathematics.

In particular, the paper makes use of contour integration. This seems like a step backward in two ways.

First, we have a root-finding problem, but you want to turn it into an integration problem?! Isn’t root-finding faster than integration? Not in this case.

Second, not only are we introducing integration, we’re introducing integration in the complex plane. Isn’t complex analysis *complex*? Not in the colloquial sense. The use of “complex” as a technical term is unfortunate because complex analysis often *simplifies* problems. As Jacques Hadamard put it,

The shortest path between two truths in the real domain passes through the complex domain.

[1] Oliver H. E. Philcox, Jeremy Goodman, Zachary Slepian. Kepler’s Goat Herd: An Exact Solution for Elliptical Orbit Evolution. arXiv:2103.15829

The post Efficiently solving Kepler’s equation first appeared on John D. Cook.]]>where *m* is the month, *d* is the day, and *y* is the last two digits of the year.

The sum for today is unusually round:

By contrast, the sum from yesterday is nowhere near round:

Out of curiosity, I looked at the numbers making up today’s sum. First, let’s look at the plot of today’s exponential sum with the axes.

So the graph is not centered at the origin as you might implicitly expect. It does seem to be centered near zero on the real axis, but it’s centered somewhere near 8 on the imaginary axis.

If our graph is approximately a circle, its real and imaginary parts should be approximately sines and cosines. And that’s what we have.

Going back to our original equation, we have empirically discovered that

Showing why the sum on the left should be roughly equal to the expression on the right is left as an exercise to the reader. :)

The post Unusually round exponential sum first appeared on John D. Cook.]]>Ashley Kanter left a comment on Tuesday’s post Within one percent with an approximation I’d never seen.

One that I find handy is the hypotenuse of a right-triangle with other sides

aandb(wherea<b) can be approximated to within 1% by 5(a+b)/7 when 1.04 ≤b/a≤1.50.

That sounds crazy, but it’s right. Since we’re talking about relative error, not absolute error, we can assume without loss of generality that *a* = 1. Here’s a plot of the relative error in the approximation.

Here’s the Python code that produced the plot.

from numpy import linspace import matplotlib.pyplot as plt exact = lambda b: (b**2 + 1)**0.5 approx = lambda b: 5*(1+b)/7 rel_error = lambda b: abs((exact(b) - approx(b))/exact(b)) x = linspace(1, 1.5, 100) plt.plot(x, rel_error(x)) plt.xlabel("$b$") plt.ylabel("relative error") plt.title(r"$\sqrt{1+b^2} \approx 5(1+b)/7$") plt.savefig("hyp_approx.png")

You can solve a quadratic equation to find that the error is zero at *b* = 4/3. In fact, this gives a clue to where the approximation came from: it is exact for a (3, 4, 5) triangle.

Let’s see where the relative error equals 0.01.

from scipy.optimize import bisect left = bisect(lambda b: rel_error(b) - 0.01, 0.0, 1.3) right = bisect(lambda b: rel_error(b) - 0.01, 1.4, 1.6) print(left, right)

This shows that the approximation is within 1% over the interval [1.0354, 1.5087], or to use rounder numbers, [1.04, 1.50] as advertised. The error is only 1.02% if you take the interval to be [1, 1.50]. So if you take “approximately equal” to mean “within 1.02%” then we can restate the rule above as

If one leg of a right triangle is no more than 50% bigger than the other, then the hypotenuse is approximately 5(a+b)/7.

On a similar note, see this post about the case where *b* is much larger than *a*. If you want to measure across a room, for example, and you can’t quite measure to the point you’d like, it makes surprisingly little difference.

Coulomb’s law says that the force between two charged particles is proportional to the product of their charges and inversely proportional to the distance between them. In symbols,

The proportionality constant, the *k*_{e} term, is known as **Coulomb’s constant**.

What are the units on Coulomb’s constant? Well, they’re whatever they have to be. The left hand side is a force, so it’s measured in **newtons**, *N*. Charges are measured in **coulombs** and distances in **meters**, so the right hand side, aside from Coulomb’s constant, has units coulombs squared per meter squared, C² / m². So *k*_{e} must have units N m² / C².

OK, but what is a **coulomb**? That’s where things get interesting.

The informal definition that you might see in a textbook is that a **coulomb** is the amount of charge on a certain number of electrons, and that an **ampere** is a current of that many electrons flowing per second.

The formal definition, until two years ago, was that a **coulomb** was defined the amount of charge carried by a current of one **ampere** per second [1], and an ampere was defined as

that constant current which, if maintained in two straight parallel conductors of infinite length, of negligible circular cross-section, and placed one metre apart in vacuum, would produce between these conductors a force equal to 2×10

^{−7}newtons per metre of length.

There were several things about the definitions of SI units that were less than satisfying. For example, the infinitely long conductors in the definition of ampere are in short supply.

The definitions of fundamental units have changed over time as measurement technology changes. For example, the kilogram was defined as the mass of a particular physical object, the Prototypical International Kilogram. Obviously this is awkward, but it wasn’t technically feasible to do anything better until recently.

The SI base units were redefined effective May 20, 2019.

The **elementary charge**, the charge on a single electron, is

*e* = 1.602176634×10^{−19} coulomb.

This equation used to be an empirical statement, the measured value of the elementary charge in terms of the coulomb. Now the equation is taken to be **exact by definition**, defining the coulomb.

Now that we know what a coulomb is, let’s go back to Coulomb’s constant. We said that *k*_{e} must have units N m² / C². We’ve said what coulombs are, but what about newtons and meters? The newton is defined in terms of the kilogram, meter, and second, and the definitions of all these units changed as well.

The speed of light is now

*c* = 299792458 m⋅s^{−1}

**by definition**. The second is defined so that the transition frequency of a caesium-133 atom is 9,192,631,770 cycles per second, and the meter is defined in terms of the speed of light and the second.

The Planck constant is now exactly

*h* = 6.62607015×10^{−34} kg m² / s

by definition, which defines the kilogram in terms of the meter, the second, and *h*. Now someone on a distant planet without access to the standard kilogram can determine how much a kilogram is by measuring the speed of light, the frequency of a caesium-133 atom, and the Plank constant.

Coulomb’s constant is equal to

where ɛ_{0} is **vacuum permittivity**.

Now

where *c* is the speed of light and μ_{0} is vacuum permeability.

It used to be that

μ_{0} = 4π × 10^{−7} N/A^{2}

by definition, but now that the speed of light is specified as exact by definition, μ_{0} is a measured quantity. Still, the measured value is very close to the former definition, accurate to nine significant figures. Now the value of c is exact by definition, and so the product of ɛ_{0} and μ_{0} is exact by definition, but ɛ_{0} and μ_{0} individually empirically determined.

[1] The abbreviation for coulomb is *C* and the abbreviation for ampere is *A* because units named after people, such as Coulomb and Ampère, are capitalized. But why aren’t the full unit names “coulomb” and “ampere” not capitalized? Because full names of SI units are *not* capitalized. Except for Celsius. *C’est comme ça parce que c’est comme ça*.

Whether 1% relative error is good enough completely depends on context.

The familiar approximations for π and *e* are good to within 1%: π ≈ 22/7 and *e* ≈ 19/7. (OK, the approximation for *e* isn’t so familiar, but it should be.)

Also, the speed of light is *c* ≈ 300,000 km/s and the fine structure constant is α ≈ 1/137. See also Koide’s coincidence.

The following hold for angles in radians.

- sin
*x*≈*x*for |*x*| < 0.244. - cos
*x*≈ 1 –*x*²/2 for |*x*| < 0.662. - tan
*x*≈*x*for |*x*| < 0.173.

Here again angles are in radians.

- arcsin
*x*≈*x*for |*x*| < 0.242. - arccos
*x*≈ π/2 –*x*for |*x*| < 0.4. - arctan
*x*≈*x*for |*x*| < 0.173.

Natural log has the following useful approximation:

- log(1 +
*x*) ≈*x*for -0.0199 <*x*< 0.0200.

Sterling’s approximation leads to the following.

- Γ(
*x*) ≈ √(2π/*x*) (*x*/*e*)^{x}for*x*> 8.2876. *n*! ≈ √(2π/(*n*+1)) ((*n*+1)/*e*)^{(n+1)}for*n*≥ 8.

Stirling’s approximation is different from the other approximations because it is an asymptotic approximation: it improves as its argument gets larger.

The rest of the approximations are valid over finite intervals. These intervals are symmetric when the function being approximated is symmetric, that is, even or odd. So, for example, it holds for sine but not for log.

For sine and tangent, and their inverses, the absolute error is *O*(*x*^{3}) and the value is *O*(*x*), so the relative error is *O*(*x*^{2}). [1]

The widest interval is for cosine. That’s because the absolute error and relative error are *O*(*x*^{4}). [2]

The narrowest interval for is log(1 + *x*) due to lack of symmetry. The absolute error is *O*(*x*^{2}), the value is *O*(*x*), and so the relative error is only *O*(*x*).

Here’s Python code to validate the claims above, assuming the maximum relative error always occurs on the ends, which it does in these examples. We only need to test one side of symmetric approximations to symmetric functions because they have symmetric error.

from numpy import * from scipy.special import gamma def sterling_gamma(x): return sqrt(2*pi/x)*(x/e)**x id = lambda x: x for f, approx, x in [ (sin, id, 0.244), (tan, id, 0.173), (arcsin, id, 0.242), (arctan, id, 0.173), (cos, lambda x: 1 - 0.5*x*x, 0.662), (arccos, lambda x: 0.5*pi - x, 0.4), (log1p, id, 0.02), (log1p, id, -0.0199), (gamma, sterling_gamma, 8.2876) ]: assert( abs((f(x) - approx(x))/f(x)) < 0.01 )

- Sine approximation for small angles
- Simple approximations for logarithms
- Simple approximation for
*e*^{x} - Simple approximation for the gamma function
- Two useful asymptotic series

[1] Odd functions have only terms with odd exponents in their series expansion around 0. The error near 0 in a truncated series is roughly equal to the first series term not included. That’s why we get third order absolute error from a first order approximation.

[2] Even functions have only even terms in their series expansion, so our second order expansion for cosine has fourth order error. And because cos(0) = 1, the relative error is basically the absolute error near 0.

The post Within one percent first appeared on John D. Cook.]]>`grep`

has a recursive switch -R, but it may not work like you’d expect.
Suppose want to find the names of all .org files in your current directory and below that contain the text “cheese.”

You have four files, two in the working directory and two below, that all contain the same string: “I like cheese.”

$ ls -R .: rootfile.org rootfile.txt sub ./sub: subfile.org subfile.txt

It seems that `grep -R`

can either search all files of the form *.org in the current directory, ignoring the -R switch, or search all files recursively if you don’t give it a file glob, but it can’t do both.

$ grep -R -l cheese *.org rootfile.org $ grep -R -l cheese . ./rootfile.org ./rootfile.txt ./sub/subfile.org ./sub/subfile.txt

One way to solve this is with `find`

and `xargs`

:

$ find . -name '*.org' | xargs grep -l cheese ./rootfile.org ./sub/subfile.org

I was discussing this with Chris Toomey and he suggested an alternative using a subshell that seems more natural:

grep -l cheese $(find . -name '*.org')

Now the code reads more like an ordinary call to `grep`

. From left to right, it essentially says “Search for ‘cheese’ in files ending in .org” whereas the version with `find`

reads like “Find files whose names end in .org and search them for ‘cheese.'” It’s good to understand how both approaches work.

Oil wells are not simply vertical holes in the ground. The boreholes curve around underground, even if the intent is to drill a perfectly straight hole. With horizontal drilling the curvature can be substantial.

Dogleg severity is calculated by measuring **inclination** and **azimuth** every 100 feet along a well. Inclination is the angle the hole makes with the vertical axis, like the angle φ in spherical coordinates, 90° minus the latitude. Azimuth is the angle of the projection of the well to a horizontal plane, like the θ or longitude angle in spherical coordinates. So if a well is nearly vertical, the inclination angles will be small. If a hole were shaped like a corkscrew, the inclination would be constant while the azimuth makes multiple rotations.

When I said 100 feet along the well, that is 100 feet of arc length. If a hole were perfectly vertical, this would be a change of 100 vertical feet, but generally it is less. If a segment were perfectly horizontal, it would be a change of 100 horizontal feet with no change in vertical depth.

Dogleg severity models a section of an oil well as a piece of a big circle. We assume our two inclination and azimuth measurements taken at two points along this circle, separated by an arc of 100 feet. If this arc makes an angle ψ, then the length of the arc is

ρ ψ = 100 ft

where ψ is the radius of the circle. Engineers call the angle ψ the **dog leg angle**, the angle of the sector between two measurements. Mathematicians call 1/ρ the curvature, so curvature is proportional to DLS.

To calculate curvature or DLS you have to calculate ψ from the two inclination and azimuth readings, (θ_{1}, φ_{1}) and (θ_{2}, φ_{2}). This is exactly the same as calculating distance from longitude and latitude. Instead of longitude and latitude on the earth, imagine a large sphere tangent to the wellbore at the two points where the measurements were taken. To put it another way, we imagine this section of the well as being a piece of a great circle on this sphere.

The only difference is that instead of the radius ρ being fixed and solving for distance, as in the longitude and latitude problem, our distance is fixed at 100 ft and we want to calculate ρ, or equivalently calculate ψ. From these notes we have

ψ = cos^{-1}(cos φ_{1} cos φ_{2} + sin φ_{1} sin φ_{2} cos(θ_{1}-θ_{2})).

So, for example, suppose we had inclination 4° and azimuth 30° at one point, and inclination 7° and azimuth 40° at the next measurement, 100 ft of arc length away. Then

ψ = cos^{-1}(cos 4° cos 7° + sin 4° cos 7° (cos40° – 30°)) = 3.138°

This says ρ = 1826 feet, i.e. our section of the well is curved like a circle of radius 1826 feet.

We could compute this in Python as follows.

from numpy import sin, cos, arccos, deg2rad, rad2deg def dls(inc1, az1, inc2, az2): ph1 = deg2rad(inc1) ph2 = deg2rad(inc2) th1 = deg2rad(az1) th2 = deg2rad(az2) return rad2deg( arccos( cos(ph1)*cos(ph2) + sin(ph1)*sin(ph2)*cos(th1-th2)) )

Here we assume input and output are in degrees, but internally we do calculations in radians.

If we call `dls(4, 30, 7, 40)`

we get back 3.138°.

where *ad* – *bc* ≠ 0.

A Möbius transformation is uniquely determined by its values at three points. Last year I wrote a post that mentioned how to determine the coefficients of a Möbius transformation. There I said

The unique bilinear transform sending

z_{1},z_{2}, andz_{3}tow_{1},w_{2}, andw_{3}is given byPlug in your constants and solve for

w=f(z).

This is correct, but it still leaves a bit of work to do. In a particular case it’s not hard to find the coefficients, but it would be harder to find the coefficients in general.

There is an explicit formula for each of the parameters *a*, *b*, *c*, and *d* given the specified points *z*_{1}, *z*_{2}, *z*_{3} and their images *w*_{1}, *w*_{2}, *w*_{3} which we will present shortly. We could easily code-up these formulas except for one complication: we may want one of our inputs our outputs to be ∞. This is not merely an edge case: in applications, say to signal processing, you often want to specify the location of poles.

If none of the inputs or outputs are infinite, the coefficients are given by

This is one solution; multiplying all four coefficients by a non-zero constant gives another solution. But modulo a constant the solution is unique. To put it another way, the Möbius transformation is unique but it’s representation is only unique up to a constant multiplying the numerator and denominator.

You might hope for a second that this formula would work if you just let floating point infinities take care of themselves. But if any of the *z*‘s or *w*‘s is infinite, every determinant above is infinite.

The possible cases of infinities are:

- No infinities
- Only one
*z*is infinite - Only one
*w*is infinite - A
*z*and a*w*with the same subscript are infinite - A
*z*and a*w*with different subscripts are infinite

The order of our *z*‘s and *w*‘s is arbitrary, so we can rearrange them if necessary for our convenience. So without loss of generality we may assume

- No infinities
- Only
*z*_{1}is infinite - Only
*w*_{1}is infinite *z*_{1}=*w*_{1}= ∞*z*_{1}=*w*_{2}= ∞

To handle the case (2) of *z*_{1}= ∞, divide each of the equations above by *z*_{1 }and take the limit as *z*_{1} approaches ∞. Since dividing one row of a matrix by a constant divides its determinant by the same amount, in each case we divide the first row by *z*_{1}. This works out nicely because *z*_{1} only ever appears in the first row of each matrix. We can handle the case (3) of *2*_{1}= ∞ analogously.

To handle the case (4) of *z*_{1} = *w*_{1} = ∞ we divide the first rows by *z*_{1} *w*_{1} and take the limit. To handle the case (5) of *z*_{1} = *w*_{2} = ∞ we divide the first rows by *z*_{1} and the second rows by *w*_{1} and take a limit.

If you find this limiting business dubious, it doesn’t matter: if the result is correct, it’s correct. And since Möbius transformations are determined by their values at three points, you can verify that each case is correct by sticking in *z*_{1}, *z*_{2}, *z*_{3} and checking that you get *w*_{1}, *w*_{2}, *w*_{3} out. You could do this manually, which I have, or trust the output of the code at the bottom of the post.

So here are our solutions.

Given above.

*a* = *w*_{1} (*w*_{2} – *w*_{3})

*b* = *w*_{1} (*z*_{2} *w*_{3} – *z*_{3} *w*_{2}) + *w*_{2} *w*_{3} (*z*_{3} – *z*_{2})

*c* = *w*_{2}* – w*

*a* = *z*_{1} (*w _{2} – w*

NB: You could derive these from the case *z*_{1} = ∞ by inverting the transform. This means swapping *z*‘s with *w*‘s, swapping *a* and *d*, and negating *b* and *c*.

*a* = *w*_{2} – *w*_{3}

*b* = *z*_{2} *w*_{3} – *z*_{3} *w*_{2}

*c* = 0

*d* = *z*_{2} – *z*_{3}

*a* = *w*_{1}

*b* = –*z*_{2} *w*_{3} + *z*_{3} (*w*_{3} – *w*_{1})

*c* = 1

*d* = –*z*_{2}

import numpy as np def all_finite(z1, z2, z3, w1, w2, w3): a = np.linalg.det( [[z1*w1, w1, 1], [z2*w2, w2, 1], [z3*w3, w3, 1]]) b = np.linalg.det( [[z1*w1, z1, w1], [z2*w2, z2, w2], [z3*w3, z3, w3]]) c = np.linalg.det( [[z1, w1, 1], [z2, w2, 1], [z3, w3, 1]]) d = np.linalg.det( [[z1*w1, z1, 1], [z2*w2, z2, 1], [z3*w3, z3, 1]]) return (a, b, c, d) def z1_infinite(z1, z2, z3, w1, w2, w3): assert(np.isinf(z1)) a = w1*(w2 - w3) b = w1*(z2*w3 - z3*w2) + w2*w3*(z3 - z2) c = w2 - w3 d = w1*(z2 - z3) - z2*w2 + z3*w3 return (a, b, c, d) def w1_infinite(z1, z2, z3, w1, w2, w3): assert(np.isinf(w1)) a = z1*(w2 - w3) - z2*w2 + z3*w3 b = z1*(z2*w3 - z3*w2) + z2*z3*(w2 - w3) c = z3 - z2 d = z1*(z2 - z3) return (a, b, c, d) def z1w1_infinite(z1, z2, z3, w1, w2, w3): assert(np.isinf(z1) and np.isinf(w1)) a = w2 - w3 b = z2*w3 - z3*w2 c = 0 d = z2 - z3 return (a, b, c, d) def z1w2_infinite(z1, z2, z3, w1, w2, w3): assert(np.isinf(z1) and np.isinf(w2)) a = w1 b = -z2*w3 + z3*(w3 - w1) c = 1 d = -z2 return (a, b, c, d) def mobius_coeff(z1, z2, z3, w1, w2, w3): infz = np.isinf(z1) or np.isinf(z2) or np.isinf(z3) infw = np.isinf(w1) or np.isinf(w2) or np.isinf(w3) if infz: if np.isinf(z2): z1, z2 = z2, z1 w1, w2 = w2, w1 if np.isinf(z3): z1, z3 = z3, z1 w1, w3 = w3, w1 if infw: if np.isinf(w1): return z1w1_infinite(z1, z2, z3, w1, w2, w3) if np.isinf(w3): z2, z3 = z3, z2 w2, w3 = w3, w2 return z1w2_infinite(z1, z2, z3, w1, w2, w3) else: return z1_infinite(z1, z2, z3, w1, w2, w3) if infw: # and all z finite if np.isinf(w2): z1, z2 = z2, z1 w1, w2 = w2, w1 if np.isinf(w3): z1, z3 = z3, z1 w1, w3 = w3, w1 return w1_infinite(z1, z2, z3, w1, w2, w3) return all_finite(z1, z2, z3, w1, w2, w3) def mobius(x, a, b, c, d): if np.isinf(x): if c == 0: return np.inf return a/c if c*x + d == 0: return np.inf else: return (a*x + b)/(c*x + d) def test_mobius(z1, z2, z3, w1, w2, w3): tolerance = 1e-6 a, b, c, d = mobius_coeff(z1, z2, z3, w1, w2, w3) for (x, y) in [(z1, w1), (z2, w2), (z3, w3)]: m = mobius(x, a, b, c, d) assert(np.isinf(m) and np.isinf(y) or abs(m - y) <= tolerance) test_mobius(1, 2, 3, 6, 4, 2) test_mobius(1, 2j, 3+7j, 6j, -4, 2) test_mobius(np.inf, 2, 3, 8j, -2, 0) test_mobius(0, np.inf, 2, 3, 8j, -2) test_mobius(0, -1, np.inf, 2, 8j, -2) test_mobius(1, 2, 3, np.inf, 44j, 0) test_mobius(1, 2, 3, 1, np.inf, 40j) test_mobius(-1, 0, 3j, 1, -1j, np.inf) test_mobius(np.inf, -1j, 5, np.inf, 2, 8) test_mobius(1, np.inf, -1j, 5, np.inf, 2) test_mobius(12, 0, np.inf, -1j, 5, np.inf) test_mobius(np.inf, -1j, 5, 0, np.inf, -1) test_mobius(6, np.inf, -1j, 0, 8, np.inf) test_mobius(6, 3j, np.inf, -1j, np.inf, 1)

I found these fractions using Mathematica’s `Convergents`

function.

For any irrational number, the “convergents” of its continued fraction representation give a sequence of rational approximations, each the most accurate possible given the size of its denominator. The convergents of a continued fraction are like the partial sums of a series, the intermediate steps you get in evaluating the limit. More on that here.

Notice there are some repeated entries in the approximations for π. For example, the best approximation for π after the familiar 22/7 is 333/106 = 3.141509…. The fraction with the smallest denominator that gives you at least 3 decimal places actually gives you 4 decimal place. Buy one get one free.

There’s only one repeated row in the *e* column and none in the φ column. So it may seem there are no interesting patterns in the approximations to φ. But there are. It’s just that our presentation conceals them.

For one thing, all the numerators and denominators in the φ column are Fibonacci numbers. In fact, each fraction contains *consecutive* Fibonacci numbers: each numerator is the successor of the denominator in the Fibonacci series. There are no repeated rows because these ratios converge slowly to φ.

We don’t see the pattern in the convergents for φ clearly in the table because we pick out the ones that meet our accuracy goals. If we showed all the convergents we’d see that the *n*th convergent is the ratio of the (*n*+1)st and *n*th Fibonacci numbers.

In a sense made precise here, φ is the hardest irrational number to approximate with rational numbers. The bottom row of the table above gives the 8th convergent for π, the 15th convergent for *e*, and the 25th convergent for φ.

One reason I find simple approximations interesting is that they suggest further exploration. For example, the approximation for the perimeter of an ellipse here is interesting because of the connection to *r*-means. There are more accurate approximations, but not more interesting approximations, in my opinion.

Another reason I find simple approximations interesting is that if they are simple enough to be memorable and easy to evaluate, they’re useful for quick mental calculations.

I’ve written several posts about simple approximations, and this may be the last one, at least for a while. I’ve covered trig functions, logs and exponents. The most commonly used function I haven’t discussed is the gamma function, which I cover now.

The most important function that’s not likely to be on a calculator is the gamma function Γ(*x*). This function comes up all the time in probability and statistics, as well as in other areas of math.

The gamma function satisfies

Γ(*x* + 1) = *x* Γ(*x*),

and so you can evaluate the function anywhere if you can evaluate it on an interval of length 1. For example, the next section gives an approximation on the interval [2, 3]. We could reduce calculating Γ(4.2) to a problem on that interval by

Γ(4.2) = 3.2 Γ(3.2) = 3.2 × 2.2 Γ(2.2)

It turns out that

Γ(*x*) ≈ 2/(4 – *x*)

for 2 ≤ *x* ≤ 3. The approximation is exact on the end points, and the relative error is less than 0.6% over the interval.

An even simpler approximation over the interval [2, 3] is

Γ(*x*) ≈ *x* – 1.

It has a maximum relative error of about 13%, so it’s far less accurate, but it’s trivial to compute in your head.

The way I found the rational approximation was to use Mathematica to find the best bilinear approximation using

MiniMaxApproximation[Gamma[x], {x, {2, 3}, 1, 1}]

and rounding the coefficients. In hindsight, I could have simply used blinear interpolation. I did use linear interpolation to derive the approximation *x* – 1.

exp(*x*) = *e*^{x}.

We will assume -0.5 ≤ *x* ≤ 0.5. You can bootstrap your way from there to other values of *x*. For example,

exp(1.3) = exp(1 + 0.3) = *e* exp(0.3)

and

exp(0.8) = exp(1 – 0.2) = *e* / exp(0.2).

I showed here that

log_{e}(*x*)≈ (2*x* – 2)/(*x* + 1)

for *x* between exp(-0.5) and exp(0.5).

Inverting both sides of the approximation shows

exp(*x*) ≈ (2 + *x*)/(2 – *x*)

for *x* between -0.5 and 0.5.

The maximum relative error in this approximation is less than 1.1% and occurs at *x* = 0.5. For *x* closer to the middle of the interval [-0.5, 0.5] the relative error is much smaller.

Here’s a plot of the relative error.

This was produced with the following Mathematica code.

re[x_] := (Exp[x] - (2 + x)/(2 - x))/Exp[x] Plot[re[x], {x, -0.5, 0.5}]

log_{10} *x* ≈ (*x* – 1)/(*x* + 1)

for 1/√10 ≤ *x* ≤ √10. You could convert this into an approximation for logs in any base by multiplying by the right scaling factor, but why does it work out so simply for base 10?

Define

*m*(*x*) = (*x* – 1)/(*x* + 1).

Notice that *m*(1) = 0 and *m*(1/*x*) = –*m*(*x*), two properties that *m* shares with logarithms. This is a clue that there’s a connection between *m* and logs.

Now suppose we want to approximate log_{b} by *k* *m*(*x*) over the interval 1/√ *b* ≤ *x* ≤ √*b*. How do we pick *k*? If we choose

*k* = 1/(2*m*(√*b*))

then our approximation error will be zero at both ends of our interval.

If *b* = 10, *k* = 0.9625, i.e. close to 1. That’s why the approximation rule is particularly simple when *b* = 10. And although it’s good enough to round 0.9625 to 1 for rough calculations, our approximation for logs base 10 would be a little better if we didn’t.

In the process of asking why base 10 was special, we came up with a general way of constructing logarithm approximations for any base.

Using the method above we find that

log_{e} ≈ 2.0413 (*x* – 1)/(*x* + 1)

over the interval 1/√*e* ≤ *x* ≤ √*e* and that

log_{2} ≈ 2.9142 (*x* – 1)/(*x* + 1)

over the interval 1/√2 ≤ *x* ≤ √2.

These rules are especially convenient if you round 2.0413 down to 2 and round 2.9142 up to 3. These changes are no bigger than rounding 0.9625 up to 1, which we did implicitly in the rule for logs base 10.

The log base 100 of a number is half the log base 10 of the number. So you could calculate the log of a number base 100 two ways: directly using *b* = 100 with the method above, or indirectly by setting *b* = 10 and dividing your result by 2. Do these give you the same answer? Nope! Our scaling is not linear in the (logarithm of) the base.

Well, then which is better? Taking the log base 10 first and dividing by 2 is better. In general, the smaller the base, the more accurate the result.

This seems strange at first, like something for nothing, but realize than when *b* is smaller, so is the interval we’re working over. You’ve done more work in reducing the range when you use a smaller base, and your reward is that you get a more accurate result. Since 2 is the smallest logarithm base that comes up with any regularity, the approximation for log base 2 is the most accurate of its kind that is likely to come up.

For every base *b*, we’ve shown that the approximation error is zero at 1/√*b*, 1, and √*b*. What we haven’t shown is that we have what’s known as “equal ripple” error. For example, here’s a plot of the error for our rule for approximating log base 2.

The error goes exactly as far negative on the left of 1 as it goes positive on the right of 1. The minimum is -0.00191483 and the maximum is 0.00191483. This follows from the property

*m*(1/*x*) = –*m*(*x*)

mentioned above. The location of the minimum (signed) error is the reciprocal location of the maximum error, and the value of that minimum is the negative of the maximum.

The post What makes the log10 trick work? first appeared on John D. Cook.]]>

The general equation for the surface of an ellipsoid is

If two of the denominators {*a*, *b*, *c*} are equal then there are formulas for the area in terms of elementary functions. But in general, the area requires special functions known as “incomplete elliptic integrals.” For more details, see this post.

Here I want to focus on Knud Thomsen’s approximation for the surface area and its connection to the previous post.

The previous post reported that the perimeter of an ellipse is 2π*r* where *r* is the *effective radius*. An ellipse doesn’t have a *radius* unless it’s a circle, but we can define something that acts like a radius by defining it to be the perimeter over 2π. It turns out that

makes a very good approximation for the effective radius where *x* = (*a*, *b*).

Here

using the notation of Hardy, Littlewood, and Pólya.

This suggests we define an effective radius for an ellipsoid and look for an approximation to the effective radius in terms of an *r*-mean. The area of a sphere is 4π*r*², so we define the effective radius of an ellipsoid as the square root of its surface area divided by 4π. So the effective radius of a sphere is its radius.

Thomsen found that

where *x* = (*ab*, *bc*, *ac*) and *p* = 1.6. More explicitly,

Note that it’s the *square* of the effective radius that is an *r*-mean, and that the argument to the mean is not simply (*a*, *b*, *c*). I would have naively tried looking for a value of *p* so that the *r*-mean of (*a*, *b*, *c*) gives a good approximation to the effective radius. In hindsight, it makes sense in terms of dimensional analysis that the inputs to the mean have units of area, and so the output has units of area.

The maximum relative error in Thomsen’s approximation is 1.178% with *p* = 1.6. You can tweak the value of *p* to reduce the worst-case error, but the value of 1.6 is optimal for approximately spherical ellipsoids.

For an example, let’s compute the surface area of Saturn, the planet with the largest equatorial bulge in our solar system. Saturn is an oblate spheroid with equatorial diameter about 11% greater than its polar diameter.

Assuming Saturn is a perfect ellipsoid, Thomsen’s approximation over-estimates its surface area by about 4 parts per million. By comparison, approximating Saturn as a sphere under-estimates its area by 2 parts per thousand. The code for these calculations, based on the code here, is given at the bottom of the post.

from numpy import pi, sin, cos, arccos from scipy.special import ellipkinc, ellipeinc # Saturn in km equatorial_radius = 60268 polar_radius = 54364 a = b = equatorial_radius c = polar_radius phi = arccos(c/a) m = 1 temp = ellipeinc(phi, m)*sin(phi)**2 + ellipkinc(phi, m)*cos(phi)**2 ellipsoid_area = 2*pi*(c**2 + a*b*temp/sin(phi)) def rmean(r, x): n = len(x) return (sum(t**r for t in x)/n)**(1/r) approx_ellipsoid_area = 4*pi*rmean(1.6, (a*b, b*c, a*c)) r = (a*b*c)**(1/3) sphere_area = 4*pi*r**2 def rel_error(exact, approx): return (exact-approx)/approx print( rel_error(ellipsoid_area, sphere_area) ) print( rel_error(ellipsoid_area, approx_ellipsoid_area) )The post Simple approximation for surface area of an ellipsoid first appeared on John D. Cook.]]>

So this post has two parts: exact calculation, and simple approximation.

The perimeter can be computed exactly in terms of an elliptic integral. The name’s not a coincidence: elliptic integrals are so named because they were motivated by trying to find the perimeter of an ellipse.

Let *a* be the length of the major semi-axis and *b* the length of the minor semi-major axis. So *a* > *b*, and if our ellipse is actually a circle, *a* = *b* = *r*.

If ε is the eccentricity of our ellipse,

ε² = 1 – *b*² / *a*²

and the perimeter is give by

*p* = 4*a* *E*(ε²)

where *E* is the “complete elliptic integral of the second kind.” This function has popped up several times on this blog, most recently in the post about why serpentine walls save bricks.

You could calculate the perimeter of an ellipse in Python as follows.

from scipy.special import ellipe def perimeter(a, b): assert(a > b > 0) return 4*a*ellipe(1 - (b/a)**2)

But what if you don’t have a scientific library like SciPy at hand? Is there a simple approximation to the perimeter?

In fact there is. In [1] the authors present the approximation

If we define the *effective radius* as *r* = *p*/2π, then the approximation above says

in the notation of Hardy, Littlewood, and Pólya, where *x* = (*a*, *b*).

We could code this up in Python as follows.

def rmean(r, x): n = len(x) return (sum(t**r for t in x)/n)**(1/r) def approx(a, b): return 2*pi*rmean(1.5, (a, b))

It would have been less code to write up `approx`

directly without defining `rmean`

, but we’ll use `rmean`

again in the next post.

For eccentricity less than 0.05, the error is less than 10^{-15}, i.e. the approximation is correct to all the precision in a double precision floating point number. Even for fairly large eccentricity, the approximation is remarkably good.

The eccentricity of Pluto’s orbit is approximately 0.25. So the approximation above could compute the perimeter of Pluto’s orbit to nine significant figures, if the orbit were exactly an ellipse. A bigger source of error would be the failure of the orbit to be perfectly elliptical.

If an ellipse has major axes several times longer than its minor axis, the approximation here is less accurate, but still perhaps good enough, depending on the use. There is an approximation due to Ramanujan that works well even for much more eccentric ellipses, though it’s a little more complicated.

To my mind, the most interesting thing here is the connection to *r*-norms. I’ll explore that more in the next post, applying *r*-norms to the surface area of an ellipsoid.

[1] Carl E. Linderholm and Arthur C. Segal. An Overlooked Series for the Elliptic Perimeter. Mathematics Magazine, June 1995, pp. 216-220

The post Simple approximation for perimeter of an ellipse first appeared on John D. Cook.]]>One of the secrets to the success of Google’s PageRank algorithm is that it ranks based on revealed preferences: If someone links to a site, they’re implicitly endorsing it.

I got to thinking about revealed preferences when it comes to reference books the other day when I used some packing tape to keep the cover of my copy of Abramowitz and Stegun from falling off [1].

Instead of asking “What are some of your favorite books,” it might be more informative to ask “**Which of your books show the most wear?**” [2] This confounds frequent use and poor binding, but that’s life: there are always confounding effects.

My most worn math books are A&S, Bak and Newman, and Dunford and Schwartz .Bak and Newman was my undergraduate complex analysis book; I think it may have had a poor binding. Dunford and Schwartz got a lot of wear in college when I was into functional analysis.

I used A&S a lot in when I was developing a numerical library for Bayesian statistics. I still open it up occasionally, though not as often as I used to.

My volumes of TAOCP are in good shape, but I think that’s because they are well bound. I’ve cracked open Volume 2 quite a bit, though I hardly ever look at the other volumes.

What are some of your most worn books?

- Deserted island books
- Banned math book
- Books you’d like to have read
- Assignment complete, 20 years later

[1] Yes, I know it’s available online, but I prefer the dead tree edition. And yes, I know there are more extensive references, but in my experience anything I need that isn’t in A&S is unlikely to be in any other reference book.

[2] Benford’s law was discovered via revealed preferences. Simon Newcomb noticed that the early pages of a book of logarithms were much dirtier than the later pages. (Yes, Newcomb discovered Benford’s law, consistent with Stigler’s law of eponymy.)

The post Books and revealed preferences first appeared on John D. Cook.]]>