Distance from a point to a line

Eric Lengyel’s new book Projective Geometric Algebra Illuminated arrived yesterday and I’m enjoying reading it. Imagine if someone started with ideas like dot products, cross products, and determinants that you might see in your first year of college, then thought deeply about those things for years. That’s kinda what the book is.

Early in the book is the example of finding the distance from a point q to a line of the form p + tv.

If you define u = q − p then a straight-forward derivation shows that the distance d from q to the line is given by

d = \sqrt{{\bold u}^2 - \frac{({\bold u}\cdot {\bold v})^2}{{\bold v}^2}}

But as the author explains, it is better to calculate d by

d = \sqrt{\frac{({\bold u}\times {\bold v})^2}{{\bold v}^2}}

Why is that? The two expressions are algebraically equal, but the latter is better suited for numerical calculation.

The cardinal rule of numerical calculation is to avoid subtracting nearly equal floating point numbers. If two numbers agree to b bits, you may lose up to b bits of significance when computing their difference.

If u and v are vectors with large magnitude, but q is close to the line, then the first equation subtracts two large, nearly equal numbers under the square root.

The second equation involves subtraction too, but it’s less obvious. This is a common theme in numerical computing. Imagine this dialog.

[Student produces first equation.]

Mentor: Avoid subtracting nearly equal numbers.

[Student produces section equation.]

Student: OK, did it.

Mentor: That’s much better, though it could still have problems.

Where is there a subtraction in the second equation? We started with a subtraction in defining u. More subtly, the definition of cross product involves subtractions. But these subtractions involve smaller numbers than the first formula, because the first formula subtracts squared values. Eric Lengyel points this out in his book.

None of this may matter in practice, until it does matter, which is a common pattern in numerical computing. You implement something like the first formula, something that can be derived directly. You implicitly have in mind vectors whose magnitude is comparable to d and this guides your choice of unit tests, which all pass.

Some time goes by and a colleague tells you your code is failing. Impossible! You checked your derivation by hand and in Mathematica. Your unit tests all pass. Must be your colleague’s fault. But it’s not. Your code would be correct in infinite precision, but in an actual computer it fails on inputs that violate your implicit assumptions.

This can be frustrating, but it can also be fun. Implementing equations from a freshman textbook accurately, efficiently, and robustly is not a freshman-level exercise.

Related posts

Accelerating Archimedes

One way to approximate π is to find the areas of polygons inscribed inside a circle and polygons circumscribed outside the circle. The approximation improves as the number of sides in the polygons increases. This idea goes back at least as far as Archimedes (287–212 BC). Maybe you’ve tried this. It’s a lot of work.

In 1667 James Gregory came up with a more efficient way to carry out Archimedes’ approach. Tom Edgar and David Richeson cite Gregory’s theorem in [1] as follows.

Let Ik and Ck denote the areas of regular k-gons inscribed in and circumscribed around a given circle. Then I2n is the geometric mean of In and Ck, and C2n is the harmonic mean of I2n and Cn; that is,

\begin{align*} I_{2n} &= \sqrt{I_nC_n} \\ C_{2n} &= \frac{2C_n I_{2n}}{C_n + I_{2n}} \end{align*}

We can start with n = 4. Obviously the square circumscribed around the unit circle has vertices at (±1, ±1) and area 4. A little thought finds that the inscribed square has vertices at (±√2/2, ±√2/2) and area 2.

The following Python script finds π to six decimal places.

n = 4
I, C = 2, 4 # inscribed area, circumscribed area
delta = 2

while delta > 1e-6:
    I = (I*C)**0.5
    C = 2*C*I / (C + I)
    delta = C - I
    n *= 2
    print(n, I, C, delta)

The script stops when n = 8192, meaning that the difference between the areas of an inscribed and circumscribed 8192-gon is less than 10−6. The final output of the script is.

8192 3.1415923 3.1415928 4.620295e-07

If we average the upper and lower bound we get one more decimal place of accuracy.

Gregory’s algorithm isn’t fast, though it’s much faster than carrying out Archimedes’ approach by computing each area from scratch.

Gregory gives an interesting recursion. It has an asymmetry that surprised me: you update I, then C. You don’t update them simultaneously as you do when you’re computing the arithmetic-geometric mean.

Related posts

[1] Tom Edgar and David Richeson. A Visual Proof of Gregory’s Theorem. Mathematics Magazine, December 2019, Vol. 92, No. 5 (December 2019), pp. 384–386

How much will a cable sag? A simple approximation

Suppose you have a cable of length 2s suspended from two poles of equal height a distance 2x apart. Assuming the cable hangs in the shape of a catenary, how much does it sag in the middle?

If the cable were pulled perfectly taut, we would have s = x and there would be no sag. But in general, s > x and the cable is somewhat lower in the middle than on the two ends. The sag is the difference between the height at the end points and the height in the middle.

The equation of a catenary is

y = a cosh(t/a)

and the length of the catenary between t = −x and t = x is

2s = 2a sinh(x/a).

You could solve this equation for a and the sag will be

g = a cosh(x/a) − a.

Solving for a given s is not an insurmountable problem, but there is a simple approximation for g that has an error of less than 1%. This approximation, given in [1], is

g² = (s − x)(sx/2).

To visualize the error, I set x = 1 and let a vary to produce the plot below.

The relative error is always less than 1/117. It peaks somewhere around x = 1/3 and decreases from there, the error getting smaller as a increases (i.e. the cable is pulled tighter).

Related posts

[1] J. S. Frame. An approximation for the dip of a catenary. Pi Mu Epsilon Journal, Vol. 1, No. 2 (April 1950), pp. 44–46

A surprising result about surprise index

Surprise index

Warren Weaver [1] introduced what he called the surprise index to quantify how surprising an event is. At first it might seem that the probability of an event is enough for this purpose: the lower the probability of an event, the more surprise when it occurs. But Weaver’s notion is more subtle than this.

Let X be a discrete random variable taking non-negative integer values such that

\text{Prob}(X = i) = p_i

Then the surprise index of the ith event is defined as

S_i = \frac{\sum_{j=0}^\infty p_j^2 }{p_i}

Note that if X takes on values 0, 1, 2, … N−1 all with equal probability 1/N, then Si = 1, independent of N. If N is very large, each outcome is rare but not surprising: because all events are equally rare, no specific event is surprising.

Now let X be the number of legs a human selected at random has. Then p2 ≈ 1, and so the numerator in the definition of Si is approximately 1 and S2 is approximately 1, but Si is large for any value of i ≠ 2.

The hard part of calculating the surprise index is computing the sum in the numerator. This is the same calculation that occurs in many contexts: Friedman’s index of coincidence, collision entropy in physics, Renyi entropy in information theory, etc.

Poisson surprise index

Weaver comments that he tried calculating his surprise index for Poisson and binomial random variables and had to admit defeat. As he colorfully says in a footnote:

I have spent a few hours trying to discover that someone else had summed these series and spent substantially more trying to do it myself; I can only report failure, and a conviction that it is a dreadfully sticky mess.

A few years later, however, R. M. Redheffer [2] was able to solve the Poisson case. His derivation is extremely terse. Redheffer starts with the generating function for the Poisson

e^{-\lambda} e^{\lambda x} = \sum p_mx^m

and then says

Let x = eiθ; then eiθ; multiply; integrate from 0 to 2π and simplify slightly to obtain

\sum p_m^2 = (e^{2\lambda}/\pi) \int_0^\pi e^{2\lambda \cos \theta}\, d\theta

The integral on the right is recognized as the zero-order Bessel function …

Redheffer then “recognizes” an expression involving a Bessel function. Redheffer acknowledges in a footnote at a colleague M. V. Cerrillo was responsible for recognizing the Bessel function.

It is surprising that the problem Weaver describes as a “dreadfully sticky mess” has a simple solution. It is also surprising that a Bessel function would pop up in this context. Bessel functions arise frequently in solving differential equations but not that often in probability and statistics.

Unpacking Redheffer’s derivation

When Redheffer says “Let x = eiθ; then eiθ; multiply; integrate from 0 to 2π” he means that we should evaluate both sides of the equation for the Poisson generating function equation at these two values of x, multiply the results, and average the both sides over the interval [0, 2π].

On the right hand side this means calculating

\frac{1}{2\pi} \int_0^{2\pi}\left(\sum_{m=0}^\infty p_m \exp(im\theta)\right) \left(\sum_{n=0}^\infty p_n \exp(-in\theta)\right)\, d\theta

This reduces to

\sum_{m=0}^\infty p_m^2

because

alt=

i.e. the integral evaluates to 1 when m = n but otherwise equals zero.

On the left hand side we have

\begin{align*} \frac{1}{2\pi} \int_0^{2\pi} \exp(-2\lambda) \exp(\lambda(e^{i\theta} + e^{-i\theta})) \, d\theta &= \frac{1}{2\pi} \int_0^{2\pi} \exp(-2\lambda) \exp(2 \cos \theta) \, d\theta \\ &= \frac{e^{-2\lambda}}{\pi} \int_0^{\pi} \exp(2 \cos \theta) \, d\theta \end{align*}

Cerrillo’s contribution was to recognize the integral as the Bessel function J0 evaluated at -2iλ or equivalently the modified Bessel function I0 evaluated at -2λ. This follows directly from equations 9.1.18 and 9.6.16 in Abramowitz and Stegun.

Putting it all together we have

\frac{\pi}{e^{2\lambda}}\sum_{m=0}^\infty p_m^2 = \int_0^{\pi} \exp(2 \cos \theta) \, d\theta = J_0(-2i\lambda ) = I_0(-2\lambda )

Using the asymptotic properties of I0 Redheffer notes that for large values of λ,

\sum_{m=0}^\infty p_m^2 \sim \frac{1}{2\sqrt{\pi\lambda}}

[1] Warren Weaver, “Probability, rarity, interest, and surprise,” The Scientific Monthly, Vol 67 (1948), p. 390.

[2] R. M. Redheffer. A Note on the Surprise Index. The Annals of Mathematical Statistics, Mar., 1951, Vol. 22, No. 1 pp. 128ndash;130.

Blow up in finite time

A few years ago I wrote a post about approximating the solution to a differential equation even though the solution did not exist. You can ask a numerical method for a solution at a point past where the solution blows up to infinity, and it will dutifully give you a finite solution. The result is meaningless, but will give a result anyway.

The more you can know about the solution to a differential equation before you attempt to solve it numerically the better. At a minimum, you’d like to know whether there even is a solution before you compute it. Unfortunately, a lot of theorems along these lines are local in nature: the theorem assures you that a solution exists in some interval, but doesn’t say how big that interval might be.

Here’s a nice theorem from [1] that tells you that a solution is going to blow up in finite time, and it even tells you what that time is.

The initial value problem

y′ = g(y)

with y(0) = y0 with g(y) > 0 blows up at T if and only if the integral

\int_{y_0}^\infty \frac{1}{g(t)} \, dt
converges to T.

Note that it is not necessary to first find a solution then see whether the solution blows up.

Note also that an upper (or lower) bound on the integral gives you an upper (or lower) bound on T. So the theorem is still useful if the integral is hard to evaluate.

This theorem applies only to autonomous differential equations, i.e. the right hand side of the equation depends only on the solution y and not on the solution’s argument t. The differential equation alluded to at the top of the post is not autonomous, and so the theorem above does not apply. There are non-autonomous extensions of the theorem presented here (see, for example, [2]) but I do not know of a theorem that would cover the differential equation presented here.

[1] Duff Campbell and Jared Williams. Exloring finite-time blow-up. Pi Mu Epsilon Journal, Spring 2003, Vol. 11, No. 8 (Spring 2003), pp. 423–428

[2] Jacob Hines. Exploring finite-time blow-up of separable differential equations. Pi Mu Epsilon Journal, Vol. 14, No. 9 (Fall 2018), pp. 565–572

Finite differences and Pascal’s triangle

The key to solving a lot of elementary what-number-comes-next puzzles is to take first or second differences. For example, if asked what the next item in the series

14, 29, 50, 77, 110, …

the answer (or at lest the answer the person posing the question is most likely looking for) is 149. You might discover this by first looking at the differences of the items:

15, 21, 27, 33, …

The differences all differ by 6, i.e. the second difference of the series is constant. From there you can infer that the next item in the original series will be 39 more than the previous, i.e. it will be 149.

We can apply the same technique for exploring series that are not artificial puzzles. For example, a one-page article by Harlan Brothers [1] asks what would happen if you looked at the products of elements in each row of Pascal’s triangle.

The products grow very quickly, which suggests we work on a log scale. Define

s(n) = \log \prod{k=0}^n \binom{n}{k}  = \sum_{k=0}^n \log \binom{n}{k}

Let’s use a little Python script to look at the first 10 elements in the series.

    from scipy.special import binom
    from numpy import vectorize, log

    def s(n):
        return sum([log(binom(n, k)) for k in range(n+1)])
    s = vectorize(s)

    n = range(1, 11)
    x = s(n)
    print(x)

This prints

0.
0.69314718
2.19722458
4.56434819
7.82404601
11.9953516
17.0915613
23.1224907
30.0956844
38.0171228

Following the strategy at the top of the post, let’s look at the first differences of the sequence with [2]

    y = x[1:] - x[:-1]
    print(y)

This prints

0.69314718
1.50407740
2.36712361
3.25969782
4.17130560
5.09620968
6.03092943
6.97319372
7.92143836

The first differences are increasing by about 0.9, i.e. the second differences are roughly constant. And if we look at the third differences, we find that they’re small and getting smaller the further out you go.

We can easily look further out in the sequence by changing range(1, 11) to range(1, 101). When we do, we find that the second difference are

…, 0.99488052, 0.9949324. 0.99498325

If we look even further out, looking at a thousand terms, the last of the second differences is

…, 0.99949883, 0.99949933, 0.99949983

We might speculate that the second differences are approaching 1 as n → ∞. And this is exactly what is proved in [1], though the author does not work on the log scale. The paper shows that the ratio of the ratio of consecutive lines converges to e. This is equivalent on a log scale to saying the second differences converge to 1.

[1] Harlan J. Brothers. Math Bite: Finding e in Pascal’s Triangle. Mathematics Magazine , Vol. 85, No. 1 (February 2012), p. 51

[2] In Python, array elements are numbered starting at 0, and x[1:] represents all but the first elements of x. The index −1 is a shorthand for the last element, so x{:-1] means all the elements of x up to (but not including) the last.

Bounding the perimeter of a triangle between circles

Suppose you have a triangle and you know the size of the largest circle that can fit inside (the incircle) and the size of the smallest circle that can fit outside (the circumcircle). How would you estimate the perimeter of the triangle?

In terms of the figure below, if you know the circumference of the red and blue circles, how could you estimate the perimeter of the black triangle?

Triangle with inscribed and circumscribed circles

A crude estimate is that the triangle perimeter must be greater than the incircle circumference and less than the circumcircle circumference. But we can do better.

Gerretsen’s inequalities

It is conventional in this kind of problem to work with the semiperimeter of the triangle, half the perimeter, rather than the perimeter. Let r be the radius of the incircle, R the radius of the circumcircle, and s the semiperimeter of the triangle. Then Gerretsen’s inequalities say

16Rr − 5r² ≤ s² ≤ 4R² + 4Rr + 3r²

In the figure above,

r = 1.058, R = 2.5074, s = 6.1427

and Gerretsen’s inequalities give

36.8532  ≤ s² = 37.7327 ≤ 39.1200.

In the case of an equilateral triangle, Gerretsen’s inequalities are in fact equations.

Note that Gerretsen’s inequalities say nothing about the centers of the circles. An incircle must be inside a circumcircle, but for a variety of triangles with the centers of the two circles in different relations you have the same bounds.

Kooi’s inequality

Kooi’s inequality [2] gives another upper bound for the perimeter of the triangle:

s² ≤ ½ R(4R + r)² / (2Rr)

which in the example above gives a tighter upper bound, 38.9515.

Kooi’s upper bound is uniformly better than the upper bound half of Gerretsen’s inequalities. But further refinements are possible [3].

Related posts

[1] J. C. H. Gerretsen, Ongelijkheden in de driehoek. Nieuw Tijdschr 41 (1953), 1–7.

[2] O. Kooi, Inequalities for the triangle, Simon Stevin, 32 (1958), 97–101.

[3] Martin Lukarevski and Dan Stefan Marinescu. A refinement of the Kooi’s inequality, Mittenpunkt and applications. Journal of Mathematical Applications. Volume 13, Number 3 (2019), 827–832

Determinant of an infinite matrix

What does the infinite determinant

D = \left| \begin{array}{lllll} 1 & a_1 & 0 & 0 & \cdots \\ b_1 & 1 & a_2 & 0 & \cdots \\ 0 & b_2 & 1 & a_3 & \\ 0 & 0 & b_3 & 1 & \ddots \\ \vdots & \vdots & & \ddots & \ddots \\ \end{array} \right|

mean and when does it converge?

The determinant D above is the limit of the determinants Dn defined by

D_n = \left| \begin{array}{lllll} 1 & a_1 & & & \\ b_1 & 1 & a_2 & & \\ & b_2 & 1 & \ddots & \\ & & \ddots & \ddots & a_{n-1} \\ & & & b_{n-1} & 1 \\ \end{array} \right|

If all the a‘s are 1 and all the b‘s are −1 then this post shows that Dn = Fn, the nth Fibonacci number. The Fibonacci numbers obviously don’t converge, so in this case the determinant of the infinite matrix does not converge.

In 1895, Helge von Koch said in a letter to Poincaré that the infinite determinant is absolutely convergent if and only if the sum

\sum_{i=1}^\infty a_ib_i

is absolutely convergent. A proof is given in [1].

The proof shows that the Dn are bounded by

\prod_{i=1}^n\left(1 + |a_ib_i| \right)

and so the infinite determinant converges if the corresponding infinite product converges. And a theorem on infinite products says

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right)

converges absolute if the sum in Koch’s theorem converges. In fact,

\prod_{i=1}^\infty\left(1 + |a_ib_i| \right) \leq \exp\left(\sum_{i=1}^\infty |a_ib_i| \right )

and so we have an upper bound on the infinite determinant.

Related post: Triadiagonal systems, determinants, and cubic splines

[1] A. A. Shaw. H. von Koch’s First Lemma and Its Generalization. The American Mathematical Monthly, April 1931, Vol. 38, No. 4, pp. 188–194

Area of quadrilateral as a determinant

I’ve written several posts about how determinants come up in geometry. These determinants often look similar, having columns related to coordinates and a column of ones. You can find several examples here along with an explanation for this pattern.

If you have three points z1, z2, and z3 in the complex plane, you can find the area of a triangle with these points as vertices

A(z_1, z_2, z_3) = \frac{i}{4} \, \left| \begin{matrix} z_1 & \bar{z}_1 & 1 \\ z_2 & \bar{z}_2 & 1 \\ z_3 & \bar{z}_3 & 1 \\ \end{matrix} \right|

You can read more about this here.

If you add the second column to the first, and subtract the first column from the second, you can get the equation for the area of a triangle in the real plane.

A = \frac{1}{2} \, \left| \begin{matrix} x_1 & y_1 & 1 \\ x_2 & y_2 & 1 \\ x_3 & y_3 & 1 \\ \end{matrix} \right|

Presumably the former equation was first derived from the latter, but the two are equivalent.

Now suppose you have a quadrilateral whose vertices are numbered in clockwise or counterclockwise order. Then the area is given by

A = \frac{1}{2} \, \left| \begin{matrix} x_1 & y_1 & 1 & 0\\ x_2 & y_2 & 1 & 1\\ x_3 & y_3 & 1 & 0\\ x_4 & y_4 & 1 & 1\\ \end{matrix} \right|

The proof is easy. If you expand the determinant by minors on the last column, you get the sum of two 3 × 3 determinants corresponding to the areas of the two triangles formed by splitting the quadrilateral along the diagonal connecting the first and third points.

Handy approximation for roots of fractions

This post will discuss a curious approximation with a curious history.

Approximation

Let x be a number near 1, written as a fraction

x = p / q.

Then define s and d as the sum and difference of the numerator and denominator.

s = p + q
d = pq

Since we are assuming x is near 1, s is larger relative to d.

We have the following approximation for the nth root of x.

nx ≈ (ns + d) / (ns − d).

This comes from a paper written in 1897 [1]. At the time there was great interest in approximations that are easy to carry out by hand, and this formula would have been very convenient.

The approximation assumes x is near 1. If not, you can multiply by a number of known square root to make x near 1. There will be an example below.

Examples

Positive d

Let’s find the cube root of x = 112/97. We have n = 3, p = 112, q = 97, s = 209, and d = 15. The approximation tells says

3x ≈ 642/612 = 107/102 = 1.049019…

while the exact value is 1.049096… .

Negative d

The value of d might be negative, as when x = 31/32. If we want to find the fifth root, n = 5, p = 31, q = 32, s = 63, and d = −1.

5x ≈ 312/314= 156/157 = 0.9936708…

while the exact value is 0.9936703… .

x not near 1

If x is not near 1, you can make it near 1. For example, suppose you wanted to compute the square root of 3. Since 17² = 289, 300/289 is near 1. You could find the square root of 300/289, then multiply the result by 17/10 to get an approximation to √3.

History

The author refers to this approximation as Mercator’s formula, presumable Gerardus Mercator (1512–1594) [2] of map projection fame. A brief search did not find this formula because Mercator’s projection drowns out Mercator’s formula in search results.

The author says a proof is given in Hutton’s Tracts on Mathematics, Vol 1. I tracked down this reference, and the full title in all its 19th century charm is

TRACTS
ON
MATHEMATICAL
AND
PHILOSOPHICAL SUBJECTS,
COMPRISING,
AMONG NUMEROUS IMPORTANT ARTICLES,
THE THEORY OF BRIDGES,
WITH SEVERAL PLANS OF IMPROVEMENT,
ALSO,
THE RESULTS OF NUMEROUS EXPERIMENTS ON
THE FORCE OF GUNPOWER,
WITH APPLICATIONS TO
THE MODERN PRACTICE OF ARTILLERY.
IN THREE VOLUMES
BY CHARLES HUTTON, LL.D. AND F.R.S. &c.
Late Professor of Mathematics in the Royal Military Academy, Woolwich.

Hutton’s book looks interesting. You can find it on Archive.org. Besides bridges and gunpowder, the book has a lot to say about what we’d now call numerical analysis, such as ways to accelerate the convergence of series. Hutton’s version of the formula above does not require that x be near 1.

Related posts

[1] Ansel N. Kellogg. Empirical formulæ; for Approximate Computation. The American Mathematical Monthly. February 1897, Vol. 4 No. 2, pp. 39–49.

[2] Mercator’s projection is so familiar that we may not appreciate what a clever man he was. We can derive his projection now using calculus and logarithms, but Mercator developed it before Napier developed logarithms or Newton developed calculus. More on that here.