Branch cuts for elementary functions

As far as I know, all contemporary math libraries use the same branch cuts when extending elementary functions to the complex plane. It seems that the current conventions date back to Kahan’s paper [1]. I imagine to some extent he codified existing practice, but he also settled some issues, particularly regarding floating point implementation.

I’ve verified that the following branch cuts are used by Mathematica, Common Lisp, and SciPy. If you know of any software that follows other conventions, please let me know in a comment.

The conventional branch cuts are as follows.

  • sqrt: [-∞, 0)
  • log: [-∞, 0]
  • arcsin: [-∞, -1] and [1, ∞]
  • arccos: [-∞, -1] and [1, ∞]
  • arctan: [-∞i, –i] and [i, ∞i]
  • arcsinh: [-∞i, –i] and [i, ∞i]
  • arccosh: [-∞, 1]
  • arctanh: [-∞, -1] and [1, ∞]

Related posts

[1] W. Kahan. Branch Cuts for Complex Elementary Functions or Much Ado About Nothing’s Sign Bit. The State of the Art in Numerical Analysis. Clarendon Preess (1987).

Literate programming to reduce errors

I had some errors in a recent blog post that might have been eliminated if I had programmatically generated the content of the post rather than writing it by hand.

I rewrote the example in this post in using org-mode. My org file looked like this:

    #+begin_src python :session :exports none
    lax_lat =   33.94
    lax_lon = -118.41
    iah_lat =   29.98
    iah_lon =  -95.34
    #+end_src

    Suppose we want to find how far a plane would travel 
    between Los Angeles (LAX) and Houston (IAH), 
    ...

    #+begin_src python :session :exports none
    a = round(90 - lax_lat, 2)
    b = round(90 - iah_lat, 2)
    #+end_src

    /a/ = src_python[:session :results raw]{f"90° – {lax_lat}° = {a}°"}

    and

    /b/ = src_python[:session :results raw]{f"90° – {iah_lat}° = {b}°"}

    ...

Here are some observations about the experience.

First of all, writing the post in org-mode was more work than writing it directly, pasting in computed values by hand, but presumably less error-prone. It would also be easier to update. If, for example, I realized that I had the wrong coordinates for one of the airports, I could update the coordinates in one location and everything else would be updated when I regenerated the page.

I don’t think this was the best application of org-mode. It’s easier to use org-mode like a notebook, in which case you’re not trying to hide the fact that you’re mixing code and prose. I wanted to insert computed values into the text without calling attention to the fact that the values were computed. This is fine when you mostly have a text document and you only want to insert a few computed values. When you’re doing more computing it becomes tedious to repeatedly write

    src_python[:session :results raw]{...}

to insert values. It might have been easier in this case to simply write a Python program that printed out the HTML source of the example.

There are a couple advantages to org-mode that weren’t relevant here. One is that the same org-mode file can be exported to multiple formats: HTML, LaTeX, ODT, etc. Here, however, I was only interested in exporting to HTML.

Another advantage of org-mode is the ability to mix multiple programming languages. Here I was using Python for everything, but org-mode will let you mix dozens of languages. You could compute one result in R, another result in Haskell, pass both results as arguments into some Java code, etc. You could also include data tables and diagrams in your org-mode file with your prose and code.

Literate programming

In general, keeping code and documentation together reduces errors. Literate programming may be more or less work, depending on the problem, but it reduces certain kinds of errors.

The example above is sorta bargain-basement literate programming. The code being developed was very simple, and not of interest beyond the article it was used in. Literate programming really shines when used to develop complex code, as in the book Physically Based Rendering. (Update: The third edition of this book is available online.)

When code requires a lot of explanation, literate programming can be very helpful. I did a project in psychoacoustics with literate programming a few years ago that would have been hard to do any other way. The project required a lot of reverse engineering and research. A line of code could easily require a couple paragraphs of explanation. Literate programming made the code development much easier because we could develop the documentation and code together, explaining things in the order most suited to the human reader, not to the compiler.

Law of cosines on a sphere

The previous post looked at the analog of the Pythagorean theorem on a sphere. This post looks at the law of cosines on a sphere.

Yesterday we looked at triangles on a sphere with sides a and b meeting at a right angle and hypotenuse c. Denote the angle opposite a side with the capital version of the same letter, so C is the angle opposite c. We assumed C is a right angle, but now we will remove that assumption.

The spherical analog of the law of cosines says

cos(c) = cos(a) cos(b) + sin(a) sin(b) cos(C).

Note that we have two kinds of angles: the arcs that make up the sides, and the angle formed by the intersection of arcs [1]. So the cos(C) term at the end is different animal from the other terms in the equation.

If C is a right angle, cos(C) = 0 and the second term drops out, leaving us with the spherical counterpart of the Pythagorean theorem. But we do not require that C be a right angle.

Application to air distance

Suppose we want to find how far a plane would travel between Los Angeles (LAX) and Houston (IAH), assuming it takes a great circle path. The lat/long coordinates of the two airports are (33.94°, -118.41°) and (29.98°, -95.34°).

Los Angeles and Houston are approximately at the same latitude, but even if they were at exactly the same latitude we couldn’t just find the flight distance by finding the length of the arc of constant latitude between them because that arc would not be part of a great circle.

To find the distance between LAX and IAH we form a triangle with vertices at LAX, IAH, and the north pole. Call the arc from LAX to the north pole a and the arc from IAH to the north pole b. Since latitude is the angle up from the equator, the angle down from the pole is 90° minus latitude. So

a = 90° – 33.94° = 56.06°

and

b = 90° – 29.98° = 60.02°

The arcs a and b meet at the angle C equal to the differences in the two longitudes. That is,

C = 118.41° – 95.34° = 23.07°.

The law of cosines

cos(c) = cos(a) cos(b) + sin(a) sin(b) cos(C)

reduces to

cos(c) = cos(56.06°) cos(60.02°) + sin(56.06°) sin(60.02°) cos(23.07°)

in our problem.

This tells us

cos(c) = 0.9402

from which we find c = 0.3477 radians or 19.92°. Assume the earth is a perfect sphere of radius r = 3959 miles. The arc length we’re after is r times c in radians, or 1376 miles.

Related posts

[1] In the language of differential geometry, the arcs are geodesics on the sphere and the angles of intersection are measured in the tangent space at the intersection point.

Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

FX(x) = Prob(Xx)

and

GX(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

Related posts

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

Floating point inverses and stability

Let f be a monotone, strictly convex function on a real interval I and let g be its inverse. For example, we could have f(x) = ex and g(x) = log x.

Now suppose we round our results to N digits. That is, instead of working with f and g we actually work with fN and gN where

fN(x) = round(f(x), N)

and

gN(x) = round(g(x), N)

and round(y, N) is the number y rounded to N significant figures [1].

This is what happens when we implement our functions f and g in floating point arithmetic. We don’t actually get the values of f and g but the values of fN and gN.

We assumed that f and g are inverses, but in general fN and gN will not be exact inverses. And yet in some sense the functions fN and gN are like inverses. Harold Diamond [2] proved that if go back and forth applying fN and gN two times, after two round trips the values quit changing.

To make this more precise, define

hN(x) = gN( fN(x)).

In general, hN(x) does not equal x, but we do have the following:

hN( hN( hN(x) ) )  = hNhN(x) ).

The value we get out of hN(x) might not equal x, but after we’ve applied hN twice, the value stays the same if we apply hN more times.

Connection to connections

Diamond’s stability theorem looks a lot like a theorem about Galois connections. My first reaction was that Diamond’s theorem was simply a special case of a more general theorem about Galois connections, but it cannot.

A pair of monotone functions F and G form a Galois connection if for all a in the domain of F and for all b in the domain of G,

F(a) ≤ baG(b).

Let F and G form a Galois connection and define

H(x) = G( F(x) ).

Then

H( H(x) ) = H(x).

This result is analogous to Diamond’s result, and stronger. It says we get stability after just one round trip rather than two.

The hitch is that although the functions f and g form a Galois connection, the functions fN and gN do not. Nevertheless, Diamond proved that fN and gN form some sort of weaker numerical analog of a Galois connection.

Example

The following example comes from [2]. Note that the example rounds to two significant figures, not two decimal places.

    from decimal import getcontext, Decimal
    
    # round to two significant figures
    getcontext().prec = 2
    def round(x): return float( Decimal(x) + 0 )
    
    def f(x):  return 115 - 35/(x - 97)
    def f2(x): return round(f(x))
    def g(x):  return 97 + 35/(115 - x)
    def g2(x): return round(g(x))
    def h2(x): return g2(f2(x))
    
    N = 110
    print(h2(N), h2(h2(N)), h2(h2(h2(N))))

This prints

   100.0 99.0 99.0

showing that It shows that the function h2 satisfies Diamond’s theorem, but it does not satisfy the identify above for Galois compositions. That is, we stabilize after two round trips but not after just one round trip.

Related posts

[1] Our “digits” here need not be base 10 digits. The stability theorem applies in any radix b provided bN ≥ 3.

[2] Harold G. Diamond. Stability of Rounded Off Inverses Under Iteration. Mathematics of Computation, Volume 32, Number 141. January 1978, pp. 227–232.

 

Not so fast

James Gregory’s series for π

\pi = 4\left(1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots \right)

is not so fast. It converges very slowly and so does not provide an efficient way to compute π. After summing half a million terms, we only get five correct decimal places. We can verify this with the following bc code.

    s = 0
    scale = 50
    for(k = 1; k <= 500000; k++) { 
        s += (-1)^(k-1)/(2*k-1) 
    }
    4*s

which returns

3.141590

which differs from π in the sixth decimal place. So does that mean there’s nothing interesting about Gregory’s formula? Not so fast!

When anyone speaks of a number of correct decimals, they nearly always mean number of consecutive correct digits following the decimal point. But for this post only I’ll take the term literally to mean the number of decimals that match the decimals in the correct answer.

The number of correct decimals (in this non-standard use of the term) in the series above is not so bad. Here’s the result, with the digits that differ from those of π underlined:

3.1415906535897932404626433832695028841972

So even though the sixth decimal value is wrong, the next 10 after that are correct, and then after a couple errors we get another string of correct digits.

In [1] the authors explain what makes this example tick, and show how to create similar sequences. For example, we see a similar pattern whenever the limit of the sum is half of a power of 10, but not so much for other limits. For example, let’s increase 500,000 to 600,000. We get

3.141590986923126572953384124016

which is completely wrong after the sixth digit. So even though the result is slightly more accurate, it has fewer correct decimals.

Related posts

[1] Jonathan Borwein, Peter Borwein, Karl Dilcher. Pi, Euler Numbers, and Asymptotic Expansions. American Mathematical Monthly. vol 96, p. 681–687.

Galois connections and Galois theory

What are Galois connections and what do they have to do with Galois theory?

Galois connections are much more general than Galois theory, though Galois theory provided the first and most famous example of what we now call a Galois connection.

Galois connections

Galois connections are much more approachable than Galois theory. A Galois connection is just a pair of functions between partially ordered sets that have some nice properties. Specifically, let (A, ≤) and (B, ≤) be partially ordered sets. Here “≤” denotes the partial order on each set, and so could be defined differently on A and B.

We could add subscripts to distinguish the two meanings of ≤ but this is unnecessary because the meaning is always clear from context: if we’re comparing two things from A, we’re using the ≤ operator on A, and similarly for B.

Note that “≤” could have something to do with “less than” but it need not; it represents a partial order that may or may not be helpful to think of as “less than” or something analogous to less than such as “contained in.”

Monotone and antitone functions

Before we can define a Galois connection, we need to define monotone and antitone functions.

A monotone function is an order preserving function. That is, a function f is monotone if

xyf(x) ≤ f(y).

Similarly, an antitone function is order reversing. That is, a function f is antitone if

xy ⇔ f(x) ≥ f(y).

Here ≥ is defined by

yxxy.

Monotone and antitone connections

Galois connections have been defined two different ways, and you may run into each in different contexts. Fortunately it’s trivial to convert between the two definitions.

The first definition says that a Galois connection between A and B is a pair of monotone functions F and G such that for all a in A and b in B,

F(a) ≤ baG(b).

The second definition says that a Galois connection between A and B is a pair of antitone functions F and G such that for all a in A and b in B,

F(a) ≤ ba ≥ G(b).

If you need to specify which definition you’re working with, you can call the former a monotone Galois connection and the latter an antitone Galois connection. We only need one of these definitions: if we reverse the definition of ≤ on B then a monotone connection becomes antitone and vice versa. [1]

How can we just reverse the meaning of ≤ willy-nilly? Recall that we said above that ≤ is just a notation for a partial order. There’s no need for it to mean “less than” in any sense, and the opposite of a partial order is another partial order.

We’ll use the antitone definition for the rest of the post because our examples are antitone. Importantly, the Fundamental Theorem of Galois Theory involves an antitone connection.

Examples

For our first example, let A be sets of points in the plane, and let B be sets of lines in the plane. For both sets let ≤ mean subset.

For a set of points X, define F(X) to be the set of lines that go through all the points of X.

Similarly, for a set of lines Y, define G(Y) to be the set of points on all the lines in Y.

Then the pair (F, G) form a Galois connection.

This example can be greatly generalized. Let R be any binary relation between A and B and let ≤ mean subset.

Define

F(X) = { y | x R y for all x in X }

G(Y) = { x | x R y for all y in Y }

Then the pair (F, G) form a Galois connection. The example above is a special case of this construction where x R y is true if and only if x is a point on y. Garrett Birkhoff made this observation in 1940 [2].

Galois theory

Galois theory is concerned with fields, extension fields, and automorphisms of fields that keep a subfield fixed.

I recently wrote a series of blog posts explaining what images on the covers of math books were about, and one of these posts was an explanation of the following diagram:

 

Each node in the graph is a field, and a line means the field on the higher level is an extension of the field on the lower level. For each graph like this of extension fields, there is a corresponding graph of Galois groups. Specifically, let L be the field at the top of the diagram and let E be any field in the graph.

The corresponding graph of groups replaces E with the group of group isomorphisms from L to L that leave the elements of E unchanged, the automorphisms of L that fix E. This map from fields to groups is half of a Galois connection pair. The other half is the map that takes each group to the field of elements of L fixed by G. This connection is antitone because if a field F is an extension of E, then the group of automorphisms that fix F are a subgroup of the automorphisms that fix E.

***

[1] We could think of A and B as categories, where there is a morphism between x and y iff xy. Then a monotone Galois connection between A and B is an antitone Galois connection between A and Bop.

[2] Garrett Birkhoff. Lattice Theory. American Mathematical Society Colloquium Publications, volume 25. New York, 1940.

Galois diagram

The previous post listed three posts I’d written before about images on the covers of math books. This post is about the image on the first edition of Dummit and Foote’s Abstract Algebra.

Here’s a version of the image on the cover I recreated using LaTeX.

The image on the cover appears on page 495 and represents extension fields. If you’re going through this book sequentially as a text book, it’s likely time will run out before you ever find out what the image on the cover means. If you do get to it, you get to it near the end of your course.

My diagram is topologically equivalent to the original. I took the liberty of moving things around a bit to keep the diagram from being awkwardly wide.

At the bottom of the diagram we have ℚ, the field of rational numbers. At the top of the diagram we have ℚ(i, 21/8), the smallest field containing the rational numbers, the imaginary unit i, and the eighth root of 2. Lines between fields on two levels indicate that the higher is an extension of the lower. The constant ζ in the diagram is

ζ = √i = √2(1 + i)/2.

The significance of the diagram is that extension field relationships like these are important in Galois theory. The image appears late in the book because the majority of the book is leading up to Galois theory.

Book cover posts

When a math book has an intriguing image on the cover, it’s fun to get to the point in the book where the meaning of the image is explained. I have some ideas for book covers I’d like to write about, but here I’d like to point out three such posts I’ve already written.

Weierstrass elliptic function

The book on the left is Abramowitz and Stegun, affectionately known as A&S. I believe the function plotted on the cover is the Weierstrass elliptic function, as I wrote about here.

As the image suggests, my copy of A&S has seen a bit of wear. At one point the cover fell off and I put it back on with packing tape.

Möbius transformation of circles

The book in the middle is my well-worn copy of my undergraduate complex analysis text. The cover is a bit dirty and pages are falling out, a sort of Velveteen rabbit of math books.

I haven’t written a post about the cover per se, but I did write about the necessary math to recreate the image on the cover here. That post explains how to compute the image of a circle under a Möbius transformation. The image on the left is mapped to the image on the right via the function

f(z) = 1/(z – α).

Here α is the point on the right where all the outer circles are tangent. If you wanted to reconstruct the image on the cover, it would be easier to proceed from right to left: start with the image on the right because it’s easier to describe, and apply the inverse transformation using the instructions in the blog post to produce the image on the left.

Hankel functions

I wrote about the book on the right here. I believe the image on the cover is the plot of a Hankel function.

Updates

Here’s a post explaining the image on the cover Abstract Algebra by Dummit and Foote.

And here is a post explaining the image on the cover of A Course of Modern Analysis by Whittaker and Watson.

 

Ducci sequences

Pick four integers a, b, c, and d. Now iterate the procedure that takes these four integers to

|ba|, |cb|, |dc|, |ad|

You could think of the four integers being arranged clockwise in a circle, taking the absolute value of difference between each number and its neighbor a quarter turn clockwise. The sequence of 4-tuples created this way is called a Ducci sequence.

We can play with Ducci sequences using the following Python code.

    def next(x):
        n = len(x)
        return [abs(x[i] - x[(i+1)%n]) for i in range(n)]

If we start with four numbers taken from today’s date, the sequence will terminate in zeros in five steps. The code

    x = [7, 10, 20, 22]
    for _ in range(5):
        x = next(x)
        print(x)

This produces

    [3, 10, 2, 15]
    [7, 8, 13, 12]
    [1, 5, 1, 5]
    [4, 4, 4, 4]
    [0, 0, 0, 0]

The sequence always terminates for any starting point, and usually it terminates quickly, as in the example below. However, you can find starting values so that the sequence takes arbitrarily long to terminate.

In fact, the worse case for termination comes from consecutive Tribonacci numbers [1]. These numbers are defined analogously to Fibonacci numbers, but starting with 0, 0, 1, and each subsequent number being the sum of the previous 3 numbers.

0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, …

If you start with Tn, Tn+1, Tn+2, Tn+3 then the corresponding Ducci sequence takes at least 3n/2 steps to converge to 0.

Note that the code above does not assume we use sequences of 4 numbers. If we use n-tuples where n is a power of 2, the Ducci sequence always terminates in zeros. More generally the sequence will terminate in a cycle.

For example, suppose we start with 7, 10, and 22. Then the sequence of iterations is

    [3, 12, 15]
    [9, 3, 12]
    [6, 9, 3]
    [3, 6, 3]
    [3, 3, 0]
    [0, 3, 3]
    [3, 0, 3]
    [3, 3, 0] 
    ...

***

[1] Achim Clausing, Ducci Matrices. American Mathematical Monthly, December 2018.