Naming probability functions

Given a random variable X, you often want to compute the probability that X will take on a value less than x or greater than x. Define the functions

FX(x) = Prob(Xx)

and

GX(x) = Prob(X > x)

What do you call F and G? I tend to call them the CDF (cumulative distribution function) and CCDF (complementary cumulative distribution function) but conventions vary.

The names of software functions to compute these two functions can be confusing. For example, Python (SciPy) uses the names cdf and sf (the latter for “survival function”) while the R functions to compute the CDF take an optional argument to return the CCDF instead [1].

In the Emacs calculator, the function ltpn computes the CDF. At first glace I thought this was horribly cryptic. It’s actually a very nice naming convention; it just wasn’t what I was expecting.

The “ltp” stands for lower tail probability and “n” stands for normal. The complementary probability function is utpn where “utp” stands for upper tail probability. Unlike other software libraries, Emacs gives symmetric names to these two symmetrically related functions.

“Lower tail” probability is clearer than “cumulative” probability because it leaves no doubt whether you’re accumulating from the left or the right.

You can replace the “n” at the end of ltpn and utpn with the first letters of binomial, chi-square, t, F, and Poisson to get the corresponding functions for these distributions. For example, utpt gives the upper tail probability for the Student t distribution [2].

The Emacs calculator can be quirky, but props to the developers for choosing good names for the probability functions.

Related posts

[1] Incidentally, the CCDF cannot always be computed by simply computing the CDF first and subtracting the value from 1. In theory this is possible, but not in floating point practice. See the discussion of erf and erfc in this post for an explanation.

[2] These names are very short and only a handful of distribution families are supported. But that’s fine in context. The reason to use the Emacs calculator is to do a quick calculation without having to leave Emacs, not to develop production quality statistical software.

Floating point inverses and stability

Let f be a monotone, strictly convex function on a real interval I and let g be its inverse. For example, we could have f(x) = ex and g(x) = log x.

Now suppose we round our results to N digits. That is, instead of working with f and g we actually work with fN and gN where

fN(x) = round(f(x), N)

and

gN(x) = round(g(x), N)

and round(y, N) is the number y rounded to N significant figures [1].

This is what happens when we implement our functions f and g in floating point arithmetic. We don’t actually get the values of f and g but the values of fN and gN.

We assumed that f and g are inverses, but in general fN and gN will not be exact inverses. And yet in some sense the functions fN and gN are like inverses. Harold Diamond [2] proved that if go back and forth applying fN and gN two times, after two round trips the values quit changing.

To make this more precise, define

hN(x) = gN( fN(x)).

In general, hN(x) does not equal x, but we do have the following:

hN( hN( hN(x) ) )  = hNhN(x) ).

The value we get out of hN(x) might not equal x, but after we’ve applied hN twice, the value stays the same if we apply hN more times.

Connection to connections

Diamond’s stability theorem looks a lot like a theorem about Galois connections. My first reaction was that Diamond’s theorem was simply a special case of a more general theorem about Galois connections, but it cannot.

A pair of monotone functions F and G form a Galois connection if for all a in the domain of F and for all b in the domain of G,

F(a) ≤ baG(b).

Let F and G form a Galois connection and define

H(x) = G( F(x) ).

Then

H( H(x) ) = H(x).

This result is analogous to Diamond’s result, and stronger. It says we get stability after just one round trip rather than two.

The hitch is that although the functions f and g form a Galois connection, the functions fN and gN do not. Nevertheless, Diamond proved that fN and gN form some sort of weaker numerical analog of a Galois connection.

Example

The following example comes from [2]. Note that the example rounds to two significant figures, not two decimal places.

    from decimal import getcontext, Decimal
    
    # round to two significant figures
    getcontext().prec = 2
    def round(x): return float( Decimal(x) + 0 )
    
    def f(x):  return 115 - 35/(x - 97)
    def f2(x): return round(f(x))
    def g(x):  return 97 + 35/(115 - x)
    def g2(x): return round(g(x))
    def h2(x): return g2(f2(x))
    
    N = 110
    print(h2(N), h2(h2(N)), h2(h2(h2(N))))

This prints

   100.0 99.0 99.0

showing that It shows that the function h2 satisfies Diamond’s theorem, but it does not satisfy the identify above for Galois compositions. That is, we stabilize after two round trips but not after just one round trip.

Related posts

[1] Our “digits” here need not be base 10 digits. The stability theorem applies in any radix b provided bN ≥ 3.

[2] Harold G. Diamond. Stability of Rounded Off Inverses Under Iteration. Mathematics of Computation, Volume 32, Number 141. January 1978, pp. 227–232.

 

Not so fast

James Gregory’s series for π

\pi = 4\left(1 - \frac{1}{3} + \frac{1}{5} - \frac{1}{7} + \cdots \right)

is not so fast. It converges very slowly and so does not provide an efficient way to compute π. After summing half a million terms, we only get five correct decimal places. We can verify this with the following bc code.

    s = 0
    scale = 50
    for(k = 1; k <= 500000; k++) { 
        s += (-1)^(k-1)/(2*k-1) 
    }
    4*s

which returns

3.141590

which differs from π in the sixth decimal place. So does that mean there’s nothing interesting about Gregory’s formula? Not so fast!

When anyone speaks of a number of correct decimals, they nearly always mean number of consecutive correct digits following the decimal point. But for this post only I’ll take the term literally to mean the number of decimals that match the decimals in the correct answer.

The number of correct decimals (in this non-standard use of the term) in the series above is not so bad. Here’s the result, with the digits that differ from those of π underlined:

3.1415906535897932404626433832695028841972

So even though the sixth decimal value is wrong, the next 10 after that are correct, and then after a couple errors we get another string of correct digits.

In [1] the authors explain what makes this example tick, and show how to create similar sequences. For example, we see a similar pattern whenever the limit of the sum is half of a power of 10, but not so much for other limits. For example, let’s increase 500,000 to 600,000. We get

3.141590986923126572953384124016

which is completely wrong after the sixth digit. So even though the result is slightly more accurate, it has fewer correct decimals.

Related posts

[1] Jonathan Borwein, Peter Borwein, Karl Dilcher. Pi, Euler Numbers, and Asymptotic Expansions. American Mathematical Monthly. vol 96, p. 681–687.

Galois connections and Galois theory

What are Galois connections and what do they have to do with Galois theory?

Galois connections are much more general than Galois theory, though Galois theory provided the first and most famous example of what we now call a Galois connection.

Galois connections

Galois connections are much more approachable than Galois theory. A Galois connection is just a pair of functions between partially ordered sets that have some nice properties. Specifically, let (A, ≤) and (B, ≤) be partially ordered sets. Here “≤” denotes the partial order on each set, and so could be defined differently on A and B.

We could add subscripts to distinguish the two meanings of ≤ but this is unnecessary because the meaning is always clear from context: if we’re comparing two things from A, we’re using the ≤ operator on A, and similarly for B.

Note that “≤” could have something to do with “less than” but it need not; it represents a partial order that may or may not be helpful to think of as “less than” or something analogous to less than such as “contained in.”

Monotone and antitone functions

Before we can define a Galois connection, we need to define monotone and antitone functions.

A monotone function is an order preserving function. That is, a function f is monotone if

xyf(x) ≤ f(y).

Similarly, an antitone function is order reversing. That is, a function f is antitone if

xy ⇔ f(x) ≥ f(y).

Here ≥ is defined by

yxxy.

Monotone and antitone connections

Galois connections have been defined two different ways, and you may run into each in different contexts. Fortunately it’s trivial to convert between the two definitions.

The first definition says that a Galois connection between A and B is a pair of monotone functions F and G such that for all a in A and b in B,

F(a) ≤ baG(b).

The second definition says that a Galois connection between A and B is a pair of antitone functions F and G such that for all a in A and b in B,

F(a) ≤ ba ≥ G(b).

If you need to specify which definition you’re working with, you can call the former a monotone Galois connection and the latter an antitone Galois connection. We only need one of these definitions: if we reverse the definition of ≤ on B then a monotone connection becomes antitone and vice versa. [1]

How can we just reverse the meaning of ≤ willy-nilly? Recall that we said above that ≤ is just a notation for a partial order. There’s no need for it to mean “less than” in any sense, and the opposite of a partial order is another partial order.

We’ll use the antitone definition for the rest of the post because our examples are antitone. Importantly, the Fundamental Theorem of Galois Theory involves an antitone connection.

Examples

For our first example, let A be sets of points in the plane, and let B be sets of lines in the plane. For both sets let ≤ mean subset.

For a set of points X, define F(X) to be the set of lines that go through all the points of X.

Similarly, for a set of lines Y, define G(Y) to be the set of points on all the lines in Y.

Then the pair (F, G) form a Galois connection.

This example can be greatly generalized. Let R be any binary relation between A and B and let let ≤ mean subset.

Define

F(X) = { y | x R y for all x in X }

G(Y) = { x | x R y for all y in Y }

Then the pair (F, G) form a Galois connection. The example above is a special case of this construction where x R y is true if and only if x is a point on y. Garrett Birkhoff made this observation in 1940 [2].

Galois theory

Galois theory is concerned with fields, extension fields, and automorphisms of fields that keep a subfield fixed.

I recently wrote a series of blog posts explaining what images on the covers of math books were about, and one of these posts was an explanation of the following diagram:

 

Each node in the graph is a field, and a line means the field on the higher level is an extension of the field on the lower level. For each graph like this of extension fields, there is a corresponding graph of Galois groups. Specifically, let L be the field at the top of the diagram and let E be any field in the graph.

The corresponding graph of groups replaces E with the group of group isomorphims from L to L that leave the elements of E unchanged, the automorphisms of L that fix E. This map from fields to groups is half of a Galois connection pair. The other half is the map that takes each group to the field of elements of L fixed by G. This connection is antitone because if a field F is an extension of E, then the group of automorphisms that fix F are a subgroup of the automorphisms that fix E.

***

[1] We could think of A and B as categories, where there is a morphism between x and y iff xy. Then a monotone Galois connection between A and B is an antitone Galois connection between A and Bop.

[2] Garrett Birkhoff. Lattice Theory. American Mathematical Society Colloquium Publications, volume 25. New York, 1940.

Galois diagram

The previous post listed three posts I’d written before about images on the covers of math books. This post is about the image on the first edition of Dummit and Foote’s Abstract Algebra.

Here’s a version of the image on the cover I recreated using LaTeX.

The image on the cover appears on page 495 and represents extension fields. If you’re going through this book sequentially as a text book, it’s likely time will run out before you ever find out what the image on the cover means. If you do get to it, you get to it near the end of your course.

My diagram is topologically equivalent to the original. I took the liberty of moving things around a bit to keep the diagram from being awkwardly wide.

At the bottom of the diagram we have ℚ, the field of rational numbers. At the top of the diagram we have ℚ(i, 21/8), the smallest field containing the rational numbers, the imaginary unit i, and the eighth root of 2. Lines between fields on two levels indicate that the higher is an extension of the lower. The constant ζ in the diagram is

ζ = √i = √2(1 + i)/2.

The significance of the diagram is that extension field relationships like these are important in Galois theory. The image appears late in the book because the majority of the book is leading up to Galois theory.

Book cover posts

When a math book has an intriguing image on the cover, it’s fun to get to the point in the book where the meaning of the image is explained. I have some ideas for book covers I’d like to write about, but here I’d like to point out three such posts I’ve already written.

Weierstrass elliptic function

The book on the left is Abramowitz and Stegun, affectionately known as A&S. I believe the function plotted on the cover is the Weierstrass elliptic function, as I wrote about here.

As the image suggests, my copy of A&S has seen a bit of wear. At one point the cover fell off and I put it back on with packing tape.

Möbius transformation of circles

The book in the middle is my well-worn copy of my undergraduate complex analysis text. The cover is a bit dirty and pages are falling out, a sort of Velveteen rabbit of math books.

I haven’t written a post about the cover per se, but I did write about the necessary math to recreate the image on the cover here. That post explains how to compute the image of a circle under a Möbius transformation. The image on the left is mapped to the image on the right via the function

f(z) = 1/(z – α).

Here α is the point on the right where all the outer circles are tangent. If you wanted to reconstruct the image on the cover, it would be easier to proceed from right to left: start with the image on the right because it’s easier to describe, and apply the inverse transformation using the instructions in the blog post to produce the image on the left.

Hankel functions

I wrote about the book on the right here. I believe the image on the cover is the plot of a Hankel function.

Updates

Here’s a post explaining the image on the cover Abstract Algebra by Dummit and Foote.

And here is a post explaining the image on the cover of A Course of Modern Analysis by Whittaker and Watson.

 

Ducci sequences

Pick four integers a, b, c, and d. Now iterate the procedure that takes these four integers to

|ba|, |cb|, |dc|, |ad|

You could think of the four integers being arranged clockwise in a circle, taking the absolute value of difference between each number and its neighbor a quarter turn clockwise. The sequence of 4-tuples created this way is called a Ducci sequence.

We can play with Ducci sequences using the following Python code.

    def next(x):
        n = len(x)
        return [abs(x[i] - x[(i+1)%n]) for i in range(n)]

If we start with four numbers taken from today’s date, the sequence will terminate in zeros in five steps. The code

    x = [7, 10, 20, 22]
    for _ in range(5):
        x = next(x)
        print(x)

This produces

    [3, 10, 2, 15]
    [7, 8, 13, 12]
    [1, 5, 1, 5]
    [4, 4, 4, 4]
    [0, 0, 0, 0]

The sequence always terminates for any starting point, and usually it terminates quickly, as in the example below. However, you can find starting values so that the sequence takes arbitrarily long to terminate.

In fact, the worse case for termination comes from consecutive Tribonacci numbers [1]. These numbers are defined analogously to Fibonacci numbers, but starting with 0, 0, 1, and each subsequent number being the sum of the previous 3 numbers.

0, 0, 1, 1, 2, 4, 7, 13, 24, 44, 81, 149, 274, 504, …

If you start with Tn, Tn+1, Tn+2, Tn+3 then the corresponding Ducci sequence takes at least 3n/2 steps to converge to 0.

Note that the code above does not assume we use sequences of 4 numbers. If we use n-tuples where n is a power of 2, the Ducci sequence always terminates in zeros. More generally the sequence will terminate in a cycle.

For example, suppose we start with 7, 10, and 22. Then the sequence of iterations is

    [3, 12, 15]
    [9, 3, 12]
    [6, 9, 3]
    [3, 6, 3]
    [3, 3, 0]
    [0, 3, 3]
    [3, 0, 3]
    [3, 3, 0] 
    ...

***

[1] Achim Clausing, Ducci Matrices. American Mathematical Monthly, December 2018.

Dijkstra extends Pythagoras

Suppose a triangle has sides a, b, and c. Label the angles opposite these three sides α, β, and γ respectively.

Edsger Dijkstra published (EWD975) a note proving the following extension of the Pythagorean theorem:

sgn(α + β – γ) = sgn(a² + b² – c²).

Here the sgn function is -1, 0, or 1 depending on whether its input is negative, zero, or positive.

To see that this really is an extension of the Pythagorean theorem, if γ is a right angle, then α + β = γ and so the sgn on the left hand side evaluates to 0. This forces the right hand side to 0, which says a² + b² = c².

As Dijkstra points out, his is a theorem about triangles, not simply a theorem about right triangles.

More Dijkstra posts

Bump functions

A bump function is a smooth (i.e. infinitely differentiable) function that is positive on some open interval (a, b) and zero outside that interval. I mentioned bump functions a few weeks ago and discussed how they could be used to prevent clicks in radio transmissions.

Today I ran into a twitter thread that gave a more general construction of bump functions that I’d seen before. The thread concludes with this:

You can create bump functions by this recipe:

    1. Take any f(x) growing faster than polynomial (e.g. exp)
    2. Define g(x) = 1 / f(1/x)
    3. Let h(x) = g(1+x) g(1−x).
    4. Zero out x∉(−1,1)
    5. Scale, shift, etc

In this post I’ll give a quick asymptotic proof that the construction above works.

Let a positive integer n be given and define g(x) to be zero for negative x. We’ll show that if f grows faster than xn. then g is n times differentiable at 0.

As x → ∞, f(x) is eventually bounded below by a function growing faster than xn. And so as x → 0, f(1/x) grows faster than xn and 1/f(1/x) goes to zero faster than xn, and so its nth derivative is zero.

It follows that g(1 + x) is n times differentiable at -1 and g(1 – x) is n times differentiable at 1. So h is n times differentiable at -1 and 1. The function h is positive on the open interval (-1, 1)  and zero outside. Our choice of n was arbitrary, so h is infinitely differentiable, and so h is a bump function. We could shift and scale h to make it a bump function on any other finite interval.

Discussion

When I was a student, I would have called this kind of proof hand waving. I’d want to see every inequality made explicit: there exists some M > 0 such that for x > M …. Now I find arguments like the one above easier to follow and more convincing. I imagine if a lecturer gives a proof with all the inequalities spelled out, he or she is probably thinking about something like the proof above and expanding the asymptotic argument on the fly.

Note that a slightly more general theorem falls out of the proof. Our goal was to show if f grows faster than every polynomial, then g and h are infinitely differentiable. But along the way we proved, for example, that if f eventually grows like x7 then g and h are six-times differentiable.

In fact, let’s look at the case f(x) = x7.

    f[x_] := x^7
    g[x_] := If[x > 0, 1/ f[1/x], 0]
    h[x_] := g[x + 1] g[1 - x]
    Plot[h[x], {x, -1.5, 1.5}]

This produces the following plot.

Just looking at the plot, h looks smooth; it’s plausible that it has six derivatives. It appears h is zero outside [-1, 1] and h is positive over at least part of [-1, 1]. If you look at the Mathematica code you can convince yourself that h really is positive on the entire interval open interval (-1, 1), though it is very small as it approaches either end.

Related posts

Polynomial analog of the Collatz conjecture

The Collatz conjecture, a.k.a. the 3n + 1 problem, a.k.a. the hailstone conjecture, asks whether the following sequence always terminates.

Start with a positive integer n.

  1. If n is even, set nn /2. Otherwise n ← 3n + 1.
  2. If n = 1, stop. Otherwise go back to step 1.

The Collatz conjecture remains an unsolved problem, though there has been progress toward a proof. Some people, however, are skeptical whether the conjecture is true.

This post will look at a polynomial analog of the Collatz conjecture. Instead of starting with a positive integer, we start with a polynomial with coefficients in the integers mod m.

If the polynomial is divisible by x, then divide by x. Otherwise, multiply by (x + 1) and add 1. Here x is analogous to 2 and (x + 1) is analogous to 3 in the (integer) Collatz conjecture.

Here is Mathematica code to carry out the polynomial Collatz operation.

    c[p_, m_] := 
        PolynomialMod[
            If[ (p /. x -> 0) == 0, 
                 p/x, 
                 (x + 1) p + 1
            ], 
            m
        ]

If m = 2, the process always converges. In fact, it converges in at most n² + 2n steps where n is the degree of the polynomial [1].

Here’s an example starting with the polynomial x7 + x3 + 1.

    Nest[c[#, 2] &, x^7 + x^3 + 1, 15]

This returns 1, and so in this instance 15 iterations are enough.

If m = 3, however, the conjecture is false. In [1] the authors report that the sequence starting with x² + 1 is periodic with period 6.

The following code produces the sequence of values.

    NestList[c[#, 3] &, x^2 + 1, 6]

This returns the sequence

  • 1 + x2
  • 2 + x + x2 + x3
  • 2 x2 + 2 x3 + x4
  • 2 x + 2 x2 + x3
  • 2 + 2 x + x2
  • x + x3
  • 1 + x2

Related posts

[1] Kenneth Hicks, Gary L. Mullen, Joseph L. Yucas and Ryan Zavislak. A Polynomial Analogue of the 3n + 1 Problem. The American Mathematical Monthly, Vol. 115, No. 7. pp. 615-622.