Splitting proofs in two

“Ever since Euclid, mathematical proofs have served a dual purpose: certifying that a statement is true and explaining why it is true. In the future these two epistemological functions may be divorced. In the future, the computer assistant may take care of the certification and leave the mathematician to look for an explanation that humans can understand.”

Dana Mackenzie, “What in the Name of Euclid Is Going On Here?”, Science, 2005


Permutations and tests

Suppose a test asks you to place 10 events in chronological order. Label these events A through J so that chronological order is also alphabetical order.

If a student answers BACDEFGHIJ, then did they make two mistakes or just one? Two events are in the wrong position, but they made one transposition error. The simplest way to grade such a test would be to count the number of events that are in the correct position. Is this the most fair way to grade?

If you decide to count how many transpositions are needed to correct a student’s answer, do you count any transposition or only adjacent transpositions? For example, if someone answered JBCDEFGHIA, then transposing the A and the J is enough to put the results in order. But reversing the first and last event seems like a bigger mistake than reversing the first two events. Counting only adjacent transpositions would penalize this mistake more. You would have to swap the J with each of the eight letters between J and A. But it hardly seems that answering JBCDEFGHIA is eight times worse than answering BACDEFGHIJ.

Maybe counting transpositions is too much work. So we just go back to counting how many events are in the right place. But then suppose someone answers JABCDEFGHI. This is completely wrong since every event is in the wrong position. But the student obviously knows something, since the relative order of nearly all of the events is correct. From one perspective there was only one mistake: J comes last, not first.

What is the worst possible answer? Maybe getting the order exactly backward? If you have an odd number of events, then getting the order backward means one event is in the right place, and so that doesn’t receive the lowest possible score.

This is an interesting problem beyond grading exams. (As for grading exams, I’d suggest simply not using questions of this type on an exam.) In manufacturing, how serious a mistake is it to reverse two consecutive components versus two distant components? You could also ask the same question when comparing DNA sequences or other digital signals. The best way to assign a distance between the actual and desired sequence would depend entirely on context.

Fibonacci formula for pi

Here’s an unusual formula for pi based on the product and least common multiple of the first m Fibonacci numbers.


\pi = \lim_{m\to\infty} \sqrt{\frac{6 \log F_1 \cdots F_m}{\log \mbox{lcm}( F_1, \ldots, F_m )}}

Unlike the formula I wrote about a few days ago relating Fibonacci numbers and pi, this one is not as simple to prove. The numerator inside the root is easy enough to estimate asymptotically, but estimating the denominator depends on the distribution of primes.

Source: Yuri V. Matiyasevich and Richard K. Guy, A new formula for π, American Mathematical Monthly, Vol 93, No. 8 (October 1986), pp. 631-635.


Fibonacci numbers, arctangents, and pi

Here’s an unusual formula for π. Let Fn be the nth Fibonacci number. Then

\pi = 4 \sum_{n=1}^\infty \arctan\left( \frac{1}{F_{2n+1}} \right)

As mysterious as this equation may seem, it’s not hard to prove. The arctangent identity

\arctan\left(\frac{1}{F_{2n+1}}\right) = \arctan\left(\frac{1}{F_{2n}}\right) - \arctan\left(\frac{1}{F_{2n+2}}\right)

shows that the sum telescopes, leaving only the first term, arctan(1) = π/4. To prove the arctangent identity, take the tangent of both sides, use the addition law for tangents, and use the Fibonacci identity

F_{n+1} F_{n-1} - F_n^2 = (-1)^n

See this post for an even more remarkable formula relating Fibonacci numbers and π.

Number of digits in n!

The other day I ran across the fact that 23! has 23 digits. That made me wonder how often n! has n digits.

There can only be a finite number of cases, because n! grows faster than 10n for n > 10, and it’s reasonable to guess that 23 might be the largest case. Turns out it’s not, but it’s close. The only cases where n! has n digits are 1, 22, 23, and 24. Once you’ve found these by brute force, it’s not hard to show that they must be the only ones because of the growth rate of n!.

Is there a convenient way to find the number of digits in n! without having to compute n! itself? Sure. For starters, the number of digits in the base 10 representation of a number x is

⌊ log10 x ⌋ + 1.

where ⌊ z ⌋ is the floor of z, the largest integer less than or equal to z. The log of the factorial function is easier to compute than the factorial itself because it won’t overflow. You’re more likely to find a function to compute the log of the gamma function than the log of factorial, and more likely to find software that uses natural logs than logs base 10. So in Python, for example, you could compute the number of digits with this:

from scipy.special import gammaln
from math import log, floor

def digits_in_factorial(n):
    return floor( gammaln(n+1)/log(10.0) ) + 1

What about a more elementary formula, one that doesn’t use the gamma function? If you use Stirling’s approximation for factorial and take log of that you should at least get a good approximation. Here it is again in Python:

from math import log, floor, pi

def stirling(n):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(10) ) + 1

The code above is exact for every n > 2 as far as I’ve tested, up to n = 1,000,000. (Note that one million factorial is an extremely large number. It has 5,565,709 digits. And yet we can easily say something about this number, namely how many digits it has!)

The code may break down somewhere because the error in Stirling’s approximation or the limitations of floating point arithmetic. Stirling’s approximation gets more accurate as n increases, but it’s conceivable that a factorial value could be so close to a power of 10 that the approximation error pushes it from one side of the power of 10 to the other. Maybe that’s not possible and someone could prove that it’s not possible.

You could extend the code above to optionally take another base besides 10.

def digits_in_factorial(n, b=10):
    return floor( gammaln(n+1)/log(b) ) + 1

def stirling(n, b=10):
    return floor( ((n+0.5)*log(n) - n + 0.5*log(2*pi))/log(b) ) + 1

The code using Stirling’s approximation still works for all n > 2, even for b as small as 2. This is slightly surprising since the number of bits in a number is more detailed information than the number of decimal digits.

Nicholas Higham on Mathematics in Color

This is reprint of Nick Higham’s post of the same title from the Princeton University Press blog, used with permission.


Color is a fascinating subject. Important early contributions to our understanding of it came from physicists and mathematicians such as Newton, Young, Grassmann, Maxwell, and Helmholtz. Today, the science of color measurement and description is well established and we rely on it in our daily lives, from when we view images on a computer screen to when we order paint, wallpaper, or a car, of a specified color.

For practical purposes color space, as perceived by humans, is three-dimensional, because our retinas have three different types of cones, which have peak sensitivities at wavelengths corresponding roughly to red, green, and blue. It’s therefore possible to use linear algebra in three dimensions to analyze various aspects of color.


A good example of the use of linear algebra is to understand metamerism, which is the phenomenon whereby two objects can appear to have the same color but are actually giving off light having different spectral decompositions. This is something we are usually unaware of, but it is welcome in that color output systems (such as televisions and computer monitors) rely on it.

Mathematically, the response of the cones on the retina to light can be modeled as a matrix-vector product Af, where A is a 3-by-n matrix and f is an n-vector that contains samples of the spectral distribution of the light hitting the retina. The parameter n is a discretization parameter that is typically about 80 in practice. Metamerism corresponds to the fact that Af1 = Af2  is possible for different vectors f1 and f2. This equation is equivalent to saying that Ag = 0 for a nonzero vector gf1 – f2, or, in other words, that a matrix with fewer rows than columns has a nontrivial null space.

Metamerism is not always welcome. If you have ever printed your photographs on an inkjet printer you may have observed that a print that looked fine when viewed indoors under tungsten lighting can have a color cast when viewed in daylight.

LAB Space: Separating Color from Luminosity

In digital imaging the term channel refers to the grayscale image representing the values of the pixels in one of the coordinates, most often R, G, or B (for red, green, and blue) in an RGB image. It is sometimes said that an image has ten channels. The number ten is arrived at by combining coordinates from the representation of an image in three different color spaces. RGB supplies three channels, a space called LAB (pronounced “ell-A-B”) provides another three channels, and the last four channels are from CMYK (cyan, magenta, yellow, black), the color space in which all printing is done.

LAB is a rather esoteric color space that separates luminosity (or lightness, the L coordinate) from color (the A and B coordinates). In recent years photographers have realized that LAB can be very useful for image manipulations, allowing certain things to be done much more easily than in RGB. This usage is an example of a technique used all the time by mathematicians: if we can’t solve a problem in a given form then we transform it into another representation of the problem that we can solve.

As an example of the power of LAB space, consider this image of aeroplanes at Schiphol airport.


Original image.

Suppose that KLM are considering changing their livery from blue to pink. How can the image be edited to illustrate how the new livery would look? “Painting in” the new color over the old using the brush tool in image editing software would be a painstaking task (note the many windows to paint around and the darker blue in the shadow area under the tail). The next image was produced in
just a few seconds.


Image converted to LAB space and A channel flipped.

How was it done? The image was converted from RGB to LAB space (which is a nonlinear transformation) and then the coordinates of the A channel were replaced by their negatives. Why did this work? The A channel represents color on a green–magenta axis (and the B channel on a blue–yellow axis). Apart from the blue fuselage, most pixels have a small A component, so reversing the sign of this component doesn’t make much difference to them. But for the blue, which has a negative A component, this flipping of the A channel adds just enough magenta to make the planes pink.

You may recall from earlier this year the infamous photo of a dress that generated a huge amount of interest on the web because some viewers perceived the dress as being blue and black while others saw it as white and gold. A recent paper What Can We Learn from a Dress with Ambiguous Colors? analyzes both the photo and the original dress using LAB coordinates. One reason for using LAB in this context is its device independence, which contrasts with RGB, for which the coordinates have no universally agreed meaning.

Nicholas J. Higham is the Richardson Professor of Applied Mathematics at The University of Manchester, and editor of The Princeton Companion to Applied Mathematics. His article Color Spaces and Digital Imaging in The Princeton Companion to Applied Mathematics gives an introduction to the mathematics of color and the representation and manipulation of digital images. In particular, it emphasizes the role of linear algebra in modeling color and gives more detail on LAB space.

Higham jacket

Related posts:

Three algorithms for converting to gray scale

FFT and Big Data (quotes from Princeton Companion to Applied Mathematics)

Generalization of Fibonacci ratios

Each Fibonacci number is the sum of its two predecessors. My previous post looked at generalizing this to the so-called Tribonacci numbers, each being the sum of its three predecessors. One could keep going, defining the Tetrabonacci numbers and in general the n-Fibonacci numbers for any n at least 2.

For the definition to be complete, you have to specify the first n of the n-Fibonacci numbers. However, these starting values hardly matter for our purposes. We want to look at the limiting ratio of consecutive n-Fibonacci numbers, and this doesn’t depend on the initial conditions. (If you were determined, you could find starting values where this isn’t true. It’s enough to pick integer initial values, at least one of which is not zero.)

As shown in the previous post, the ratio is the largest eigenvalue of an n by n matrix with 1’s on the first row and 1’s immediately below the main diagonal. The characteristic polynomial of such a matrix is

λn – λn-1 – λn-2 – … -1

and so we look for the largest zero of this polynomial. We can sum the terms with negative coefficients as a geometric series and show that the eigenvalues satisfy

λn – 1/(2 – λ) = 0.

So the limiting ratio of consecutive n-Fibonacci numbers is the largest root of the above equation. You could verify that when n = 2, we get the golden ratio φ as we should, and when n = 3 we get around 1.8393 as in the previous post.

As n gets large, the limiting ratio approaches 2. You can see this by taking the log of the previous equation.

n = -log(2 – λ)/log(λ).

As n goes to infinity, λ must approach 2 so that the right side of the equation also goes to infinity.

Reading equations forward and backward

There is no logical difference between writing A = B and writing B = A, but there is a psychological difference.

Equations are typically applied left to right. When you write A = B you imply that it may be useful to replace A with B. This is helpful to keep in mind when learning something new: the order in which an equation is written gives a hint as to how it may be applied. However, this way of thinking can also be a limitation. Clever applications often come from realizing that you can apply an equation in the opposite of the usual direction.

For example, Euler’s reflection formula says

Γ(z) Γ(1-z) = π / sin(πz).

Reading from left to right, this says that two unfamiliar/difficult things, values of the Gamma function, are related to a more familiar/simple thing, the sine function. It would be odd to look at this formula and say “Great! Now I can compute sines if I just know values of the Gamma function.” Instead, the usual reaction would be “Great! Now I can relate the value of Gamma at two different places by using sines.”

When we see Einstein’s equation

E = mc2

the first time, we think about creating energy from matter, such as the mass lost in nuclear fission. This applies the formula from left to right, relating what we want to know, an amount of energy, to what we do know, an amount of mass. But you could also read the equation from right to left, calculating the amount of energy, say in an accelerator, necessary to create a particle of a given mass.

Calculus textbooks typically have a list of equations, either inside the covers or in an appendix, that relate an integral on the left to a function or number on the right. This makes sense because calculus students compute integrals. But mathematicians often apply these equations in the opposite direction, replacing a number or function with an integral. To a calculus student this is madness: why replace a familiar thing with a scary thing? But integrals aren’t scary to mathematicians. Expressing a function as an integral is often progress. Properties of a function may be easier to see in integral form. Also, the integral may lend itself to some computational technique, such as reversing the order of integration in a double integral, or reversing the order to taking a limit and an integral.

Calculus textbooks also have lists of equations involving infinite sums, the summation always being on the left. Calculus students want to replace the scary thing, the infinite sum, with the familiar thing, the expression on the right. Generating functions turn this around, wanting to replace things with infinite sums. Again this would seem crazy to a calculus student, but it’s a powerful problem solving technique.

Differential equation students solve differential equations. They want to replace what they find scary, a differential equation, with something more familiar, a function that satisfies the differential equation. But mathematicians sometimes want to replace a function with a differential equation that it satisfies. This is common, for example, in studying special functions. Classical orthogonal polynomials satisfy 2nd order differential equations, and the differential equation takes a different form for different families of orthogonal polynomials. Why would you want to take something as tangible and familiar as a polynomial, something you might study as a sophomore in high school, and replace it with something as abstract and mysterious as a differential equation, something you might study as a sophomore in college? Because some properties, properties that you would not have cared about in high school, are more clearly seen via the differential equations.

Defining zero factorial

Things are defined the way they are for good reasons. This seems blatantly obvious now, but it was eye-opening when I learned this my first year in college. Our professor, Mike Starbird, asked us to go home and think about how convergence of a series should be defined. Not how it is defined, but how it should be defined. We were not to look up the definition but to think about what it should be. The next day we proposed our definitions. In good Socratic fashion Starbird showed us the flaws of each and lead us to arrive at the standard definition.

This exercise gave me confidence that mathematical definitions were created by ordinary mortals like myself. It also began my habit of examining definitions carefully to understand what motivated them.

One question that comes up frequently is why zero factorial equals 1. The pedantic answer is “Because it is defined that way.” This answer alone is not very helpful, but it does lead to the more refined question: Why is 0! defined to be 1?

The answer to the revised question is that many formulas are simpler if we define 0! to be 1. If we defined 0! to be 0, for example, countless formulas would have to add disqualifiers such as “except when n is zero.”

For example, the binomial coefficients are defined by

C(n, k) = n! / k!(nk)!.

The binomial coefficient C(n, k) tells us how many ways one can draw take a set of n things and select k of them. For example, the number of ways to deal a hand of five cards from a deck of 52 is C(52, 5) = 52! / 5! 47! = 2,598,960.

How many ways are there to deal a hand of 52 cards from a deck of 52 cards? Obviously one: the deck is the hand. But our formula says the answer is

C(52, 52) = 52! / 52! 0!,

and the formula is only correct if 0! = 1. If 0! were defined to be anything else, we’d have to say “The number of ways to deal a hand of k cards from a deck of n cards is C(n, k), except when k = 0 or k = n, in which case the answer is 1.” (See [1] below for picky details.)

The example above is certainly not the only one where it is convenient to define 0! to be 1. Countless theorems would be more awkward to state if 0! were defined any other way.

Sometimes people appeal to the gamma function for justification that 0! should be defined to be 1. The gamma function extends factorial to real numbers, and the gamma function value associated with 0! is 1. (In detail, n! = Γ(n+1) for positive integers n and Γ(1) = 1.) This is reassuring, but it raises another question: Why should the gamma function be authoritative?

Indeed, there are many ways to extend factorial to non-integer values, and historically many ways were proposed. However, the gamma function won and its competitors have faded into obscurity. So why did it win? Analogous to the discussion above, we could say that the gamma function won because more formulas work out simply with this definition than with others. That is, you can very often replace n! with Γ(n + 1) in a formula true for positive integer values of n and get a new formula valid for real or even complex values of n.

There is another reason why gamma won, and that’s the Bohr–Mollerup theorem. It says that if you’re looking for a function f(x) defined for x > 0 that satisfies f(1) = 1 and f(x+1) = x f(x), then the gamma function is the only log-convex solution. Why should we look for log-convex functions? Because factorial is log-convex, and so this is a natural property to require of its extension.

Update: Occasionally I hear someone say that the gamma function (shifting its argument by 1) is the only analytic function that extends factorial to the complex plane, but this isn’t true. For example, if you add sin(πx) to the gamma function, you get another analytic function that takes on the same values as gamma for positive integer arguments.

Related posts:

* * *

[1] Theorems about binomial coefficients have to make some restrictions on the arguments. See these notes for full details. But in the case of dealing cards, the only necessary constraints are the natural ones: we assume the number of cards in the deck and the number we want in a hand are non-negative integers, and that we’re not trying to draw more cards for a hand than there are in a deck. Defining 0! as 1 keeps us from having to make any unnatural qualifications such as “unless you’re dealing the entire deck.”

Mathematical arbitrage

I suspect there’s a huge opportunity in moving mathematics from the pure column to the applied column. There may be a lot of useful math that never sees application because the experts are unconcerned with or unaware of applications.

In particular I wonder what applications there may be of number theory, especially analytic number theory. I’m not thinking of the results of number theory but rather the elegant machinery developed to attack problems in number theory. I expect more of this machinery could be useful to problems outside of number theory.

I also wonder about category theory. The theory certainly finds uses within pure mathematics, but I’m not sure how useful it is in direct application to problems outside of mathematics. Many of the reported applications don’t seem like applications at all, but window dressing applied after-the-fact. On the other hand, there are also instances where categorical thinking led the way to a solution, but did its work behind the scenes; once a solution was in hand, it could be presented more directly without reference to categories. So it’s hard to say whether applications of category theory are over-reported or under-reported.

The mathematical literature can be misleading. When researchers say their work has various applications, they may be blowing smoke. At the same time, there may be real applications that are never mentioned in journals, either because the work is proprietary or because it is not deemed original in the academic sense of the word.

Quaternions in Paradise Lost

Last night I checked a few books out from a library. One was Milton’s Paradise Lost and another was Kuipers’ Quaternions and Rotation Sequences. I didn’t expect any connection between these two books, but there is one.

photo of books mentioned here

The following lines from Book V of Paradise Lost, starting at line 180, are quoted in Kuipers’ book:

Air and ye elements, the eldest birth
Of nature’s womb, that in quaternion run
Perpetual circle, multiform, and mix
And nourish all things, let your ceaseless change
Vary to our great maker still new praise.

When I see quaternion I naturally think of Hamilton’s extension of the complex numbers, discovered in 1843. Paradise Lost, however, was published in 1667.

Milton uses quaternion to refer to the four elements of antiquity: air, earth, water, and fire. The last three are “the eldest birth of nature’s womb” because they are mentioned in Genesis before air is mentioned.


Random walks and the arcsine law

Suppose you stand at 0 and flip a fair coin. If the coin comes up heads, you take a step to the right. Otherwise you take a step to the left. How much of the time will you spend to the right of where you started?

As the number of steps N goes to infinity, the probability that the proportion of your time in positive territory is less than x approaches 2 arcsin(√x)/π. The arcsine term gives this rule its name, the arcsine law.

Here’s a little Python script to illustrate the arcsine law.

import random
from numpy import arcsin, pi, sqrt

def step():
u = random.random()
return 1 if u < 0.5 else -1 M = 1000 # outer loop N = 1000 # inner loop x = 0.3 # Use any 0 < x < 1 you'd like. outer_count = 0 for _ in range(M): n = 0 position= 0 inner_count = 0 for __ in range(N): position += step() if position > 0:
inner_count += 1
if inner_count/N < x: outer_count += 1 print (outer_count/M) print (2*arcsin(sqrt(x))/pi) [/code]

Playing with continued fractions and Khinchin’s constant

Take a real number x and expand it as a continued fraction. Compute the geometric mean of the first n coefficients.

Aleksandr Khinchin proved that for almost all real numbers x, as n → ∞ the geometric means converge. Not only that, they converge to the same constant, known as Khinchin’s constant, 2.685452001…. (“Almost all” here mean in the sense of measure theory: the set of real numbers that are exceptions to Khinchin’s theorem have measure zero.)

To get an idea how fast this convergence is, let’s start by looking at the continued fraction expansion of π. In Sage, we can type


to get the continued fraction coefficient

[3, 7, 15, 1, 292, 1, 1, 1, 2, 1, 3, 1, 14, 2, 1, 1, 2, 2, 2, 2, 1, 84, 2, 1, 1, 15, 3]

for π to 100 decimal places. The geometric mean of these coefficients is 2.84777288486, which only matches Khinchin’s constant to 1 significant figure.

Let’s try choosing random numbers and working with more decimal places.

There may be a more direct way to find geometric means in Sage, but here’s a function I wrote. It leaves off any leading zeros that would cause the geometric mean to be zero.

from numpy import exp, mean, log
def geometric_mean(x):
    return exp( mean([log(k) for k in x if k > 0]) )

Now let’s find 10 random numbers to 1,000 decimal places.

for _ in range(10):
    r = RealField(1000).random_element(0,1)

This produced


Three of these agree with Khinchin’s constant to two significant figures but the rest agree only to one. Apparently the convergence is very slow.

If we go back to π, this time looking out 10,000 decimal places, we get a little closer:


produces 2.67104567579, which differs from Khinchin’s constant by about 0.5%.

Grand unification of mathematics

Greg Egan’s short story Glory features a “xenomathematician” who discovers that an ancient civilization had produced a sort of grand unification of their various branches of mathematics.

It was not a matter of everything in mathematics collapsing in on itself, with one branch turning out to have been merely a recapitulation of another under a different guise. Rather, the principle was that every sufficiently beautiful mathematical system was rich enough to mirror in part — and sometimes in a complex and distorted fashion — every other sufficiently beautiful system. Nothing became sterile and redundant, nothing proved to have been a waste of time, but everything was shown to be magnificently intertwined.

Another reason natural logarithms are natural

In mathematics, log means natural logarithm by default; the burden of explanation is on anyone taking logarithms to a different base. I elaborate on this a little here.

Looking through Andrew Gelman and Jennifer Hill’s regression book, I noticed a justification for natural logarithms I hadn’t thought about before.

We prefer natural logs (that is, logarithms base e) because, as described above, coefficients on the natural-log scale are directly interpretable as approximate proportional differences: with a coefficient of 0.06, a difference of 1 in x corresponds to an approximate 6% difference in y, and so forth.

This is because

exp(x) ≈ 1 + x

for small values of x based on a Taylor series expansion. So in Gelman and Hill’s example, a difference of 0.06 on a natural log scale corresponds to roughly multiplying by 1.06 on the original scale, i.e. a 6% increase.

The Taylor series expansion for exponents of 10 is not so tidy:

10x ≈ 1 + 2.302585 x

where 2.302585 is the numerical value of the natural log of 10. This means that a change of 0.01 on a log10 scale corresponds to an increase of about 2.3% on the original scale.

Related post: Approximation relating lg, ln, and log10