Automated theorem proving

When I first heard of automated theorem proving, I imagined computers being programmed to search for mathematical theorems interesting to a wide audience. Maybe that’s what a few of the pioneers in the area had in mind too, but that’s not how things developed.

The biggest uses for automated theorem proving have been highly specialized applications, not mathematically interesting theorems. Computer chip manufacturers use formal methods to verify that given certain inputs their chips produce certain outputs. Compiler writers use formal methods to verify that their software does the right thing. A theorem saying your product behaves correctly is very valuable to you and your customers, but nobody else. These aren’t the kinds of theorems that anyone would cite the way they might site the Pythagorean theorem. Nobody would ever say “And therefore, by the theorem showing that this particular pacemaker will not fall into certain error modes, I now prove this result unrelated to pacemakers.”

Automated theorem provers are important in these highly specialized applications in part because the results are of such limited interest. For every theorem of wide mathematical interest, there are a large number of mathematicians who are searching for a proof or who are willing to scrutinize a proposed proof. A theorem saying that a piece of electronics performs correctly appeals to only the tiniest audience, and yet is probably much easier (for a computer) to prove.

The term “automated theorem proving” is overloaded to mean a couple things. It’s used broadly to include any use of computing in proving theorems, and it’s used more narrowly to mean software that searches for proofs or even new theorems. Most theorem provers in the broad sense are not automated theorem provers in the more narrow sense but rather proof assistants. They verify proofs rather than discover them. (There’s some gray zone. They may search on a small scale, looking for a way to prove a minor narrow result, but not search for the entire proof to a big theorem.) There have been computer-verified proofs of important mathematical theorems, such as the Feit-Thompson theorem from group theory, but I’m not aware of any generally interesting discoveries that have come out of a theorem prover.

Related post: Formal methods let you explore the corners

Beta reduction: The difference typing makes

Beta reduction is essentially function application. If you have a function described by what it does to x and apply it to an argument t, you rewrite the xs as ts. The formal definition of β-reduction is more complicated than this in order to account for free versus bound variables, but this informal description is sufficient for this blog post. We will first show that β-reduction holds some surprises, then explain how these surprises go away when you add typing.

Suppose you have an expression (λx.x + 2)y, which means apply to y the function that takes its argument and adds 2. Then β-reduction rewrites this expression as y + 2. In a more complicated expression, you might be able to apply β-reduction several times. When you do apply β-reduction several times, does the process always stop? And if you apply β-reduction to parts of an expression in a different order, can you get different results?

Failure to normalize

You might reasonably expect that if you apply β-reduction enough times you eventually get an expression you can’t reduce any further. Au contraire!

Consider the expression  (λx.xx) (λx.xx).  Beta reduction says to replace each of the red xs with the expression in blue. But when we do that, we get the original expression (λx.xx) (λx.xx) back. Beta reduction gets stuck in an infinite loop.

Next consider the expression L = (λx.xxy) (λx.xxy). Applying β-reduction the first time gives  (λx.xxy) (λx.xxyy or Ly. Applying β-reduction again yields Lyy. Beta “reduction” doesn’t reduce the expression at all but makes it bigger.

The technical term for what we’ve seen is that β-reduction is not normalizing. A rewrite system is strongly normalizing if applying the rules in any order eventually terminates. It’s weakly normalizing if there’s at least some sequence of applying the rules that terminates. Beta reduction is neither strongly nor weakly normalizing in the context of (untyped) lambda calculus.

Types to the rescue

In simply typed lambda calculus, we assign types to every variable, and functions have to take the right type of argument. This additional structure prevents examples such as those above that fail to normalize. If x is a function that takes an argument of type A and returns an argument of type B then you can’t apply x to itself. This is because x takes something of type A, not something of type function from A to B. You can prove that not only does this rule out specific examples like those above, it rules out all possible examples that would prevent β-reduction from terminating.

To summarize, β-reduction is not normalizing, not even weakly, in the context of untyped lambda calculus, but it is strongly normalizing in the context of simply typed lambda calculus.


Although β-reduction is not normalizing for untyped lambda calculus, the Church-Rosser theorem says it is confluent. That is, if an expression P can be transformed by β-reduction two different ways into expressions M and N, then there is an expression T such that both M and N can be reduced to T. This implies that if β-reduction does terminate, then it terminates in a unique expression (up to α-conversion, i.e. renaming bound variables). Simply typed lambda calculus is confluent as well, and so in that context we can say β-reduction always terminates in a unique expression (again up to α-conversion).

Proofs and programs

Here’s an interesting quote comparing writing proofs and writing programs:

Building proofs and programs are very similar activities, but there is one important difference: when looking for a proof it is often enough to find one, however complex it is. On the other hand, not all programs satisfying a specification are alike: even if the eventual result is the same, efficient programs must be preferred. The idea that the details of proofs do not matter—usually called proof irrelevance—clearly justifies letting the computer search for proofs …

The quote is saying “Any proof will do, but among programs that do the same job, more efficient programs are better.” And this is certainly true, depending on what you want from a proof.

Proofs serve two main purposes: to establish that a proposition is true, and to show why it is true. If you’re only interested in the existence of a proof, then yes, any proof will do. But proofs have a sort of pedagogical efficiency just as programs have an execution efficiency. Given two proofs of the same proposition, the more enlightening proof is better by that criteria.

I assume the authors of the above quote would not disagree because they say it is often enough to find one, implying that for some purposes simpler proofs are better. And existence does come first: a complex proof that exists is better than an elegant proof that does not exist! Once you have a proof you may want to look for a simpler proof. Or not, depending on your purposes. Maybe you have an elegant proof that you’re convinced must be essentially true, even if you doubt some details. In that case, you might want to complement it with a machine-verifiable proof, no matter how complex.

Source: Interactive Theorem Proving and Program Development
Coq’Art: The Calculus of Inductive Constructions

Primitive recursive functions and enumerable sets

The set of primitive recursive (PR) functions is the smallest set of functions of several integer arguments satisfying five axioms:

  1. Constant functions are PR.
  2. The function that picks the ith element of a list of n arguments is PR.
  3. The successor function S(n) = n+1 is PR.
  4. PR functions are closed under composition.
  5. PR functions are closed under primitive recursion.

The last axiom obviously gives PR functions their name. So what is primitive recursion? Given a PR function  that takes k arguments, and another PR function g that takes k+2 arguments, the primitive recursion of f and g is a function h of k+1 arguments satisfying two properties:

  1. h(0, x1, …, xk) = f(x1, …, xk)
  2. h(S(y), x1, …, xk) = g(yh(yx1, … xk), x1, …, xk)

Not every computable function is primitive recursive. For example, Ackermann’s function is a general recursive function, but not a primitive recursive function. General recursive functions are Turing complete. Turing machines, lambda calculus, and general recursive functions are all equivalent models of computation, but the first two are better known.

For this post, the main thing about general recursive functions is that, as the name implies, they are more general than primitive recursive functions.

Now we switch from functions to sets. The characteristic function of a set A is the function that is 1 for elements of A and zero everywhere else. In other areas of math, there is a sort of duality between functions and sets via characteristic functions. For example, the indicator function of a measurable set is a measurable function. And the indicator function of a convex set is a convex function. But in recursive functions, there’s an unexpected wrinkle in this analogy.

A set of integers is recursively enumerable if it is either empty or the image of a general recursive function. But there’s a theorem, due to Alonzo Church, that a non-empty recursively enumerable set is actually the image of a primitive recursive function. So although general recursive functions are more general, you can’t tell that from looking at function images. For example, although the Ackermann function is not primitive recursive, there is a primitive recursive function with the same image.

Intuitionistic logic in Gilbert and Sullivan

In a recent preprint, Philip Wadler introduces intuitionistic logic using the comic opera The Gondoliers.

In Gilbert and Sullivan’s The Gondoliers, Casilda is told that as an infant she was married to the heir of the King of Batavia, but that due to a mix-up no one knows which of two individuals, Marco or Giuseppe, is the heir. Alarmed, she wails “Then do you mean to say that I am married to one of two gondoliers, but it is impossible to say which?” To which the response is “Without any doubt of any kind whatever.”

… Intuitionists … insist that a proof of AB must show which of A or B holds, and hence they would reject the claim that Casilda is married to Marco or Giuseppe until one of the two was identified as her husband. Perhaps Gilbert and Sullivan anticipated intuitionism, for their story’s outcome is that the heir turns out to be a third individual, Luiz, with whom Casilda is, conveniently, already in love.

Proofs of false statements

Mark Dominus brought up an interesting question last month: have there been major screw-ups in mathematics? He defines a “major screw-up” to be a flawed proof of an incorrect statement that was accepted for a significant period of time. He excludes the case of incorrect proofs of statements that were nevertheless true.

It’s remarkable that he can even ask the question. Can you imagine someone asking with a straight face whether there have ever been major screw-ups in, say, software development? And yet it takes some hard thought to come up with examples of really big blunders in math.

No doubt there are plenty of flawed proofs of false statements in areas too obscure for anyone to care about. But in mainstream areas of math, blunders are usually uncovered very quickly. And there are examples of theorems that were essentially correct but neglected some edge case. Proofs of statements that are just plain wrong are hard to think of. But Mark Dominus came up with a few.

Yesterday he gave an example of a statement by Kurt Gödel that was flat-out wrong but accepted for over 30 years. Warning: reader discretion advised. His post is not suitable for those who get queasy at the sight of symbolic logic.