# More readable lambda calculus

Lambda calculus is simple. The definitions and rules of lambda calculus would easily fit on an index card.

But you can’t really “read” lambda calculus so much as you can mentally execute it. Not many people can look at more than a few characters of lambda calculus and have any idea what it represents, though it’s not hard to mechanically manipulate it. I suppose in hindsight that’s what you’d expect from a theoretical model for computing.

The post The Programmer’s Ring by Loup Vaillant gives a notation for lambda calculus that I hadn’t seen before, notation that in my opinion is easier to understand. The author combines two ideas: de Bruijn indices (which I’d seen before), and replacing lambdas with brackets (which I had not seen).

The idea of de Bruijn indices is to replace variables with numbers. Vaillant describes De Bruijn indices saying

… variables are not referenced by name, but by stack offset: the last variable is accessed with the number 0, the second last with the number 1, and so on …

This would be a terrible idea in an ordinary programming language, but in lambda calculus it actually helps. Lambda calculus is so low-level that the variables aren’t meaningful anyway. It’s not like you can look at a few lines of lambda calculus and say “Oh, this is calculating net present value, so this variable must be the interest rate. It would be easier to read if they just called it `interest_rate` instead of 42 because it’s the 42nd variable.”

Wikipedia describes de Bruijn indices differently than Vaillant:

Each De Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and denotes the number of binders that are in scope between that occurrence and its corresponding binder.

It’s not immediately obvious that these two definitions of a de Bruijn index are the same, and in fact they’re not. But the only difference is that the first definition numbers from 0 and the second numbers from 1. This post will count from 0 with Vaillant.

After rewriting lambda calculus expressions to use de Bruijn indices, Vaillant fully parenthesizes the expressions (using braces, saving parentheses for grouping, as Mathematica does) then deletes the λs: every bracketed expression starts with a λ, so the λ itself is redundant. Also, you can delete the name of the variable that the λ takes since the variable numbering takes care of that.

OK, so what does all this buy us? As Vaillant points out, this notation is more concise, and it makes some patterns easier to recognize. For example, the famous Y combinator

`    Y = λf. (λx. f (x x)) λx. f (x x)`

becomes

`    Y = [ [1 (0 0)] [1 (0 0)] ]`

The latter makes the repetition inside more apparent.

As another example, let’s look at Church encoding for numerals:

```
0 → λf. λx. x
1 → λf. λx. f x
2 → λf. λx. f (f x)
3 → λf. λx. f (f (f x))
…
```

Here’s what Church numerals would look like in the notation described above.

```
0 → [[ 0 ]]
1 → [[ 1 0 ]]
2 → [[ 1 (1 0) ]]
3 → [[ 1 (1 (1 0)) ]]
…
```

You can convert a Church numeral to its corresponding integer by adding all the numbers in the lambda calculus expression.

Vaillant’s notation is more formal than traditional lambda calculus notation, but lambda calculus is formal anyway, so you might as well carry the formality a step further if it makes things easier.

# NP vs small P

The P vs NP conjecture has always seemed a little odd to me. Or rather the interpretation of the conjecture that any problem in P is tractable. How reassuring is to know a problem can be solved in time less than some polynomial function of the size of the input if that polynomial has extremely high degree?

But this evening I ran across a fascinating comment by Avi Wigderson [1] that puts the P vs NP conjecture in a different light:

Very few known polynomial time algorithms for natural problems have exponents above 3 or 4 (even though at discovery the initial exponent may have been 30 or 40).

Problems in P may be more tractable in practice than in (current) theory. Wigderson’s comment suggests that if you can find any polynomial time algorithm, your chances are improved that you can find a small-order polynomial time algorithm. It seems there’s something deep going on here that would be hard to formalize.

## Related posts

[1] From his new book Mathematics and Computation

# Does computer science help you program?

The relationship between programming and computer science is hard to describe. Purists will say that computer science has nothing to do with programming, but that goes too far.

Computer science is about more than programming, but it’s is all motivated by getting computers to do things. With few exceptions. students major in computer science in college with the intention of becoming programmers.

I asked on Twitter yesterday how helpful people found computer science in writing software.

In a follow up tweet I said “For this poll, basic CS would be data structures and analysis of algorithms. Advanced CS is anything after that.”

So about a quarter didn’t find computer science useful, but the rest either expected it to be useful or at least found the basic theory useful.

I suspect some of those who said they haven’t found (advanced) CS theory useful don’t know (advanced) CS theory. This isn’t a knock on them. It’s only the observation that you can’t use what you aren’t at least aware of. In fact, you may need to know something quite well before you can recognize an opportunity to use it. (More on that here.)

Many programmers are in jobs where they don’t have much need for computer science theory. I thought about making that a possible choice, something like “No, but I wish I were in a job that could use more theory.” Unfortunately Twitter survey responses have to be fairly short.

Of course this isn’t a scientific survey. (Even supposedly scientific surveys aren’t that great.) People who follow the CompSciFact twitter account have an interest in computer science. Maybe people who had strong feelings about CS, such as resentment for having to study something they didn’t want to or excitement for being able to use something they find interesting, were more likely to answer the question.

# Beta reduction: The difference typing makes

Beta reduction is essentially function application. If you have a function described by what it does to x and apply it to an argument t, you rewrite the xs as ts. The formal definition of β-reduction is more complicated than this in order to account for free versus bound variables, but this informal description is sufficient for this blog post. We will first show that β-reduction holds some surprises, then explain how these surprises go away when you add typing.

Suppose you have an expression (λx.x + 2)y, which means apply to y the function that takes its argument and adds 2. Then β-reduction rewrites this expression as y + 2. In a more complicated expression, you might be able to apply β-reduction several times. When you do apply β-reduction several times, does the process always stop? And if you apply β-reduction to parts of an expression in a different order, can you get different results?

## Failure to normalize

You might reasonably expect that if you apply β-reduction enough times you eventually get an expression you can’t reduce any further. Au contraire!

Consider the expression  (λx.xx) (λx.xx).  Beta reduction says to replace each of the red xs with the expression in blue. But when we do that, we get the original expression (λx.xx) (λx.xx) back. Beta reduction gets stuck in an infinite loop.

Next consider the expression L = (λx.xxy) (λx.xxy). Applying β-reduction the first time gives  (λx.xxy) (λx.xxyy or Ly. Applying β-reduction again yields Lyy. Beta “reduction” doesn’t reduce the expression at all but makes it bigger.

The technical term for what we’ve seen is that β-reduction is not normalizing. A rewrite system is strongly normalizing if applying the rules in any order eventually terminates. It’s weakly normalizing if there’s at least some sequence of applying the rules that terminates. Beta reduction is neither strongly nor weakly normalizing in the context of (untyped) lambda calculus.

## Types to the rescue

In simply typed lambda calculus, we assign types to every variable, and functions have to take the right type of argument. This additional structure prevents examples such as those above that fail to normalize. If x is a function that takes an argument of type A and returns an argument of type B then you can’t apply x to itself. This is because x takes something of type A, not something of type function from A to B. You can prove that not only does this rule out specific examples like those above, it rules out all possible examples that would prevent β-reduction from terminating.

To summarize, β-reduction is not normalizing, not even weakly, in the context of untyped lambda calculus, but it is strongly normalizing in the context of simply typed lambda calculus.

## Confluence

Although β-reduction is not normalizing for untyped lambda calculus, the Church-Rosser theorem says it is confluent. That is, if an expression P can be transformed by β-reduction two different ways into expressions M and N, then there is an expression T such that both M and N can be reduced to T. This implies that if β-reduction does terminate, then it terminates in a unique expression (up to α-conversion, i.e. renaming bound variables). Simply typed lambda calculus is confluent as well, and so in that context we can say β-reduction always terminates in a unique expression (again up to α-conversion).

# What is calculus?

When people ask me what calculus is, my usual answer is “the mathematics of change,” studying things that change continually. Algebra is essentially static, studying things frozen in time and space. Calculus studies things that move, shapes that bend, etc. Algebra deals with things that are exact and consequently can be fragile. Calculus deals with approximation and is consequently more robust.

I’m happier with the paragraph above if you replace “calculus” with “analysis.” Analysis certainly seeks to understand and model things that change continually, but calculus per se is the mechanism of analysis.

I used to think it oddly formal for people to say “differential and integral calculus.” Is there any other kind? Well yes, yes there is, though I didn’t know that at the time. A calculus is a system of rules for computing things. Differential and integral calculus is a system of rules for calculating derivatives and integrals. Lately I’ve thought about other calculi more than differential calculus: propositional calculus, lambda calculus, calculus of inductive constructions, etc.

In my first career I taught (differential and integral) calculus and was frustrated with students who would learn how to calculate derivatives but never understood what a derivative was or what it was good for. In some sense though, they got to the core of what a calculus is. It would be better if they knew what they were calculating and how to apply it, but they still learn something valuable by knowing how to carry out the manipulations. A computer science major, for example, who gets through (differential) calculus knowing how to calculate derivatives without knowing what they are is in a good position to understand lambda calculus later.

# Primitive recursive functions and enumerable sets

The set of primitive recursive (PR) functions is the smallest set of functions of several integer arguments satisfying five axioms:

1. Constant functions are PR.
2. The function that picks the ith element of a list of n arguments is PR.
3. The successor function S(n) = n+1 is PR.
4. PR functions are closed under composition.
5. PR functions are closed under primitive recursion.

The last axiom obviously gives PR functions their name. So what is primitive recursion? Given a PR function  that takes k arguments, and another PR function g that takes k+2 arguments, the primitive recursion of f and g is a function h of k+1 arguments satisfying two properties:

1. h(0, x1, …, xk) = f(x1, …, xk)
2. h(S(y), x1, …, xk) = g(yh(yx1, … xk), x1, …, xk)

Not every computable function is primitive recursive. For example, Ackermann’s function is a general recursive function, but not a primitive recursive function. General recursive functions are Turing complete. Turing machines, lambda calculus, and general recursive functions are all equivalent models of computation, but the first two are better known.

For this post, the main thing about general recursive functions is that, as the name implies, they are more general than primitive recursive functions.

Now we switch from functions to sets. The characteristic function of a set A is the function that is 1 for elements of A and zero everywhere else. In other areas of math, there is a sort of duality between functions and sets via characteristic functions. For example, the indicator function of a measurable set is a measurable function. And the indicator function of a convex set is a convex function. But in recursive functions, there’s an unexpected wrinkle in this analogy.

A set of integers is recursively enumerable if it is either empty or the image of a general recursive function. But there’s a theorem, due to Alonzo Church, that a non-empty recursively enumerable set is actually the image of a primitive recursive function. So although general recursive functions are more general, you can’t tell that from looking at function images. For example, although the Ackermann function is not primitive recursive, there is a primitive recursive function with the same image.

# Where combinator names come from

Today I found out where the one-letter names of some functions in combinatory logic come from. I’d seen these before (for example, in To Mock a Mockingbird) but I had no idea what inspired the names.

These functions — I, K, S, T, and Z — are known as the Schönfinkel combinators, and their names are somewhat mnemonic in German. (Only somewhat. Don’t get your hopes up.)

 Definition Name Name origin λx. x I Identitätsfunktion (identity function) λx,y. x K Konstanzfunktion (constant function) λx,y,z. xz(yz) S Verschmelzungsfunktion (amalgamation function) λx,y,z. xzy T Vertauschungsfunktion (exchange function) λx,y,z. x(yz) Z Zusammensetzungsfunktion (composition function)

Source: Practical Foundations of Mathematics, footnote on page 89. Available online here.

# Seven John McCarthy papers in seven weeks

I recently ran across a series of articles from Carin Meier going through seven papers by the late computer science pioneer John McCarthy in seven weeks. Published so far:

Carin has announced that the next paper will be “First Order Theories of Individual Concepts and Propositions” but she hasn’t posted a commentary on it yet.

# How to solve supposedly intractable problems

Contrary to semi-educated belief, NP-complete problems are not necessarily intractable. Often such problems are intractable, but not always. If a problem is NP-complete, this means that there is no polynomial-time algorithm for solving the problem

1. for the worst case,
2. as the problem size grows,
3. finding exact solutions,
4. with certainty,
5. as far as anyone knows.

One way out of this (#5) is to show how to solve NP-complete problems efficiently. That may not be possible, and it hasn’t been possible so far, so we’ll not worry about this one. But that still leaves four options:

1. special cases
2. small problem sizes
3. approximate solutions
4. probabilistic solutions.

For example, the Traveling Salesman problem is NP-complete. But it’s easy to find solutions for a small number of cities by brute force. And even for a moderate number of cities, people routinely find solutions; there’s even an iPhone app for finding solutions. For larger problems, there are algorithms that yield near-optimal solutions with high probability.

Similar remarks hold for other kinds of problems deemed intractable. For example, some say you can’t find the roots of a 5th degree polynomial. Isn’t that a theorem? No. There’s a theorem that says you cannot find

1. exact solutions,
2. in general,
3. using a finite sequence of certain mathematical operations.

So there are three loop holes here:

1. approximate solutions,
2. special cases, and
3. more general operations.

If you only want to find solutions to 100 decimal places, for example, that’s doable.

Most integrals cannot be computed

1. by a typical calculus student,
2. exactly,
3. with certainty,
4. in terms of elementary functions.

But often integrals can be computed by some combination of

1. more sophisticated exact techniques,
2. numerical approximation,
3. randomized algorithms, and
4. a broader class of functions.

For example, some say that the integral of exp(-x2) cannot be computed. Of course it can, though there is not a finite combination of elementary functions that gives the exact integral. You can compute the integral exactly in terms of the error function erf(x), or compute it numerically to any desired accuracy. You could even approximate the integral with a Monte Carlo method, though there’s no point in using that approach here. For many high-dimensional integration problems, some sort of Monte Carlo is the only practical approach.

Maybe you need to solve a problem for which none of the loop holes given here apply. That’s often the case. I’m not saying that there’s always a way out, only that sometimes there is. Sometimes problems thought to be intractable are not.

# Collatz 3n + 1 conjecture possibly solved

Gerhard Opfer has posted a paper that claims to resolve the famous Collatz conjecture.

Start with a positive number n and repeatedly apply these simple rules:

1. If n = 1, stop.
2. If n is even, divide n by 2.
3. If n is odd, multiply n by 3 and add 1.

In 1937, Lothar Collatz asked whether this procedure always stops for every positive starting value of n. If Gerhard Opfer is correct, we can finally say that indeed it always stops.

Update: It appears there’s a flaw in the proof. See discussion here. Perhaps the gap can be filled in, or perhaps an idea in the paper can be of use somewhere else.

Update (September 10. 2019): As of this date, the full Collatz conjecture remains unsolved, but Terence Tao has just posted a paper chipping away at the problem.

Related post: Easy to guess, hard to prove