Does computer science help you program?

The relationship between programming and computer science is hard to describe. Purists will say that computer science has nothing to do with programming, but that goes too far.

Computer science is about more than programming, but it’s is all motivated by getting computers to do things. With few exceptions. students major in computer science in college with the intention of becoming programmers.

I asked on Twitter yesterday how helpful people found computer science in writing software.

In a follow up tweet I said “For this poll, basic CS would be data structures and analysis of algorithms. Advanced CS is anything after that.”

So about a quarter didn’t find computer science useful, but the rest either expected it to be useful or at least found the basic theory useful.

I suspect some of those who said they haven’t found (advanced) CS theory useful don’t know (advanced) CS theory. This isn’t a knock on them. It’s only the observation that you can’t use what you aren’t at least aware of. In fact, you may need to know something quite well before you can recognize an opportunity to use it. (More on that here.)

Many programmers are in jobs where they don’t have much need for computer science theory. I thought about making that a possible choice, something like “No, but I wish I were in a job that could use more theory.” Unfortunately Twitter survey responses have to be fairly short.

Of course this isn’t a scientific survey. (Even supposedly scientific surveys aren’t that great.) People who follow the CompSciFact twitter account have an interest in computer science. Maybe people who had strong feelings about CS, such as resentment for having to study something they didn’t want to or excitement for being able to use something they find interesting, were more likely to answer the question.

 

Beta reduction: The difference typing makes

Beta reduction is essentially function application. If you have a function described by what it does to x and apply it to an argument t, you rewrite the xs as ts. The formal definition of β-reduction is more complicated than this in order to account for free versus bound variables, but this informal description is sufficient for this blog post. We will first show that β-reduction holds some surprises, then explain how these surprises go away when you add typing.

Suppose you have an expression (λx.x + 2)y, which means apply to y the function that takes its argument and adds 2. Then β-reduction rewrites this expression as y + 2. In a more complicated expression, you might be able to apply β-reduction several times. When you do apply β-reduction several times, does the process always stop? And if you apply β-reduction to parts of an expression in a different order, can you get different results?

Failure to normalize

You might reasonably expect that if you apply β-reduction enough times you eventually get an expression you can’t reduce any further. Au contraire!

Consider the expression  (λx.xx) (λx.xx).  Beta reduction says to replace each of the red xs with the expression in blue. But when we do that, we get the original expression (λx.xx) (λx.xx) back. Beta reduction gets stuck in an infinite loop.

Next consider the expression L = (λx.xxy) (λx.xxy). Applying β-reduction the first time gives  (λx.xxy) (λx.xxyy or Ly. Applying β-reduction again yields Lyy. Beta “reduction” doesn’t reduce the expression at all but makes it bigger.

The technical term for what we’ve seen is that β-reduction is not normalizing. A rewrite system is strongly normalizing if applying the rules in any order eventually terminates. It’s weakly normalizing if there’s at least some sequence of applying the rules that terminates. Beta reduction is neither strongly nor weakly normalizing in the context of (untyped) lambda calculus.

Types to the rescue

In simply typed lambda calculus, we assign types to every variable, and functions have to take the right type of argument. This additional structure prevents examples such as those above that fail to normalize. If x is a function that takes an argument of type A and returns an argument of type B then you can’t apply x to itself. This is because x takes something of type A, not something of type function from A to B. You can prove that not only does this rule out specific examples like those above, it rules out all possible examples that would prevent β-reduction from terminating.

To summarize, β-reduction is not normalizing, not even weakly, in the context of untyped lambda calculus, but it is strongly normalizing in the context of simply typed lambda calculus.

Confluence

Although β-reduction is not normalizing for untyped lambda calculus, the Church-Rosser theorem says it is confluent. That is, if an expression P can be transformed by β-reduction two different ways into expressions M and N, then there is an expression T such that both M and N can be reduced to T. This implies that if β-reduction does terminate, then it terminates in a unique expression (up to α-conversion, i.e. renaming bound variables). Simply typed lambda calculus is confluent as well, and so in that context we can say β-reduction always terminates in a unique expression (again up to α-conversion).

What is calculus?

When people ask me what calculus is, my usual answer is “the mathematics of change,” studying things that change continually. Algebra is essentially static, studying things frozen in time and space. Calculus studies things that move, shapes that bend, etc. Algebra deals with things that are exact and consequently can be fragile. Calculus deals with approximation and is consequently more robust.

I’m happier with the paragraph above if you replace “calculus” with “analysis.” Analysis certainly seeks to understand and model things that change continually, but calculus per se is the mechanism of analysis.

I used to think it oddly formal for people to say “differential and integral calculus.” Is there any other kind? Well yes, yes there is, though I didn’t know that at the time. A calculus is a system of rules for computing things. Differential and integral calculus is a system of rules for calculating derivatives and integrals. Lately I’ve thought about other calculi more than differential calculus: propositional calculus, lambda calculus, calculus of inductive constructions, etc.

In my first career I taught (differential and integral) calculus and was frustrated with students who would learn how to calculate derivatives but never understood what a derivative was or what it was good for. In some sense though, they got to the core of what a calculus is. It would be better if they knew what they were calculating and how to apply it, but they still learn something valuable by knowing how to carry out the manipulations. A computer science major, for example, who gets through (differential) calculus knowing how to calculate derivatives without knowing what they are is in a good position to understand lambda calculus later.

Primitive recursive functions and enumerable sets

The set of primitive recursive (PR) functions is the smallest set of functions of several integer arguments satisfying five axioms:

  1. Constant functions are PR.
  2. The function that picks the ith element of a list of n arguments is PR.
  3. The successor function S(n) = n+1 is PR.
  4. PR functions are closed under composition.
  5. PR functions are closed under primitive recursion.

The last axiom obviously gives PR functions their name. So what is primitive recursion? Given a PR function  that takes k arguments, and another PR function g that takes k+2 arguments, the primitive recursion of f and g is a function h of k+1 arguments satisfying two properties:

  1. h(0, x1, …, xk) = f(x1, …, xk)
  2. h(S(y), x1, …, xk) = g(yh(yx1, … xk), x1, …, xk)

Not every computable function is primitive recursive. For example, Ackermann’s function is a general recursive function, but not a primitive recursive function. General recursive functions are Turing complete. Turing machines, lambda calculus, and general recursive functions are all equivalent models of computation, but the first two are better known.

For this post, the main thing about general recursive functions is that, as the name implies, they are more general than primitive recursive functions.

Now we switch from functions to sets. The characteristic function of a set A is the function that is 1 for elements of A and zero everywhere else. In other areas of math, there is a sort of duality between functions and sets via characteristic functions. For example, the indicator function of a measurable set is a measurable function. And the indicator function of a convex set is a convex function. But in recursive functions, there’s an unexpected wrinkle in this analogy.

A set of integers is recursively enumerable if it is either empty or the image of a general recursive function. But there’s a theorem, due to Alonzo Church, that a non-empty recursively enumerable set is actually the image of a primitive recursive function. So although general recursive functions are more general, you can’t tell that from looking at function images. For example, although the Ackermann function is not primitive recursive, there is a primitive recursive function with the same image.

Where combinator names come from

Today I found out where the one-letter names of some functions in combinatory logic come from. I’d seen these before (for example, in To Mock a Mockingbird) but I had no idea what inspired the names.

These functions — I, K, S, T, and Z — are known as the Schönfinkel combinators, and their names are somewhat mnemonic in German. (Only somewhat. Don’t get your hopes up.)

DefinitionNameName origin
λx. xIIdentitätsfunktion (identity function)
λx,y. xKKonstanzfunktion (constant function)
λx,y,z. xz(yz)SVerschmelzungsfunktion (amalgamation function)
λx,y,z. xzyTVertauschungsfunktion (exchange function)
λx,y,z. x(yz)ZZusammensetzungsfunktion (composition function)

Source: Practical Foundations of Mathematics, footnote on page 89. Available online here.

If you’re not familiar with the notation in the function definitions, see this introduction to lambda calculus.

* * *

For a daily dose of computer science and related topics, follow @CompSciFact on Twitter.

CompSciFact twitter icon

Seven John McCarthy papers in seven weeks

I recently ran across a series of articles from Carin Meier going through seven papers by the late computer science pioneer John McCarthy in seven weeks. Published so far:

Prologue
#1: Ascribing Mental Qualities to Machines
#2: Towards a Mathematical Science of Computation

Carin has announced that the next paper will be “First Order Theories of Individual Concepts and Propositions” but she hasn’t posted a commentary on it yet.

How to solve supposedly intractable problems

Contrary to semi-educated belief, NP-complete problems are not necessarily intractable. Often such problems are intractable, but not always. If a problem is NP-complete, this means that there is no polynomial-time algorithm for solving the problem

  1. for the worst case,
  2. as the problem size grows,
  3. finding exact solutions,
  4. with certainty,
  5. as far as anyone knows.

One way out of this (#5) is to show how to solve NP-complete problems efficiently. That may not be possible, and it hasn’t been possible so far, so we’ll not worry about this one. But that still leaves four options:

  1. special cases
  2. small problem sizes
  3. approximate solutions
  4. probabilistic solutions.

For example, the Traveling Salesman problem is NP-complete. But it’s easy to find solutions for a small number of cities by brute force. And even for a moderate number of cities, people routinely find solutions; there’s even an iPhone app for finding solutions. For larger problems, there are algorithms that yield near-optimal solutions with high probability.

Similar remarks hold for other kinds of problems deemed intractable. For example, some say you can’t find the roots of a 5th degree polynomial. Isn’t that a theorem? No. There’s a theorem that says you cannot find

  1. exact solutions,
  2. in general,
  3. using a finite sequence of certain mathematical operations.

So there are three loop holes here:

  1. approximate solutions,
  2. special cases, and
  3. more general operations.

If you only want to find solutions to 100 decimal places, for example, that’s doable.

Most integrals cannot be computed

  1. by a typical calculus student,
  2. exactly,
  3. with certainty,
  4. in terms of elementary functions.

But often integrals can be computed by some combination of

  1. more sophisticated exact techniques,
  2. numerical approximation,
  3. randomized algorithms, and
  4. a broader class of functions.

For example, some say that the integral of exp(-x2) cannot be computed. Of course it can, though there is not a finite combination of elementary functions that gives the exact integral. You can compute the integral exactly in terms of the error function erf(x), or compute it numerically to any desired accuracy. You could even approximate the integral with a Monte Carlo method, though there’s no point in using that approach here. For many high-dimensional integration problems, some sort of Monte Carlo is the only practical approach.

Maybe you need to solve a problem for which none of the loop holes given here apply. That’s often the case. I’m not saying that there’s always a way out, only that sometimes there is. Sometimes problems thought to be intractable are not.

Click to find out more about consulting for numerical computing

 

Collatz 3n + 1 conjecture possibly solved

Gerhard Opfer has posted a paper that claims to resolve the famous Collatz conjecture.

Start with a positive number n and repeatedly apply these simple rules:

  1. If n = 1, stop.
  2. If n is even, divide n by 2.
  3. If n is odd, multiply n by 3 and add 1.

In 1937, Lothar Collatz asked whether this procedure always stops for every positive starting value of n. If Gerhard Opfer is correct, we can finally say that indeed it always stops.

Update: It appears there’s a flaw in the proof. See discussion here. Perhaps the gap can be filled in, or perhaps an idea in the paper can be of use somewhere else.

Related post: Easy to guess, hard to prove

Math/CS cheat sheet

Here’s something called a theoretical computer science cheat sheet. I don’t know whether I agree with the name, but it’s a nice cheat sheet.

The first two pages of the cheat sheet have to do with sums, combinatorics, and recurrence relations, the kinds of things you’d find in Concrete Mathematics and certainly useful in theoretical computer science. The chart mentions the “master method” that I blogged about here. But a large part of the cheat sheet is devoted to calculus and other general mathematics not as directly related to computer science.

There’s a nice summary of finite calculus on page 8, a useful little area of math that’s not very widely known.

Thanks to Bill the Lizard for pointing out the cheat sheet.

Counterfeit coins and rare diseases

Here’s a puzzle I saw a long time ago that came to mind recently.

You have a bag of 27 coins. One of these coins is counterfeit and the rest are genuine. The genuine coins all weigh exactly the same, but the counterfeit coin is lighter. You have a simple balance. How can you find the counterfeit coin while using the scale the least number of times?

The surprising answer is that the false coin can be found while only using the scales only three times. Here’s how. Put nine coins on each side of the balance. If one side is lighter, the counterfeit is on that side; otherwise, it is one of the nine not on the scales. Now that you’ve narrowed it down to nine coins, apply the same idea recursively by putting three of the suspect coins on each side of the balance. The false coin is now either on the lighter side if the scales do not balance or one of the three remaining coins if the scales do balance. Now apply the same idea one last time to find which of the remaining three coins is the counterfeit. In general, you can find one counterfeit in 3n coins by using the scales n times.

The counterfeit coin problem came to mind when I was picking out homework problems for a probability class and ran into the following (problem 4.56 here):

A large number, N = mk, of people are subject to a blood test. This can be administered in two ways.

  1. Each person can be tested separately. In this case N tests are required.
  2. The blood samples of k people can be pooled and analyzed together. If the test is negative, this one test suffices for k people. If the test is positive, each of the k people must be tested separately, and, in all, k+1 test are required for the k people.

Suppose each person being tested has the disease with probability p. If the disease is rare, i.e. p is sufficiently small, the second approach will be more efficient. Consider the extremes. If p = 0, the first approach takes mk tests and the second approach takes only m tests. At the other extreme, if p = 1, the first approach still takes mk tests but the second approach now takes m(k+1) tests.

The homework problem asks for the expected number of tests used with each approach as a function of p for fixed k. Alternatively, you could assume that you always use the second method but need to find the optimal value of k. (This includes the possibility that k=1, which is equivalent to using the first method.)

I’d be curious to know whether these algorithms have names. I suspect computer scientists have given the coin testing algorithm a name. I also suspect the idea of pooling blood samples has several names, possibly one name when it is used in literally testing blood samples and other names when the same idea is applied to analogous testing problems.

* * *

For daily posts on probability, follow @ProbFact on Twitter.

ProbFact twitter icon

The cost of breaking things down and putting them back together

The basic approach to solving large, complex problems is to break them down into smaller problems then re-assemble the solutions. The total cost of a solution has to account for not only the cost of solving the all the sub-problems, but also the cost of dividing the original problem into sub-problems and the cost of assembling the solutions to the sub-problems into a solution to the original problem. For example, the cost of writing a program is not simply the cost of each of the subroutines that comprise the final program. The cost has to include the cost of dividing the original task into subroutines and the cost of integrating these subroutines into the final product.

There is a theorem in computer science that quantifies the fuzzy statements above.

Let a ≥ 1 and b > 1 be constants. Let T(n) be defined on non-negative integers by the recurrence T(n) = a T(n/b) + f(n). In applications, T(n) is the total cost of solving a problem of size n. The recurrence relation says we have a way of breaking the original problem into b sub-problems of size n/b, each of which takes a units of work to solve. The cost of breaking the original problem into sub-problems and assembling the solutions into the final solution is described by f(n).

The theorem, which Introduction to Algorithms (ISBN 0262032937) calls the “master theorem,” says that the total cost T(n) of the solution can be bound according to three separate cases.

  1. If f(n) = O(nlogba – ε) for some constant ε > 0 then T(n) = O(nlogba).
  2. If f(n) = T(nlogba), then T(n) = T(nlogba lg(n)).
  3. If f(n) = (nlogba + ε) for some constant ε > 0, and if a f(n/b) = c f(n) for some constant c < 1 and for all sufficiently large n, then T(n) = O(f(n)).

The notation O, Ω, and Θ is explained in these notes on asymptotic order notation.

The main point here is that the term f(n), the one we tend to think less about, is the most important term in the theorem. T(n), the total time for the solution, breaks into three cases depending on whether grows at a rate slower, equal to, or faster than a multiple of nlogba.

Strictly speaking, the theorem applies to the cost (operation count) of running a program. However, you can draw an analogy from the theorem to the cost of developing a program as well. You have to consider the cost of the human subdividing the original problem (design) and the cost of making the pieces work together (integration).

* * *

For a daily dose of computer science and related topics, follow @CompSciFact on Twitter.

CompSciFact twitter icon