Why is the word problem hard?

This post is about the word problem. When mathematicians talk about “the word problem” we’re not talking about elementary math problems expressed in prose, such as “If Johnny has three apples, ….”

The word problem in algebra is to decide whether two strings of symbols are equivalent given a set of algebraic rules. I go into a longer description here.

I know that the word problem is hard, but I hadn’t given much thought to why it is hard. It has always seemed more like a warning sign than a topic to look into.

“Why don’t we just …?”

“That won’t work because … is equivalent to the word problem.”

In logic terminology, the word problem is undecidable. There can be no algorithm to solve the word problem in general, though there are algorithms for special cases.

So why is the word problem undecidable? As with most (all?) things that have been shown to be undecidable, the word problem is undecidable because the halting problem is undecidable. If there were an algorithm that could solve the word problem in general, you could use it to solve the halting problem. The halting problem is to write a program that can determine whether another arbitrary program will always terminate, and it can be shown that no such program can exist.

The impulse for writing this post came from stumbling across Keith Conrad’s expository article Why word problems are hard. His article goes into some of the algebraic background of the problem—free groups, (in)finitely generated groups, etc.—and gives some flavor for why the word problem is harder than it looks. The bottom line is that the word problem is hard because the halting problem is hard, but Conrad’s article gives you a little more of an idea what’s going on that just citing a logic theorem.

I still don’t have a cocktail-party answer for why the word problem is hard. Suppose a bright high school student who had heard of the word problem were at a cocktail party (drinking a Coke, of course) and asked why the word problem is hard. Suppose also that this student had not heard of the halting problem. Would the simplest explanation be to start by explaining the halting problem?

Suppose we change the setting a bit. You’re teaching a group theory course and the students know about groups and generators, but not about the halting problem, how might you give them a feel for why the word problem is hard? You might ask them to read Keith Conrad’s paper and point out that it shows that simpler problems than the word problem are harder than they seem at first.

Related posts

Logic in moral terminology

I got an email from Fr. John Rickert today, and with his permission I’ll share part of it here.

A sin of commission occurs when we do something we should not do. A system is consistent (or maybe I should say “sound”) if the results of proofs really are true. Gödel’s 2nd Incompleteness Theorem says that it is undecidable whether Peano Arithmetic commits any “sins of commission.”

A sin of omission occurs when we fail to do something that we should do. A system is complete if every true statement actually has a proof (in finitely many steps). Gödel’s 1st Incompleteness Theorem says that Peano Arithmetic does commit some “sins of omission”: There are truths that cannot be proved.

Finally, a conscience is perplexed if it does not know whether to do or refrain from a proposed action; the conscience is de facto in a state of invincible ignorance. Undecidability is invincible ignorance.

Of course a formal system isn’t under any moral obligations, and certainly not under obligation to do what it cannot do. These are just analogies. But they are interesting analogies. Sins of commission and omission, things done and things left undone, are more verbally parallel than completeness and soundness.

Here’s another post based on an email exchange with Fr. Rickert exactly one year ago: Unexpected square wave.

When are formal proofs worth the effort?

Formal verification of theorems takes a lot of work. And it takes more work where it is least needed. But the good news is that it takes less effort in contexts where it is needed most.

Years of effort have gone into formally verifying the proofs of theorems that no one doubted were correct. For example, a team worked for six years to formally verify the proof of the odd-order theorem in group theory.

Mathematical proofs are fallible, but it’s rare for a proof to be accepted that reaches a wrong conclusion. Flaws in proofs are much more likely to be gaps in reasoning than to be arguments in favor of false statements. And the more people who study a proof, the more likely it is that flaws will be found and fixed.

The return on investment for formal verification is highest for theorems that are the opposite of the odd-order theorem, theorems that are obscure and shallow rather than famous and deep.

For example, suppose you want to prove that a complex set of security protocols doesn’t have any gaps. Your problem is obscure in that only your company cares about it. You don’t have the assurance that would come from thousands of mathematicians around the world reviewing your work.

But on the plus side, your security rules are logically shallow. These rules may be too complicated to hold in your head, but they can be formally specified with far less effort than some mathematical object like a Möbius strip. In general, discrete things like if-then rules are much easier to formalize than continuous things like real numbers.

Mathematicians choose to study things they have some intuition for, and that intuition is often correct even when the reasoning around it is logically incomplete. But businesses are often faced with complex problems they did not chose and problems for which no one has much intuition, such as proving that a complex circuit does what it’s supposed to do, or that an encryption protocol didn’t neglect some edge case. That’s where the ROI on formal methods is greatest.

Minimizing random Boolean expressions

The previous post looked at all Boolean expressions on three or four variables and how much they can be simplified. The number of Boolean expressions on n variables is

2^{2^n}

and so the possibilities explode as n increases. We could do n = 3 and 4, but 5 would be a lot of work, and 6 is out of the question.

So we do what we always do when a space is too big to explore exhaustively: we explore at random.

The Python module we’ve been using, qm, specifies a function of n Boolean variables in terms of the set of product terms on which the function evaluates to 1. These product terms can be encoded as integers, and so a Boolean function of n variables corresponds to a subset of the integers 0 through 2n – 1.

We can generate a subset of these numbers by generating a random mask consisting of 0s and 1s, and keeping the numbers where the mask value is 1. We could do this with code like the following.

     N= 2**n
     x = np.arange(N)
     mask = np.random.randint(2, size=N)
     ones = set(mask*x)

There’s a small problem with this approach: the set ones always contains 0. We want it to contain 0 if and only if the 0th mask value is a 1.

The following code generates a Boolean expression on n variables, simplifies it, and returns the length of the simplified expression [1].

    def random_sample(n):
        N = 2**n
        x = np.arange(N)
        mask = np.random.randint(2, size=N)
        ones = set(mask*x)
        if mask[0] == 0:
            ones.remove(0)
        return len(qm(ones=ones, dc={}))

We can create several random samples and make a histogram with the following code.

    def histogram(n, reps):
        counts = np.zeros(2**n+1, dtype=int)
        for _ in range(reps):
            counts[random_sample(n)] += 1
        return counts

The data in the following graph comes from calling histogram(5, 1000).

data = [0, 0, 0, 0, 4, 44, 145, 339, 296, 128, 38, 5, 0, 1]

Note that the length of the random expressions is distributed symmetrically around 16 (half of 25). So minimization turns a distribution centered around 16 into a distribution centered around 8.

The code is slow because the Quine-McCluskey algorithm is slow, and our Python implementation of the algorithm isn’t as fast as it could be. But Boolean minimization is an NP problem, so no exact algorithm is going to scale well. To get faster results, we could switch to something like the Expresso Heuristic Logic Minimizer, which often gets close to a minimum expression.

***

[1] The code above will fail if the set of terms where the function is 1 is empty. However this is extremely unlikely: we’d expect it to happen once in every 2^(2^n) times and so when n = 5 this is less than one time in four billion. The fully correct approach would be to call qm with zeros=x when ones is empty.

How much can Boolean expressions be simplified?

In the previous post we looked at how to minimize Boolean expressions using a Python module qm. In this post we’d like to look at how much the minimization process shortens expressions.

Witn n Boolean variables, you can create 2^n terms that are a product of distinct variables. You can specify a Boolean function by specifying the subset of such terms on which it takes the value 1, and so there are 2^(2^n) Boolean functions on n variables. For very small values of n we can minimize every possible Boolean function.

To do this, we need a way to iterate through the power set (set of all subsets) of the integers up to 2^n. Here’s a function to do that, borrowed from itertools recipes.

    from itertools import chain, combinations
    def powerset(iterable):
        xs = list(iterable)
        return chain.from_iterable(
            combinations(xs, n) for n in range(len(xs) + 1))

Next, we use this code to run all Boolean functions on 3 variables through the minimizer. We use a matrix to keep track of how long the input expressions are and how long the minimized expressions are.

    from numpy import zeros
    from qm import q 

    n = 3
    N = 2**n
    tally = zeros((N,N), dtype=int)
    for p in powerset(range(N)):
        if not p:
            continue # qm can't take an empty set
        i = len(p)
        j = len(qm(ones=p, dc={}))
        tally[i-1, j-1] += 1 

Here’s a table summarizing the results [1].

The first column gives the number of product terms in the input expression and the subsequent columns give the number of product terms in the output expressions.

For example, of the expressions of length 2, there were 12 that could be reduced to expressions of length 1 but the remaining 16 could not be reduced. (There are 28 possible input expressions of length 2 because there are 28 ways to choose 2 items from a set of 8 things.)

There are no nonzero values above the main diagonal, i.e. no expression got longer in the process of minimization. Of course that’s to be expected, but it’s reassuring that nothing went obviously wrong.

We can repeat this exercise for expressions in 4 variables by setting n = 4 in the code above. This gives the following results.

We quickly run into a wall as n increases. Not only does the Quine-McCluskey algorithm take about twice as long every time we add a new variable, the number of possible Boolean functions grows even faster. There were 2^(2^3) = 256 possibilities to explore when n = 3, and 2^(2^4) = 65,536 when n = 4.

If we want to explore all Boolean functions on five variables, we need to look at 2^(2^5) = 4,294,967,296 possibilities. I estimate this would take over a year on my laptop. The qm module could be made more efficient, and in fact someone has done that. But even if you made the code a billion times faster, six variables would still be out of the question.

To explore functions of more variables, we need to switch from exhaustive enumeration to random sampling. I may do that in a future post. (Update: I did.)

***

[1] The raw data for the tables presented as images is available here.

Minimizing boolean expressions

This post will look at how to take an expression for a Boolean function and look for a simpler expression that corresponds to the same function. We’ll show how to use a Python implementation of the Quine-McCluskey algorithm.

Notation

We will write AND like multiplication, OR like addition, and use primes for negation. For example,

wx + z

denotes

(w AND x) OR (NOT z).

Minimizing expressions

You may notice that the expression

wxz + wxz

can be simplified to wz, for example, but it’s not feasible to simplify complicated expressions without a systematic approach.

One such approach is the Quine-McCluskey algorithm. Its run time increases exponentially with the problem size, but for a small number of terms it’s quick enough [1]. We’ll show how to use the Python module qm which implements the algorithm.

Specifying functions

How are you going to pass a Boolean expression to a Python function? You could pass it an expression as a string and expect the function to parse the string, but then you’d have to specify the grammar of the little language you’ve created. Or you could pass in an actual Python function, which is more work than necessary, especially if you’re going to be passing in a lot of expressions.

A simpler way is pass in the set of places where the function evaluates to 1, encoded as numbers.

For example, suppose your function is

wxyz + wxyz

This function evaluates to 1 when either the first term evaluates to 1 or the second term evaluates to 1. That is, when either

(w, x, y, z) = (1, 1, 0, 1)

or

(w, x, y, z) = (0, 1, 1, 0).

Interpreting the left sides as binary numbers, you could specify the expression with the set {13, 6} which describes where the function is 1.

If you prefer, you could express your numbers in binary to make the correspondence to terms more explicit, i.e. {0b1101,0b110}.

Using qm

One more thing before we use qm: your Boolean expression might not be fully specified. Maybe you want it to be 1 on some values, 0 on others, and you don’t care what it equals on the rest.

The qm module lets you specify these with arguments ones, zeroes, and dc. If you specify two out of these three sets, qm will infer the third one.

For example, in the code below

    from qm import qm
    print(qm(ones={0b111, 0b110, 0b1101}, dc={}))

we’re asking qm to minimize the expression

xyz + xyz‘ + wxyz.

Since the don’t-care set is empty, we’re saying our function equals 0 everywhere we haven’t said that it equals 1. The function prints

    ['1101', '011X']

which corresponds to

wxyz + wxy,

the X meaning that the fourth variable, z, is not part of the second term.

Note that the minimized expression is not unique: we could tell by inspection that

xyz + xyz‘ + wxyz.

could be reduced to

xy + wxyz.

Also, our code defines a minimum expression to be one with the fewest sums. Both simplifications in this example have two sums. But xy + wxyz is simpler than wxyz + wxy in the sense of having one less term, so there’s room for improvement, or at least discussion, as to how to quantify the complexity of an expression.

In the next post I use qm to explore how much minimization reduces the size of Boolean expressions.

***

[1] The Boolean expression minimization problem is in NP, and so no known algorithm that always produces an exact answer will scale well. But there are heuristic algorithms like Espresso and its variations that usually provide optimal or near-optimal results.

Rotating symbols in LaTeX

Linear logic uses an unusual symbol, an ampersand rotated 180 degrees, for multiplicative disjunction.

\invamp

The symbol is U+214B in Unicode.

I was looking into how to produce this character in LaTeX when I found that the package cmll has two commands that produce this character, one semantic and one descriptive: \parr and \invamp [1].

This got me to wondering how you might create a symbol like the one above if there wasn’t one built into a package. You can do that by using the graphicx package and the \rotatebox command. Here’s how you could roll your own par operator:

    \rotatebox[origin=c]{180}{\&}

There’s a backslash in front of the & because it’s a special character in LaTeX. If you wanted to rotate a K, for example, there would be no need for a backslash.

The \rotatebox command can rotate any number of degrees, and so you could rotate an ampersand 30° with

    \rotatebox[origin=c]{30}{\&}

to produce a tilted ampersand.

\invamp

Related posts

[1] The name \parr comes from the fact that the operator is sometimes pronounced “par” in linear logic. (It’s not simply \par because LaTeX already has a command \par for inserting a paragraph break.)

The name \invamp is short for “inverse ampersand.” Note however that the symbol is not an inverted ampersand in the sense of being a reflection; it is an ampersand rotated 180°.

Following an idea to its logical conclusion

Following an idea to its logical conclusion might be extrapolating a model beyond its valid range.

Suppose you have a football field with area A. If you make two parallel sides twice as long, then the area will be 2A. If you double the length of the sides again, the area will be 4A. Following this reason to its logical conclusion, you could double the length of the sides as many times as you wish, say 15 times, and each time the area doubles.

Except that’s not true. By the time you’ve doubled the length of the sides 15 times, you have a shape so big that it is far from being a rectangle. The fact that Earth is round matters a lot for figure that big.

Euclidean geometry models our world really well for rectangles the size of a football field, or even rectangles the size of Kansas. But eventually it breaks down. If the top extends to the north pole, your rectangle becomes a spherical triangle.

The problem in this example isn’t logic; it’s geometry. If you double the length of the sides of a Euclidean rectangle 15 times, you do double the area 15 times. A football field is not exactly a Euclidean rectangle, though it’s close enough for all practical purposes. Even Kansas is a Euclidean rectangle for most practical purposes. But a figure on the surface of the earth with sides thousands of miles long is definitely not Euclidean.

Models are based on experience with data within some range. The surprising thing about Newtonian physics is not that it breaks down at a subatomic scale and at a cosmic scale. The surprising thing is that it is usually adequate for everything in between.

Most models do not scale up or down over anywhere near as many orders of magnitude as Euclidean geometry or Newtonian physics. If a dose-response curve, for example, is linear for based on observations in the range of 10 to 100 milligrams, nobody in his right mind would expect the curve to remain linear for doses up to a kilogram. It wouldn’t be surprising to find out that linearity breaks down before you get to 200 milligrams.

“Any sufficiently advanced logic is indistinguishable from stupidity.” — Alex Tabarrok

Related posts

Curry-Howard-Lambek correspondence

Curry-Howard-Lambek is a set of correspondences between logic, programming, and category theory. You may have heard of the slogan proofs-as-programs or propositions-as-types. These refer to the Curry-Howard correspondence between mathematical proofs and programs. Lambek’s name is appended to the Curry-Howard correspondence to represent connections to category theory.

The term Curry-Howard isomorphism is often used but is an overstatement. Logic and programming are not isomorphic, nor is either isomorphic to category theory.

You’ll also hear the term computational trinitarianism. That may be appropriate because logic, programming, and category theory share common attributes without being identical.

The relations between logic, programming, and categories are more than analogies, but less than isomorphisms.

There are formal correspondences between parts of each theory. And aside from these rigorous connections, there is also a heuristic that something interesting in one area is likely to have an interesting counterpart in the other areas. In that sense Curry-Howard-Lambek is similar to the Langlands program relating number theory and geometry, a set of theorems and conjectures as well as a sort of philosophy.

Logic and applications Twitter account

I stopped posting to the @FormalFact Twitter account last July, but I didn’t deactivate the account. Now I’m going to restart it.

Unlike my other Twitter accounts, I don’t plan to have a regular posting schedule. I may not post often. We’ll see how it goes.

LogicPractice icon

I’ve changed the account name from @FormalFact to @LogicPractice. The “formal” part of the original name referred to formal theorem proving, the initial focus of the account. The new name reflects a focus on logic more generally, and practical applications of logic that are less laborious than formal theorem proving.