How much can Boolean expressions be simplified?

In the previous post we looked at how to minimize Boolean expressions using a Python module qm. In this post we’d like to look at how much the minimization process shortens expressions.

Witn n Boolean variables, you can create 2^n terms that are a product of distinct variables. You can specify a Boolean function by specifying the subset of such terms on which it takes the value 1, and so there are 2^(2^n) Boolean functions on n variables. For very small values of n we can minimize every possible Boolean function.

To do this, we need a way to iterate through the power set (set of all subsets) of the integers up to 2^n. Here’s a function to do that, borrowed from itertools recipes.

    from itertools import chain, combinations
    def powerset(iterable):
        xs = list(iterable)
        return chain.from_iterable(
            combinations(xs, n) for n in range(len(xs) + 1))

Next, we use this code to run all Boolean functions on 3 variables through the minimizer. We use a matrix to keep track of how long the input expressions are and how long the minimized expressions are.

    from numpy import zeros
    from qm import q 

    n = 3
    N = 2**n
    tally = zeros((N,N), dtype=int)
    for p in powerset(range(N)):
        if not p:
            continue # qm can't take an empty set
        i = len(p)
        j = len(qm(ones=p, dc={}))
        tally[i-1, j-1] += 1 

Here’s a table summarizing the results [1].

The first column gives the number of product terms in the input expression and the subsequent columns give the number of product terms in the output expressions.

For example, of the expressions of length 2, there were 12 that could be reduced to expressions of length 1 but the remaining 16 could not be reduced. (There are 28 possible input expressions of length 2 because there are 28 ways to choose 2 items from a set of 8 things.)

There are no nonzero values above the main diagonal, i.e. no expression got longer in the process of minimization. Of course that’s to be expected, but it’s reassuring that nothing went obviously wrong.

We can repeat this exercise for expressions in 4 variables by setting n = 4 in the code above. This gives the following results.

We quickly run into a wall as n increases. Not only does the Quine-McCluskey algorithm take about twice as long every time we add a new variable, the number of possible Boolean functions grows even faster. There were 2^(2^3) = 256 possibilities to explore when n = 3, and 2^(2^4) = 65,536 when n = 4.

If we want to explore all Boolean functions on five variables, we need to look at 2^(2^5) = 4,294,967,296 possibilities. I estimate this would take over a year on my laptop. The qm module could be made more efficient, and in fact someone has done that. But even if you made the code a billion times faster, six variables would still be out of the question.

To explore functions of more variables, we need to switch from exhaustive enumeration to random sampling. I may do that in a future post. (Update: I did.)

***

[1] The raw data for the tables presented as images is available here.