Machine learning by satisfiability solving

Define B = {0, 1} and a Boolean function fp: BN → B where p is a Boolean parameter vector in Bn. Consider that fp(x) can be represented as a Boolean expression whose variables are the entries of vectors p and x. Assume that c is the cost of computing fp(x) measured in some way, for example, the number of operator evaluations based on some complete set of Boolean operators. Then, given y in B and x in BN, the cost to solve the equation fp(x) = y for satisfying y has cost c2n, using the most naïve brute force search.

For 3SAT solving, a much better worst case bound is known, namely O(1.307n), see here, for a related discussion see here. For the inexactly-defined class of “industrial problems,” performance in practice is often much better; for a discussion see here.

Now consider a set of Boolean feature vectors xi and labels yi. for i in {1, …, d}. One can now solve fp(xi) = yi for all i. Since the number of unknown variables to be solved for is unchanged, the naïve bound on computational cost is cd2n, scaling linearly in the number of data values d. Note this is tractable if the phenomenon in question can be modeled by a small number of logical parameters n.

In practice, one generally does not solve a machine learning problem exactly but approximately, minimizing a loss function. One can use the AtLeast operator in a SAT solver like Z3 to require that fp(xi) = yi be satisfied for at least K values of i, for some K. One can then find the maximal such K by performing bisection on K, requiring log2(d) such SAT solves. On exact solvers for this problem see also here.

The AtLeast operator can be implemented by either binary adder / totalizer encoding (Bailleux and Boufkhad 2003), sequential counter encoding (Sinz 2005), or Batcher sorting network approach (Abío et al. 2013). Unfortunately, all these methods require adding significant numbers of auxiliary variables, adversely affecting the naïve complexity bound. However, one can hope that performance is much better than this bound for industrial problems, as is often the case in practice. Furthermore, randomized approximation algorithms known to run in polynomial time can provably find assignments that satisfy a guaranteed fraction (e.g., 3/4 or more) of the maximum number of satisfiable Boolean constraints (see for example here, here, here and here). This might serve as a proxy for exactly solving for the optimizer.

If the problem instead has Boolean expressions with embedded linear inequality predicates on integer variables of bounded range, one could apply SMT solver methods directly using the ideas described above, or convert the problem to a SAT problem and apply the above methods directly.

The idea of using SAT solvers for machine learning in the way described here goes back to (Kamath et al. 1992). The method is described with an example in Donald Knuth’s TAOCP fascicle on Satisfiability, section on Learning a Boolean function.

Getting some (algorithmic) SAT-isfaction

How can you possibly solve a mission-critical problem with millions of variables—when the worst-case computational complexity of every known algorithm for that problem is exponential in the number of variables?

SAT (Satisfiability) solvers have seen dramatic orders-of-magnitude performance gains for many problems through algorithmic improvements over the last couple of decades or so. The SAT problem—finding an assignment of Boolean variables that makes a given Boolean expression true—represents the archetypal NP-complete problem and in the general case is intractable.

However, for many practical problems, solutions can be found very efficiently by use of modern methods. This “killer app” of computer science, as described by Donald Knuth, has applications to many areas, including software verification, electronic design automation, artificial intelligence, bioinformatics, and planning and scheduling.

Its uses are surprising and diverse, from running billion dollar auctions to solving graph coloring problems to computing solutions to Sudoku puzzles. As an example, I’ve included a toy code below that uses SMT, a relative of SAT, to find the English language suffix rule for regular past tense verbs (“-ed”) from data.

When used as a machine learning method, SAT solvers are quite different from other methods such as neural networks. SAT solvers can for some problems have long or unpredictable runtimes (though MAXSAT can sometimes relax this restriction), whereas neural networks have essentially fixed inference cost (though looping agent-based models do not).

On the other hand, answers from SAT solvers are always guaranteed correct, and the process is interpretable; this is currently not so for neural network-based large language models.

To understand better how to think about this difference in method capabilities, we can take a lesson from the computational science community. There, it is common to have a well-stocked computational toolbox of both slow, accurate methods and fast, approximate methods.

In computational chemistry, ab initio methods can give highly accurate results by solving Schrödinger’s equation directly, but only scale to limited numbers of atoms. Molecular dynamics (MD), however, relies more on approximations, but scales efficiently to many more atoms. Both are useful in different contexts. In fact, the two methodologies can cross-pollenate, for example when ab initio calculations are used to devise force fields for MD simulations.

A lesson to take from this is, it is paramount to find the best tool for the given problem, using any and all means at one’s disposal.

The following are some of my favorite general references on SAT solvers:

It would seem that unless P = NP, commonly suspected to be false, the solution of these kinds of problems for any possible input is hopelessly beyond reach of even the world’s fastest computers. Thankfully, many of the problems we care about have an internal structure that makes them much more solvable (and likewise for neural networks). Continued improvement of SAT/SMT methods, in theory and implementation, will greatly benefit the effective solution of these problems.

A toy example: find the English past tense suffix rule using Z3

import csv
import z3

def char2int(c): return ord(c) - ord('a')

def int2char(i): return chr(i + ord('a'))

# Access the language data from the file.
with open('eng_cols.txt', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter='\t')
    table = [row for row in reader]

nrow, ncol = len(table), len(table[0])

# Identify which columns of input table have stem and targeted word form.
stem_col, form_col = 0, 1

# Calculate word lengths.
nstem = [len(table[i][stem_col]) for i in range(nrow)]
nform = [len(table[i][form_col]) for i in range(nrow)]

# Length of suffix being sought.
ns = 2

# Initialize optimizer.
solver = z3.Optimize()

# Define variables to identify the characters of suffix; add constraints.
var_suf = [z3.Int(f'var_suf_{i}') for i in range(ns)]

for i in range(ns):
    solver.add(z3.And(var_suf[i] >= 0, var_suf[i] < 26))

# Define variables to indicate whether the given word matches the rule.
var_m = [z3.Bool(f'var_m_{i}') for i in range(nrow)]

# Loop over words.
for i in range(nrow):

    # Constraint on number of characters.
    constraints = [nform[i] == nstem[i] + ns]

    # Constraint that the form contains the stem.
    for j in range(nstem[i]):
        constraints.append(
            table[i][stem_col][j] == table[i][form_col][j]
                if j < nform[i] else False)

    # Constraint that the end of the word form matches the suffix. 
    for j in range(ns):
        constraints.append(
            char2int(table[i][form_col][nform[i]-1-j]) == var_suf[j]
                if j < nform[i] else False)

    # var_m[i] is the "and" of all these constraints.
    solver.add(var_m[i] == z3.And(constraints))

# Seek suffix that maximizes number of matches.
count = z3.Sum([z3.If(var_m[i], 1, 0) for i in range(nrow)])
solver.maximize(count)

# Run solver, output results.
if solver.check() == z3.sat:
    model = solver.model()
    suf = [model[var_suf[i]] for i in range(ns)]
    print('Suffix identified: ' +
          ''.join(list([int2char(suf[i].as_long())
                        for i in range(ns)]))[::-1])
    print('Number of matches: ' + str(model.evaluate(count)) +
          ' out of ' + str(nrow) + '.')

    var_m_values = [model[var_m[i]] for i in range(nrow)]

    print('Matches:')
    for i in range(nrow):
        if var_m_values[i]:
            print(table[i][stem_col], table[i][form_col])