Automation and Validation

Posted on 24 December 2025 by John

It’s been said whatever you can validate, you can automate. An AI that produces correct work 90% of the time could be very valuable, provided you have a way to identify the 10% of the cases where it is wrong. Often verifying a solution takes far less computation than finding a solution. Examples here.

Validating AI output can be tricky since the results are plausible by construction, though not always correct.

Consistency checks

One way to validate output is to apply consistency checks. Such checks are necessary, but not sufficient, and often easy to implement. An simple consistency check might be that inputs to a transaction equal outputs. A more sophisticated consistency check might be conservation of energy or something analogous to it.

Certificates

Some problems have certificates, ways of verifying that a calculation is correct that can be evaluated with far less effort than finding the solution that they verify. I’ve written about certificates in the context of optimization, solving equations, and finding prime numbers.

Formal methods

Correctness is more important in some contexts than others. If a recommendation engine makes a bad recommendation once in a while, the cost is a lower probability of conversion in a few instances. If an aircraft collision avoidance system makes an occasional error, the consequences could be catastrophic.

When the cost of errors is extremely high, formal verification may be worthwhile. Formal correctness proofs using something like Lean or Rocq are extremely tedious and expensive to create, and hence not economical. But if an AI can generate a result and a formal proof of correctness, hurrah!

Who watches the watchmen?

But if an AI result can be wrong, why couldn’t a formal proof generated to defend the result also be wrong? As the Roman poet Juvenal asked, Quis custodiet ipsos custodes? Who will watch the watchmen?

An AI could indeed generate an incorrect proof, but if it does, the proof assistant will reject it. So the answer to who will watch Claude, Gemini, and ChatGPT is Lean, Rocq, and Isabelle.

Who watches the watchers of the watchmen?

Isn’t it possible that a theorem prover like Rocq could have a bug? Of course it’s possible; there is no absolute certainty under the sun. But hundreds of PhD-years of work have gone into Rocq (formerly Coq) and so bugs in the kernel of that system are very unlikely. The rest of the system is bootstrapped, verified by the kernel.

Even so, an error in the theorem prover does not mean an error in the original result. For an incorrect result to slip through, the AI-generated proof would have to be wrong in a way that happens to exploit an unknown error in the theorem prover. It is far more likely that you’re trying to prove the wrong thing than that the theorem prover let you down.

I mentioned collision avoidance software above. I looked into collision avoidance software when I did some work for Amazon’s drone program. The software that was formally verified was also unrealistic in its assumptions. The software was guaranteed to work correctly, if two objects are flying at precisely constant velocity at precisely the same altitude etc. If everything were operating according to geometrically perfect assumptions, there would be no need for collision avoidance software.

Formulating eight queens as a SAT problem

Posted on 11 May 2025 by John

The Boolean satisfiability problem is to determine whether there is a way to assign values to variables in a set of Boolean formulas to make the formulas hold [1]. If there is a solution, the next task would be to enumerate the solutions.

You can solve the famous eight queens problem, or its generalization to n-queens, by formulating the problem as a Boolean formula then using a SAT solver to find the solutions.

It’s pretty obvious how to start. A chessboard is an 8 by 8 grid, so you have a variable for each square on the board, representing whether or not that square holds a queen. Call the variables b_ij where i and j run from 1 to 8.

The requirement that every row contains exactly one queen can be turned into two subrequirements:

Each row contains at least one queen.
Each row contains at most one queen.

The first requirement is easy. For each row i, we have a clause

b_i1 ∨ b_i2 ∨ b_i3 ∨ … ∨ b_i8

The second requirement is harder. How do you express in terms of our boolean variables that there is no more than one queen in each row? This is the key difficulty. If we can solve this problem, then we’re essentially done. We just need to do the analogous thing for columns and diagonals. (We don’t require a queen on every diagonal, but we require that there be at most one queen on every diagonal.)

First approach

There are two ways to encode the requirement that every row contain at most one queen. The first is to use implication. If there’s a queen in the first column, then there is not a queen in the remaining columns. If there’s a queen in the second column, then there is not a queen in all but the second column, etc. We have an implication for each row in each column. Let’s just look at the first row and first column.

b₁₁ ⇒ ¬ (b₁₂ ∨ b₁₃ ∨ … ∨ b₁₈)

We can turn an implication of the form a ⇒ b into the clause ¬a ∨ b.

Second approach

The second way to encode the requirement that every row contain at most one queen is to say that for every pair of squares in a row (a, b) either a has no queen or b has no queen. So for the first row we would have ₈C₂ = 28 clauses because there are 28 ways to choose pairs from a set of 8 things.

(¬b₁₁ ∨ ¬b₁₂) ∧ (¬b₁₁ ∨ ¬b₁₃) ∧ … ∧ (¬b₁₇ ∨ ¬b₁₈)

An advantage of this approach is that it directly puts the problem into conjunctive normal form (CNF). That is, our formula is a conjunction of terms that contain only disjunctions, an AND of ORs.

[1] You’ll see the SAT problem described as finding the solution to a Boolean formula. If you have multiple formulas, then the first holds, and the second, etc. So you can AND them all together to make one formula.

Decoupling formal theorem proving effort

Posted on 19 November 2023 by John

Terence Tao has been experimenting with formal theorem proving using Lean and writing about his experience.

Here’s something Tao said on Mathstodon that I thought was interesting.

It is remarkable how much “decoupling” is achieved by the Lean+Blueprint combo. Contributors can work locally on proving a lemma, without necessarily fully understanding the global proof structure. Mathematicians who do understand the global proof can work on the blueprint, without necessarily understanding the mechanics of Lean. Lean experts can work on technical aspects of the implementation, such as optimizing the selection of classes and definitions, without needing expert domain knowledge. A theorem can be formalized, before, after, or concurrently with the lemmas it relies on, or the applications it has. Two participants who want to discuss some finer point of the argument can localize to a very specific and highly formalized step and have a constructive discussion even if they come from quite different backgrounds. It allows for (certain types of) high-level mathematical activity to be done at a far more atomized level than is usually possible.

The 10th Dedekind number

Posted on 27 June 2023 by John

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. The 9th Dedekind number was recently computed to be

M(9) = 286386577668298411128469151667598498812366.

The previous post defines monotone Boolean functions and explicitly enumerates the functions for one, two, or three variables. As that post demonstrates, M(1) = 3, M(2) = 6, and M(3) = 20. But as n increases, M(n) increases rapidly, with M(9) being on the order of 10⁴¹.

Although computing the Dedekind numbers exactly is difficult—M(8) was computed in 1991 and M(9) now in 2023—there is an explicit formula for these numbers, and much is known about their asymptotic growth. This post speculates about what M(10) might be.

Write the number k in binary and let b_i^k be its ith bit:

$b_i^k=\left\lfloor\frac{k}{2^i}\right\rfloor - 2\left\lfloor\frac{k}{2^{i+1}}\right\rfloor$

Then the nth Dedekind number is given by

$M(n)=\sum_{k=1}^{2^{2^n}} \prod_{j=1}^{2^n-1} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)$

and so

$M(10)=\sum_{k=1}^{2^{1024}} \prod_{j=1}^{1023} \prod_{i=0}^{j-1} \left(1-b_i^k b_j^k\prod_{m=0}^{\log_2 i} (1-b_m^i+b_m^i b_m^j)\right)$

In principle, all you have to do to compute M(10) is evaluate the sum above. However, since this sum has more than 10³⁰⁸ terms, it would take a while.

What can we say about M(10) without computing it? The number of monotone Boolean functions of n variables is less than the total number of Boolean functions of n variables, which equals

$2^{2^n}$

That tells us M(10) < 1.8 × 10³⁰⁸.

There are more useful bounds. It has been proven that

${n\choose \lfloor n/2\rfloor}\le \log_2 M(n)\le {n\choose \lfloor n/2\rfloor}\left(1+O\left(\frac{\log n}{n}\right)\right)$

This gives us a definite lower bound but not a definite upper bound. We know M(10) ≥ 2²⁵² which is approximately 7.237 × 10⁷⁵, but we don’t know what the big-O term is. All we know is that for sufficiently large n, this term is smaller than some multiple of log(n)/n. How large does n need to be and what is this constant? I don’t know. Maybe researchers in this area have some partial results.

Let’s take a guess at the upper bound by seeing what the big-O term was for M(9). Find k such that

$\log_2 M(9) = \binom{9}{4}\left(1 + k \frac{\log 9}{9}\right)$

We get

$k = \left(\frac{\log_2M(9)}{126} - 1 \right)\frac{9}{\log 9} \approx 0.3809$

and we can use this to guess that

$\log_2 M(10) \stackrel{?}{=} \binom{10}{5}\left(1 + 0.3809 \frac{\log 10}{10}\right) \approx 274.1$

which would imply M(10) = 3.253 × 10⁸².

So to recap, we know for certain that M(10) is between 7.237 × 10⁷⁵ and 1.8 × 10³⁰⁸, and our guess based on the heuristic above is that M(10) = 3.253 × 10⁸².

Enumerating monotone Boolean functions

Posted on 27 June 2023 by John

The 9th Dedekind number was recently computed. What is a Dedekind number and why was it a big deal to compute just the 9th one of them?

We need to define a couple terms before we can define Dedekind numbers.

A Boolean function is a function whose inputs are 0’s and 1’s and whose output is 0 or 1.

A function f is monotone if increasing the input cannot decrease the output:

x ≤ y ⇒ f(x) ≤ f(y).

Obviously a monotone Boolean function is a Boolean function that is monotone, but monotone with respect to what order? How are we to define when x ≤ y when x and y are sequences of bits?

There are numerous ways one might order the inputs, but the conventional order [1] in this context is to say x ≤ y if every bit in x is less than or equal to the corresponding bit in y. So if the ith bit of x is a 1, then the ith bit of y must be a 1.

A Boolean function is monotone if and only if flipping an input bit from 0 to 1 cannot change the output from 1 to 0.

Enumerating monotone Boolean functions

The nth Dedekind number M(n) is the number of monotone Boolean functions of n variables. We’ll enumerate a few of these. Let a, b, c and d be Boolean variables and denote AND by ∧ and OR by ∨. As usual, we assume ∧ is higher precedence than ∨ so that, for example,

x ∨ y ∧ z

means

x ∨ (y ∧ z).

One variable

There are three monotone functions of one variable a: always return 0, always return a, and always return 1.

The only Boolean function of one variable that isn’t monotone is the function that flips a, i.e. f(a) = ¬a.

Two variables

There are six monotone Boolean functions with two variables:

0
a
b
a ∧ b
a ∨ b
1

and so M(2) = 6.

We can verify that the six functions above are monotone with the following Python code.

    from itertools import product
    
    f = [None]*6
    f[0] = lambda a, b: 0
    f[1] = lambda a, b: a
    f[2] = lambda a, b: b
    f[3] = lambda a, b: a | b 
    f[4] = lambda a, b: a & b
    f[5] = lambda a, b: 1
    
    for i in range(6):
        for (a, b) in product((0,1), repeat=2):
            for (x, y) in product((0,1), repeat=2):
                if a <= x and b <= y:
                    assert(f[i](a, b) <= f[i](x, y))

Three variables

There are 20 monotone Boolean functions of three variables:

0
a
b
c
a ∧ b
b ∧ c
a ∧ c
a ∨ b
b ∨ c
a ∨ c
a ∧ b ∨ c
b ∧ c ∨ a
a ∧ c ∨ b
a ∧ b ∨ b ∧ c
a ∧ c ∨ b ∧ c
a ∧ b ∨ a ∧ c
a ∧ b ∨ b ∧ c ∨ a ∧ c
a ∧ b ∧ c
a ∨ b ∨ c

and so M(3) = 20.

As before, we can verify that the functions above are monotone with a script.

    g = [None]*20
    g[ 0] = lambda a, b, c: 0
    g[ 1] = lambda a, b, c: a 
    g[ 2] = lambda a, b, c: b
    g[ 3] = lambda a, b, c: c
    g[ 4] = lambda a, b, c: a & b
    g[ 5] = lambda a, b, c: b & c
    g[ 6] = lambda a, b, c: a & c
    g[ 7] = lambda a, b, c: a | b
    g[ 8] = lambda a, b, c: b | c
    g[ 9] = lambda a, b, c: a | c
    g[10] = lambda a, b, c: a & b | c
    g[11] = lambda a, b, c: b & c | a
    g[12] = lambda a, b, c: a & c | b
    g[13] = lambda a, b, c: a & b | b & c
    g[14] = lambda a, b, c: a & c | b & c
    g[15] = lambda a, b, c: a & b | a & c
    g[16] = lambda a, b, c: a & b | b & c | a & c
    g[17] = lambda a, b, c: a & b & c
    g[18] = lambda a, b, c: a | b | c 
    g[19] = lambda a, b, c: 1
    
    for i in range(20):
        for (a, b, c) in product((0,1), repeat=3):
            for (x, y, z) in product((0,1), repeat=3):
                if a <= x and b <= y and c <= z:
                    assert(g[i](a, b, c) <= g[i](x, y, z))

More variables

The concrete approach to enumerating monotone Boolean functions does not scale. There are 168 monotone functions of four variables, 7581 of five variables, and 7,828,354 functions of six variables. The Dedekind numbers M(n) grow very quickly. The next post will quantify just how quickly.

[1] This “order” is technically a partial order. If x = (0, 1) and y = (1, 0) then x and y are not comparable; neither is less than or equal to the other.

Ligatures for Logic

Posted on 3 October 2022 by John

A ligature in typesetting is a way of presenting two (or more) consecutive characters differently the individual characters would be displayed. For example, “fi” is often rendered with the top of the ‘f’ dotting the ‘i’. Here’s an example from Computer Modern, the default font in LaTeX.

Usually the difference is subtle—ordinarily readers are not consciously aware of them—but a ligature could look entirely different from its components. The previous post is an example of the latter: the two-letter abbreviation for a country is rendered as the flag of that country.

I’ve been playing around with Fira Code, a font with ligatures for programming. Fonts like this aim to do for programming what ordinary ligatures do for prose. For example, a programming font might include a ligature to render >= as ≥.

Programming fonts are obviously intended for use in programming, but I personally don’t like the idea of using ligatures in programming. They compromise the simplicity of plain text [1]. They’re supported in some environments but not in others, or they require some fiddly configuration before they’ll work, etc.

Still, I like the aesthetics of Fira Code, particularly in the way it handles logic symbols. Here are some examples comparing a common monospace font and Fira Code.

$(a => b) <=> (¬a \/ b), {a} |= a \/ b, |= p → |- p$

The image above is a screen shot of a document created in LibreOffice Writer. The ligatures didn’t work when I tried using them in Microsoft Word.

The Fira Code was designed as a monospace font, but has been extended to include proportional fonts. Fira Code with a proportional font might be useful in prose documents. You could insert a few symbols with a couple key strokes rather than searching for the symbol or entering Unicode.

However, it seems most of Fira Code’s ligatures are only available in monospaced versions of the font. If you use Fira Code in a prose document, you could switch from proportional font to monospace font just for an occasional symbol. It’s unclear whether that would be more or less work than other alternatives.

There’s one place where I believe Fira Code would be ideal: code examples inside a prose document. In that context you care about aesthetics and you want a monospaced font. Here again are some examples comparing Inconsolata and Fira Code.

$if (a >= b /\ c != d) {…}$

Modal logic posts

Posted on 30 August 2022 by John

Selected articles about modal logic:

Temporal and polymodal logics

Posted on 25 January 2022 by John

My posts on modal logic have mostly been about monomodal logic, logic with one modal operator. This may not seem accurate because I’ve talked about □ (“box”) and ◇ (“diamond”). But these are really just one mode: you can define either in terms of the other.

◇p = ¬ □ ¬p
□p = ¬ ◇ ¬p

Temporal logic

Temporal logic is an example of a polymodal operator. There are many varieties of temporal logic, but one version defined by Arthur Prior in the 1950s has two basic modes: P (past) and F (future). Prior called his system tense logic but the term temporal logic is more common now.

We can interpret Fp to mean that at some point in the future, p will be the case.

We can interpret Pp to mean at some point in the past p was the case.

In temporal logic we have two analogs of the box operator, and each has its dual counterpart analogous to diamond.

Gp = ¬ F ¬p
Hp = ¬ P ¬p

So Gp can be read as “from now on p will be the case.”

And Hp can be read as “always up until now p has been the case.”

The combination GFp can be interpreted as “eventually p.”

The combination PHp can be interpreted as “up until some point in the past, p.”

I’ve mentioned before how Kripke semantics—models of “accessible worlds”—sounds like science fiction. Temporal logic can have that flavor too when you get into whether you view time as one linear stream or a branching tree, whether time is discrete or continuous, etc.

I mentioned briefly at the bottom of my post on duality that the algorithm given there for monomodal logics extends to polynomial logics.

In that post I said that the shell command

    tr '□◇→' '◇□←'

can turn a modal proposition into its dual. We could find the dual of a temporal proposition with the shell command

    tr 'FGPH→' 'GFHP←'

In other words, we replace Fs with Gs, Gs with Fs, Ps with Hs, Hs with Ps, and reversing the direction of implication.

More modes

Polymodal logics can have a lot more than two modes. For example, in security applications of modal logic, you might have a modal operator for every user of a system. Here we might model what a user knows rather than what is true: something might be true without a particular agent knowing that it’s true.

Simulation

Obviously monomodal logics are contained inside polymodal logics; if you have several modalities, box could be one of them. But here’s the amazing converse: polymodal logics can be simulated by monomodal logics.

In the abstract to their paper “Normal monomodal logics can simulate all others.” Marcus Kracht6 and Frank Wolter announce

This paper shows … that polymodal normal logics can be simulated by monomodal (normal) logics. Many properties of logics are shown to be reflected and preserved by such simulations … sheding new light on the power of normal monomodal logic.

Dual axioms in modal logic

Posted on 24 January 2022 by John

Axioms in modal logic often say that one sequence of boxes and diamonds in front of a proposition p implies another sequence of boxes and diamonds in front of p. For example, Axiom 4 says

□ p → □□ p

and Axiom 5 says

◇p → □◇p.

Every axiom has a dual form. The dual form of Axiom 4 is

◇◇p → ◇p

and the dual of Axiom 5 is

◇□ p → □ p.

Computing duals

There’s a simple way to compute the dual of such axioms:

Rotate all the squares 45° and rotate the arrow 180°.

This turns boxes into diamonds, diamonds into boxes, and flips the direction of implication.

Shell and Perl

We could do this using the tr utility at the command line

    $ echo '□□◇□◇p → □◇p' | tr '□◇→' '◇□←'
    ◇◇□◇□p ← ◇□p

We could also do the same thing in Perl, using its tr operator

    $prop = "□□◇□◇p → □◇p";
    ($dual = $prop) =~ tr/□◇→/◇□←/;
    print "$prop\n$dual\n";

This prints

    □□◇□◇p → □◇p
    ◇◇□◇□p ← ◇□p

It’s important to note that tr in both its incarnations does simultaneous replacement. It did what we expected, so it might be hard to notice.

tr takes two strings of the same length as arguments. Call the first one from and the second to. The easiest way to implement tr would have been to replace the first character of from with the first character of to, then replace the second character of from with the second character of to, etc.

This would have turned all our boxes into diamonds, then turned all diamonds into boxes, and so we’d be left with nothing but boxes! Our sequence □□◇□◇ would have turned into □□□□□.

Proof

Why is the rule above valid?

Let ○ stand for either a box or a diamond and suppose we start with

○₁ ○₂ … ○_m p → ○_m+1 ○_m+2 … ○_n p

where p is an arbitrary proposition.

Now let ○’ stand for the dual of ○. So if ○ is a box, ○’ is a diamond, and vice versa. Then

○ p = ¬○’ ¬p

by definition. (If you take □ as primary, then the equation above is the definition of ◇. If you take ◇ as primary, it’s the definition of □.) Apply this rule everywhere.

¬○’₁ ¬¬○’₂ ¬… ¬○’_m ¬p → ¬○’_m+1 ¬¬○’_m+2 … ¬○’_n ¬p

Now cancel out all the pairs of consecutive negations.

¬○’₁ ○’₂ … ○’_m ¬p → ¬○’_m+1 ○’_m+2 … ○’_n ¬p

Now take the contrapositive: (¬P → ¬Q) → (Q → P).

○’_m+1 ○’_m+2 … ○’_n ¬p → ○’₁ ○’₂ … ○’_m ¬p

Since p was an arbitrary proposition, we can replace p with ¬p.

○’_m+1 ○’_m+2 … ○’_n p → ○’₁ ○’₂ … ○’_m p

What we have above is the proposition we started with, with all the boxes replaced with diamonds, all the diamonds replaced with boxes, and the direction of the implication reversed.

More modalities

Notice that the theorem and proof still holds if there are multiple modalities. Suppose we have a set of modalities K_i. You could interpret

K_ip

as saying the ith agent knows proposition p is true. Then the dual is defined by

K’_ip = ¬ K_i¬ p,

which could be interpreted as saying the ith agent does not know p to be false.

You could form the dual of a proposition involving K and K‘ expressions by adding primes to terms that don’t have them, and removing primes from terms that do, and turning the implication around. The proof would be the same as above, only we don’t restrict ○ to being □ or ◇.

Word problems, logic, and regular expressions

Posted on 23 January 2022 by John

Word problems

Suppose you have a sequence of symbols and a set of rewriting rules for replacing some patterns of symbols with others. Now you’re given two such sequences. Can you tell whether there’s a way to turn one of them into the other?

This is known as the word problem, and in general it’s undecidable. In general the problem cannot be solved by a program, but some instances can. We’ll look at a word problem that can be solved with a few regular expressions.

Modal logic

Basic modal logic has two symbols, □ (“box”) and ◇ (“diamond”), and concatenations of these symbols. In general, there are infinitely many non-equivalent sequences of boxes and diamonds, depending on the axioms of your modal logic.

In the axiom system S4, every non-empty sequence of boxes and diamonds is equivalent to one of six possibilities:

□
◇
□◇
◇□
□◇□
◇□◇

An arbitrary sequence of boxes and diamonds can be reduced to one of the forms above by applying the following rules:

□ □ → □
◇ ◇ → ◇
□◇□◇ → □◇
◇□◇□ → ◇□

Regular expressions

We can apply the reduction rules above using regular expressions with the following Perl code.

    use utf8;

    $_ = "□□◇□◇◇◇◇□□";

    s/□+/□/g;
    s/◇+/◇/g; 
    s/(□◇)+/□◇/g; 
    s/(◇□)+/◇□/g;

    print;

The directive use utf8; tells Perl to be prepared for non-ASCII characters, namely boxes and diamonds. In Perl, $_ is the implicit variable; all the following substitution commands will modify this variable, and the print statement will output the final value of this variable.

The first substitution replaces one or more consecutive boxes with one box and the second does the analogous substitution for consecutive diamonds. The third and fourth substitution commands replace repetitions of □◇ or ◇□ with a single instance.

The script above outputs

□◇□

meaning that

□□◇□◇◇◇◇□□p ⟷ □◇□p

is a theorem in S4.

Word problems can’t always be solved using regular expressions, or any other programming technique, but this one could.

Logic

Automation and Validation

Consistency checks

Certificates

Formal methods

Who watches the watchmen?

Who watches the watchers of the watchmen?

Formulating eight queens as a SAT problem

First approach

Second approach

Related posts

Decoupling formal theorem proving effort

Related posts

The 10th Dedekind number

Enumerating monotone Boolean functions

Enumerating monotone Boolean functions

One variable

Two variables

Three variables

More variables

Ligatures for Logic

Related links

Modal logic posts

Temporal and polymodal logics

Temporal logic

More modes

Simulation

Related posts

Dual axioms in modal logic

Computing duals

Shell and Perl

Proof

More modalities

Related posts

Word problems, logic, and regular expressions

Word problems

Modal logic

Regular expressions

Related posts