I was curious how well LLMs would do at minimizing a Boolean expression, that is, taking a Boolean expression and producing a smaller equivalent expression.

I didn’t expect good performance because this problem is more about logic than recall, but sometimes LLMs surprise you, so I wanted to give it a chance. I thought it might accomplish some reduction in size though not as much as possible. Let’s see what it did.

## Prompt

My prompt for the three platforms I tried—GPT4All, ChatGPT, and Google Bard—was

Minimize the boolean expression ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD

I knew the expression could be minimized to

B¬C¬D + A¬D + AC.

## GPT4All

I first tried GPT4All using the Groovy data set. It crashed immediately. I tried a few variations on the same prompt and it crashed right away every time.

This was on a MacBook Pro. I didn’t try it on Windows or Linux.

## ChatGPT

I’ll go through ChatGPT’s response a little at a time.

To minimize the given boolean expression ¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD, we can use Karnaugh maps or Boolean algebra simplification techniques. I’ll demonstrate both methods.

This is a sensible start. Karnaugh maps are a common classroom technique for solving small Boolean minimization problems, so there were lots of examples of Karnaugh maps in the data scraped to train ChatGPT.

Next ChatGPT draws a Karnaugh map. The screenshot at the top of the post was from a different experiment, a version of the experiment described here but containing an error on my part. This problem produced a black image simply containing “CD” but also contained a Karnaugh map in text. Cleaning up the formatting we get the following.

| 00 | 01 | 11 | 10 | 00 | 1 | 0 | 1 | 0 | 01 | 1 | 1 | 1 | 1 | 11 | 1 | 1 | 1 | 1 | 10 | 0 | 1 | 0 | 0 |

This is nonsense. Nine out of sixteen entries are wrong.

ChatGPT concludes

Grouping the adjacent 1’s in the Karnaugh map, we can simplify the expression to: ¬A¬C + ¬A¬D + A¬B + ACD

This is also nonsense. It does not correctly solve the original problem, nor does it solve the problem corresponding to the incorrect Karnaugh map above.

Next let’s see how it does with Boolean algebra. It starts out OK.

Boolean Algebra Method: We can simplify the expression using Boolean algebra identities and rules.

¬AB¬C¬D + A¬B¬C¬D + A¬B¬CD + A¬BC¬D + A¬BCD + AB¬C¬D + ABC¬D + ABCD

Rearranging the terms: A¬B¬C¬D + A¬B¬CD + AB¬C¬D + A¬BC¬D + ABC¬D + A¬BCD + ABCD + A¬B¬C¬D

Grouping the terms: A¬B¬C¬D + A¬B¬CD + AB¬C¬D + A¬BC¬D + ABC¬D + A¬BCD + ABCD

Combining like terms: (A + A)¬B¬C¬D + (A + A)¬B¬CD + AB¬C¬D + (A + A)¬BC¬D + ABC¬D + A¬BCD + ABCD

These steps haven’t accomplished anything, but they’re not wrong. Then it goes off the rails.

Using the complement law A + ¬A = 1: ¬B¬C¬D + ¬B¬CD + AB¬C¬D + ¬BC¬D + ABC¬D + A¬BCD + ABCD

The explanation is wrong and the solution is wrong.

The solution is completely wrong, but it looks plausible. At first glance I thought it might be right, or more right than it was. If you were making a movie and you needed to generate electrical engineering homework, this would be adequate for a realistic-looking homework solution.

## Google Bard

Google Bard gives a more succinct solution with five steps:

- Identify the common terms in the expression. The common terms are ¬AB¬C¬D, A¬B¬CD, and ABC¬D.
- Combine the common terms. This gives us the expression (¬AB¬C¬D + A¬B¬CD + ABC¬D) + (A¬B¬C¬D + AB¬C¬D).
- Apply the distribution property. This gives us the expression ¬AB¬C¬D + A¬B¬CD + ABC¬D + A¬B¬C¬D + AB¬C¬D.
- Identify the redundant terms. The redundant terms are ¬AB¬C¬D and A¬B¬C¬D.
- Remove the redundant terms. This gives us the expression A¬B¬CD + ABC¬D.

This too is nonsense.

This is a highly succinct demonstration of the core inadequacy of generative AI. It needs some kind of confidence level qualifier for answers. That would probably be adding a different kind of AI and with that in a loop it might actually be able to converge on 99% confidence answers a lot more of the time. It may also be that such a confidence level indicator is not computable. Like you feed it itself and it freaks out. Above my pay grade, obviously.

You might be interested in how ChatGPT with code interpreter handles this https://chat.openai.com/share/af745d0e-ec05-4038-84eb-56f879f95ea8. It generates SymPy code to solve the problem, which it then executes to get the right answer.

Sorry, here is a corrected link https://chat.openai.com/share/5e48f2eb-bf24-4cea-870f-69bf1286fdb5