Formalizing a ring theorem with Lean 4 and Claude

I’ve been testing Claude’s ability to generate Lean 4 code to prove theorems. I’ve written about a couple experiments that verified calculations. I did not write about my failed attempt to get Claude to formalize a proof of the pqr theorem for seminorms.

This time I asked Claude to formally prove the theorem from the previous post on partial fraction decomposition in an abstract setting. Unlike the previous experiments, I did not give it a proof or calculation to verify; I asked it to generate a proof.

This was the prompt:

Write a proof in Lean 4 for the following generalization of partial fraction decomposition.

If a ring R is a principle ideal domain (PID), then every element c of the field of fractions over R can be written in the form \[c = \sum_i \frac{a_i}{p_i^{r_i}}\]

where the p_i are nonassociate irreducible elements of R, the r_i are non-negative integers, and the elements a_i and p_i are relatively prime.

This was Claude’s immediate response:

I’ll work through this carefully, as it’s a substantial formalization task. Let me think through what’s available in Mathlib and what needs to be constructed.

This is a genuinely hard Lean 4 formalization problem. Let me think carefully about what’s in Mathlib, what the statement precisely means, and what proof strategy is realistic.

It also stated its overall proof strategy:

Proof strategy: Write c = a/b in lowest terms. Factor b = ∏ pᵢ^rᵢ. Use the Chinese Remainder Theorem / Bezout identity to decompose a/b into partial fractions.

If I needed to prove the theorem by hand, this suggestion alone could be very helpful.

The first attempt generated code that would not build. As before, a lot of the problems seem to stem from things moving around in Mathlib due to refactorings. There are also occasional hallucinations.

I did not offer any help other than pasting in error messages after each failure. When working on a real problem, I’d be more interactive. But as a test, I just passively pasted error messages and tried again. It took 11 iterations, but I finally got a proof. The proof had five sections marked “sorry,” which is essentially a TODO note indicating an unproven assumption.

I asked if it could explain or remove the sorries, and it said it could remove three of them. But the generated code did not build and I hit a session limit before I could debug the code. Here is the code with five sorry sections.

/-
  PartialFractions.lean  (v11 — final)
-/

import Mathlib

open UniqueFactorizationMonoid

variable {R : Type*} [CommRing R] [IsDomain R] [IsPrincipalIdealRing R]

structure PFDSummand (R : Type*) [CommRing R] where
  prime    : R
  exponent : ℕ
  numer    : R

noncomputable def PFDSummand.val [CommRing R] [IsDomain R] (s : PFDSummand R) :
    FractionRing R :=
  algebraMap R (FractionRing R) s.numer /
  algebraMap R (FractionRing R) (s.prime ^ s.exponent)

structure IsPFD [CommRing R] [IsDomain R] [IsPrincipalIdealRing R]
    (terms : List (PFDSummand R)) (c : FractionRing R) : Prop where
  irred    : ∀ s ∈ terms, Irreducible s.prime
  nonAssoc : ∀ s ∈ terms, ∀ t ∈ terms, s ≠ t → ¬Associated s.prime t.prime
  coprime  : ∀ s ∈ terms, IsCoprime s.numer s.prime
  sum_eq   : (terms.map PFDSummand.val).sum = c

private lemma algMap_ne_zero {a : R} (h : a ≠ 0) :
    (algebraMap R (FractionRing R)) a ≠ 0 :=
  map_ne_zero_iff _ (IsFractionRing.injective R (FractionRing R)) |>.mpr h

private lemma exists_num_denom (c : FractionRing R) :
    ∃ a b : R, b ≠ 0 ∧
      (algebraMap R (FractionRing R)) a / (algebraMap R (FractionRing R)) b = c := by
  obtain ⟨a, b, hb, hab⟩ := IsFractionRing.div_surjective (A := R) c
  exact ⟨a, b, nonZeroDivisors.ne_zero hb, hab⟩

-- ======================================================================
-- §6  Bézout splitting
-- ======================================================================

private lemma bezout_split (x y : R) (hcop : IsCoprime x y)
    (hx : x ≠ 0) (hy : y ≠ 0) (e : R) :
    ∃ s t : R, s * x + t * y = 1 ∧
      (algebraMap R (FractionRing R) e) /
        ((algebraMap R (FractionRing R) x) * (algebraMap R (FractionRing R) y)) =
      (algebraMap R (FractionRing R) (e * t)) / (algebraMap R (FractionRing R) x) +
      (algebraMap R (FractionRing R) (e * s)) / (algebraMap R (FractionRing R) y) := by
  obtain ⟨s, t, hst⟩ := hcop
  refine ⟨s, t, hst, ?_⟩
  have hxF : (algebraMap R (FractionRing R)) x ≠ 0 := algMap_ne_zero hx
  have hyF : (algebraMap R (FractionRing R)) y ≠ 0 := algMap_ne_zero hy
  have hxyF : (algebraMap R (FractionRing R)) x *
              (algebraMap R (FractionRing R)) y ≠ 0 := mul_ne_zero hxF hyF
  have hst_F : (algebraMap R (FractionRing R)) s * (algebraMap R (FractionRing R)) x +
               (algebraMap R (FractionRing R)) t * (algebraMap R (FractionRing R)) y = 1 := by
    have h := congr_arg (algebraMap R (FractionRing R)) hst
    simp only [map_add, map_mul, map_one] at h; exact h
  -- Prove: ι e / (ι x * ι y) = ι(e*t)/ι x + ι(e*s)/ι y
  -- Strategy: show RHS = LHS via eq_div_iff, simplify, linear_combination.
  symm
  rw [eq_div_iff hxyF, add_mul, div_mul_eq_mul_div, div_mul_eq_mul_div]
  -- Goal: ι(e*t)*(ι x*ι y)/ι x + ι(e*s)*(ι x*ι y)/ι y = ι e
  -- The mul_comm on the product inside:
  rw [show (algebraMap R (FractionRing R)) x * (algebraMap R (FractionRing R)) y =
        (algebraMap R (FractionRing R)) y * (algebraMap R (FractionRing R)) x from mul_comm _ _]
  -- Now: ι(e*t)*(ι y*ι x)/ι x + ι(e*s)*(ι y*ι x)/ι y = ι e
  -- Use field_simp to clear remaining denominators
  field_simp [hxF, hyF]
  -- Now a polynomial goal: close with linear_combination
  simp only [map_mul]
  linear_combination (algebraMap R (FractionRing R) e) * hst_F

-- ======================================================================
-- §7  Coprime numerator reduction
-- ======================================================================

private lemma coprime_numer_exists {p : R} (hp : Irreducible p) (a : R) (n : ℕ) :
    ∃ a' q : R, a = a' + q * p ^ n ∧ IsCoprime a' p := by
  rcases Classical.em (p ∣ a) with _h | h
  · exact ⟨0, 0, by sorry, by sorry⟩
  · exact ⟨a, 0, by ring,
      isCoprime_comm.mpr ((irreducible_iff_prime.mp hp).coprime_iff_not_dvd.mpr h)⟩

-- ======================================================================
-- §8  Main theorem
-- ======================================================================

private def emptyPFD (c : FractionRing R) (hc : c = 0) :
    ∃ terms : List (PFDSummand R), IsPFD terms c :=
  ⟨[], { irred := by simp, nonAssoc := by simp, coprime := by simp,
         sum_eq := by simp [hc] }⟩

private lemma isPFD_of_denom : ∀ (d : R) (e : R),
    ∃ terms : List (PFDSummand R),
      IsPFD terms ((algebraMap R (FractionRing R) e) /
                   (algebraMap R (FractionRing R) d)) := by
  intro d
  induction d using UniqueFactorizationMonoid.induction_on_coprime with
  | h0 =>
    intro e; simp only [map_zero, div_zero]
    exact emptyPFD 0 rfl
  | h1 =>
    intro e
    -- Unit denominator: sorry for now (§9C)
    exact ⟨[], { irred := by simp, nonAssoc := by simp,
                 coprime := by simp, sum_eq := by simp; sorry }⟩
  | hpr =>
    rename_i prime_elem exp hp_prime
    intro e
    have hp_irred : Irreducible prime_elem := irreducible_iff_prime.mpr hp_prime
    obtain ⟨e', _q, _he, he'_cop⟩ := coprime_numer_exists hp_irred e exp
    exact ⟨[⟨prime_elem, exp, e'⟩], {
      irred    := fun s hs => by
        simp only [List.mem_singleton] at hs; subst hs; exact hp_irred
      nonAssoc := fun s hs t ht hne => by
        simp only [List.mem_singleton] at hs ht; subst hs; subst ht; exact absurd rfl hne
      coprime  := fun s hs => by
        simp only [List.mem_singleton] at hs; subst hs; exact he'_cop
      sum_eq   := by
        simp only [List.map_singleton, List.sum_singleton, PFDSummand.val]; sorry }⟩
  | hcp =>
    rename_i x y hrel ih_x ih_y
    intro e
    -- Handle the case where x = 0 or y = 0 directly
    rcases eq_or_ne x 0 with rfl | hx
    · -- x = 0: denominator = 0*y = 0, so quotient = 0
      simp only [zero_mul, map_zero, div_zero]
      exact emptyPFD 0 rfl
    rcases eq_or_ne y 0 with rfl | hy
    · -- y = 0: denominator = x*0 = 0, so quotient = 0
      simp only [mul_zero, map_zero, div_zero]
      exact emptyPFD 0 rfl
    -- Both nonzero: apply Bézout splitting
    -- IsCoprime x y from IsRelPrime x y (using IsRelPrime.isCoprime)
    have hcop : IsCoprime x y := hrel.isCoprime
    obtain ⟨s, t, _hst, hsplit⟩ := bezout_split x y hcop hx hy e
    obtain ⟨terms_x, hpfd_x⟩ := ih_x (e * t)
    obtain ⟨terms_y, hpfd_y⟩ := ih_y (e * s)
    exact ⟨terms_x ++ terms_y, {
      irred    := fun s' hs' => by
        rw [List.mem_append] at hs'
        exact hs'.elim (hpfd_x.irred s') (hpfd_y.irred s')
      nonAssoc := fun s' hs' t' ht' hne => by
        rw [List.mem_append] at hs' ht'
        rcases hs' with hs' | hs' <;> rcases ht' with ht' | ht'
        · exact hpfd_x.nonAssoc s' hs' t' ht' hne
        · intro; sorry
        · intro; sorry
        · exact hpfd_y.nonAssoc s' hs' t' ht' hne
      coprime  := fun s' hs' => by
        rw [List.mem_append] at hs'
        exact hs'.elim (hpfd_x.coprime s') (hpfd_y.coprime s')
      sum_eq   := by
        rw [List.map_append, List.sum_append, hpfd_x.sum_eq, hpfd_y.sum_eq]
        -- Goal: ι(e*t)/ι x + ι(e*s)/ι y = ι e / ι(x*y)
        -- hsplit: ι e/(ι x*ι y) = ι(e*t)/ι x + ι(e*s)/ι y
        -- So: ι(e*t)/ι x + ι(e*s)/ι y = ι e/(ι x*ι y) = ι e/ι(x*y)
        rw [← hsplit, map_mul] }⟩

theorem pid_partial_fraction_decomposition (c : FractionRing R) :
    ∃ terms : List (PFDSummand R), IsPFD terms c := by
  obtain ⟨a, b, _hb, hc⟩ := exists_num_denom c
  rw [← hc]; exact isPFD_of_denom b a

section IntExample
example (c : FractionRing ℤ) :
    ∃ terms : List (PFDSummand ℤ), IsPFD terms c :=
  pid_partial_fraction_decomposition c
end IntExample

Quaternion Rotations, Claude, and Lean

I got an email message this afternoon reporting a typo in a blog post from about a year ago on converting between quaternions and rotation matrices [1]. The email said exactly where the typo was, but I decided to see whether Claude would find it. Specifically, I prompted Sonnet 4.6 Medium with the following.

Write Lean code to verify the two theorems at the top of this post:
https://www.johndcook.com/blog/2025/05/07/quaternions-and-rotation-matrices/

That is, prove that the expressions given in the two SVG files are correct.

The post included Python code to numerically verify the equations. However, the Python code differed from the LaTeX code for the image in one subscript [2]. Although I asked Claude to prove the expressions in the SVG file produced by the LaTeX code, it detected the conflict between the Python and LaTeX and correctly concluded that the former was correct.

The SVG is an image — let me rely on the Python code in the blog (which is the ground truth implementation) and the matrix entries in the alt text. … Note the alt text says row 1 col 2 is 2(q_1 q_3 - q_0 q_1) — that’s a typo in the alt text; the Python code says 2*(q2*q3 - q0*q1) which makes sense. I’ll use the Python code as the authoritative source.

The code did not run on first attempt. After four iterations of pasting in error messages and generating code again, it worked.

Here is the final Lean 4 code.

/-
  Lean 4 / Mathlib verification of the two theorems from:
    "Converting between quaternions and rotation matrices"
    ...

  Matrix entries (0-based, matching the post's Python code):
    R00 = 2(q0²+q1²)−1    R01 = 2(q1 q2−q0 q3)   R02 = 2(q1 q3+q0 q2)
    R10 = 2(q1 q2+q0 q3)  R11 = 2(q0²+q2²)−1     R12 = 2(q2 q3−q0 q1)
    R20 = 2(q1 q3−q0 q2)  R21 = 2(q2 q3+q0 q1)   R22 = 2(q0²+q3²)−1

  THEOREM 1  (quaternion → rotation matrix)
    If q0²+q1²+q2²+q3² = 1 then R is orthogonal (Rᵀ R = I),
    proved via the 9 scalar dot-product identities.

  THEOREM 2  (rotation matrix → quaternion, Chiaverini–Siciliano)
    With rᵢⱼ as above:
        1 + R00 + R11 + R22 = 4 q0²
        1 + R00 − R11 − R22 = 4 q1²
        1 − R00 + R11 − R22 = 4 q2²
        1 − R00 − R11 + R22 = 4 q3²
-/

import Mathlib.Tactic

set_option linter.style.whitespace false

variable (q0 q1 q2 q3 : ℝ)

/-! ## Theorem 1 : Rᵀ R = I -/

-- ── Column norms = 1 ─────────────────────────────────────────────────────────

theorem col0_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) ^ 2 + (2 * (q1 * q2 + q0 * q3)) ^ 2 +
    (2 * (q1 * q3 - q0 * q2)) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 + q1 ^ 2 - q2 ^ 2 - q3 ^ 2),
             sq_nonneg (q0 * q2 + q1 * q3), sq_nonneg (q0 * q3 - q1 * q2),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

theorem col1_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q2 - q0 * q3)) ^ 2 + (2 * (q0 ^ 2 + q2 ^ 2) - 1) ^ 2 +
    (2 * (q2 * q3 + q0 * q1)) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 - q1 ^ 2 + q2 ^ 2 - q3 ^ 2),
             sq_nonneg (q0 * q1 + q2 * q3), sq_nonneg (q0 * q3 - q1 * q2),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

theorem col2_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q3 + q0 * q2)) ^ 2 + (2 * (q2 * q3 - q0 * q1)) ^ 2 +
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 - q1 ^ 2 - q2 ^ 2 + q3 ^ 2),
             sq_nonneg (q0 * q1 - q2 * q3), sq_nonneg (q0 * q2 + q1 * q3),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

-- ── Column orthogonality = 0 ─────────────────────────────────────────────────
-- These need h (the residuals shown by Lean are multiples of (q0²+q1²+q2²+q3²-1)).
-- We use linear_combination with the explicit witnesses.

theorem col0_col1_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) * (2 * (q1 * q2 - q0 * q3)) +
    (2 * (q1 * q2 + q0 * q3)) * (2 * (q0 ^ 2 + q2 ^ 2) - 1) +
    (2 * (q1 * q3 - q0 * q2)) * (2 * (q2 * q3 + q0 * q1)) = 0 := by
  linear_combination 4 * q1 * q2 * h

theorem col0_col2_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) * (2 * (q1 * q3 + q0 * q2)) +
    (2 * (q1 * q2 + q0 * q3)) * (2 * (q2 * q3 - q0 * q1)) +
    (2 * (q1 * q3 - q0 * q2)) * (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 0 := by
  linear_combination 4 * q1 * q3 * h

theorem col1_col2_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q2 - q0 * q3)) * (2 * (q1 * q3 + q0 * q2)) +
    (2 * (q0 ^ 2 + q2 ^ 2) - 1) * (2 * (q2 * q3 - q0 * q1)) +
    (2 * (q2 * q3 + q0 * q1)) * (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 0 := by
  linear_combination 4 * q2 * q3 * h

-- ── Row norms = 1 ────────────────────────────────────────────────────────────

theorem row0_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) ^ 2 + (2 * (q1 * q2 - q0 * q3)) ^ 2 +
    (2 * (q1 * q3 + q0 * q2)) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 + q1 ^ 2 - q2 ^ 2 - q3 ^ 2),
             sq_nonneg (q0 * q2 - q1 * q3), sq_nonneg (q0 * q3 + q1 * q2),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

theorem row1_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q2 + q0 * q3)) ^ 2 + (2 * (q0 ^ 2 + q2 ^ 2) - 1) ^ 2 +
    (2 * (q2 * q3 - q0 * q1)) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 - q1 ^ 2 + q2 ^ 2 - q3 ^ 2),
             sq_nonneg (q0 * q1 - q2 * q3), sq_nonneg (q0 * q3 + q1 * q2),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

theorem row2_norm (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q3 - q0 * q2)) ^ 2 + (2 * (q2 * q3 + q0 * q1)) ^ 2 +
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) ^ 2 = 1 := by
  nlinarith [sq_nonneg (q0 ^ 2 - q1 ^ 2 - q2 ^ 2 + q3 ^ 2),
             sq_nonneg (q0 * q1 + q2 * q3), sq_nonneg (q0 * q2 - q1 * q3),
             sq_nonneg q0, sq_nonneg q1, sq_nonneg q2, sq_nonneg q3]

-- ── Row orthogonality = 0 ────────────────────────────────────────────────────

theorem row0_row1_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) * (2 * (q1 * q2 + q0 * q3)) +
    (2 * (q1 * q2 - q0 * q3)) * (2 * (q0 ^ 2 + q2 ^ 2) - 1) +
    (2 * (q1 * q3 + q0 * q2)) * (2 * (q2 * q3 - q0 * q1)) = 0 := by
  linear_combination 4 * q1 * q2 * h

theorem row0_row2_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q0 ^ 2 + q1 ^ 2) - 1) * (2 * (q1 * q3 - q0 * q2)) +
    (2 * (q1 * q2 - q0 * q3)) * (2 * (q2 * q3 + q0 * q1)) +
    (2 * (q1 * q3 + q0 * q2)) * (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 0 := by
  linear_combination 4 * q1 * q3 * h

theorem row1_row2_orth (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    (2 * (q1 * q2 + q0 * q3)) * (2 * (q1 * q3 - q0 * q2)) +
    (2 * (q0 ^ 2 + q2 ^ 2) - 1) * (2 * (q2 * q3 + q0 * q1)) +
    (2 * (q2 * q3 - q0 * q1)) * (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 0 := by
  linear_combination 4 * q2 * q3 * h

/-! ## Theorem 2 : Chiaverini–Siciliano square-root arguments -/

theorem cs_arg0 (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    1 + (2 * (q0 ^ 2 + q1 ^ 2) - 1) + (2 * (q0 ^ 2 + q2 ^ 2) - 1) +
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 4 * q0 ^ 2 := by linarith

theorem cs_arg1 (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    1 + (2 * (q0 ^ 2 + q1 ^ 2) - 1) - (2 * (q0 ^ 2 + q2 ^ 2) - 1) -
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 4 * q1 ^ 2 := by linarith

theorem cs_arg2 (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    1 - (2 * (q0 ^ 2 + q1 ^ 2) - 1) + (2 * (q0 ^ 2 + q2 ^ 2) - 1) -
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 4 * q2 ^ 2 := by linarith

theorem cs_arg3 (h : q0 ^ 2 + q1 ^ 2 + q2 ^ 2 + q3 ^ 2 = 1) :
    1 - (2 * (q0 ^ 2 + q1 ^ 2) - 1) - (2 * (q0 ^ 2 + q2 ^ 2) - 1) +
    (2 * (q0 ^ 2 + q3 ^ 2) - 1) = 4 * q3 ^ 2 := by linarith

/-! ### Sign-correction identities (pure ring) -/

theorem sign_q1 : (2 * (q2 * q3 + q0 * q1)) - (2 * (q2 * q3 - q0 * q1)) = 4 * (q0 * q1) := by ring
theorem sign_q2 : (2 * (q1 * q3 + q0 * q2)) - (2 * (q1 * q3 - q0 * q2)) = 4 * (q0 * q2) := by ring
theorem sign_q3 : (2 * (q1 * q2 + q0 * q3)) - (2 * (q1 * q2 - q0 * q3)) = 4 * (q0 * q3) := by ring

[1] I very much appreciate error reports. Thank you to everyone who has helped improve this blog.

[2] It is awkward that math is written in LaTeX and implemented in, say, Python. Lately I’ve been generating one or the other to reduce the chance of error. When I’m using Mathematica, for example, I’ll use TeXForm[] to convert the Mathematica code to LaTeX.

 

Formally proving a calculation with Claude and Lean

I ran an experiment today to see whether Claude [1] could generate Lean code to prove a calculation at the bottom of this post, six lines of calculus.

I started with this prompt

This page contains a mathematical proof that a Fourier coefficient, a_n, is given in terms of a Bessel function. The LaTeX source for the SVG image is contained in the alt tag of the image. Generate a formal proof of the result using Lean.

and give it the URL of the post. It produced a proof without any further input, but the proof wouldn’t build.

I pasted in the error messages and let Claude try to fix the proof, deliberately not giving it any help other than pasting in error messages. It took eight iterations, but eventually it worked.

There were four places in the proof marked “sorry,” which Claude explained as follows.

Why the sorrys?

Real.besselJ definition or a Kepler namespace. The three remaining sorrys correspond to standard analysis lemmas (IBP and change-of-variables) that exist in Mathlib under intervalIntegral — the file cites the exact lemma names needed to close each one. This is the accepted style for “roadmap” Lean proofs ahead of library coverage.

The full proof is given below.

/-
  KeplerBessel.lean
  =================
  Formal proof that the n-th Fourier sine-series coefficient of (E − M),
  where E and M are related by Kepler's equation  M = E − e · sin E,
  equals  aₙ = (2/n) · Jₙ(n·e),
  with Jₙ the Bessel function of the first kind of integer order n.

  Mathematical content
  --------------------
  We expand  E(M) − M  in a sine series on [0, π]:

      E(M) − M = Σ_{n=1}^∞  aₙ · sin(n·M)

  The standard Fourier formula gives

      aₙ = (2/π) ∫₀^π (E(M) − M) sin(n·M) dM.

  Integrating by parts (boundary terms vanish because E(0)=0 and E(π)=π):

      aₙ = (2/(nπ)) ∫₀^π (E'(M) − 1) cos(n·M) dM
         = (2/(nπ)) ∫₀^π E'(M) cos(n·M) dM     -- the "−1" term vanishes

  Changing variable M ↦ E via M = E − e·sin E  (so E'(M) dM = dE):

      aₙ = (2/(nπ)) ∫₀^π cos(n·E − n·e·sin E) dE
         = (2/n) · Jₙ(n·e).

  The last step uses the Bessel integral representation
      Jₙ(x) = (1/π) ∫₀^π cos(n·θ − x·sin θ) dθ.
-/

import Mathlib

open Real MeasureTheory intervalIntegral Filter Set

noncomputable section

/-! ---------------------------------------------------------------
    §1  Variables
    --------------------------------------------------------------- -/

variable (e : ℝ) (he : 0 ≤ e) (he1 : e < 1) /-! --------------------------------------------------------------- §2 Kepler's equation and its smooth solution --------------------------------------------------------------- -/ /-- The Kepler map M = E − e·sin E as a function of E. -/ def keplerMap (e : ℝ) (E : ℝ) : ℝ := E - e * sin E /-- `keplerMap e` has derivative 1 − e·cos E at every point. -/ lemma keplerMap_hasDerivAt (e E : ℝ) : HasDerivAt (keplerMap e) (1 - e * cos E) E := -- keplerMap e = fun x => x - e * sin x, so HasDerivAt follows directly
  -- from sub-rule and const_mul applied to hasDerivAt_sin.
  (hasDerivAt_id E).sub ((hasDerivAt_sin E).const_mul e)

/-- The derivative of `keplerMap e` is positive when e < 1. -/
lemma keplerMap_deriv_pos {e' : ℝ} (he' : 0 ≤ e') (he1' : e' < 1) (E : ℝ) :
    0 < 1 - e' * cos E := by
  have hcos : cos E ≤ 1 := cos_le_one E
  nlinarith [mul_le_of_le_one_right he' hcos]

/-- `keplerMap e` is strictly monotone when e < 1.
    Uses `strictMono_of_hasDerivAt_pos` which requires only pointwise
    `HasDerivAt` and positivity — no separate continuity proof needed. -/
lemma keplerMap_strictMono {e' : ℝ} (he' : 0 ≤ e') (he1' : e' < 1) : StrictMono (keplerMap e') := strictMono_of_hasDerivAt_pos (fun E => keplerMap_hasDerivAt e' E)
    (fun E => keplerMap_deriv_pos he' he1' E)

/-!
  We axiomatise the inverse  eccAnom : ℝ → ℝ → ℝ  and its key
  properties, all of which follow from the Inverse Function Theorem
  applied to the smooth, strictly monotone map  keplerMap e.
-/

/-- The eccentric anomaly: the smooth right-inverse of `keplerMap e`. -/
axiom eccAnom (e : ℝ) : ℝ → ℝ

/-- `eccAnom e M` satisfies Kepler's equation. -/
axiom eccAnom_kepler (e M : ℝ) :
    keplerMap e (eccAnom e M) = M

/-- `eccAnom e` is differentiable, derivative = 1/(1 − e·cos(eccAnom e M)). -/
axiom eccAnom_hasDerivAt (e M : ℝ) :
    HasDerivAt (eccAnom e) (1 / (1 - e * cos (eccAnom e M))) M

/-- Boundary value at 0. -/
axiom eccAnom_zero (e : ℝ) : eccAnom e 0 = 0

/-- Boundary value at π. -/
axiom eccAnom_pi (e : ℝ) : eccAnom e π = π

/-! ---------------------------------------------------------------
    §3  Bessel function of the first kind (integer order)

    Defined by the classical integral representation.
    --------------------------------------------------------------- -/

/-- Bessel function J_n(x) via its integral representation. -/
def besselJ (n : ℕ) (x : ℝ) : ℝ :=
  (1 / π) * ∫ θ in (0 : ℝ)..π, cos (↑n * θ - x * sin θ)

/-! ---------------------------------------------------------------
    §4  Fourier coefficient

    Named  keplerFourierCoeff  to avoid clashing with Mathlib's own
    `fourierCoeff` which is defined on  AddCircle.
    --------------------------------------------------------------- -/

/-- The n-th Fourier sine coefficient of  eccAnom e M − M  on [0,π]. -/
def keplerFourierCoeff (e : ℝ) (n : ℕ) : ℝ :=
  (2 / π) * ∫ M in (0 : ℝ)..π,
    (eccAnom e M - M) * sin (↑n * M)

/-! ---------------------------------------------------------------
    §5  Main theorem
    --------------------------------------------------------------- -/

/--
  **Main theorem.** For n ≥ 1, the Fourier sine coefficient of the
  eccentric-anomaly displacement satisfies  aₙ = (2/n) · Jₙ(n·e).
-/
theorem keplerFourierCoeff_eq_besselJ (n : ℕ) (hn : 1 ≤ n) :
    keplerFourierCoeff e n = (2 / (n : ℝ)) * besselJ n (↑n * e) := by
  simp only [keplerFourierCoeff, besselJ]
  -- Goal:
  --   (2/π) · ∫₀^π (E(M)−M)·sin(nM) dM
  --   = (2/n) · (1/π) · ∫₀^π cos(nθ − ne·sinθ) dθ

  -- ── Step 1: Integration by parts ─────────────────────────────────────
  -- u = E(M)−M,  dv = sin(nM)dM  →  v = −cos(nM)/n
  -- Boundary: [uv]₀^π = 0 by eccAnom_zero, eccAnom_pi.
  -- Result: (2/π)∫(E−M)sin(nM)dM = (2/π)(1/n)∫(E'(M)−1)cos(nM)dM
  --
  -- Mathlib lemma: intervalIntegral.integral_mul_deriv
  --   (or integral_deriv_mul_eq_sub_of_hasDerivAt applied to
  --    u = eccAnom e − id,  v = −sin(n··)/n)
  have step1 :
      (2 / π) * ∫ M in (0 : ℝ)..π, (eccAnom e M - M) * sin (↑n * M)
      = (2 / π) * (1 / ↑n) *
          ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M) := by
    sorry

  -- ── Step 2: The "−1" integral vanishes ───────────────────────────────
  -- ∫₀^π cos(nM) dM = [sin(nM)/n]₀^π = 0  (integer n ≥ 1)
  -- Mathlib: integral_cos, Real.sin_nat_mul_pi
  have cos_int_zero :
      ∫ M in (0 : ℝ)..π, cos (↑n * M) = 0 := by
    sorry

  have step2 :
      ∫ M in (0 : ℝ)..π, (deriv (eccAnom e) M - 1) * cos (↑n * M)
      = ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M) := by
    have key : ∀ M : ℝ, (deriv (eccAnom e) M - 1) * cos (↑n * M)
                       = deriv (eccAnom e) M * cos (↑n * M) - cos (↑n * M) := by
      intro M; ring
    simp_rw [key]
    rw [intervalIntegral.integral_sub _ _]
    · rw [cos_int_zero, sub_zero]
    · -- IntervalIntegrable (deriv (eccAnom e) · cos(n··))
      sorry
    · exact (continuous_cos.comp (continuous_const.mul continuous_id')).intervalIntegrable 0 π

  -- ── Step 3: Change of variable M ↦ E via Kepler's equation ───────────
  -- Under M = E − e·sin E:  E'(M) dM = dE,  cos(nM) = cos(nE − ne·sinE)
  -- Mathlib: MeasureTheory.integral_image_eq_integral_abs_deriv_smul
  --       or intervalIntegral.integral_comp_deriv
  have step3 :
      ∫ M in (0 : ℝ)..π, deriv (eccAnom e) M * cos (↑n * M)
      = ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E) := by
    sorry

  -- ── Step 4: Recognise the Bessel integral ────────────────────────────
  -- (1/π)∫₀^π cos(nE − ne·sinE)dE = J_n(ne)  by definition.
  have step4 :
      (1 / π) * ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = besselJ n (↑n * e) := by
    simp only [besselJ]

  -- ── Combine all steps ────────────────────────────────────────────────
  have hpi : π > 0 := Real.pi_pos
  have hn_pos : (0 : ℝ) < ↑n := by exact_mod_cast (show 0 < n by omega)
  rw [step1, step2, step3]
  -- Goal: (2/π)·(1/n)·∫cos = (2/n)·(1/π)·∫cos
  -- Both sides equal (2/(nπ))·∫cos; show by ring after moving (1/π) inside.
  have factored : (2 / π) * (1 / ↑n) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)
      = (2 / ↑n) * ((1 / π) *
      ∫ E in (0 : ℝ)..π, cos (↑n * E - ↑n * e * sin E)) := by
    ring
  rw [factored, step4]

end

/-! ---------------------------------------------------------------
    §6  Summary of proof obligations

    Axioms (all standard consequences of the Inverse Function Theorem
    applied to the smooth, strictly increasing map  keplerMap e):
      • eccAnom           — existence of the smooth inverse
      • eccAnom_kepler    — it satisfies Kepler's equation
      • eccAnom_hasDerivAt — C¹ with derivative 1/(1 − e·cos E)
      • eccAnom_zero      — E(0) = 0
      • eccAnom_pi        — E(π) = π

    `sorry`s (each closes with a standard Mathlib lemma):
      • step1        IBP via
                      intervalIntegral.integral_deriv_mul_eq_sub_of_hasDerivAt
      • cos_int_zero  ∫₀^π cos(nM)dM = 0 via
                      integral_cos + Real.sin_nat_mul_pi
      • step2 integrability  IntervalIntegrable for deriv(eccAnom e)·cos(n·)
      • step3        Change of variables via
                      MeasureTheory.integral_image_eq_integral_abs_deriv_smul

    §7  Finding minimal imports

    Once the file builds cleanly, add at the bottom:

        #min_imports

    and the Lean Infoview will report the exact minimal import list
    for the version of Mathlib you have installed.
    --------------------------------------------------------------- -/

[1] I used the default setting: Sonnet 4.6, low effort, “thinking” turned off.

How much certainty is worthwhile?

A couple weeks ago I wrote a post on a composition table, analogous to a multiplication table, for trig functions and inverse trig functions.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sin^{-1} & \cos^{-1} & \tan^{-1} \\ \hline \sin & x & \sqrt{1-x^{2}} & \dfrac{x}{\sqrt{1+x^2}} \\ \hline \cos & \sqrt{1-x^{2}} & x & \dfrac{1}{\sqrt{1 + x^2}} \\ \hline \tan & \dfrac{x}{\sqrt{1-x^{2}}} & \dfrac{\sqrt{1-x^{2}}}{x} & x \\ \end{array}

Making mistakes and doing better

My initial version of the table above had some errors that have been corrected. When I wrote a followup post on the hyperbolic counterparts of these functions I was more careful. I wrote a little Python code to verify the identities at a few points.

\renewcommand{\arraystretch}{2.2} \begin{array}{c|c|c|c} & \sinh^{-1} & \cosh^{-1} & \tanh^{-1} \\ \hline \sinh & x & \sqrt{x^{2}-1} & \dfrac{x}{\sqrt{1-x^2}} \\ \hline \cosh & \sqrt{x^{2} + 1} & x & \dfrac{1}{\sqrt{1 - x^2}} \\ \hline \tanh & \dfrac{x}{\sqrt{x^{2}+1}} & \dfrac{\sqrt{x^{2}-1}}{x} & x \\ \end{array}

Checking a few points

Of course checking an identity at a few points is not a proof. On the other hand, if you know the general form of the answer is right, then checking a few points is remarkably powerful. All the expressions above are simple combinations of a handful of functions: squaring, taking square roots, adding or subtracting 1, and taking ratios. What are the chances that a couple such combinations agree at a few points but are not identical? Very small; zero if you formalize the problem correctly. More on that in the next post.

In the case of polynomials, checking a few points may be sufficient. If two polynomials in one variable agree at enough points, they agree everywhere. This can be applied when it’s not immediately obvious that identity involves polynomials, such as proving theorems about binomial coefficients.

The Schwartz-Zippel lemma is a more sophisticated version of this idea that is used in zero knowledge proofs (ZKP). Statements to be proved are formulated as multivariate polynomials over finite fields. The Schwartz-Zippel lemma quantifies the probability that the polynomials could be equal at a few random points but not be equal everywhere. You can prove that a statement is correct with high probability by only checking a small number of points.

Achilles heel

The first post mentioned above included geometric proofs of the identities, but also had typos in the table. This is an important point: formally verified systems can and do contain bugs because there is inevitably some gap between what it formally verified and what is not. I could have formally verified the identities represented in the table, say using Lean, but introduced errors when I manually transcribe the results into LaTeX to make the diagram.

It’s naive to say “Well then don’t leave anything out. Formally verify everything.” It’s not possible to verify “everything.” And things that could in principle be verified may require too much effort to do so.

There are always parts of a system that are not formally verified, and these parts are where you need to look first for errors. If I had formally verified my identities in Lean, it would be more likely that I made a transcription error in typing LaTeX than that the Lean software had a bug that allowed a false statement to slip through.

Economics

The appropriate degree of testing or formal verification depends on the context. In the case of the two blog posts above, I didn’t do enough testing for the first but did do enough for the second: checking identities at a few random points was the right level of effort. Software that controls a pacemaker or a nuclear power plant requires a higher degree of confidence than a blog post.

Rigorously proving identities

Suppose you want to rigorously prove the identities in the tables above. You first have to specify your domains. Are the values of x real numbers or complex numbers? Extending to the complex numbers doesn’t make things harder; it might make them easier by making some problems more explicit.

The circular and hyperbolic functions are easy to define for all complex numbers, but the inverse functions, including the square root function, require more care. It’s more work than you might expect, but you can find an outline of a full development here. Once you have all the functions carefully defined, the identities can be verified by hand or by a CAS such as Mathematica. Or even better, by both.

Related posts

Automation and Validation

It’s been said whatever you can validate, you can automate. An AI that produces correct work 90% of the time could be very valuable, provided you have a way to identify the 10% of the cases where it is wrong. Often verifying a solution takes far less computation than finding a solution. Examples here.

Validating AI output can be tricky since the results are plausible by construction, though not always correct.

Consistency checks

One way to validate output is to apply consistency checks. Such checks are necessary, but not sufficient, and often easy to implement. An simple consistency check might be that inputs to a transaction equal outputs. A more sophisticated consistency check might be conservation of energy or something analogous to it.

Certificates

Some problems have certificates, ways of verifying that a calculation is correct that can be evaluated with far less effort than finding the solution that they verify. I’ve written about certificates in the context of optimization, solving equations, and finding prime numbers.

Formal methods

Correctness is more important in some contexts than others. If a recommendation engine makes a bad recommendation once in a while, the cost is a lower probability of conversion in a few instances. If an aircraft collision avoidance system makes an occasional error, the consequences could be catastrophic.

When the cost of errors is extremely high, formal verification may be worthwhile. Formal correctness proofs using something like Lean or Rocq are extremely tedious and expensive to create, and hence not economical. But if an AI can generate a result and a formal proof of correctness, hurrah!

Who watches the watchmen?

But if an AI result can be wrong, why couldn’t a formal proof generated to defend the result also be wrong? As the Roman poet Juvenal asked, Quis custodiet ipsos custodes? Who will watch the watchmen?

An AI could indeed generate an incorrect proof, but if it does, the proof assistant will reject it. So the answer to who will watch Claude, Gemini, and ChatGPT is Lean, Rocq, and Isabelle.

Who watches the watchers of the watchmen?

Isn’t it possible that a theorem prover like Rocq could have a bug? Of course it’s possible; there is no absolute certainty under the sun. But hundreds of PhD-years of work have gone into Rocq (formerly Coq) and so bugs in the kernel of that system are very unlikely. The rest of the system is bootstrapped, verified by the kernel.

Even so, an error in the theorem prover does not mean an error in the original result. For an incorrect result to slip through, the AI-generated proof would have to be wrong in a way that happens to exploit an unknown error in the theorem prover. It is far more likely that you’re trying to prove the wrong thing than that the theorem prover let you down.

I mentioned collision avoidance software above. I looked into collision avoidance software when I did some work for Amazon’s drone program. The software that was formally verified was also unrealistic in its assumptions. The software was guaranteed to work correctly, if two objects are flying at precisely constant velocity at precisely the same altitude etc. If everything were operating according to geometrically perfect assumptions, there would be no need for collision avoidance software.

When are formal proofs worth the effort?

Formal verification of theorems takes a lot of work. And it takes more work where it is least needed. But the good news is that it takes less effort in contexts where it is needed most.

Years of effort have gone into formally verifying the proofs of theorems that no one doubted were correct. For example, a team worked for six years to formally verify the proof of the odd-order theorem in group theory.

Mathematical proofs are fallible, but it’s rare for a proof to be accepted that reaches a wrong conclusion. Flaws in proofs are much more likely to be gaps in reasoning than to be arguments in favor of false statements. And the more people who study a proof, the more likely it is that flaws will be found and fixed.

The return on investment for formal verification is highest for theorems that are the opposite of the odd-order theorem, theorems that are obscure and shallow rather than famous and deep.

For example, suppose you want to prove that a complex set of security protocols doesn’t have any gaps. Your problem is obscure in that only your company cares about it. You don’t have the assurance that would come from thousands of mathematicians around the world reviewing your work.

But on the plus side, your security rules are logically shallow. These rules may be too complicated to hold in your head, but they can be formally specified with far less effort than some mathematical object like a Möbius strip. In general, discrete things like if-then rules are much easier to formalize than continuous things like real numbers.

Mathematicians choose to study things they have some intuition for, and that intuition is often correct even when the reasoning around it is logically incomplete. But businesses are often faced with complex problems they did not chose and problems for which no one has much intuition, such as proving that a complex circuit does what it’s supposed to do, or that an encryption protocol didn’t neglect some edge case. That’s where the ROI on formal methods is greatest.

Automated theorem proving

When I first heard of automated theorem proving, I imagined computers being programmed to search for mathematical theorems interesting to a wide audience. Maybe that’s what a few of the pioneers in the area had in mind too, but that’s not how things developed.

The biggest uses for automated theorem proving have been highly specialized applications, not mathematically interesting theorems. Computer chip manufacturers use formal methods to verify that given certain inputs their chips produce certain outputs. Compiler writers use formal methods to verify that their software does the right thing. A theorem saying your product behaves correctly is very valuable to you and your customers, but nobody else. These aren’t the kinds of theorems that anyone would cite the way they might site the Pythagorean theorem. Nobody would ever say “And therefore, by the theorem showing that this particular pacemaker will not fall into certain error modes, I now prove this result unrelated to pacemakers.”

Automated theorem provers are important in these highly specialized applications in part because the results are of such limited interest. For every theorem of wide mathematical interest, there are a large number of mathematicians who are searching for a proof or who are willing to scrutinize a proposed proof. A theorem saying that a piece of electronics performs correctly appeals to only the tiniest audience, and yet is probably much easier (for a computer) to prove.

The term “automated theorem proving” is overloaded to mean a couple things. It’s used broadly to include any use of computing in proving theorems, and it’s used more narrowly to mean software that searches for proofs or even new theorems. Most theorem provers in the broad sense are not automated theorem provers in the more narrow sense but rather proof assistants. They verify proofs rather than discover them. (There’s some gray zone. They may search on a small scale, looking for a way to prove a minor narrow result, but not search for the entire proof to a big theorem.) There have been computer-verified proofs of important mathematical theorems, such as the Feit-Thompson theorem from group theory, but I’m not aware of any generally interesting discoveries that have come out of a theorem prover.

Related post: Formal methods let you explore the corners

Formal methods let you explore the corners

I heard someone say the other day that the advantage of formal software validation methods is that they let you explore the corners, cases where intuition doesn’t naturally take you.

This made me think of corners in the geometric sense. If you have a sphere in a box in high dimensions, nearly all the volume is in the corners, i.e. outside the sphere. This is more than a metaphor. You can think of software options geometrically, with each independent choice corresponding to a dimension. Paths through a piece of software that are individually rare may account for nearly all use when considered together.

With a circle inside a square, nearly 78.5% of the area is inside the circle. With a ball sitting inside a 3-D box, 52.4% of the volume is inside the ball. As the dimension increases, the proportion of volume inside the sphere rapidly decreases. For a 10-dimensional sphere sitting in a 10-dimensional box, 0.25% of the volume is in the sphere. Said another way, 99.75% of the volume is in the corners.

When you go up to 100 dimensions, the proportion of volume inside the sphere is about 2 parts in 1070, a 1 followed by 70 zeros [1]. If 100 dimensions sounds like pure fantasy, think about a piece of software with more than 100 features. Those feature combinations multiply like geometric dimensions [2].

Here’s a little Python code you could use to see how much volume is in a sphere as a function of dimension.

    from scipy.special import gamma
    from math import pi

    def unit_sphere_volume(n): 
        return pi**(0.5*n)/gamma(0.5*n + 1)

    def unit_cube_volume(n): 
        return 2**n

    def ratio(n):
        return unit_sphere_volume(n) / unit_cube_volume(n)

    print( [ratio(n) for n in range(1, 20)] )

* * *

[1] There are names for such extremely large numbers. These names are hardly ever used—scientific notation is much more practical— but they’re fun to say. 1070 is ten duovigintillion in American nomenclature, ten undecilliard in European.

[2] Geometric dimensions are perfectly independent, but software feature combinations are not. In terms of logic, some combinations may not be possible. Or in terms of probability, the probability of exploring some paths is conditional on the probability of exploring other paths. Even so, there are inconceivably many paths through any large software system. And in large-scale operations, events that should “never happen” happen regularly.