Striving for simplicity, arriving at complexity

This post is a riff on a line from Mathematics without Apologies, the book I quoted yesterday.

In an all too familiar trade-off, the result of striving for ultimate simplicity is intolerable complexity; to eliminate too-long proofs we find ourselves “hopelessly lost” among the too-long definitions. [emphasis added]

It’s as if there’s some sort of conservation of complexity, but not quite in the sense of a physical conservation law. Conservation of momentum, for example, means that if one part of a system loses 5 units of momentum, other parts of the system have to absorb exactly 5 units of momentum. But perceived complexity is psychological, not physical, and the accounting is not the same. By moving complexity around we might increase or decrease the overall complexity.

The opening quote suggests that complexity is an optimization problem, not an accounting problem. It also suggests that driving the complexity of one part of a system to its minimum may disproportionately increase the complexity of another part. Striving for the simplest possible proofs, for example, could make the definitions much harder to digest. There’s a similar dynamic in programming languages and programs.

Larry Wall said that Scheme is a beautiful programming language, but every Scheme program is ugly. Perl, on the other hand, is ugly, but it lets you write beautiful programs. Scheme can be simple because it requires libraries and applications to implement functionality that is part of more complex languages. I had similar thoughts about COM. It was an elegant object system that lead to hideous programs.

Scheme is a minimalist programming language, and COM is a minimalist object framework. By and large the software development community prefers complex languages and frameworks in hopes of writing smaller programs. Additional complexity in languages and frameworks isn’t felt as strongly as additional complexity in application code. (Until something breaks. Then you might have to explore parts of the language or framework that you had blissfully ignored before.)

The opening quote deals specifically with the complexity of theorems and proofs. In context, the author was saying that the price of Grothendieck’s elegant proofs was a daunting edifice of definitions. (More on that here.) Grothendieck may have taken this to extremes, but many mathematicians agree with the general approach of pushing complexity out of theorems and into definitions. Michael Spivak defends this approach in the preface to his book Calculus on Manifolds.

… the proof of [Stokes’] theorem is, in the mathematician’s sense, an utter triviality — a straight-forward calculation. On the other hand, even the statement of this triviality cannot be understood without a horde of definitions … There are good reasons why the theorems should all be easy and the definitions hard. As the evolution of Stokes’ theorem revealed, a single simple principle can masquerade as several difficult results; the proofs of many theorems involve merely stripping away the disguise. The definitions, on the other hand, serve a twofold purpose: they are rigorous replacements for vague notions, and machinery for elegant proofs. [emphasis added]

Mathematicians like to push complexity into definitions like software developers like to push complexity into languages and frameworks. Both strategies can make life easier on professionals while making it harder on beginners.

Related post: A little simplicity goes a long way

Problem solvers and theory builders

From Mathematics without Apologies:

It’s conventional to classify mathematicians as “problem solvers” or “theory builders,” depending on temperament. My experiences and the sources I consulted in writing this book convince me that curiosity about problems guides the growth of theories, rather than the other way around. Alexander Grothendieck and Robert Langlands … count among the most ambitious of all builders of mathematical theories, but everything they built was addressed to specific problems with ancient roots.

Related post: Examples bring a subject to life

Counting primitive bit strings

A string of bits is called primitive if it is not the repetition of several copies of a smaller string of bits. For example, the 101101 is not primitive because it can be broken down into two copies of the string 101. In Python notation, you could produce 101101 by "101"*2. The string 11001101, on the other hand, is primitive. (It contains substrings that are not primitive, but the string as a whole cannot be factored into multiple copies of a single string.)

For a given n, let’s count how many primitive bit strings there are of length n. Call this f(n). There are 2n bit strings of length n, and f(n) of these are primitive. For example, there are f(12) primitive bit strings of length 12. The strings that are not primitive are made of copies of primitive strings: two copies of a primitive string of length 6, three copies of a primitive string of length 4, etc. This says

 2^{12} = f(12) + f(6) + f(4) + f(3) + f(2) + f(1)

and in general

2^n = \sum_{d \mid n} f(d)

Here the sum is over all positive integers d that divide n.

Unfortunately this formula is backward. It gives is a formula for something well known, 2n, as a sum of things we’re trying to calculate. The Möbius inversion formula is just what we need to turn this formula around so that the new thing is on the left and sums of old things are on the right. It tells us that

f(n) = \sum_{d \mid n} \mu\left(\frac{n}{d}\right) 2^d

where μ is the Möbius function.

We could compute f(n) with Python as follows:

from sympy.ntheory import mobius, divisors

def num_primitive(n):
    return sum( [mobius(n/d)*2**d for d in divisors(n)] )

The latest version of SymPy, version 0.7.6, comes with a function mobius for computing the Möbius function. If you’re using an earlier version of SymPy, you can roll your own mobius function:

from sympy.ntheory import factorint

def mobius(n):
    exponents = factorint(n).values()
    lenexp = len(exponents)
    m = 0 if lenexp == 0 else max(exponents)
    return 0 if m > 1 else (-1)**lenexp

The version of mobius that comes with SymPy 0.7.6 may be more efficient. It could, for example, stop the factorization process early if it discovers a square factor.

How medieval astronomers made trig tables

How would you create a table of trig functions without calculators or calculus?

It’s not too hard to create a table of sines at multiples of 3°. You can use the sum-angle formula for sines

sin(α+β) = sin α cos β + sin β cos α.

to bootstrap your way from known values to other values. Elementary geometry gives you the sines of 45° and 30°, and the sum-angle formula will then give you the sine of 75°. From Euclid’s construction of a 5-pointed star you can find the sine of 72°. Then you can use the sum-angle formula to find the sine of 3° from the sines of 75° and 72°. Ptolemy figured this out in the 2nd century AD.

But if you want a table of trig values at every degree, you need to find the sine of 1°. If you had that, you could bootstrap your way to every other integer number of degrees. Ptolemy had an approximate solution to this problem, but it wasn’t very accurate or elegant.

The Persian astronomer Jamshīd al-Kāshī had a remarkably clever solution to the problem of finding the sine of 1°. Using the sum-angle formula you can find that

sin 3θ = 3 sin θ – 4 sin3 θ.

Setting θ = 1° gives you a cubic equation for the unknown value of sin 1° involving the known value of sin 3°. However, the cubic formula wasn’t discovered until over a century after al-Kāshī. Instead, he used a numerical algorithm more widely useful than the cubic formula: finding a fixed point of an iteration!

Define f(x) = (sin 3° + 4x3)/3. Then sin 1° is a fixed point of f. Start with an approximate value for sin 1° — a natural choice would be (sin 3°)/3 — and iterate. Al-Kāshī used this procedure to compute sin 1° to 16 decimal places.

Here’s a little Python code to play with this algorithm.

from numpy import sin, deg2rad

sin3deg = sin(deg2rad(3))

def f(x):
    return (sin3deg + 4*x**3)/3

x = sin3deg/3
for i in range(4):
    x = f(x)

This shows that after only three iterations the method has converged to floating point precision, which coincidentally is about 16 decimal places, the same as al-Kāshī’s calculation.

Source: Heavenly Mathematics: The Forgotten Art of Spherical Trigonometry


Roughly speaking, an ergodic system is one that mixes well. You get the same result whether you average its values over time or over space.

This morning I ran across the etymology of the word:

In the late 1800s, the physicist Ludwig Boltzmann needed a word to express the idea that if you took an isolated system at constant energy and let it run, any one trajectory, continued long enough, would be representative of the system as a whole. Being a highly-educated nineteenth century German-speaker, Boltzmann knew far too much ancient Greek, so he called this the “ergodic property”, from ergon “energy, work” and hodos “way, path.” The name stuck.

Found here, footnote on page 479.

Other etymological footnotes:


Googol and googolplex

Numericon gives the history of the words googol and googolplex:

… the famous googol, 10100 (a 1 followed by 100 zeros), defined in 1929 by American mathematician Edward Kasner and named by his nine-year-old nephew, Milton Sirotta. Milton went even further and came up with the googolplex, now defined as 10googol but initially defined by Milton as a 1, followed by writing zeros until you get tired.

Related post: There isn’t a googol of anything

John Napier

Julian Havil has written a new book John Napier: Life, Logarithms, and Legacy.

I haven’t read more than the introduction yet — a review copy arrived just yesterday — but I imagine it’s good judging by who wrote it. Havil’s book Gamma is my favorite popular math book. (Maybe I should say “semi-popular.” Havil’s books have more mathematical substance than most popular books, but they’re still aimed at a wide audience. I think he strikes a nice balance.) His latest book is a scientific biography, a biography with an unusual number of equations and diagrams.

Napier is best known for his discovery of logarithms. (People debate endlessly whether mathematics is discovered or invented. Logarithms are so natural — pardon the pun — that I say they were discovered. I might describe other mathematical objects, such as Grothendieck’s schemes, as inventions.) He is also known for his work with spherical trigonometry, such as Napier’s mnemonic. Maybe Napier should be known for other things I won’t know about until I finish reading Havil’s book.

Mathematical beauty

Michael Atiyah quoted Hermann Weyl in the opening talk at the second Heidelberg Laureate Forum:

I believe there is, in mathematics, in contrast to the experimental disciplines, a character which is nearer to that of free creative art.

There is evidence that the relation of artistic beauty and mathematical beauty is more than an analogy. Michael Atiyah recently published a paper with Semir Zeki et al that suggests the same part of the brain responds to both.

Sum of geometric means

Let xn be a sequence of non-negative numbers. Then the sum of their running geometric means is bounded by e times their sum. In symbols

\sum_{n=1}^\infty \left(x_1 x_2 \cdots x_n\right)^{1/n} \leq e \sum_{n=1}^\infty x_n

The inequality is strict unless all the x‘s are zero, and the constant e on the right side is optimal. Torsten Carleman proved this theorem in 1923.

A subway topologist

One of my favorite books when I was growing up was the Mathematics volume in the LIFE Science Library. I didn’t own the book, but my uncle did, and I’d browse through the book whenever I visited him. I was too young at the time to understand much of what I was reading.

One of the pages that stuck in my mind was a photo of Samuel Eilenberg. His name meant nothing to me at the time, but the caption titled “A subway topologist” caught my imagination.

… Polish-born Professor Samuel Eilenberg sprawls contemplatively in his Greenwich Village apartment in New York City. “Sometimes I like to think lying down,” he says, “but mostly I like to think riding on the subway.” Mainly he thinks about algebraic topology — a field so abstruse that even among mathematicians few understand it. …

I loved the image of Eilenberg staring intensely at the ceiling or riding around on a subway thinking about math. Since then I’ve often thought about math while moving around, though usually not on a subway. I’ve only lived for a few months in an area with a subway system.

The idea that a field of math would be unknown to many mathematicians sounded odd. I had no idea at the time that mathematicians specialized.

Algebraic topology doesn’t seem so abstruse now. It’s a routine graduate course and you might get an introduction to it in an undergraduate course. The book was published in 1963, and I suppose algebraic topology would have been more esoteric at the time.


Making change

How many ways can you make change for a dollar? This post points to two approaches to the problem, one computational and one analytic.

SICP gives a Scheme program to solve the problem:

(define (count-change amount) (cc amount 5))

(define (cc amount kinds-of-coins)
    (cond ((= amount 0) 1)
    ((or (< amount 0) (= kinds-of-coins 0)) 0)
    (else (+ (cc amount
                 (- kinds-of-coins 1))
             (cc (- amount

(define (first-denomination kinds-of-coins)
    (cond ((= kinds-of-coins 1) 1)
          ((= kinds-of-coins 2) 5)
          ((= kinds-of-coins 3) 10)
          ((= kinds-of-coins 4) 25)
          ((= kinds-of-coins 5) 50)))

Concrete Mathematics explains that the number of ways to make change for an amount of n cents is the coefficient of z^n in the power series for the following:

\frac{1}{(1 - z)(1 - z^5)(1 - z^{10})(1 - z^{25})(1 - z^{50})}

Later on the book gives a more explicit but complicated formula for the coefficients.

Both show that there are 292 ways to make change for a dollar.

Iterative linear solvers as metaphor

Gaussian elimination is systematic way to solve systems of linear equations in a finite number of steps. Iterative methods for solving linear systems require an infinite number of steps in theory, but may find solutions faster in practice.

Gaussian elimination tells you nothing about the final solution until it’s almost done. The first phase, factorization, takes O(n^3) steps, where n is the number of unknowns. This is followed by the back-substitution phase which takes O(n^2) steps. The factorization phase tells you nothing about the solution. The back-substitution phase starts filling in the components of the solution one at a time. In application n is often so large that the time required for back-substitution is negligible compared to factorization.

Iterative methods start by taking a guess at the final solution. In some contexts, this guess may be fairly good. For example, when solving differential equations, the solution from one time step gives a good initial guess at the solution for the next time step. Similarly, in sequential Bayesian analysis the posterior distribution mode doesn’t move much as each observation arrives. Iterative methods can take advantage of a good starting guess while methods like Gaussian elimination cannot.

Iterative methods take an initial guess and refine it to a better approximation to the solution. This sequence of approximations converges to the exact solution. In theory, Gaussian elimination produces an exact answer in a finite number of steps, but iterative methods never produce an exact solution after any finite number of steps. But in actual computation with finite precision arithmetic, no method, iterative or not, ever produces an exact answer. The question is not which method is exact but which method produces an acceptably accurate answer first. Often the iterative method wins.

Successful projects often work like iterative numerical methods. They start with an approximation solution and iteratively refine it. All along the way they provide a useful approximation to the final product. Even if, in theory, there is a more direct approach to a final product, the iterative approach may work better in practice.

Algorithms iterate toward a solution because that approach may reach a sufficiently accurate result sooner. That may apply to people, but more important for people is the psychological benefit of having something to show for yourself along the way. Also, iterative methods, whether for linear systems or human projects, are robust to changes in requirements because they are able to take advantage of progress made toward a slightly different goal.

Related post: Ten surprises from numerical linear algebra

Every exercise in the book

When I did an independent study course with Ted Odell, he told me to get a copy of De Vito’s Functional Analysis and work every exercise. I don’t recall whether I actually worked every problem, though I believe I at least did most of them. I heard of someone who learned algebraic geometry by working every problem in Hartshorne.

Doing all the exercises in a book isn’t a bad way to learn something, though it depends on the book, what you’re trying to accomplish, and on the quality and quantity of the exercises.

Have you ever gone through a book working every exercise? If so, what book? How was your experience?



Swedish superellipse

Sergels torg, Stockholm, Sweden. Photo via Wikipedia.

The fountain in Sergels torg (English: Sergel’s Square) in Sockholm is in the shape of a “superellipse.” Piet Hein proposed this shape as a practical and aesthetically pleasing compromise between a circle and a rectangle.

What Piet Hein dubbed a superellipse is a curve satisfying

|x/a|2.5 + |y/b|2.5 = 1

In the case of Sergels torg, a/b = 6/5. A superellipse is what an analyst would call an Lp ball with p = 2.5 and rescaled axes.

Incidentally, I used curves of this form in a dose-finding method a few years ago. A clinical trial design would pick a, b, and p to match a physician’s desired trade-off between probability of efficacy and probability of toxicity.

Update: A few minutes after I posted this, I realized that there was a bowl on my desk shaped like a superellipse:

The bowl came from Ikea, so there’s another connection to Sweden.

Related post: Volumes of Lp balls

Sometimes definitions are enough

Sometimes you can apply math just by raiding it for vocabulary. You may not need to apply a single theorem.

This has been a surprise to me. I’m more used to creating a mathematical model so you can compute something or apply some theorem. But sometimes you can move a project along just by providing a name for a concept. A meandering discussion can snap into focus because someone has a name for an idea everyone vaguely understands.

Sometimes it may be clear that only part of a mathematical definition applies. In this case math can guide the discussion by asking whether the rest of the definition applies. “It sounds like we’ve got a widget here. A widget has to have these five properties and clearly we have the first three. Let’s think about whether the last two hold.” The answers don’t have to be positive to be useful. You might realize something important in the process of explaining why your thing is not a widget.

Sometimes a definition may not apply at all and still be useful! “This reminds me of a widget. It’s not a widget in any strict sense. But if it were, this is what we’d do next. I wonder whether we can do something like that.”