Programmers like highly expressive programming languages, but programming managers do not. I wrote about this on Twitter a few months ago.

Q: Why do people like Lisp so much?

A: Because Lisp is so expressive.

Q: Why don’t teams use Lisp much?

A: Because Lisp is so expressive.

Q: Why do programmers complain about Java?

A: Because it’s not that expressive.

Q: Why do businesses use Java?

A: Because it’s not that expressive.

A highly expressive programming language offers lots of options. This can be a good thing. It makes programming more fun, and it can lead to better code. But it can also lead to more idiosyncratic code.

A large programming language like Perl allows developers to carve out language subsets that hardly overlap. A team member has to learn not only the parts of the language he understands and wants to use, but also all the parts that his colleagues might use. And those parts that he might accidentally use.

While Perl has maximal syntax, Lisp has minimal syntax. But Lisp is also very expressive, albeit in a different way. Lisp makes it very easy to extend the language via macros. While Perl is a big language, Lisp is an extensible language. This can also lead to each programmer practically having their own language.

With great expressiveness comes great responsibility. A team using a highly expressive language needs to develop conventions for how the language will be used in order to avoid fracturing into multiple de facto languages.

But what if you’re a team of one? Now you don’t need to be as concerned how other people use your language. You still may need to care somewhat. You want to be able to grab sample code online, and you may want to share code or ask others for help. It pays not to be entirely idiosyncratic, though you’re free to wander further from the mainstream.

Even when you’re working in a team, you still may have code that only you use. If your team is producing C# code, and you secretively use a Perl script to help you find things in the code, no one needs to know. On the other hand, there’s a tendency for personal code to become production code, and so personal tools in a team environment are tricky.

But if you’re truly working by yourself, you have great freedom in your choice of tools. This can take a long time to sort out when you leave a team environment to strike out on your own. You may labor under your previous restrictions for a while before realizing they’re no longer necessary. At the same time, you may choose to stick to your old tools, not because they’re optimal for your new situation, but because it’s not worth the effort to retool.

Related posts

(Regarding the last link, think myth as in Joseph Campbell, not myth as in Myth Busters.)

From shell to system

Routine computer tasks and system programming require different tools, though I’m not entirely sure why.

Many people have thought about how inconsistent shells and system programming languages are and tried to unite them. Wouldn’t it be nice to use one language for everything? But attempts to bring system languages down to the shell, or to push shell programming up to large programs, have not been very successful.

I learned Perl in college so I wouldn’t have to learn shell programming. That’s what Perl was initially designed to be: an alternative to shell scripting. Larry Wall called Perl a “distillation of Unix culture.”

Perl is the most disliked programming language according to Stack Overflow. And yet I imagine many who complain about Perl gladly use the menagerie of quirky tools that Perl was created to unify. Bash is popular while Perl is unpopular, and yet the quirkiest parts of Perl are precisely those it shares with bash.

I expect much of the frustration with Perl comes from using it as a language for writing larger programs. Perl is very terse and expressive. These features are assets for one-liners and individual use. They are liabilities for large programs and team development.

Compared to a system programming language like Java, Perl is complex, inconsistent, and unsafe. But compared to shell scripting, Perl is simple, consistent, and safe!

Related posts

A warped perspective on math history

Yesterday I posted on @TopologyFact

The uniform limit of continuous functions is continuous.

John Baez replied that this theorem was proved by his “advisor’s advisor’s advisor’s advisor’s advisor’s advisor.” I assume he was referring to Christoph Gudermann.

The impressive thing is not that Gudermann was able to prove this simple theorem. The impressive thing is that he saw the need for the concept of uniform convergence. My impression from reading the Wikipedia article on uniform convergence is that Gudermann alluded to uniform convergence in passing and didn’t explicitly define it or formally prove the theorem above. He had the idea and applied it but didn’t see the need to make a fuss about it. His student Karl Weierstrass formalized the definition and saw how generally useful the concept was.

It’s easy for a student to get a warped perspective of math history. You might implicitly assume that mathematics was developed in the order that you learn it. If as a student you learn about uniform convergence and that the term was coined around 1840, you might reasonably conclude that in 1840 mathematicians were doing what is now sophomore-level math, which is far from true.

Gudermann tossed off the idea of uniform convergence in passing while working on elliptic functions, a topic I wasn’t exposed to until sometime after graduate school. My mathematics education was more nearly reverse-chronological than chronological. I learned 20th century mathematics in school and 19th century mathematics later. Much of the former was a sort of dehydrated abstraction of the latter. Much of my career has been rehydrating, discovering the motivation for and application of ideas I was exposed to as a student.

Related posts

Decomposing functions of many variables to functions of one variable

Suppose you have a computer that can evaluate and compose continuous functions of one real variable and can do addition. What kinds of functions could you compute with it? You could compute functions of one variable by definition, but could you bootstrap it to compute functions of two variables?

Here’s an example that shows this computer might be more useful than it seems at first glance. We said it can do addition, but can it multiply? Indeed it can [1].

We can decompose the function

f(x, y) = xy


f(x, y) = g_1(g_2(x) + g_2(y)) + g_3(x) + g_3(y)


\begin{align*} g_1(x) &= \exp(x) \\ g_2(x) &= \log(x+1) \\ g_3(x) &= -x - \frac{1}{2} \end{align*}

So multiplication of two variables can be decomposed into the addition and composition of functions of one variable. What other functions of two variables can be decomposed this way? What about functions of three or more variables?

The Kolmogorov-Arnol’d theorem says that all continuous functions, of however many variables you like, can be decomposed this way.

Here is the original version of the Kolmogorov-Arnol’d theorem from 1957:

Let f : [0,1]nR be an arbitrary multivariate continuous function. Then it has the representation

f(x_1  \ldots, x_n ) = \sum_{q=0}^{2n} \Phi_q\left( \sum_{p=1}^n \psi_{q,p}(x_p) \right)

where the univariate functions Φq and ψq,p are continuous and defined on the real line. Furthermore the ψq,p can be chosen independent of f.

Later refinements showed that it is possible to get by with a single outer function Φ depending on f , but independent of q [1].

The original theorem was not constructive, but later Jürgen Braun and Michael Griebel gave a constructive proof [2].


  1. Sidney A. Morris. “Hilbert 13: Are there any genuine continuous multivariate real-valued functions?” Journal: Bull. Amer. Math. Soc. DOI: Posted online July 6, 2020
  2. Jürgen Braun, Michael Griebel. On a Constructive Proof of Kolmogorov’s Superposition Theorem. Constructive Approximation 30, 653 (2009).

Eliminating polynomial terms

The first step in solving a cubic equation is to apply a change of variables to reduce an equation of the form

x³ + bx² + cx + d = 0

to one of the form

y³ + py + q = 0.

This process can be carried further through Tschirnhausen transformations, a generalization of an idea going back to Ehrenfried Walther von Tschirnhaus in 1683.

For a polynomial of degree n > 4, a Tschirnhausen transformations is a rational change of variables

y = g(x) / h(x)

turning the equation

xn + an-1 xn-1 + an-2 xn-2 + … + a0 = 0


yn + bn-4 yn-4 + bn-5 yn-5 + … + b0 = 0

where the denominator h(x) of the transformation is not zero at any root of the original equation.

I believe the details of how to construct the transformations are in An essay on the resolution of equations by G. B. Jerrad.

Related posts

Leapfrog integrator

The so-called “leapfrog” integrator is a numerical method for solving differential equations of the form

x'' = f(x)

where x is a function of t. Typically x is position and t is time.

This form of equation is common for differential equations coming from mechanical systems. The form is more general than it may seem at first. It does not allow terms involving first-order derivatives, but these terms can often be eliminated via a change of variables. See this post for a way to eliminate first order terms from a linear ODE.

The leapfrog integrator is also known as the Störmer-Verlet method, or the Newton-Störmer-Verlet method, or the Newton-Störmer-Verlet-leapfrog method, or …

The leapfrog integrator has some advantages over, say, Runge-Kutta methods, because it is specialized for a particular (but important) class of equations. For one thing, it solves the second order ODE directly. Typically ODE solvers work on (systems of) first order equations: to solve a second order equation you turn it into a system of first order equations by introducing the first derivative of the solution as a new variable.

For another thing, it is reversible: if you advance the solution of an ODE from its initial condition to some future point, make that point your new initial condition, and reverse time, you can step back to exactly where you started, aside from any loss of accuracy due to floating point; in exact arithmetic, you’d return to exactly where you started.

Another advantage of the leapfrog integrator is that it approximately conserves energy. The leapfrog integrator could perform better over time compared to a method that is initially more accurate.

Here is the leapfrog method in a nutshell with step size h.

\begin{align*} x_{i+1} &= x_i + v_i h + \frac{1}{2} f(x_i) h^2 \\ v_{i+i} &= v_i + \frac{1}{2}\left(f(x_i) + f(x_{i+1})\right) h \end{align*}

And here’s a simple Python demo.

    import numpy as np
    import matplotlib.pyplot as plt
    # Solve x" = f(x) using leapfrog integrator
    # For this demo, x'' + x = 0
    # Exact solution is x(t) = sin(t)
    def f(x):
        return -x
    k = 5               # number of periods
    N = 16              # number of time steps per period
    h = 2*np.pi/N       # step size
    x = np.empty(k*N+1) # positions
    v = np.empty(k*N+1) # velocities
    # Initial conditions
    x[0] = 0
    v[0] = 1
    anew = f(x[0])
    # leapfrog method
    for i in range(1, k*N+1):
        aold = anew
        x[i] = x[i-1] + v[i-1]*h + 0.5*aold*h**2
        anew = f(x[i])
        v[i] = v[i-1] + 0.5*(aold + anew)*h

Here’s a plot of the solution over five periods.

There’s a lot more I hope to say about the leapfrog integrator and related methods in future posts.

More on ODE solvers

Counterexample to Dirichlet principle

Let Ω be an open set in some Euclidean space and v a real-valued function on Ω.

Dirichlet principle

Dirichlet’s integral for v, also called the Dirichlet energy of v, is

\int_\Omega \frac{1}{2} | \nabla v |^2

Among functions with specified values on the boundary of Ω, Dirichlet’s principle says that minimizing Dirichlet’s integral is equivalent to solving Laplace’s equation.

In a little more detail, let g be a continuous function on the boundary ∂Ω of the region Ω. A function u has minimum Dirichlet energy, subject to the requirement that u = g on ∂Ω, if and only if u solves Laplace’s equation

\Delta; u = 0

subject to the same boundary condition.

Dirichlet’s principle requires some hypotheses not stated here, as Hadamard’s example below shows.

Hadamard’s example

Let g(θ) be the function [1]

g(\theta) = \sum_{n=1}^\infty \frac{\sin n!\theta}{n^2}

The function g is continuous and so there exists a unique solution to Laplace’s equation on the unit disk with boundary values given by g, but the Dirichlet energy of the solution diverges.

The solution, in polar coordinates, is

u(r, \theta) = \sum_{n=1}^\infty r^{n!} \,\,\frac{\sin n!\theta}{n^2}

The Laplace operator in polar coordinates is

\frac{1}{r} \frac{\partial }{\partial r}\left(r \frac{\partial u}{\partial r} \right) + \frac{1}{r^2} \frac{\partial^2 u}{\partial \theta^2}

and you can differentiate u term by-term to show that it satisfies Laplace’s equation.

Dirichlet’s integral in polar coordinates is

\int_0^{2\pi} \int_0^1 \frac{1}{2} \left\{ \left( \frac{\partial u}{\partial r}\right)^2 + \frac{1}{r^2}\left(\frac{\partial f}{\partial \theta}\right)^2 \right\} \, r\,dr\,d\theta

Integrating term-by-term, the nth term in the series for the Dirichlet energy in Hadamard’s example is

\frac{(n!)^2}{2n^4(2n! - 1)}

and so the series rapidly diverges.

Dirichlet’s principle requires that there be at least one function satisfying the specified boundary conditions that has finite Dirichlet energy. In the example above, the solution to Laplace’s equation with boundary condition g has infinite Dirichlet energy. It turns out the same is true for every function satisfying the same boundary condition, whether it satisfies Laplace’s equation or not.

Related posts

[1] What is the motivation for this function? The function is given by a lacunary series, a Fourier series with increasingly large gaps between the frequency components. The corresponding series for u cannot be extended to an analytic function outside the closed unit circle. If it could be so extended, Dirichlet’s principle would apply and the example wouldn’t work.

Software analysis and synthesis

People who haven’t written large programs think that writing software is easy. All you have to do is break a big problem into smaller problems until you have something so small that it’s easy to program.

The problem is putting the pieces back together. If you’ve only written small programs, you haven’t had many pieces to put together. It’s harder to put the pieces together when you write a large program by yourself. It’s even harder when you work on a large program with other people.

Synthesis is harder than analysis. Or as Perdita Stevens put it, integration is harder than separation.

The image above is a screenshot from her keynote at the RC2020 conference on reversible computation.

Related post: The cost of taking things apart and putting them back together.

COVID19 mortality per capita by state

Here’s a silly graph by Richard West with a serious point. States with longer names tend to have higher covid19 mortality. Of course no one believes there’s anything about the length of a state’s name that should impact the health of its residents. The correlation is real, but it’s a coincidence.

The variation between mortality in different states is really large. Something caused that, though not the length of the names. But here’s the kicker: you may come up with an explanation that’s much more plausible than length of name, and be just as wrong. Discovering causation is hard work, much harder than looking for correlations.

Morse code golf

You can read the title of this post as ((Morse code) golf) or as (Morse (code golf)).

Morse code is a sort of approximate Huffman coding of letters: letters are assigned symbols so that more common letters can be transmitted more quickly. You can read about how well Morse code achieves this design objective here.

But digits in Morse code are kinda strange. I imagine they were an afterthought, tacked on after encodings had been assigned to each of the letters, and so had to avoid encodings that were already in use. Here are the assignments:

    | Digit | Code  |
    |     1 | .---- |
    |     2 | ..--- |
    |     3 | ...-- |
    |     4 | ....- |
    |     5 | ..... |
    |     6 | -.... |
    |     7 | --... |
    |     8 | ---.. |
    |     9 | ----. |
    |     0 | ----- |

There’s no attempt to relate transmission length to frequency. Maybe the idea was that all digits are equally common. While in some contexts this is true, it’s not true in general for mathematical and psychological reasons.

There is a sort of mathematical pattern to the Morse code symbols for digits. For 1 ≤ n ≤ 5, the symbol for n is n dots followed by 5-n dashes. For 6 ≤ n ≤ 9, the symbol is n-5 dashes followed by 10-n dots. The same rule extends to 0 if you think of 0 as 10.

A more mathematically satisfying way to assign symbols would have been binary numbers padded to five places:

    0 -> .....
    1 -> ....-
    2 -> ..._.

Because the Morse encoding of digits is awkward, it’s not easy to describe succinctly. And here is where golf comes in.

The idea of code golf is to write the shortest program that does some task. Fewer characters is better, just as in golf the lowest score wins.

Here’s the challenge: Write two functions as small you can, one to encode digits as Morse code and another to decode Morse digits. Share your solutions in the comments below.

Related posts