One practical application of functional programming

Arguments in favor of functional programming are often unconvincing. For example, the most common argument is that functional programming makes it easier to “reason about your code.” That’s true to some extent. All other things being equal, it’s easier to understand a function if all its inputs and outputs are explicit. But all other things are not equal. In order to make one function easier to understand, you may have to make something else harder to understand.

Here’s an argument from Brian Beckman for using a functional style of programming in a particular circumstance that I find persuasive. The immediate context is Kalman filtering, but it applies to a broad range of mathematical computation.

By writing a Kalman filter as a functional fold, we can test code in friendly environments and then deploy identical code with confidence in unfriendly environments. In friendly environments, data are deterministic, static, and present in memory. In unfriendly, real-world environments, data are unpredictable, dynamic, and arrive asynchronously.

If you write the guts of your mathematical processing as a function to be folded over your data, you can isolate the implementation of your algorithm from the things that make code hardest to test, i.e. the data source. You can test your code in a well-controlled “friendly” test environment and deploy exactly the same code into production, i.e. an “unfriendly” environment.

Brian continues:

The flexibility to deploy exactly the code that was tested is especially important for numerical code like filters. Detecting, diagnosing and correcting numerical issues without repeatable data sequences is impractical. Once code is hardened, it can be critical to deploy exactly the same code, to the binary level, in production, because of numerical brittleness. Functional form makes it easy to test and deploy exactly the same code because it minimizes the coupling between code and environment.

I ran into this early on when developing clinical trial methods first for simulation, then for production. Someone would ask whether we were using the same code in production as in simulation.

“Yes we are.”

Exactly the same code?”

“Well, essentially.”

“Essentially” was not good enough. We got to where we would use the exact same binary code for simulation and production, but something analogous to Kalman folding would have gotten us there sooner, and would have made it easier to enforce this separation between numerical code and its environment across applications.

Why is it important to use the exact same binary code in test and production, not just a recompile of the same source code? Brian explains:

Numerical issues can substantially complicate code, and being able to move exactly the same code, without even recompiling, between testing and deployment can make the difference to a successful application. We have seen many cases where differences in compiler flags, let alone differences in architectures, even between different versions of the same CPU family, introduce enough differences in the generated code to cause qualitative differences in the output. A filter that behaved well in the lab can fail in practice.

Emphasis added, here and in the first quote above.

Note that this post gives an argument for a functional style of programming, not necessarily for the use of functional programming languages. Whether the numerical core or the application that uses it would best be written in a functional language is a separate discussion.

More functional programming posts

Dividing projects into math, statistics, and computing

If you’ve read this blog for long, you know that my work is a combination of math, statistics, and computing.

I was looking over my records and tried to see how my work divides into these three areas. In short, it doesn’t.

The boundaries between these areas are fuzzy or arbitrary to begin with, but a few projects fell cleanly into one of the three categories. However, 85% of my income has come from projects that involve a combination of two areas or all three areas.

If you calculate a confidence interval using R, you could say you’re doing math, statistics, and computing. But for the accounting above I’d simply call that statistics. When I say a project uses math and computation, for example, I mean it requires math outside what is typical in programming, and programming outside what is typical in math.

Gaussian correlation inequality

The Gaussian correlation inequality was proven in 2014, but the proof only became widely known this year. You can find Thomas Royan’s remarkably short proof here.

Let X be a multivariate Gaussian random variable with mean zero and let E and F be two symmetric convex sets, both centered at the origin. The Gaussian correlation inequality says that

Prob(in E and F) ≥ Prob(X in E) Prob(X in F).

Here’s a bit of Python code for illustrating the inequality. For symmetric convex sets we take balls of p-norm r where p ≥ 1 and r > 0. We could, for example, set one of the values of p to 1 to get a cube and set the other p to 2 to get a Euclidean ball.

from scipy.stats import norm as gaussian

def pnorm(v, p):
    return sum( abs(x)**p for x in v )**(1./p)

def simulate(dim, r1, p1, r2, p2, numReps):
    count_1, count_2, count_both = (0, 0, 0)
    for _ in range(numReps):
        x = gaussian.rvs(0, 1, dim)
        in_1 = (pnorm(x, p1) < r1)
        in_2 = (pnorm(x, p2) < r2)
        if in_1:
            count_1 += 1
        if in_2:
            count_2 += 1
        if in_1 and in_2:
            count_both += 1
    print("Prob in both:", count_both / numReps)
    print("Lower bound: ", count_1*count_2 * numReps**-2)


simulate(3, 1, 2, 1, 1, 1000)

When numReps is large, we expect the simulated probability of the intersection to be greater than the simulated lower bound. In the example above, the former was 0.075 and the latter 0.015075, ordered as we’d expect.

If we didn’t know that the theorem has been proven, we could use code like this to try to find counterexamples. Of course a simulation cannot prove or disprove a theorem, but if we found what appeared to be a counterexample, we could see whether it persists with different random number generation seeds and with a large value of  numReps. If so, then we could try to establish the inequality analytically. Now that the theorem has been proven we know that we’re not going to find real counterexamples, but the code is only useful as an illustration.

Related math posts

Irrational rotations are ergodic

In a blog post yesterday, I mentioned that the golden angle is an irrational portion of a circle, and so a sequence of rotations by the golden angle will not repeat itself. We can say more: rotations by an irrational portion of a circle are ergodic. Roughly speaking, this means that not only does the sequence not repeat itself, the sequence “mixes well” in a technical sense.

Ergodic functions have the property that “the time average equals the space average.” We’ll unpack what that means and illustrate it by simulation.

Suppose we pick a starting point x on the circle then repeatedly rotate it by a golden angle. Take an integrable function f on the circle and form the average of its values at the sequence of rotations. This is the time average. The space average is the integral of f over the circle, divided by the circumference of the circle. The ergodic theorem says that the time average equals the space average, except possibly for a setting of starting values of measure zero.

More generally, let X be a measure space (like the unit circle) with measure μ let T be an ergodic transformation (like rotating by a golden angle), Then for almost all starting values x we have the following:

\lim_{n\to \infty} \frac{1}{n} \sum_{k=0}^{n-1} f(T^k x) = \frac{1}{\mu(X)} \int_X f\, d\mu

Let’s do a simulation to see this in practice by running the following Python script.

        from scipy import pi, cos
        from scipy.constants import golden
        from scipy.integrate import quad
        
        golden_angle = 2*pi*golden**-2
        
        def T(x):
            return (x + golden_angle) % (2*pi)
        
        def time_average(x, f, T, n):
            s = 0
            for k in range(n):
                s += f(x)
                x = T(x)
            return s/n
        
        def space_average(f):
            integral = quad(f, 0, 2*pi)[0]
            return integral / (2*pi)
        
        f = lambda x: cos(x)**2
        N = 1000000
        
        print( time_average(0, f, T, N) )
        print( space_average(f) )

In this case we get 0.49999996 for the time average, and 0.5 for the space average. They’re not the same, but we only used a finite value of n; we didn’t take a limit. We should expect the two values to be close because n is large, but we shouldn’t expect them to be equal.

Update: The code and results were updated to fix a bug pointed out in the comments below.  I had written ... % 2*pi when I should have written ... % (2*pi). I assumed the modulo operator was lower precedence than multiplication, but it’s not. It was a coincidence that the buggy code was fairly accurate.

A friend of mine, a programmer with decades of experience, recently made a similar error. He’s a Clojure fan but was writing in C or some similar language. He rightfully pointed out that this kind of error simply cannot happen in Clojure. Lisps, including Clojure, don’t have operator precedence because they don’t have operators. They only have functions, and the order in which functions are called is made explicit with parentheses. The Python code x % 2*pi corresponds to (* (mod x 2) pi) in Clojure, and the Python code x % (2*pi) corresponds to (mod x (* 2 pi)).

Related: Origin of the word “ergodic”