A few weeks ago I started a series of posts on various things you could do with a functional fold. In the first post I mentioned that the idea came from a paper by Brian Beckman on Kalman filters and folds:
This post was inspired by a paper by Brian Beckman (in progress) that shows how a Kalman filter can be implemented as a fold. From a Bayesian perspective, the thing that makes the Kalman filter work is that a certain multivariate normal model has a conjugate prior. This post shows that conjugate models more generally can be implemented as folds over the data. That’s interesting, but what does it buy you? Brian’s paper discusses this in detail, but one advantage is that it completely separates the accumulator function from the data, so the former can be tested in isolation.
At the time Brian was working on one big paper in private. This has since been split into several papers and they’re now public.
I’m starting a new Twitter account @FunctorFact for functional programming and category theory.
These two subjects have a lot of overlap, and some tweets will combine both, but many will be strictly about one or the other. So some content will be strictly about programming, some strictly about math, and some combining ideas from both.
These have all been numerical algorithms. Insertion sort is an example of a non-numerical algorithm that could be implemented as a fold.
Insertion sort is not the fastest sorting algorithm. It takes O(n2) operations to sort a list of size n while the fastest algorithms take O(n log n) operations. But insertion sort is simple, and as it works its way down a list, the portion it has processed is sorted. If you interrupt it, you have correct results given the input so far. If you interrupt something like a quicksort, you don’t have a sorted list. If you’re receiving a stream of data points and want to sort them as they come, you have to use insertion sort rather than something like quicksort.
The function to fold over a list looks like this:
f as a = [x | x <- as, x < a] ++ [a] ++ [x | x <- as, x >= a]
Given a sorted list as and a value a, return a new sorted list that has inserted the element a in the proper position. Our function f does this by joining together the list of the elements less than a, the list containing only a, and the list of elements at least as big as a.
Here’s how we could use this to alphabetize the letters in the word “example.”
foldl f  "example"
This returns "aeelmpx".
Haskell takes our function f and an empty list of characters  and returns a new list of characters by folding f over the list of characters making up the string "example".
You can always watch how foldl works by replacing it with scanl to see intermediate results.
Folds in functional programming are often introduced as a way to find the sum or product of items in a list. In this case the fold state has the same type as the list items. But more generally the fold state could have a different type, and this allows more interesting applications of folds. Previous posts look at using folds to update conjugate Bayesian models and numerically solve differential equations.
This post uses a fold to compute mean, variance, skewness, and kurtosis. See this earlier post for an object-oriented approach. The code below seems cryptic out of context. The object-oriented post gives references for where these algorithms are developed. The important point for this post is that we can compute mean, variance, skewness, and kurtosis all in one pass through the data even though textbook definitions appear to require at least two passes. It’s also worth noting that the functional version is less than half as much code as the object-oriented version.
(Algorithms that work in one pass through a stream of data, updating for each new input, are sometimes called “online” algorithms. This is unfortunate now that “online” has come to mean something else.)
The Haskell function moments below returns the number of samples and the mean, but does not directly return variance, skewness and kurtosis. Instead it returns moments from which these statistics can easily be calculated using the mvks function.
The foldl applies moments first to its initial value, the 5-tuple of zeros. Then it iterates over the data, taking data points one at a time and visiting each point only once, returning a new state from moments each time. Another way to say this is that after processing each data point, moments returns the 5-tuple that it would have returned if that data only consisted of the values up to that point.
For a non-numerical example of folds, see my post on sorting.
One of the most widely used numerical algorithms for solving differential equation is the 4th order Runge-Kutta method. This post shows how the Runge-Kutta method can be written naturally as a fold over the set of points where the solution is needed. These points do not need to be evenly spaced.
Given a differential equation of the form
with initial condition y(t0) = y0, the 4th order Runge-Kutta method advances the solution by an amount of time h by
The Haskell code for implementing the accumulator function for foldl looks very much like the mathematical description above. This is a nice feature of the where syntax.
rk (t, y) t' = (t', y + h*(k1 + 2.0*k2 + 2.0*k3 + k4)/6.0)
h = t' - t
k1 = f t y
k2 = f (t + 0.5*h) (y + 0.5*h*k1)
k3 = f (t + 0.5*h) (y + 0.5*h*k2)
k4 = f (t + 1.0*h) (y + 1.0*h*k3)
Suppose we want to solve the differential equation y ‘ = (t2 – y2) sin(y) with initial condition y(0) = -1, and we want to approximate the solution at [0.01, 0.03, 0.04, 0.06]. We would implement the right side of the equation as
f t y = (t**2 - y**2)*sin(y)
and fold the function rk over our time steps with
foldl rk (0, -1) [0.01, 0.03, 0.04, 0.06]
This returns (0.06, -0.9527). The first part, 0.06, is no surprise since obviously we asked for the solution up to 0.06. The second part, -0.9527, is the part we’re more interested in.
If you want to see the solution at all the specified points and not just the last one, replace foldl with scanl.
As pointed out in the previous post, writing algorithms as folds separates the core of the algorithm from data access. This would allow us, for example, to change rk independently, such as using a different order Runge-Kutta method. (Which hardly anyone does. Fourth order is a kind of sweet spot.) Or we could swap out Runge-Kutta for a different ODE solver entirely just by passing a different function into the fold.
Next see how you could use a fold to compute mean, variance, skewness, and kurtosis in one pass through a stream of data using a fold.
Allen Wirfs-Brock gave the following defense of OOP a few days ago in a series of six posts on Twitter:
A young developer approached me after a conf talk and said, “You must feel really bad about the failure of object-oriented programming.” I was confused. I said, “What do you mean that object-orient programming was a failure. Why do you think that?”
He said, “OOP was supposed to fix all of our software engineering problems and it clearly hasn’t. Building software today is just as hard as it was before OOP. came along.”
“Have you ever look at the programs we were building in the early 1980s? At how limited their functionality and UIs were? OOP has been an incredible success. It enabled us to manage complexity as we grew from 100KB applications to today’s 100MB applications.”
Of course OOP hasn’t solved all software engineering problems. Neither has anything else. But OOP has been enormously successful in allowing ordinary programmers to write much larger applications. It has become so pervasive that few programmers consciously think about it; it’s simply how you write software.
I’ve written several posts poking fun at the excesses of OOP and expressing moderate enthusiasm for functional programming, but I appreciate OOP. I believe functional programming will influence object oriented programming, but not replace it.
Haskell advocates are fond of saying that a Haskell function cannot launch missiles without you knowing it. Pure functions have no side effects, so they can only do what they purport to do. In a language that does not enforce functional purity, calling a function could have arbitrary side effects, including launching missiles. But this cannot happen in Haskell.
The difference between pure functional languages and traditional imperative languages is not quite that simple in practice.
Programming with pure functions is conceptually easy but can be awkward in practice. You could just pass each function the state of the world before the call, and it returns the state of the world after the call. It’s unrealistic to pass a program’s entire state as an argument each time, so you’d like to pass just that state that you need to, and have a convenient way of doing so. You’d also like the compiler to verify that you’re only passing around a limited slice of the world. That’s where monads come in.
Suppose you want a function to compute square roots and log its calls. Your square root function would have to take two arguments: the number to find the root of, and the state of the log before the function call. It would also return two arguments: the square root, and the updated log. This is a pain, and it makes function composition difficult.
Monads provide a sort of side-band for passing state around, things like our function call log. You’re still passing around the log, but you can do it implicitly using monads. This makes it easier to call and compose two functions that do logging. It also lets the compiler check that you’re passing around a log but not arbitrary state. A function that updates a log, for example, can effect the state of the log, but it can’t do anything else. It can’t launch missiles.
Once monads get large and complicated, it’s hard to know what side effects they hide. Maybe they can launch missiles after all. You can only be sure by studying the source code. Now how do you know that calling a C function, for example, doesn’t launch missiles? You study the source code. In that sense Haskell and C aren’t entirely different.
The Haskell compiler does give you assurances that a C compiler does not. But ultimately you have to study source code to know what a function does and does not do.
People often dislike static type systems because they’ve only met weak ones. A weak or not very expressive type system gets in your way all the time. It prevents you from writing functions you want to write that you know are fine. … The solution is not to abandon the type system but to make the type system more expressive.
In particular, he mentions Haskell’s polymorphic types and type inference as ways to make strong static typing convenient to use.
Pure functional languages have this advantage: all flow of data is made explicit. And this disadvantage: sometimes it is painfully explicit.
That’s the problem monads solve: they let you leave implicit some of the repetitive code otherwise required by functional programming. That simple but critical point left out of many monad tutorials.
Dan Piponi wrote a blog post You Could Have Invented Monads! (And Maybe You Already Have) very much in the spirit of Wadler’s paper. He starts with the example of adding logging to a set of functions. You could have every function return not just the return value that you’re interested in but also the updated state of the log. This quickly gets messy. For example, suppose your basic math functions write to an error log if you call them with illegal arguments. That means your square root function, for example, has to take as input not just a real number but also the state of the error log before it was called. Monads give you a way of effectively doing the same thing, but conceptually separating the code that takes square roots from the code that maintains the error log.
Someone asked me this weekend how to get started with functional programming. My answer: Start by writing pure functions in the programming language you’re currently using.
The only input to a pure function is its argument list and the only output is its return value. If you haven’t seen this before, you might think all functions are pure. After all, any function takes in values and returns a value. But in conventional programming there are typically out-of-band ways for information to flow in or out of a function. For example, an impure function may depend on a global variable or class member data. In that case, it’s behavior is not entirely determined by its arguments. Similarly, an impure function might set a global variable or write to a database. In that case the function has a side effect in addition to its return value.
You can write pure functions in any language, though it’s easier in some languages than others. For example, no one would call Fortran a functional language, but there are people (M. J. D. Powell comes to mind) who discipline themselves to write pure functions in Fortran.
Why write pure functions? A pure function has referential transparency, meaning it will always return the same value when given the same inputs. The output does not depend on the system time, the state of a database, which functions were called previously, or anything else that is not explicitly passed as an argument to the function. This means pure functions are easier to understand (and hence easier to debug and test).
You can’t always write pure functions. If you need to stick a value in a database, this cannot be accomplished with a pure function. Or if you’re calling a random number generator, you don’t want it to have referential transparency, always returning the same output! But the goal is to use pure functions when practical. You want to eliminate out-of-band communication when it’s convenient to do so. Total purity is not practical; some argue that the sweet spot is about 85% purity.
So why don’t programmers use pure functions more often? One reason is that pure functions require longer argument lists. In an object oriented language, object methods can have shorter argument lists by implicitly depending on object state. The price to pay for shorter method signatures is that you can’t understand a method by itself. You have to know the state of the object when the method is called. Is it worthwhile to give up referential transparency in order to have shorter method signatures? It depends on your context and your taste, though in my opinion its often worthwhile to use longer function signatures in exchange for more pure functions.
Another reason people give for not writing pure functions is that its too expensive to copy large data structures to pass them into a function. But that’s what pointers are for. You can conceptually pass an object into a function without having to actually make a copy of the object’s bits.
You can also fake purity for the sake of efficiency. For example, Mike Swaim left a comment recently giving an example of how memoization sped up a program by several orders of magnitude. (Memoization is a technique of caching computations. When a function is asked to compute something, it first looks to see whether it has already done the calculation. If so, it returns the cached value. If not, it does the calculation and adds its output to the cache.) A function that uses memoization is not strictly pure — its calculations have a persistent impact on the state of its cache — but such a function can still have referential transparency, always returning the same output given the same input. You could say it’s cheating to call such functions pure, and it is, but if you’re really a stickler about it, all pure functions have side effects.
Joe Armstrong, creator of Erlang, on software reusability.
I think the lack of reusability comes in object-oriented languages, not functional languages. Because the problem with object-oriented languages is they’ve got all this implicit environment that they carry around with them. You wanted a banana but what you got was a gorilla holding the banana and the entire jungle.
If you have referentially transparent code, if you have pure functions — all the data comes in its input arguments and everything goes out and leave no state behind — it’s incredibly reusable.
I do most of my work in object-oriented languages and I don’t see that changing any time soon. I’m more interested in functional techniques than functional languages: using pure functions, using functions as arguments and return values, etc. As Joe Armstrong says, such code is easier to reuse. If you want to reuse (or test) a functional banana, you don’t have to set up a stateful gorilla to hold the banana first.
Functional programming emphasizes “pure” functions, functions that have no side effects. When you call a pure function, all you need to know is the return value of the function. You can be confident that calling a function doesn’t leave any state changes that will effect future function calls.
But pure functions are only pure at a certain level of abstraction. Every function has some side effect: it uses memory, it takes CPU time, etc. Harald Armin Massa makes this point in his PyCon 2010 talk “The real harm of functional programming.” (His talk is about eight minutes into the February 21, 2010 afternoon lightning talks: video.)
Even pure functions in programming have side effects. They use memory. They use CPU. They take runtime. And if you look at those evil languages, they are quite fast at doing Fibonacci or something, but in bigger applications you get reports “Hmm, I have some runtime problems. I don’t know how to get it faster or what it going wrong.
Massa argues that the concept of an action without side effects is dangerous because it disassociates us from the real world. I disagree. I appreciate his warning that the “no side effect” abstraction may leak like any other abstraction. But pure functions are a useful abstraction.
You can’t avoid state, but you can partition the stateful and stateless parts of your code. 100% functional purity is impossible, but 85% functional purity may be very productive.
James Hague offers this assessment of functional programming:
My real position is this: 100% pure functional programing doesn’t work. Even 98% pure functional programming doesn’t work. But if the slider between functional purity and 1980s BASIC-style imperative messiness is kicked down a few notches — say to 85% — then it really does work. You get all the advantages of functional programming, but without the extreme mental effort and unmaintainability that increases as you get closer and closer to perfectly pure.
I found James Hague’s blog via a link from Greg Wilson. I’ve gone back through several posts on Hague’s blog Programming in the 21st Century and look forward to reading more.
Philip Wadler wrote an article a decade ago entitled Why no one uses functional languages. He begins the article by explaining that yes, there have been a number of large successful projects developed in functional programming languages. But compared to the number of programmers who work in procedural languages, the number working in functional languages is essentially zero. The reasons he listed fall into eight categories.
Lack of compatibility with existing code
Limited library support compared to popular languages
Lack of portability across operating systems
Small communities and correspondingly little community support
Inability to package code well for reuse
Lack of sophisticated tool support
Lack of training for new developers in functional programming
Lack of popularity
Most of these reasons do not apply to Microsoft’s new functional language F# since it is built on top of the .NET framework. For example, F# has access to the enormous Common Language Runtime library and smoothly interoperates with anything developed with .NET. And as far as tool support, Visual Studio will support F# starting with the 2010 release. Even portability is not a barrier: The Mono Project has been quite successful in porting .NET code to non-Microsoft platforms. (Listen to this Hanselminutes interview with Aaron Bockover for an idea how mature Mono is.)
The only issues that may apply to F# are training and popularity. Programmers receive far more training in procedural programming, and the popularity of procedural programming is self-reinforcing. Despite these disadvantages, interest in functional programming in general is growing. And when programmers want to learn a functional programming language, I believe many will choose F#.
It will be interesting to see whether F# catches on. It resolves many of the accidental difficulties of functional programming, but the intrinsic difficulties remain. Functional programming requires a different mindset, one that programmers have been reluctant to adopt. But now programmers have a new incentive to give functional languages a try: multi-core processors.
Individual processor cores are not getting faster, but we’re getting more of them per box. We have to write multi-threaded code to take advantage of extra cores, and multi-threaded programming in procedural languages is hard, beyond the ability of most programmers. Strict functional languages eliminate many of the difficulties with multi-threaded programming, and so it seems likely that at least portions of systems will be written in functional languages.
According to an article on The Snowtide Blog, MIT has decided to teach beginning CS classes in Python rather than Scheme. (Thanks to procoders.net for the link.) The article paraphrases remarks by Gerdald Sussman at the announcement.The main rationale was that the world is more complicated now than when the course was designed and SICP was written. Now programmers spend more time researching libraries than writing everything from scratch.
Here’s something I expected Sussman to say though apparently he did not: you can use Python as a functional language if you like, you just don’t have to. I don’t know why more people don’t emphasize this. Python is not a functional language, but Python does make it easy to declare functions on the fly, pass functions as arguments, etc. You’re free to write Scheme-like programs in Python if you want to, but you’re also free to use other styles when they are more appropriate.