# Computing higher moments with a fold

Folds in functional programming are often introduced as a way to find the sum or product of items in a list. In this case the fold state has the same type as the list items. But more generally the fold state could have a different type, and this allows more interesting applications of folds. Previous posts look at using folds to update conjugate Bayesian models and numerically solve differential equations.

This post uses a fold to compute mean, variance, skewness, and kurtosis. See this earlier post for an object-oriented approach. The code below seems cryptic out of context. The object-oriented post gives references for where these algorithms are developed. The important point for this post is that we can compute mean, variance, skewness, and kurtosis all in one pass through the data even though textbook definitions appear to require at least two passes. It’s also worth noting that the functional version is less than half as much code as the object-oriented version.

(Algorithms that work in one pass through a stream of data, updating for each new input, are sometimes called “online” algorithms. This is unfortunate now that “online” has come to mean something else.)

The Haskell function `moments` below returns the number of samples and the mean, but does not directly return variance, skewness and kurtosis. Instead it returns moments from which these statistics can easily be calculated using the `mvks` function.

```    moments (n, m1, m2, m3, m4) x = (n', m1', m2', m3', m4')
where
n' = n + 1
delta = x - m1
delta_n = delta / n'
delta_n2 = delta_n**2
t = delta*delta_n*n
m1' = m1 + delta_n
m4' = m4 + t*delta_n2*(n'*n' - 3*n' + 3) + 6*delta_n2*m2 - 4*delta_n*m3
m3' = m3 + t*delta_n*(n' - 2) - 3*delta_n*m2
m2' = m2 + t

mvsk (n, m1, m2, m3, m4) = (m1, m2/(n-1.0), (sqrt n)*m3/m2**1.5, n*m4/m2**2 - 3.0)
```

Here’s an example of how you would use this Haskell code to compute statistics for the list [2, 30, 51, 72]:

```    ghci>  mvsk \$ foldl moments (0,0,0,0,0) [2, 30, 51, 72]
(38.75, 894.25,-0.1685, -1.2912)
```

The `foldl` applies `moments` first to its initial value, the 5-tuple of zeros. Then it iterates over the data, taking data points one at a time and visiting each point only once, returning a new state from `moments` each time. Another way to say this is that after processing each data point, `moments` returns the 5-tuple that it would have returned if that data only consisted of the values up to that point.

For a non-numerical example of folds, see my post on sorting.

## 7 thoughts on “Computing higher moments with a fold”

1. Unfortunately, if floating point arithmetic is used, beware these kinds of mathematical transformations…

2. Sjoerd Visscher

Looking at the OO approach, you could even define a RunningStats Monoid, and then use foldMap! (I love foldMap for its elegance.)

Altough I think that would be just a curiosity, and a lot slower than your foldl implementation.

3. Andre

What a funny coincidence. I study computer science at a large German university and today we were introduced to moments in our math class and after that we had excercises where we had to write fold ops (foldl, foldr, foldb, etc.) in generic lambda calculus (our tutor uses Haskell to practice abstractions and applications) and then I find this post somewhere down the twitter stream :)

4. David W Locke

If only statistics was dynamic and watching these values was what we intended to do. Too much reliance on snapshot samples these days. Once we have parallel computing, we’ll still be dealing with serialized data.