Kalman filters and bottom-up learning

radio antennae

Kalman filtering is a mixture of differential equations and statistics. Kalman filters are commonly used in tracking applications, such as tracking the location of a space probe or tracking the amount of charge left in a cell phone battery. Kalman filters provide a way to synthesize theoretical predictions and actual measurements, accounting for error in both.

Engineers naturally emphasize the differential equations and statisticians naturally emphasize the statistics. Both perspectives are valuable, but in my opinion/experience, the engineering perspective must come first.

From an engineering perspective, a Kalman filtering problem starts as a differential equation. In an ideal world, one would simply solve the differential equation and be done. But the experienced engineer realizes his or her differential equations don’t capture everything. (Unlike the engineer in this post.) Along the road to the equations at hand, there were approximations, terms left out, and various unknown unknowns.

The Kalman filter accounts for some level of uncertainty in the process dynamics and in the measurements taken. This uncertainty is modeled as randomness, but this doesn’t mean that there’s necessarily anything “random” going on. It simply acknowledges that random variables are an effective way of modeling miscellaneous effects that are unknown or too complicated to account for directly. (See Random is as random does.)

The statistical approach to Kalman filtering is to say that it is simply another estimation problem. You start from a probability model and apply Bayes’ theorem. That probability model has a term inside that happens to come from a differential equation in practice, but this is irrelevant to the statistics. The basic Kalman filter is a linear model with normal probability distributions, and this makes a closed-form solution for the posterior possible.

You’d be hard pressed to start from a statistical description of Kalman filtering, such as that given here, and have much appreciation for the motivating dynamics. Vital details have simply been abstracted away. As a client told me once when I tried to understand his problem starting from the top-down, “You’ll never get here from there.”

The statistical perspective is complementary. Some things are clear from the beginning with the statistical formulation that would take a long time to see from the engineering perspective. But while both perspectives are valuable, I believe it’s easier to start on the engineering end and work toward the statistics end rather than the other way around.

History supports this claim. The Kalman filter from the engineering perspective came first and its formulation in terms of Bayesian statistics came later. Except that’s not entirely true.

Rudolf Kálmán published his seminal paper in 1960 and four years later papers started to come out making the connection to Bayesian statistics. But while Kálmán and others were working in the US starting from the engineering end, Ruslan Stratonovich was working in Russia starting from the statistical end. Still, I believe it’s fair to say that most of the development and application of Kalman filters has proceeded from the engineering to the statistics rather than the other way around.

More on Kalman filters


6 thoughts on “Kalman filters and bottom-up learning

  1. Is there a Kalman flavor (closed form or otherwise) based on non-normal distribution?

  2. Ross: Kalman filters depend on having finite, constant size sufficient statistics. The internal state that they update recursively is a set of sufficient statistics, and this must be bounded to be practical. And having such sufficient statistics is practically synonymous with closed-form posteriors, which is practically synonymous with conjugate priors.

    Since regression is a special case of Kalman filtering, Poisson regression would be a Kalman filter, though I don’t know if anyone thinks of it that way.

  3. Deep Variational Bayes Filters work with intractable posteriors. This comes out of arbitrary likelihood functions and non-linear transition functions.

    My colleagues Max Karl, Max Sölch, Patrick van der Smagt and me have devised this method to combine deep learning with Bayesian filtering over the course of the last year. The method greatly leverages variational inference in the form of stochastic gradient variational Bayes (SGVB) which uses an approximation of the posterior to estimate the model parameters.

    Check it out in case you are interested: http://arxiv.org/abs/1605.06432.

  4. There are some extensions, which include Extended Kalman Filters (which allows non-linear state dynamics) and Unscented Kalman Filters.

  5. @Scott, thanks. Extended are nonlinear, aye, but linearized around region of interest and more importantly, still assume Gaussian processes underlying. (Interesting to be reminded of these, thank you…)

  6. @justin

    I am punching (way) over my weight class, here, but you leverage LGMs as an instance example. Is this necessary, or sufficient? (Thanks for arxiv link to your work.)

Leave a Reply

Your email address will not be published. Required fields are marked *