Suppose you have a linear dynamic system. That is, the function that predicts the next state from the current state to the next is linear. Suppose also that the states in your system are not known precisely but have some uncertainty modeled by a (multivariate) normal distribution. Then the uncertainty in the state at the next step also has a normal distribution, because a linear transformation of a normal distribution remains normal. This is a very high-level description of the classic Kalman filter.

When the transition from one state to another is nonlinear, the probability distribution around future states is not normal. There are many variations on the Kalman filter that amount to various approximations to get around this core difficulty: extended Kalman filters, unscented Kalman filters, particle filters, etc. Here I’ll just discuss unscented Kalman filters and particle filters. This will be a very hand-wavy discussion, but it will give the basic ideas.

It’s easy to push discrete random points through a nonlinear transformation. Calculating the effect of a nonlinear transformation on continuous random variables is more work. The idea of an unscented Kalman filter is to create a normal distribution that approximates the result of a nonlinear transformation by seeing what happens to a few carefully chosen points. The idea of particle filtering is to randomly generate a cloud of points and push these points through the nonlinear transformation.

This may make it sound like (unscented) Kalman filters and particle filters are trivial to implement. Far from it. The basic framework is easy to understand, but getting good performance on a particular problem can take a lot of work.

**Related posts**:

These sound interesting. Count me as one ‘Yes’ vote for posting about them further.

Yes, it is difficult to get good performance. In many cases extended Kalman filter gave us better results than UKF or particle filter. We also tried some advanced resampling techniques based on swarm intelligence algorithms, but it did not help too much.

Moreover, in many cases we did not have any information about initial parameters, and all nonlinear filtering algorithms were very sensitive to initial conditions.

It would be great to read some material about the practical nonlinear filtering.

I’ve implemented and optimized lots of PID filters, many linear Kalman filters, a couple EKFs, and have completely failed in my attempts to use UKFs and PFs.

I’m encouraged by literature on the use of MNNs (Multi-layer Neural Nets) for optimizing UKFs and PFs, such as: https://www.researchgate.net/publication/3343462_Neural_network-aided_adaptive_unscented_Kalman_filter_for_nonlinear_state_estimation

I’m a neural net newbie, and have been delighted to see how a simple 6-node 3-layer network (http://molefrog.com/pidnn-talk/) can quickly and easily optimize a PID filter. This provides a useful check of neural nets against traditional PID tuning algorithms.

I’m also amazed how much GPU time you can get for $200 on AWS. No more multi-day runs on my dual-Xeon server.

The key, of course, is getting good training/testing/validation data. The GIGO Principle applies in its most vicious form when training neural nets. There’s also the need to avoid over-training, which means doing many training runs with variable data set sizes to see when training convergence first occurs.

I’d estimate that under 10% of the training I do is needed to get the answer I’m looking for. The other 90+% ensures I’m not fooling myself. I hope to improve on this as my experience grows.

I think somewhere you should mention that KF == properties of normal distribution + repeated use of baye’s rule.

Hi Bob, you mention using cloud to train, and not wanting to overfit, have you thought about using Google cloud? They have home grown TPUs, (tensor processing unit) optimized specifically for neutral nets, which helps not overfit, they run faster than GPUs for neural net processing. Just a thought ðŸ™‚