Learning Nonlinear Dynamic Models (2009)

Authors

Abstract

We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives.

Discussion

Bob Williamson, 2009/06/15 15:00

there's lots of work in the 60's on the sufficient statistic idea for predicting dynamical systems. Motivation was to generalise the kalman filter. If the transition and observation functions are linear and the observation and process noise is gaussian then of course the mean and covariance are indeed sufficient statistics.

Sorenson is an author I recall and this is a survey paper.

@article{alspach1972nbe,

title={{Nonlinear Bayesian estimation using Gaussian sum approximations}},
author={Alspach, D. and Sorenson, H.},
journal={IEEE Transactions on Automatic Control},
volume={17},
number={4},
pages={439--448},
year={1972}

}

This may (or may not!) be relevant to your stuff.

I would be interested to understand your invertibility condition in the time-invariant linear case. My (wild) guess is it corresponds to what the control engineers call “observability” http://en.wikipedia.org/wiki/Observability

John Langford, 2009/06/15 17:09

Thanks. This paper looks similar in spirit to the Rao-Blackwellized particle filter, where you think of the posterior as a sum of Gaussians represented by particle-filter points. A key difference compared to this paper is that we aren't assuming that the dynamics are known.

Invertibility appears a weaker condition than Observability. Observability says “after some fixed number of observations, you can infer the state”. Invertibility says “given the conditional distribution over the observation, you can infer the conditional distribution over the state”. In particular after one observation, you might not be able to infer the state, but you might be able to infer the correct distribution over states. In the other direction, if you have observability after n observations, then you must have invertibility for any sequence of n observations.

John Langford, 2009/06/15 17:24

I finally figured out a minimal example where if you get the example, you get the mathematics in the paper.

(1) Suppose we have 3 observations in sequence x_1, x_2, and x_3. (2) Suppose we form a predictor p'(x_2|x_1) of p(x_2|x_1) using whatever favorite technology you prefer. (3) Now, think of p' as a circuit and partition the circuit into two pieces with inputs in one partition and outputs in the other. Record the values of all wires crossing the partition and call that vector u_2. (4) Now, form a predictor p'(x_3|x_2,u_2) of p(x_3|x_2,u_2)

The consistency statement says that if your p' converges to the true probability estimate (and the true system is invertible), then you get a good estimate for p(x_3|x_2,x_1). In other words, good prediction creates a sufficient statistic for the belief state as a side effect.

John Langford, 2009/06/15 17:37

A pointer to <a href=“http://hunch.net/?p=777”>hunch.net discussion</a>.

I also wanted to address a comment at the end of Russ's talk. Someone said “u is really a state”, and Russ said yes. This is correct, but delicate. If you are thinking of the general technique for transforming a POMDP into an MDP, where the belief state becomes a state, that's right. But if you aren't it's important to understand that u is summarizing a belief state.

Another detail—I didn't fully understand Nando's comment at the time, but we had a chance to talk afterwards. The basic point was that for the particular architecture we did for the experiments, it's possible to directly apply EM to an associated probabilistic model, which would be a natural competitor to this approach. I expect the approach here will win in such a comparison, because the local training process avoids local minima issues, but we'll look into trying it. More generally, there are architectures without an associated probabilistic semantics to where this technique could be applied.

Anonymous, 2009/07/23 13:38

I don't follow how invertibility is weaker then observability. From the paper definition, it appears that we need to be able to perfectly recover the state from a single observation.

Enter your comment (wiki syntax is allowed):
AROKD
 
paper/2009/295.txt · Last modified: 2009/05/24 18:42 (external edit)
 
Driven by DokuWiki