Probabilistic Models of
Human and Machine Learning

CSCI 7222

Assignment 9

Assigned 12/3/2015
Due  12/14/2015


The goal of this assignment is to gain some experience with models that deal with sequential or temporal data.  

Part 1

Generate a time series using a special case of a linear dynamical system that is similar to an autoregressive model in the time series literature.The hidden state of the dynamical system will be a vector with 2 elements, which evolve over time according to:

y2(t) = y1(t-1) + μ1
y1(t) = α1 y1(t-1) + α2 y2(t-1) + μ2

The observation will be a scalar:

z(t) = y1(t) + μ3

where for i={1,2,3}, μi ~ Normal(0,σ2) and you should pick {α1,α2, σ} to obtain interesting dynamics.

Plot a couple hundred points that are representative of this process.

Part 2

Turn your model into a switched linear dynamical system with 3 modes.  Each mode has an associated set of {α1,α2} parameters.
Switching is memoryless, i.e., at each time with some small probability β, the mode m, is re-drawn from the set {1,2,3} with uniform probability.

Draw a graphical model that depicts this switched dynamical system.  Show the conditional distributions on each arc, e.g., write out the transition table for P(m(t) | m(t-1))

Generate a nice graph illustrating the switching dynamics.

Part 3 (Optional)

Now try to perform inference using the data set you generated in either part 1 or part 2.  

If you use the data set from part 1, then you can model it with a Kalman filter. Implement a KF (it's just a few lines of code in matlab) or find a toolbox and plot a portion of the time series along with the posterior predictive distribution. The posterior predictive distribution is P(z(t+1) | z(1), ..., z(t)). Since this distribution is Gaussian, plot the mean and some representation of uncertainty (e.g., lines at +/- 2 SD, or the shaded representation one sometimes sees.)

If you use the data set from part 2, you could be ambitious and use a particle filter to perform inference, in which case you'll obtain a set of samples that represent the posterior predictive distribution. Or you could try using a model that isn't quite up to the task, e.g., an HMM with a Gaussian observation model (i.e., each state of the HMM outputs a constant value corrupted by Gaussian noise). You'd have to train the HMM, and it would require more than 3 states, since each state represents a constant level, but it would be interesting to see how poorly it models the data.