Probabilistic Models of
Human and Machine Learning

CSCI 5822

Assignment 9

Assigned 4/12/2018
Due  5/3/2018

Your options for this assignment

I'm hoping that folks in the class would like to do a final project. I will describe a couple of options in class, and some of your colleagues may propose projects as well.  As an alternative to a final project, I give you the option of doing a relatively larger homework assignment, which is laid out below. This homework assignment is a bit more open ended than the previous assignments, so you have an opportunity to be creative and explore.

If you are concerned with grades or making up for missed work, we will look most favorably on an ambitious final project (Option 1), then next most favorably on Option 2 if you go above and beyond the bare minimum.

Option 1: Final Project

I described two possible projects in class. One project involves using Bayesian optimization to discover an optimal training policy for simulated students. These lecture notes describe the project.  The other project involves predicting human forgetting over time. Here are slides that describe the forgetting project, and here is a few pages of a paper draft.  I also encourage you to devise your own project, hopefully a project that you can tie into your own research interests. We'll spend some time in class on April 17 discussing possible projects.


Option 2: Bayesian Optimization and Time Series

For Parts 1a and 1b, you will explore Bayesian optimization using Gaussian processes. I would like you to get first hand experience setting priors over hyperparameters, optimizing the hyperparameters, iteratively running experiments, and observing the convergence to a global optimum. You are welcome to write your own software, but you can also use existing software packages. Here are a few packages I know about:  MOE, Spearmint, BayesOpt, PyBO.

For Parts 2a-c, you will be exploring timeseries models.

Part 1a

Write code to implement a function over which we will optimize. This function takes an input vector and outputs a function value. I'd suggest using a 2-dimensional input space in order that you can easily visualize the function as well as the progress of Bayesian optimization.  For example, your function might be based on a mixture of Gaussians added to a linear function of the inputs. Make the function complex enough that there are some local optima and that the global optimum isn't in one corner of the space. I'd like your function to add noise to the returned value, e.g., Gaussian mean zero noise. Make a plot of the expected value of the function.

Part 1b

Run Bayesian optimization with queries being answered by a call to your function. For the Bayesian optimization, do not build in any knowledge you have about the solution, e.g., allow the optimization procedure to estimate the function mean and observation noise variance.  Describe the assumptions you made in your model and report on the outcome of experiments, e.g., report how close the estimated optimum is from the true optimum as more experiments are performed. To get a reliable estimate of convergence, you'll need to run the optimization process multiple times.

Test at least two different GP covariance functions, and run the optimization process enough times you can determine which covariance function converges most rapidly. Summarize your results in graphs and a few sentences.

Test at least two different acquisition functions (i.e., schemes for performing active selection), and run the optimization procedure enough times that you can determine which acquisition function is more effective. Summarize your results in graphs and a few sentences.

Part 2a

Generate a time series using a special case of a linear dynamical system that is similar to an autoregressive model in the time series literature.The hidden state of the dynamical system will be a vector with 2 elements, which evolve over time according to:

y2(t) = y1(t-1) + μ1
y1(t) = α1 y1(t-1) + α2 y2(t-1) + μ2

The observation will be a scalar:

z(t) = y1(t) + μ3

where for i={1,2,3}, μi ~ Normal(0,σ2) and you should pick {α1,α2, σ} to obtain interesting dynamics.

Plot a couple hundred points that are representative of this process.

Part 2b

Turn your model into a switched linear dynamical system with 3 modes.  Each mode has an associated set of {α1,α2} parameters.
Switching is memoryless, i.e., at each time with some small probability β, the mode m, is re-drawn from the set {1,2,3} with uniform probability.

Draw a graphical model that depicts this switched dynamical system.  Show the conditional distributions on each arc, e.g., write out the transition table for P(m(t) | m(t-1))

Generate a nice graph illustrating the switching dynamics.

Part 2c

Now try to perform inference using the data set you generated in either part 2a (easy option) or part 2b (harder option).  

If you use the data set from part 2a, then you can model it with a Kalman filter. Implement a KF and plot a portion of the time series along with the posterior predictive distribution. The posterior predictive distribution is P(z(t+1) | z(1), ..., z(t)). Since this distribution is Gaussian, plot the mean and some representation of uncertainty (e.g., lines at +/- 2 SD, or the shaded representation one sometimes sees.)

If you use the data set from part 2b, you could be ambitious and use a particle filter to perform inference, in which case you'll obtain a set of samples that represent the posterior predictive distribution. Or you could try using a model that isn't quite up to the task, e.g., an HMM with a Gaussian observation model (i.e., each state of the HMM outputs a constant value corrupted by Gaussian noise). You'd have to train the HMM, and it would require more than 3 states, since each state represents a constant level, but it would be interesting to see how poorly it models the data.