Probabilistic Models of
Human and Machine Intelligence

CSCI 7222
Fall 2012

Assigned 11/8/2012
Due  11/17/2012

Goal

To further develop the model we discussed in class that infers a student's latent learning state from their performance over a sequence of problems.  The relevant material is contained in the lecture of 11/6. If you need additional background, see the optional paper in the syllabus for 11/6.

Baker's data is available here.  In addition to the data, there is a file called notes.txt that explains the various data files and contains some summary statistics.  And there is a file called process.m that I wrote with matlab code to read the excel spreadsheets and pull out the data that we care about.

Task 1

Make a concrete suggestion  for computing P(Ts | {Xs,i}, hyperparameters) in the graphical model we came up with in class (slide 14 of the lecture notes).  This posterior represents the probability distribution over the (discrete) time at which the student s learns the concept, given not only the data from that particular student, but exploiting data from the entire population.  It's almost certainly the case that the posterior can't be computed analytically, but if you can do that, fabulous!  Otherwise, you will have to suggest a sampling scheme  The variables that need to be jointly sampled are: α0, α1 , ρ0,  ρ1 , λ, γ, and T.  You should write out equations (as relevant), describe proposal distributions (as relevant), and possibly even present pseudocode.

I don't have a good answer to this problem.  Some of the variables can be efficiently sampled via Gibbs sampling.  To do Gibbs sampling, you'll have to determine the Markov Blanket of each variable and compute its probability conditioned on its Markov blanket.  I suspect that other variables cannot be sampled from Gibbs sampling, so you'll need an inner loop (with Gibbs sampling in the outer loop) to do sampling for the nasty variables.

Task 2

We have a lot of hyperparameters that have to be set by hand.  Based on your understanding of the Gamma, Poisson, and Beta distributions (i.e., their means and variances), what values would you assign to these hyperparameters?  Briefly justify.

Optional

If you have ideas for an improved version of the model, propose it here.  The changes can be small or large.  For example, the suggestion was made by -- I believe -- Nicole to augment the model to incorporate multiple learning tasks.  As another example, we talked about alternative prior distributions, such as the Geometric distribution instead of a Poisson.  And finally, we're covering sequential models in class now, and perhaps some model that takes advantage of sequences might conceivably be useful.  Sequence models such as HMMs or linear dynamical systems will probably be useful only if we allow for more than 2 states of knowledge (don't know and know).  But extending the model to allow for graded states of knowledge will be an interesting direction to move in.  We can use the likelihood of a test set to compare our models.